History of Least Squares
Deming was once asked how does a mathematical physicist becomes a statistician? His answer was simple. “I’ve always been interested in the theory of errors and least squares and I had the best teachers.
Near the end of the 18th century, you might say that using statistics to find new planets was all the rage. In 1781 astronomers discovered Uranus by using a relatively simple predictive formula of taking the average distances of the planets from the sun. This was previously known as the Titius-Bode Law. As far back as the 16th century, by Johannes Kepler, best known for the laws of planetary motion, it had been suggested that there was a possibility of another mystery planet between Jupiter and Mars. In 1801 a Sicilian monk named Giuseppe Piazzi discovered this phantom planet and named it Ceres after the Roman goddess of agriculture, fertility, and motherly love. As it turned out, Piazzi's discovery was not a planet but believed to be the first discovered asteroid. However, over time Ceres was later reclassified as a dwarf planet.
Nestled in the Ceres story is an exciting bit of statistics history. Throughout the millennia, at its heart, astronomy has always had to tackle the issue of data measurement, which necessitates a large quantity of information. However, these measurements impose limits, including people, imprecision, physical conditions, and bad instruments. A good scientist who understands these conditions will require multiple data points (measures) to adjust the data for accuracy. This body of work is called the Theory of Errors. Over the years, astronomers, among other disciplines, have contributed a vast set of tools (primarily formulas) for dealing with these errors. Since there is always variation in all processes (errors), sound engineering requires redundant measurement techniques. The statistical approach for dealing with these errors is to analyze a group of these errors as an error index. Karl Pearson described this indexing formal in 1893 as Standard Deviation. However, there were variant formulas based on a similar concept floating around for a couple of 100 years prior.
Standard Deviation is rarely ever calculated by hand, but we can use an example of ordering Pizza. There are two pizza places on the way home from work, and both are the same quality. Assume you are in a hurry and would rather not wait that long to order the Pizza. For the sake of this exercise, let's take you have been tracking the order times of the last ten visits for both Johnny's and Leo's. If we look at the chart below, based on the average times, it appears it doesn't matter which one you choose based on the average (mean) of all the order times. Both place's average order time is 412 seconds ( under 7 minutes). However, if we look a little closer, we can see that Johnny's Pizza is quite a bit more consistent in the variation of delivery times.
This difference is what's called the flaw of averages. Even though the average order time is the same at both pizza places, you would have a higher probability of getting your Pizza quicker at Johnny's. Standard Deviation is a useful mathematical concept in this context. It calculates the average amount of time it takes for each measurement to deviate from the mean. Johnny's Pizza (33 seconds) is the average variation of all the order times. Whereas Leo's Pizza's Standard Deviation in over a minute (107 seconds). With Standard Deviation, we can see statistically that Johnny's Pizza is twice as consistent as Leo's. You can see a visual display of this variation in the following two charts comparing the variation of Johnny's Pizza against Leo's Pizza, where the blue dashes represent the average order time.
One of the problems with Giuseppe Piazzi’s discovery was that by the time he went to publish his findings, the position of Ceres had changed such that it was not visible for some duration. Astronomers were not able to confirm his discovery. At this point, there was still no real good were good algorithms to figure out where it was (i.e., Ceres’ orbit). It was a 24-year-old German mathematician named Carl Friedrich Gauss who used a novel set of statistical methods to predict the orbital path of Ceres. By the end of 1801, astronomers were able to confirm the existence of Ceres. Gauss’s approach is known today as Least Squares or is also referred to as Linear Regression. Earlier work around the Theory of Errors, specifically formulas similar to Standard Deviation, made it possible for Gauss to solve this complex problem.
To better understand Least Squares, we could further illustrate the Pizza order time example. Imagine a new manager is hired a Leo’s to help compete better against Johnny’s Pizza. This new manager might start tracking order times like in the previous example, but she might also begin correlating tip data with the order times as seen in the following chart.
In a motivational strategy, the new manager decides to use Least Square to show how much more they could make in tips to improve the overall order time by 20%. Least Square is a formula that allows you to make hypnosis from existing data. Once you have loaded your existing data in what is referred to as the best fit, you can then test the dependent hypnosis of new data. For example, using Least Square, we can ask the question of how much of a tip we are likely to receive if the order time were to be 300 seconds. Not to get too technical here, but Least Square calculates the intercept and slope of the data to predict the probable outcome value of the hypnosis data. In other words, if X = order time, then what would Y be. Based on a Least Square calculation from the previous chart, the probable tip would be $2.90 for a 300-second order. The manager proposes to the team that if we can reduce the order times by 20%, they can make $25.98 over the next ten orders instead of the $17.75. The following chart shows the output of potential tip increase by decreasing the order times from the previous chart by 20%. The Least Square formula calculates the new tip probabilities for each order.
It’s used throughout finance, economics, investing, actuary science, medicine, biology, agriculture, and just about everything. The least-squares are also a considerable part of the modern-day cloud, AI, and machine learning fundamentals.