Ordinary Least Squares (OLS) is linear regression technique, which is a simple and powerful method for analyzing data. It is the first step towards complex analytics journey. Core concepts within OLS are not advanced in nature, but rather a part of high school mathematics.
In simple terms, Ordinary Least Squares method tries to find a relationship/pattern between two variables. It uses the mean value of sample data set and computes a mathematical equation in such a way to minimize distances between the actual value against the mean. As simple as it sounds, the OLS method is effective when two variables under scrutiny are linear in nature. Mathematically, the output can be represented as below
The term variable is generic and have slightly different interpretations depending on context
Human brains are attuned to linear pattern matching and try to model many things in a rational fashion in a irrational world. From analysis perspective, the two variables can either be a dimension + measure or measure + measure. In dimension with measure scenario, only dimension of linear nature (e.g. time) will be an effective use case for predicting overall trend. In measure with measure scenario, any two measures can be used to analyze the relationship between them (e.g. Stock Price Vs Overall Market Index Price). Interpretation relies on the following factors
- Overall fit – Signifies the degree of closeness of each data point against the equation. Values closer to 100% signifies a better outcome.
- Error coefficient (Residual) – Inherently identifies the degree of scatter and impact of outliers in the model. Value closer to zero signifies a better outcome
A tight fit with zero error coefficient implies that the equation is good for predicting all values outside the data set. All tools such as R, Python have out of the box feature to create visualizations and the best method here to visualize data is by plotting data in a scatter plot format for both actual data and the output generated using the equation generated by OLS.
Note: OLS can be extended to analyze multiple variables, which is referred to as multi-variate regression analysis.