Member-only story
Within the context of a linear regression model, Analysis of Variance (ANOVA) helps to analyse the overall fit of a model. When applied to linear regression, the ANOVA procedure decomposes the total variation in the dependent variable into components attributable to the model and its residuals. This helps to determine how well the regression model explains the variation in the data.
The key components of the ANOVA in this instance are:-
- The Sum of Squares (SST) is the total variation in the dependent variable, y.
- The Regression Sum of Squares (SSR) is the variation explained by the regression model.
- The Residual Sum of Squares (SSE) is the variation not explained by the regression model.
In order top demonstrate how the ANOVA works within the context of a linear regression model, I have performed one on the PlantGrowth dataset in Python using Google Colab.
The first thing that I did was to import the libraries that I would need, being:-
- Pandas to create the dataset and manipulate data,
- Numpy to perform numerical computations,
- Statsmodels to provide statistical functionality,
- Matplotlib to visualise the data, and