How to impose Principal Component Analysis on a House Price Regression

Crystal X
6 min readJul 28, 2021

The last several posts I have written about has concerned how to reduce the features of a dataset to hopefully remove redundant or nonessential information, reduce noise and improve accuracy of predictions. A recent post I have written regarding feature selection can be found here:- https://medium.com/mlearning-ai/how-to-select-features-using-selectkbest-in-python-c5a5239969f0

One way to reduce the features of a dataset that is not necessarily feature selection is principle component analysis, or PCA. PCA is a linear reduction technique using Singular Value Decomposition of the data to project it to a lower dimensional space. In linear algebra SVD is a factorisation of a real or complex matrix. It generalises the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any m * n matrix.

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. The input data is centred but not scaled for each feature before applying the SVD.

In this post I will illustrate how PCA can be used to reduce the dimensionality of a latest with 79 features, the Ames House Price dataset. This dataset can be found in the Kaggle website under the competitions section. I have made copious submissions…

--

--

Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.