Member-only story
Sklearn, the Python library that I have been studying for a couple of years now, has the capacity to reduce high dimensional data to a lower dimension to make it easier for a machine learning model to make predictions on it. There are several functions in sklearn to perform this task, but the function that is the most well known is principal component analysis, or PCA.
I have come across Edureka!’s video on this subject, so I decided to watch the video and review it. The notes I have taken whilst watching this video follow:-
The need for PCA
High dimension data is extremely complex to process due to inconsistencies in the features that increase the computation time and make data processing and exploratory data analysis (EDA) more convoluted.
The curse of dimensionality refers to various phenomena that arise when analysing and organising data in high dimensional spaces that do not occur in low dimensional settings, such as three dimensional physical space of everyday experience. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is when the dimensionality increases, the volume of the spaces increases so fast that the available data…