Member-only story
Cross validation, which is sometimes called rotation estimation or out of sample testing, is any of various model validation techniques for assessing how the results of a statistical analysis will generalise in an independent dataset. Cross validation is a resampling method that uses different portions of the data to test and train a model in different iterations. It is mainly used in settings where the goal is prediction and the user wants to estimate how accurately a predictive model performs in practice.
In a prediction problem, a dataset of known data is usually input into the model where it is trained and fit. A dataset of unknown data, being the validation or test set, is then input into the estimator where it is predicted on.
The goal of cross validation is to test the model’s ability to make predictions on new data that is not used in the estimation process, in order to identify problems such as overfitting or selection bias and give insight into how the model will generalise when making predictions on an unknown dataset.
Cross Validation and sklearn
Sklearn, the Python programming language’s machine learning library, has several tools that can be used for cross validation.
It is not acceptable to make predictions on data that has been trained and fitted into…