Get stuck in to gridsearchcv to predict on the survivors of the Titanic
In my last post I discussed how sklearn’s train_test_split is a helper function, which splits the data up into training and validating sets as part of the cross validation process because the data that is predicted on cannot be the same data that had previously been trained on. After the data has been split, it must be put into a model for training and fitting, and this is where sklearn’s simplest cross validation tool can be used to iterate through the training a set number of times to come up with a mean score and it’s standard deviation. The cross_val_score function is the sklearn’s simplest form of cross validation because it does not ask the user to do anything or to submit any additional code to improve the model’s accuracy. The link to the post I wrote can be found here:- https://medium.com/mlearning-ai/how-to-use-the-simplest-cross-validation-technique-in-python-36257a21ff83
In this post I will endeavour to discuss a more complex type of cross validation technique that can be used to tune the parameters in a model to come to an optimum accuracy. The technique I am referring to is sklearn’s GridSearchCV function, which is a utility that performs an exhaustive search over parameter values for an estimator.
Hyperparameters are parameters that are not directly learnt within estimators. It is…