I was quite surprised to open up my email account yesterday and find that I have won not one, but two bronze medals on the same Kaggle competition. The link to the code that I used to enter the competition can be found here:- https://medium.com/@tracyrenee61/use-sklearns-extra-trees-to-predict-on-used-car-prices-65c68d105db8

This competition question involved predicting on the price of used cars, but the main stumbling block in this exercise was the fact that the features in the train and test set were not from the same distribution. Therefore, it was going to be very difficult to progress up the leaderboard with a situation like that. In real life situations, the features in the test set may not be from the same distribution as the training set, so perhaps this competition question is descriptive of a real world situation.

Linear Regression

The first model that I used to solve this question was sklearn’s linear regression model.

If the variables in a regression model contain polynomial terms or interaction terms then it is important to standardise the data. The dataset did not look to me like it had any…