I really do enjoy working on Kaggle competitions because they help me to improve my skill set in the field of machine learning. The competition that I am working on in this blog post is season 4 episode 9, and the task is to predict the prices of used cars.
In order to complete the competition, I tried a few models, including sklearn’s linear regression. Tensorflow’s linear regression, and sklearn’s extra trees. I finally settled on sklearn’s extra trees with feature selection.
I have written the program in Python, using Kaggle’s Jupyter Notebook, and saved it to my Kaggle account.
When I created the Jupyter Notebook, the first thing that I did was to import the libraries that I would need to run it, being:-
- Numpy to create numpy arrays and perform numerical computations,
- Pandas to create series and dataframes, and to also process data,
- Os to go into the operating system of the computer and retrieve all the files in the competition,
- Scipy to perform scientific calculations,
- Sklearn to provide machine learning functionality,
- Matplotlib to visualise the data, and
- Seaborn to statistically visualise the data.