Member-only story
Sometimes sklearn outperforms tensorflow when making predictions on tabular data
I have of late been using Tensorflow whenever I can in an attempt to become proficient in using this library. In Kaggle’s playground competition 4.6 I tried to use Tensorflow in an attempt to solve the programming competition question, but I only succeeded in achieving an accuracy of 35% when making predictions on the test set. I could not understand how this could be so, so I had a look at some of the code that other Kagglers had made public, and noted that one person was able to achieve very good results using a tree based algorithm.
I decided to give using a tree based algorithm a try, using sklearn’s Extra Trees algorithm instead of a deep learning framework. To my astonishment, I achieved an accuracy of 82.6% when I swapped out the algorithm.
The dataset for Kaggle’s playground competition, Classification with an Academic Success dataset, can be found here:- https://www.kaggle.com/competitions/playground-series-s4e6
When I created the Jupyter Notebook, I imported the libraries that I would need to execute the program, being:-
- Numpy to create numpy arrays and perform numerical competitions,
- Pandas to create dataframes and process data,
- Os to go into the operating system,