Member-only story
Use seaborn to create a barplot of feature importances in sklearn’s Random Forest model
Although I have been studying Tensorflow and have been using it for most of my machine learning work, there are some instances when sklearn can outperform the deep learning library. For instance, I recently worked on a census dataset and found that sklearn’s Random Forest Classifier outperformed Tensorflow when it came to making predictions on salaries. The dataset in question can be found here:- https://www.kaggle.com/competitions/minim-al-census-income/leaderboard
In that particular competition, I began my work using Tensorflow’s sequential model to make predictions, but I also used sklearn’s Random Forest Classifier too. To my surprise, Random Forest made significantly better predictions than Tensorflow. The lesson learned, therefore, is to try out several different models on a dataset and select the one that affords the highest accuracy.
One thing that sklearn can do that I have not observed in Tensorflow is to print out the feature importance of the dataset. Feature importance in machine learning refers to a technique used to determine the relevance or contribution of each feature in a predictive model. It helps us understand which features have the most significant impact on the model’s predictions or outcomes.