Member-only story
Use sklearn’s predict_proba attribute to predict on the probability of an insurance cross sell
Every month the data science website, Kaggle, has a playground competition in an attempt to help budding data scientists hone their skills in this technology. This month’s competition, being season 4 episode 7, concerned cross-selling in the insurance industry.
Cross selling is the process of selling related or complementary products to an existing customer, being one of the most effective methods of marketing.
I found this month’s competition question to be somewhat difficult because the datasets were very large, consisting of rows of data in the millions, thus requiring much computing power. I tried out several models on this competition question, but due to the large volume of data, it was very difficult to get the models to work:-
- I tried the Tensorflow binary classifier, but the class_weight hyperparameter did not work.
- I tried the ExtraTrees classifier but the program ran out of memory, so I could not proceed.
- I tried logistic regression and was able to get it to work, with less accuracy than the tensorflow model.
I have created the program using Kaggle’s Jupyter Notebook and saved it in my Kaggle account.