Member-only story

Use sklearn’s predict_proba attribute to predict on the probability of an insurance cross sell

4 min readJul 5, 2024

Every month the data science website, Kaggle, has a playground competition in an attempt to help budding data scientists hone their skills in this technology. This month’s competition, being season 4 episode 7, concerned cross-selling in the insurance industry.

Cross selling is the process of selling related or complementary products to an existing customer, being one of the most effective methods of marketing.

I found this month’s competition question to be somewhat difficult because the datasets were very large, consisting of rows of data in the millions, thus requiring much computing power. I tried out several models on this competition question, but due to the large volume of data, it was very difficult to get the models to work:-

I tried the Tensorflow binary classifier, but the class_weight hyperparameter did not work.
I tried the ExtraTrees classifier but the program ran out of memory, so I could not proceed.
I tried logistic regression and was able to get it to work, with less accuracy than the tensorflow model.

I have created the program using Kaggle’s Jupyter Notebook and saved it in my Kaggle account.

Use sklearn’s predict_proba attribute to predict on the probability of an insurance cross sell

Written by Crystal X

No responses yet