Member-only story

Use sklearn’s predict_proba attribute to predict on the probability of an insurance cross sell

Crystal X
4 min readJul 5, 2024

--

Every month the data science website, Kaggle, has a playground competition in an attempt to help budding data scientists hone their skills in this technology. This month’s competition, being season 4 episode 7, concerned cross-selling in the insurance industry.

Cross selling is the process of selling related or complementary products to an existing customer, being one of the most effective methods of marketing.

I found this month’s competition question to be somewhat difficult because the datasets were very large, consisting of rows of data in the millions, thus requiring much computing power. I tried out several models on this competition question, but due to the large volume of data, it was very difficult to get the models to work:-

  1. I tried the Tensorflow binary classifier, but the class_weight hyperparameter did not work.
  2. I tried the ExtraTrees classifier but the program ran out of memory, so I could not proceed.
  3. I tried logistic regression and was able to get it to work, with less accuracy than the tensorflow model.

I have created the program using Kaggle’s Jupyter Notebook and saved it in my Kaggle account.

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet