Interview Question: Calculate log loss on the Churn dataset

Tracyrenee
4 min readNov 16, 2023

I think it is a good idea to go over interview questions and to work through them because there are likely to be occasions when a concept comes up that the student is not familiar with. For example, I had not really encountered log loss before working on this interview question.

The interview question is to build a logistic regression model on the Customer Churn dataset, with the dependent variable being Churn and the independent variable being MonthlyCharges. The task in this interview question is to find the log loss for the model.

Log loss, also known as logarithmic loss or cross-entropy loss, is a common evaluation metric for binary classification models. It measures the performance of a model by quantifying the difference between predicted probabilities and actual values. Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value, penalising inaccurate predictions with higher values. Lower log-loss indicates better model performance.

Log Loss is the most important classification metric based on probabilities. It’s difficult to interpret raw log-loss values, but log-loss is still a good metric for comparing models. For any given problem, a lower log loss value means better predictions.

The formula for computing log loss is:-

Fortunately, Python’s machine learning library, sklearn, has a method that calculates log loss of a classification model, so this is what will be used in this blog post.

I have written the Python program for this interview question in Google Colab, which is a free, online Jupyter Notebook hosted by Google. I have to say that Google Colab is a great platform to use to write code in Python, which the only exception being that it does not have an undo function. It is important, therefore, not to accidentally overwrite or delete valuable code because it is possible that it may not be retrievable.

The first thing that I did was to import the libraries that I would need to execute the program, being:-

  1. Numpy to create numpy arrays and carry out numerical computations,
  2. Pandas to create dataframes and process data,
  3. Matplotlib to…

--

--

Tracyrenee

I have close to five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.