Member-only story
Use logistic regression to predict on multiclass probabilities using Kaggle’s cirrhosis dataset
According to the NHS website, cirrhosis is scarring of the liver caused by continuous, long-term liver damage. Scar tissue replaces healthy tissue in the liver and prevents the liver from working properly. The damage caused by cirrhosis can’t be reversed and can eventually become so extensive that your liver stops functioning.
While most people assume that cirrhosis is caused by drug and alcohol abuse, this condition can be caused by other conditions, such as overeating. Non-alcoholic fatty liver disease (NAFLD) is the term for a range of conditions caused by a build-up of fat in the liver. It’s usually seen in people who are overweight or obese. Early-stage NAFLD does not usually cause any harm, but it can lead to serious liver damage, including cirrhosis, if it gets worse.
In Kaggle’s playground competition, season 3 episode 26, the task is to predict on the probabilities of a multiclass label that a person will have cirrhosis. The label consists of three classes, being C, CL, and D.
The metric used to determine the score of the competition entrant is logloss. Log loss, also known as logarithmic loss or cross-entropy loss, is a common evaluation metric for binary classification models. It measures the performance of a model by…