Member-only story

A comparison of logistic regression versus decision tree in a heart disease dataset

5 min readSep 17, 2022

I love working on different datasets because each one has a different story to tell. In this blog post I have been working on a dataset that will tell if a person has heart disease. This is a binary classification, so in addition to making predictions on whether or not a person has heart disease, I have also included goodness of fit metrics that reveal how well the dataset fits into the model. I have decided to make predictions on two models, being:-

Logistic regression and
Decision tree classifier

In this post I have included a roc curve, which is a diagnostic tool that helps in the interpretation of probabilistic forecasts for binary classification predictive modelling problems. Roc curves summarise the tradeoff between the true positive rate and the false positive rate for a predictive model using different probability thresholds. Roc curves are appropriate when the observations of the dataset are balanced between each class.

The receiver operating characteristic, or roc, curve, is a plot of the false positive rate (X axis) and the true positive rate (y axis) for a number of different candidate thresholds between 0 and 1. It plots the false alarm rate versus the hit rate.

A comparison of logistic regression versus decision tree in a heart disease dataset

Written by Crystal X

No responses yet