Member-only story

A comparison of logistic regression versus decision tree in a heart disease dataset

Crystal X
5 min readSep 17, 2022

--

I love working on different datasets because each one has a different story to tell. In this blog post I have been working on a dataset that will tell if a person has heart disease. This is a binary classification, so in addition to making predictions on whether or not a person has heart disease, I have also included goodness of fit metrics that reveal how well the dataset fits into the model. I have decided to make predictions on two models, being:-

  1. Logistic regression and
  2. Decision tree classifier

In this post I have included a roc curve, which is a diagnostic tool that helps in the interpretation of probabilistic forecasts for binary classification predictive modelling problems. Roc curves summarise the tradeoff between the true positive rate and the false positive rate for a predictive model using different probability thresholds. Roc curves are appropriate when the observations of the dataset are balanced between each class.

The receiver operating characteristic, or roc, curve, is a plot of the false positive rate (X axis) and the true positive rate (y axis) for a number of different candidate thresholds between 0 and 1. It plots the false alarm rate versus the hit rate.

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet