Member-only story
A comparison of logistic regression versus decision tree in a heart disease dataset
I love working on different datasets because each one has a different story to tell. In this blog post I have been working on a dataset that will tell if a person has heart disease. This is a binary classification, so in addition to making predictions on whether or not a person has heart disease, I have also included goodness of fit metrics that reveal how well the dataset fits into the model. I have decided to make predictions on two models, being:-
- Logistic regression and
- Decision tree classifier
In this post I have included a roc curve, which is a diagnostic tool that helps in the interpretation of probabilistic forecasts for binary classification predictive modelling problems. Roc curves summarise the tradeoff between the true positive rate and the false positive rate for a predictive model using different probability thresholds. Roc curves are appropriate when the observations of the dataset are balanced between each class.
The receiver operating characteristic, or roc, curve, is a plot of the false positive rate (X axis) and the true positive rate (y axis) for a number of different candidate thresholds between 0 and 1. It plots the false alarm rate versus the hit rate.