Member-only story
I have been studying data science for over three years now, and anomaly that periodically crops up is how to make predictions on imbalance datasets.
The interview question that I have come across that relates to imbalance datasets is the question that I found in a news article: Define unbalanced information.
Since the question was regarding data science, I am assuming that it relates to imbalanced datasets.
A classification data set with skewed class proportions is called imbalanced. Classes that make up a large proportion of the data set are called majority classes. Those that make up a smaller proportion are minority classes.
An imbalance dataset can be mild, moderate or extreme:-
I have picked out some examples of datasets that I have personally worked with to show the extent of any imbalance.
The image below is the iris dataset, which is not imbalanced, as each class of the target is divided into 50 samples each:-
The image below is of the survivors of the Titanic ship wreck, and it can be seen that it is mildly skewed:-