Member-only story
There is no guarantee that the dataset will have a balanced target. Some datasets have highly imbalance targets, and these targets must be balanced to assure an accurate prediction.
The diagram below is a histogram of a highly imbalanced target that I was recently working on. As can be seen below, there are significantly more zeros than there are ones, which means the model will have difficulty making an accurate prediction:-
Whenever working on an imbalanced target, it is important to find ways to balance that dataset so that the model used can make accurate predictions. If using a model from Python’s sklearn library, it will likely have a class_weight with the option of tuning it to ‘balanced’. Sometimes a model that is not in the sklearn library is used and that is when it will be necessary to tune the class_weight hyperparameter manually.
The code below shows how to compute the class weight using sklearn’s compute_class_weight function. As can be seen in the code, the function must also be tuned and it is important to know the correct code, otherwise the function will not work:-