Member-only story
Why I could not complete Kaggle’s December 2021 tabular competition
It is with a heavy heart that I have to confess that I have been unable to complete Kaggle’s final tabular competition for December 2021. The reason for this is because the train dataset had 4,000,000 examples and a multiclass label of only one number 5.
Because the train dataset was so large, the system crashed on me a multitude of times. Whilst trying to find out why the system was crashing, I looked on a Kaggle problem page and found out that the system has a tendency to crash when in GPU mode!
The system crashed when I tried to normalise the data, which caused the system to crash, so I had to take out normalisation.
The dataset for the competition had a class imbalance with only one example of classification number 5, so this meant that I could not stratify y when I was splitting the dataset into training and validation sets. Because there was only one example of classification number 5, I could not use a lot of models, so I had to go and delete the row that had the only instance og classification number 5.
Because the label has a class imbalance, I tried to use SMOTE, but for some reason either the code would not work or it would crash the system, so I had to leave that piece of code out of the algorithm.