Member-only story

Why I could not complete Kaggle’s December 2021 tabular competition

Crystal X
2 min readDec 3, 2021

--

It is with a heavy heart that I have to confess that I have been unable to complete Kaggle’s final tabular competition for December 2021. The reason for this is because the train dataset had 4,000,000 examples and a multiclass label of only one number 5.

Because the train dataset was so large, the system crashed on me a multitude of times. Whilst trying to find out why the system was crashing, I looked on a Kaggle problem page and found out that the system has a tendency to crash when in GPU mode!

The system crashed when I tried to normalise the data, which caused the system to crash, so I had to take out normalisation.

The dataset for the competition had a class imbalance with only one example of classification number 5, so this meant that I could not stratify y when I was splitting the dataset into training and validation sets. Because there was only one example of classification number 5, I could not use a lot of models, so I had to go and delete the row that had the only instance og classification number 5.

Because the label has a class imbalance, I tried to use SMOTE, but for some reason either the code would not work or it would crash the system, so I had to leave that piece of code out of the algorithm.

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

Responses (1)