Member-only story
How I won my 16th bronze medal by entering Kaggle’s English Learners competition
Yesterday I had a few hours to spare, so I looked on the Kaggle website to see if there was a competition I could enter. I am limited in what I can do with Kaggle competitions because my computer does not have a lot of memory and I have not yet learned deep learning techniques for that reason. I did, however, find a competition that involves natural language processing, so decided to give that competition a go.
I entered the English Language Learners competition with no expectation of winning because:-
- I am not residing in the US so I cannot pick up any monetary award.
- I only have limited time available to devote to a Kaggle competition.
Nevertheless, I decided that I would have a stab at the competition. The dataset involved one column of text, which is what needed to be analysed, and six columns of output.
I decided to use sklearn’s tfIdfVectorizer to compute the one column of text data. Sklearn is not as efficient and accurate as spacy, but I decided to use this library’s functions as a first attempt.
The six column target was a bit trickier, but I decided to use sklearn’s MultiOutputRegressor for that purpose. I chose sklearn’s support vector regressor (SVR) as the estimator for…