Member-only story
Predict on steel defects using sklearn’s MultiOutputClassifier function
Sklearn’s MultiOutputClassifier is a function that enables the data scientist to easily make predictions on a multilabel target. Not all of the Python machine learning libraries have this functionality, so it is really great that sklearn has included this and other similar functions in its library.
I have decided to use sklearn’s MultiOutputClassifier in Kaggle’s playground series season 4 episode 3 because the target for the dataset in this competition question has seven labels. The competition question asks the entrant to predict on the probability that steel will become defective, so I decided to use a classifier that has the predict_proba method in it for ease of computation.
I have written the program using Kaggle’s free online Jupyter Notebook and stored it in my account of the data science company.
The first thing I did, after creating the Jupyter Notebook, was to import the libraries that I would need to execute the program, being:-
- Pandas to create dataframes and process data,
- Numpy to create numpy arrays and perform numerical computations,
- Os to go into the operating system and retrieve needed files,
- Scipy to perform scientific computations on the dataset,