Member-only story

Predict on steel defects using sklearn’s MultiOutputClassifier function

Crystal X
4 min readMar 8, 2024

--

Sklearn’s MultiOutputClassifier is a function that enables the data scientist to easily make predictions on a multilabel target. Not all of the Python machine learning libraries have this functionality, so it is really great that sklearn has included this and other similar functions in its library.

I have decided to use sklearn’s MultiOutputClassifier in Kaggle’s playground series season 4 episode 3 because the target for the dataset in this competition question has seven labels. The competition question asks the entrant to predict on the probability that steel will become defective, so I decided to use a classifier that has the predict_proba method in it for ease of computation.

I have written the program using Kaggle’s free online Jupyter Notebook and stored it in my account of the data science company.

The first thing I did, after creating the Jupyter Notebook, was to import the libraries that I would need to execute the program, being:-

  1. Pandas to create dataframes and process data,
  2. Numpy to create numpy arrays and perform numerical computations,
  3. Os to go into the operating system and retrieve needed files,
  4. Scipy to perform scientific computations on the dataset,

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet