Interview question: How to deal with missing values in a dataset?

Tracyrenee
3 min readJan 25, 2023

When I am working on a dataset one of the first things that I do is to check for missing values. It is important that missing values are dealt with because the model will not work if there are any missing values in the dataset.

Missing values are important because, depending on the type, they can sometimes bias the results of an analysis or prediction. Bias in data is an error that occurs when certain elements of a dataset are overweighted or overrepresented. Biased datasets don’t accurately represent a model’s use case, which leads to skewed outcomes, systematic prejudice,and low accuracy.

There are two ways to handle missing values, being:-

  1. Delete the missing values
  2. Impute the missing values

There are several ways to delete missing values in a dataset, being:-

  1. Delete the entire row
  2. Delete the entire column

The code in Python to delete an entire column of data is:-

If the entire row needs to be deleted then the “axis=0, inplace=True” would be inserted inside the brackets of the function.

I personally prefer to just drop missing values, especially if they are in the training set. The reason for this is because one never knows what the missing value should have been in the first place.

There are some instances when a person would need to impute missing data, especially if it is in a test set. I have written the code below to illustrate some different types of ways that null values can be imputed in a dataset:-

Sklearn

Python’s machine learning library, sklearn, has facilities to impute missing values. Some of sklearn’s imputation facilities are:-

  1. SimpleImputer: missing values can be imputed with a provided constant value or using mean, median, or mode of each column where the missing value is located.
  2. IterativeImputer: this imputer models each feature of missing values as a function of other…

--

--

Tracyrenee

I have close to five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.