Interview question: How to deal with missing values in a dataset?

Tracyrenee
3 min readJan 25, 2023

When I am working on a dataset one of the first things that I do is to check for missing values. It is important that missing values are dealt with because the model will not work if there are any missing values in the dataset.

Missing values are important because, depending on the type, they can sometimes bias the results of an analysis or prediction. Bias in data is an error that occurs when certain elements of a dataset are overweighted or overrepresented. Biased datasets don’t accurately represent a model’s use case, which leads to skewed outcomes, systematic prejudice,and low accuracy.

There are two ways to handle missing values, being:-

  1. Delete the missing values
  2. Impute the missing values

There are several ways to delete missing values in a dataset, being:-

  1. Delete the entire row
  2. Delete the entire column

The code in Python to delete an entire column of data is:-

If the entire row needs to be deleted then the “axis=0, inplace=True” would be inserted inside the brackets of the function.

I personally prefer to just drop missing values, especially if they are in the training set. The reason for this is…

--

--

Tracyrenee

I have five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.