Member-only story

Interview Question: What are the differences between overfitting and underfitting?

Crystal X
3 min readOct 25, 2022

--

Every person studying data science will in all likelihood prepare to attend an interview to work in a role related to data science or machine learning. One question that is likely to come up is: What are the differences between overfitting and underfitting.

Overfitting is a concept in data science that occurs when a statistical model fits exactly against its training data. When this occurs, the algorithm in question cannot perform well against unseen data. When machine learning algorithms are constructed, they train and fit the model to a sample dataset. If the model trains too long on the sample data or if the model is too complex, it can begin to learn the noise or irrelevant information within the dataset. When the model memorises the noise and fits too closely on the training set, it is said to be overfitted and is unable to generalise well on new data (being the validation or test set). If a model cannot generalise well to new data, it cannot make the predictions that it is intended for.

Overfitted models tend to have low error rates and high variance.

One way to check to see if a model is overfitted is to look at the error rate. If a model has a low error rate on the training dataset but a high error rate on the validation or test dataset, this is a…

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet