Member-only story

Interview Question: What are confounding variables?

Crystal X
3 min readDec 21, 2023

--

I have been studying data science for three years now, and I have to say that I have only just recently heard of confounding variables. I therefore had to do a bit of research on this subject before I could adequately answer this interview question. If I had never heard of confounding variables then I can bet lots of other students of data science have not heard of it either.

Confounding variables are variables that affect both independent and dependent variables in a dataset. Confounding variables have an effect on the accuracy of a machine learning problem. For example, in a regression problem, confounding variables can change the regression line, or even the sign of the line.

Several approaches can be taken to avoid losing accuracy from the impact of confounding variables. The best approach is to design a project where confounding variables are avoided. Sometimes confounding variables can be detected by performing an exploratory data analysis (EDA) of the dataset.

Confounding variables can be identified by looking at the correlation of the features in a dataset. By analysing correlations, data scientists can select relevant features and eliminate redundant or highly correlated ones, thereby improving model performance. Two Python libraries that are able to identify highly correlated features are pandas…

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet