Statistics interview question: What is a confounding variable?
In statistics, a confounding variable is an unmeasured third variable that influences both the independent and dependent variables. For example, in a scenario with three variables, X is the independent variable, y is the dependent variable, and Z is potentially the confounding variable.
A confounding variable cannot occur when there are only two variables in a dataset, as this is a phenomenon that can only occur with three or more variables.
Some challenges to identifying confounders in observational research are:-
- The variable has to be easily measurable.
- It is never actually known if something is a confounder.
- It is always a valid criticism that a proposed set of confounders don’t satisfy the backdoor criterion. The backdoor criterion for identifying the effect of X on y involves finding a set of nodes to condition on that collectively block all backdoor paths between X and y.
The Titanic dataset
One dataset that potentially has confounding variables in it is the Titanic dataset. Anyone who has studied this dataset…