Interview question: What is survivorship bias in data science and machine learning?

Tracyrenee
6 min readFeb 15, 2024

I have been studying data science and machine learning for a few years now and have come across quite a few terms to do with this profession. One term that I only recently have come across is survivorship bias.

Survivorship bias, or survival bias, is the logical error of concentrating on entities that passed a selection process while overlooking those that did not. This can lead to incorrect conclusions because of incomplete data.

Survivorship bias is a form of selection bias that can lead to overly optimistic beliefs because multiple failures are overlooked, such as when companies that no longer exist are excluded from analyses of financial performance. It can also lead to the false belief that the successes in a group have some special property, rather than just coincidence as in correlation “proves” causality.

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analysed. If the selection bias is not taken into account, then some conclusions of the study may be false.

--

--

Tracyrenee

I have five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.