Member-only story
At some point in a person’s data science studies, he will consider going to an interview for a position in data science. It is for that reason that the individual needs to be prepared for any questions the interviewers may pose.
One question that may come up in an interview is: What is selection bias?
When we collect data to undertake a data science experiment, the data we collect will not necessarily be the entirety of the population. Because the data we collect does not represent the entire population, the model that we learn will only perform well for the sample we have collected, but not the entire population. For example, if a data scientist collects purchasing patterns on people in the town I live in, Reading, this won’t necessarily reflect the purchasing patterns on the whole of England.
Selection bias is the bias introduced into the sample of data for analysis in such a way that randomisation is not achieved, thereby failing to ensure that the data sample is representative of the population being analysed. If selection bias is not taken into account then some conclusions of the analysis may not be correct.
Bias is a type of error that systematically skews results in a certain direction. Selection bias occurs when the researcher decides who or what is going to be studied. It is usually associated with participants where the selection of participants is not random.