A solid understanding of statistics is necessary to become a good data scientist because it is necessary to perform statistical tests on data in order to gain greater insight into it.
One statistical test that is often performed is the t-test, which calculates the t-statistic. A t-test is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. T-tests are generally used on small populations of data.
The t-test calculates the t-statistic, which is the ratio of the departure of the estimated value of a parameter from its hypothesised value to its standard error. It is used in hypothesis testing via Student’s t-test to determine whether to support or reject the null hypothesis.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.
There are 5 main steps in hypothesis testing:-
- State the research hypothesis as a null hypothesis and alternate hypothesis (Ho) and (Ha or H1).
- Collect data in a way designed to test the hypothesis.