Member-only story
Statistics is one of the foundation subjects necessary to become proficient at in order to become a data scientist. In my last blog post I used Python to write the algorithms for various statistical metrics. In this post, however, I intend to use Python to explore how statistics is very useful in the field of machine learning.
I have written a program in Google Colab, which is a free online Jupyter Notebook hosted by Google. I have also used a dataset concerning diabetes to carry out a statistical analysis.
I have used pandas, which is a library for panel data, to read the dataset into the program and convert it to a dataframe, df:-
I used Python’s describe function to perform a basic statistical analysys:-
- The number of rows are counted.
- The mean in the measure of centre is calculated for each column of data.
- Metrics from the measure of spread are calculated, to include the standard deviation and the five values of statistics.
- The five values of statistics include the minimum and maximum values and the quartiles.