Member-only story

Learn basic statistics for machine learning using Python

Crystal X
20 min readJan 13, 2023

--

Statistics is one of the foundation subjects necessary to become proficient at in order to become a data scientist. In my last blog post I used Python to write the algorithms for various statistical metrics. In this post, however, I intend to use Python to explore how statistics is very useful in the field of machine learning.

I have written a program in Google Colab, which is a free online Jupyter Notebook hosted by Google. I have also used a dataset concerning diabetes to carry out a statistical analysis.

I have used pandas, which is a library for panel data, to read the dataset into the program and convert it to a dataframe, df:-

I used Python’s describe function to perform a basic statistical analysys:-

  1. The number of rows are counted.
  2. The mean in the measure of centre is calculated for each column of data.
  3. Metrics from the measure of spread are calculated, to include the standard deviation and the five values of statistics.
  4. The five values of statistics include the minimum and maximum values and the quartiles.

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet