Perform an exploratory data analysis (EDA) on the Galton dataset
The Galton height dataset is a collection of data on the heights of adult children and their parents, collected by Francis Galton, a cousin of Charles Darwin, in the 1880’s. This data set includes information on the number of adult children in their families and their heights.
I was originally going to carry out an exploratory analysis (EDA) on this dataset in R, but was unable to import the dataset from the internet on the myCompiler website, which I had been using to analyse datasets in R. because the dataset is too large for me to manually code it into the program, I opted to perform the analysis in Python and its statistical library, statsmodels, instead.
I have written the EDA in Python using Google Colab, which is a fantastic platform for writing code in Python. The reason for this is because Google Colab has many of Python’s necessary libraries already installed into the framework. This interpreter also allows the user to use datasets that have been taken off the internet, which I have found not to be the case with other online compilers.
I started the coding by importing the libraries that I would need to run it, being:-
- Pandas to create the dataframe,
- Numpy to perform numerical computations,