Member-only story

Learn statistics with R using the iris dataset

Crystal X
10 min readJan 2, 2023

--

Statistics is an integral part of data science, so I have been spending a fair amount of time studying this science. Because the R programming language is a statistical language designed for statisticians, I have been spending time studying this language

I have not been able to find a free online R interpreter that I like, so I have decided to use Kaggle’s free online Jupyter Notebook because the user has the choice of programming in either Python or R. In this instance I decided to program in R because it has been designed for work with statistics.

I decided to use the Iris dataset because it is a well known dataset used by data scientists and machine learning engineers. The Iris dataset consists of 50 samples each of the three species of iris, being setosa, virginica, and versicolor. Four features were measured from each sample, being the length and width of the sepals and petals.

Because the computer does not compute strings well, I had to create an additional row that is the Species column ordinally encoded.

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

No responses yet