How to create a dataset of blobs and then make predictions on it

Crystal X
4 min readJul 15, 2021

When practicing on data science projects, it is nice to have a dataset to practice on that is unique. Kaggle, the free data science website, has many datasets in their repositories, but there is always the possibility that a dataset is not really suitable for what types of experiments are desired. Sklearn, the Python library focused primarily on machine learning, has several small toy datasets that can be used, and in the post I have written blog posts on how to use those datasets. OpenML is another site that provides datasets people can practice on and sklearn even has a function that will allow people to access OpenML datasets in through that library.

Sometimes, however, a data scientist does not want to search for the correct dataset to use, and that is when sklearns facility to create toy datasets become handy. I have previously written about how to make the two moons dataset, and the link for that post can be found here:- https://medium.com/mlearning-ai/how-to-create-a-two-moon-dataset-and-make-predictions-on-it-dcc090c829af

In this post I will endeavour to show the reader how to use sklearn to create blobs that can be predicted on.

I have written the script in Google Colab, which is a free online Jupyter Notebook. The only main drawback of Google Colab that I have found is the…

--

--

Crystal X
Crystal X

Written by Crystal X

I have over five decades experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.