Image for post
Image for post

For those individuals who are looking for datasets to practice on, the GOV.UK has many statistical datasets that can be used to make predictions on using artificial intelligence. The link for these statistical datasets can be found here:- Statistical data sets — GOV.UK (www.gov.uk)

I recently used a csv file from GOV.UK to make predictions on the price of bananas, and the link can be found here:- Facebook Prophet can be used to predict the price of Bananas in the UK

In this post I have used another csv file from the GOV.UK …


Image for post
Image for post

At the time of this writing, the world has experienced the fourth major mutation of COV19, which has come out of Brazil. As a result of this new highly infectious mutation, the UK, which also hosts its own mutation of the virus, has banned international travel to and from many South American countries as well as Portugal because of its close ties to those countries.

Because there are now four highly infectious strains of COV19, it is worth taking a look at the numbers of new cases of the virus in these countries:-

The UK, which unfortunately has the most new cases, is trying to get to grips with containing the virus by initiating a third lockdown, mandating the wear of face masks in public places, and shutting down all nonessential enterprises. The median age of the UK is 41 and the life expectancy is 81, with older people being highly susceptible to this virus. …


Image for post
Image for post

As I have been studying forecasting time series analysis, I happened across Jason Brownlee’s website and attempted the tutorial he posted. As an initial step, I copied and posted the code into a Google Colab Jupyter Notebook and, try as I might, I was unable to get the code to work. I therefore endeavored to use the code that I already have and was able to forecast the values using Facebook Prophet and also statsmodels. …


Image for post
Image for post

For those individuals who struggle to find datasets to experiment on, the UK.gov website has many free datasets that can be easily accessed and used. One such dataset that I took an interest in was the bananas dataset, and the link to this dataset can be found here:- Banana prices — GOV.UK (googleweblight.com)

The bananas dataset gives the prices of bananas coming into the UK from 1995 until the present date. This is a time series dataset arranged by group, so is a bit tricky to make predictions on. The fact that the documentation of the different libraries is quite sparse makes it all the more difficult to make predictions on unusual datasets, which is one reason for the importance of selecting appropriate datasets and then carrying out experiments on them. …


Image for post
Image for post

There is a website on the internet that tells data scientists how they can predict on a multiple time series dataset by rearranging the dataset and performing several functions. I tried to use the code that was given, but it was not presented in a logical fashion and was very time consuming. In addition, there were no clear instructions on what the prediction actually was. After working on this methodology for a day, I had to abandon it in exasperation. If there were no clear instructions and the code was not presented in an organised fashion, giving a clear answer of what the prediction actually was, how could I use it? …


Image for post
Image for post

I have posted about Kaggle’s House Price competition a few times, but I decided that it would be a good idea to post about how to enter this competition and make a video of it from beginning to end. The reason for this is because I was very daunted when I began my studies in data science and decided to open my own Kaggle account. I had taken a few free online courses in Linux, data science and Python. I had also watched YouTube videos on Python, data science, machine learning and time series analysis.

It is a good idea to embark upon any pursuit with a period of study and reflection on what has been learned, but at some put the student will be required to put what has been learned into practice, and that is where competitive programming comes in. There are several sites that offer programming competitions, but Kaggle is one of the most well known and it is the first site I came across that offered competitions. …


Image for post
Image for post

The MNIST database (Modified National Institute of Standards and Technology database[1]) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. I have posted about MNIST twice and have provided a solution to Kaggle’s Digit Recognizer, which is based on the MNIST datasets, the link being found here:- Get started with Kaggle by entering their Digit Recognizer competition |AI In Plain English |Medium

Fashion MNIST is an alternative to MNIST, and is intended to serve as a direct drop-in replacement for the original MNIST dataset to benchmark machine learning algorithms, as it shares the same image size and the structure of training and testing splits. The reasons for swapping MNIST out for MNIST are that MNIST is too easy, it is overused, and cannot represent modern cross validation methods. …


Image for post
Image for post

Kaggle was the first data science platform that I joined when I began studying data science. The site has micro-courses to help people learn the basics while embarking on their machine learning journey. At some point, after taking courses and doing tutorials, people wanting to break into the field will need to begin working on projects. Some data scientists decide to enter competitions to help them to hone their skills, and Kaggle has many competitions that are well suited for this purpose.

What I have found with Kaggle, however, is that people for the most part are required to learn the basics of data science on their own and then they can look at entering competitions. The Titanic competition is the first competition that people are asked to enter because it is considered the easiest one. There are other competitions on the Kaggle site that a person can enter that, once a person has learned the basics of data science, are just as easy to complete as the Titanic competition. …


Image for post
Image for post

The MNIST database ( National Institute of Standards and Technology ) is a large database of handwritten digits that is commonly used for training various image processing systems. This dataset is one of the most common datasets used for image classification and accessible from many different sources.

One such place where a small version of the MNIST dataset can be accessed is through Google Colab, a free online Jupyter Notebook. …


Image for post
Image for post

I initially heard about Kaggle on a YouTube video created by Ken Jee, when he mentioned the site is a good place to learn about data science. I therefore dutifully searched the Kaggle site and took a few of their micro courses in an attempt to learn about data science and to learn how to code.

I am basically an impatient person and learn better on the job, so I would rather learn by working on a project. I therefore began my journey into data science by taking a few courses on data science, taking tutorials on Python programming, and then what seemed to me to be the huge step of entering a competition. …

About

Tracyrenee

I have over 45 years experience in the world of work, being in fast food, the military, business, non-profits, and the healthcare sector.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store