Use numpy to classify an image as being pizza or not pizza
Computer vision is a field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos. Like other types of AI, computer vision seeks to perform and automate tasks that replicate human capabilities. In this case, computer vision seeks to replicate both the way humans see, and the way humans make sense of what they see.
Modern computer vision applications are increasingly relying on deep learning algorithms. With deep learning, a computer vision application runs on a type of algorithm called a neural network, which allows it to deliver even more accurate analyses of images. In addition, deep learning allows a computer vision program to retain the information from each image it analyses — so it gets more and more accurate the more it is used.
In my last blog post I compared images of dogs and cats, and that blog post can be seen here:- https://medium.com/@tracyrenee61/classify-dog-and-cat-images-with-an-easy-numpy-neural-network-04681646e0f7
In the dog and cat blog post, I used a very small dataset to make it quicker to train the data and evaluate the model. There were only 20 images of dogs and cats, meaning there was not a great deal of data to train.
In this blog post, therefore, I intend to use more data, and have therefore selected the pizza or not pizza dataset, which can be found here:- https://www.kaggle.com/datasets/carlosrunner/pizza-not-pizza/data
I have used a neural network created from Python’s numpy library. This neural network utilises three layers, being an input layer, a hidden layer, and an output layer:-
I have written the program in Python using Kaggle’s free online Jupyter Notebook, and stored it in my Kaggle account.
Once the Jupyter Notebook was created, I imported the libraries that I would need to execute the program, being:-
- Numpy to create the neural network and perform numeric computations,
- Pandas to create dataframes and process the data,
- Random to create random numbers,
- PIL to process the images,
- Cv2 to process images,
- Os to go into the operating system and retrieve the files used in the program,
- Glob to perform file management procedures,
- Shutil to also perform file management procedures, and
- Matplotlib to visualise the data.
I then used the os library to retrieve all of the files in the directory that would be used in the program:-
I visualised two images in the dataset, one of pizza and one that is not pizza:-
I used the function load_and_preprocess_images to form the preprocessing stage of the program:-
I then defined the file paths for the images that are pizza and the images that are not pizza and processed them through the function that had previously been defined.
I stacked the pizza and notpizza data and concatenated the pizza and not pizza labels.
I shuffled the data and defined the independent and dependent variables, as being X and y respectively.
I then split the dataset into training and testing sets:-
Once the data had been processed, I developed the neural network.
I defined the architecture of the neural network by initialising the input_size, hidden_size, and output_size. I also initialised the learning rate and the number of epochs.
I then initialised the weights and biases to matrices of all zeros:-
It was at this point that I defined the two functions that would be used in the neural network, being the sigmoid function and the sigmoid_derivative.
The sigmoid function is a mathematical function having a characteristic “S”-shaped curve or sigmoid curve. It maps input values to a value between 0 and 1, making it useful for binary classification and logistic regression problems. It is commonly used as an activation function in artificial neural networks, particularly in feedforward neural networks:-
The formula for the sigmoid function is:-
The derivative of the sigmoid function describes the function’s instantaneous rate of change at a certain point, as seen in the diagram below:-
The Python code for the sigmoid function and the sigmoid derivative is cited below:-
I then trained the data into the neural network:-
After I trained the training data into the neural network, I tested it:-
In this particular example, I achieved an accuracy of 59.39%. It is not the best accuracy that could be achieved, but it gives the reader insight into the mechanics of a neural network. In this era of heightened technological know-how, there are currently pretrained models that will give much greater accuracy.
I have created a code review to accompany this blog post and it can be found here:- https://youtu.be/xbIY1rx9M3Q
