Member-only story
I have been spending the past few days working on a book recommender system and have been using a Kaggle dataset for this purpose. The dataset I have used can be found here:-https://www.kaggle.com/datasets/zygmunt/goodbooks-10k?datasetId=1938&sortBy=dateRun&tab=profile
Unfortunately, recommender systems are very memory intensive and I am now unable to view or edit the code I have written in Kaggle because the system keeps crashing. I did manage to copy and paste a bit of code for this blog post, so I decided to use it so as to do something productive with the time I spent working on this project.
The model that I used to form the recommendation system was sklearn’s nearest neighbors. Nearest Neighbors provides functionality for unsupervised and supervised neighbors based learning methods. The principle behind Nearest Neighbors is to find a predefined number of training samples closest in distance to the new point, and predict a label from these.
I created the program in Kaggle’s free online Jupyter Notebook. Once I created the program, I imported the libraries that I would need to execute the program. The libraries that I imported are:-
- Numpy, which performs numerical computations,
- Pandas, which creates and manipulates dataframes,