One of the great things about Kaggle competitions is the fact that they enable a person to develop his or her data science skills. The most recent Kaggle playground competition, being season 3 episode 25 is a regression problem that makes predictions on the hardness of minerals. The link to the competition can be found here:- https://www.kaggle.com/competitions/playground-series-s3e25/overview
Although there are other models that I could have used, I decided to use Jax’s linear regression model to solve the problem in order to show the different ways that Jax can be used. JAX is basically a Just-In-Time (JIT) compiler focused on harnessing the maximum number of FLOPs to generate optimised code while using the simplicity of pure Python. Some of the salient features of JAX are: Just-in-Time (JIT) compilation.
I have not used the JIT in this problem because it is not a large dataset that has been used. One thing that I have found is that when using Kaggle, larger datasets will crash when using Jax’s more sophisticated functions. Therefore, just to keep it simple, I have used Jax functions that are compatible with numpy. Although the numpy API is similar to Jax, it is not an exact duplication, so some alterations to numpy code have to be made to make it compatible with Jax.
I have written the program in a free online Jupyter Notebook hosted by Kaggle and have saved it into my Kaggle account.
When the Jupyter Notebook was created, I imported the libraries that I would need, being:-
- Pandas to create dataframes and process dat,
- Jax to create the model and perform numerical operations,
- Sklearn to provide machine learning functionality,
- Os to go into the operating system and retrieve the files used in the program,
- Matplotlib to visualise the data, and
- Seaborn to statistically visualise the data.
I then used Jax.random to set up a random number for the program:-