Member-only story
Use Jax linear regression to predict on probabilities of software defects
I really enjoy working on the Kaggle community competitions because they enable me to try out new techniques in a competitive environment. In this particular competition, which is season 3 episode 23, I was making predictions on the probability of a software defect.
Python’s machine learning library, sklearn, has functionality to make predictions on probabilities on some of its models with the predict_proba() method. Not every algorithm and not every library has predict_proba, however, so I had to improvise.I read on stackoverflow that to predict on the probability is a regression, so I decided to give it a try. Therefore, instead of using the logistic regression algorithm to solve this problem, I used a linear regression algorithm.
I wrote the program in Kaggle’s free online Jupyter Notebook, which is stored in my account for that platform.
After I created the Jupyter Notebook, I imported the libraries I would need to execute the program, being:-
- Pandas to create dataframes and process data,
- Os to go into the operating system to retrieve the files used in the program,
- Sklearn to provide machine learning functionality,
- Jax to create the linear regression model,
- Matplotlib to visualise the data, and
- Seaborn to statistically visualise the data.