In the past couple of weeks I have been st udying probabilities, so it was a pleasant surprise that this month’s Kaggle competition also concerned probabilities.
The problem statement for this month’s Kaggle competition was to calculate the probability that someone would be given a loan. The dataset used had both categorical and numerical values, so the categorical values needed to be encoded before they could be put into the model. The target had a very large class imbalance, so I used a tree based model, ExtraTreesClassifier, in an attempt to address that issue.
I wrote the algorithm in Kaggle’s free Jupyter Notebook, whichI saved to my account.
Once I had created the Jupyter Notebook, I imported the libraries that would be needed to execute it, being:-
- Numpy to create numpy arrays and carry out numerical computations,
- Pandas to create dataframes and process data,
- Os to go into the operating system,
- Scipy to perform statistical and scientific computations,
- Sklearn to provide machine learning functionality,
- Matplotlib to visualise the data, and
- Seaborn to statistically visualise the data.