In the past blog posts I have been studying SQL in order to enhance my skillset. I have selected a bigquery dataset that can be used on a Kaggle Jupyter Notebook. The dataset, which is a recording of incidents in Austin, Texas, can be found here:- https://www.kaggle.com/datasets/jboysen/austin-incidents/data
I created a Jupyter notebook in Kaggle and saved it to my account. The program is written in Python, and the SQL commands are written in Python remarks.
After I created a Jupyter Notebook to analyse this dataset, I imported the libraries that I would need to execute the program:-
- bigquery to enable Python to work on bigquery datasets,
- Bq_helper to enable Python to work on bigquery datasets,
- Numpy to create numpy arrays and perform numeric computations,
- Pandas to create dataframes and process data,
- Matplotlib to visualise data, and
- Seaborn to statistically visualise data.
Once the libraries were imported, I formatted the dataset to work with the bq_helper library.