Member-only story
Practicalities involved in performing an Exploratory Data Analysis (EDA) in Python
When endeavouring to make predictions on data, it is important to perform an exploratory data analysis (EDA) on that data in order to glean any information available on that data.
An EDA is a data analytics process to understand the data in depth and learn the different characteristics of that data, often with visual means. This allows the user to get a better feel of the data and find useful patterns in it. An EDA helps the user to learn the patterns inherent in the data, those variables that are important, and those that do not play a significant role in the output. Some variables may have correlations with other variables. In addition, any errors need to be identified and corrected.
The steps involved in an exploratory data analysis (EDA) are:-
- Collect the data. Data can be collected individually or found on data science websites, such as Kaggle.
- Clean the data by removing unwanted variables and values from the dataset and get rid of any irregularities. This involves removing missing values, outliers, and unnecessary rows and columns, as well as reindexing and reformatting the data.
- Performing a univariate analysis on only one variable, which can employ either graphical or non-graphical means.