Exploratory Data Analysis – An Insight?
As John Tukey, American mathematician best known for development box plot and the Fast Fourier Transform algorithm says exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as the things we believe might be there. Large organisations are faced with this constant challenge to excel. In this endeavour, they haphazardly plan their analytics not knowing where and what they are looking for. Exploratory study on data analytics helps to assess the companies available data and does an audit of the existing data to understand what are the gaps. Turning a mountain of data into a molehill of suitable decisions is no easy task.
Exploratory data analysis is the first step in data analysis which starts with assessing the data sets available with the company to make comprehensible decisions. Here structured / unstructured data are explored for assessing patterns, characteristics etc. The reason for this process is to identify trends or points of further analysis. Data exploration helps to create more structured data sets than just the number crunching. A successful Exploratory data analysis combines both internal and external data as they provide the context required to develop a model and interpret the results more accurately.
Data exploration is an art as well as science. While machine learning in its advanced form, helps us to bypass the method of exploratory data analysis, it has been observed that lack of proper analysis of data sets will lead to only crystal ball gazing rather than exploring the reason and the underlying problems. Exploratory data analysis often provides unpredictable insights, some that benefits many stake holders. This form of analysis is more than just making vague predictions. It is about understanding your data and taking steps in the direction of data mining. It is a method of avoiding inaccurate models or even building accurate models on wrong data. There are many advantages to exploratory data analysis. They help spot out missing or wrong infromation
At Nu-Pie, we believe the CRISP-DM framework is best suited to do exploratory analysis on a large organisation. Our in-house developed methodology has various stages involved in EDA.
1. Business Understanding – We strongly emphasise a strong understanding of the business processes is important to do exploratory analysis. This will help in providing insights with a strong business perspective. Further, it can be used as the foundation for implementing analytics.
2. Data Understanding – A very prominent step to gain a thorough understanding of data by the analyst. This helps the analyst to get familiar with the data and weed out any data quality issues in the data. A signification part of the data understanding is to mapping the internal processes and identifying relevant data generated from such processes. During this stage, an evaluation of the optimal data extraction method is conducted.
3. Modelling – The data preparation phase is building final data from raw data. At this stage possible external data that can be used for additional Analytics is strongly considered. Several tests like univariate and bivariate tests on the data are conducted to understand its significance and correlation with other data.
4. Evaluation – In the evaluation mode, the analyst tests and choose models based on the above learnings that can best deliver results.
5. Deployment – On successful evaluation of the model, it is deployed and constant feedback is taken from the users. This feedback is used to improve the model and achieve better insights.
Successfully exploring the data will ensure that companies don’t miss out on opportunities that help leverage their position in the market.
An approximate answer to a right problem is worth a good deal than an exact answer to an approximate problem – John Tukey
Author: Jerrin Thomas; Co-author: Benila Jacob