Business Intelligence
Data Readiness for Analytics
The quality and readiness of data determine the outcome of the analytics system and its precision. According to Shadra et al. (2020), the characteristics (metrics) that define the readiness level of data for an analytics study are data source reliability, content accuracy, accessibility, security and privacy, richness, consistency, currency/timeliness, granularity, validity, and relevancy (pg. 124). It requires a significant amount of preprocessing before starting the analytics process. “Data issues such as nonprinting characters, misspellings, text embedded in the quantitative data, interpretation, imputation, or unit conversions all must be accomplished before the data is ready for analysis” (Snyder, 2019).
Data Preprocessing Steps
The four steps in the data preprocessing are as follows: The first step is data consolidation, during which the relevant data is collected, selected, and merged (Shadra et al., 2020, pg. 129). In this process of data gathering, the significant amount of irrelevant data gets discarded, and only the relevant data is selected and integrated. The second phase is the cleansing phase, where the missing values are handled, noise in the data is identified and reduced, and the erroneous data is found and eliminated (Shadra et al., 2020, pg. 129). The significance of this step is that it brings clarity to the collected data. The third step is the data transformation phase, where the data is discretized and/or aggregated, such as converting the numerical variables to categorical values instead of using individual states with 50 different values to use several regions (Shadra et al., 2020, pg. 130). The final phase of data preprocessing is the data reduction phase. Reduction of the number of data attributes, reduction of the number of records, and skewed data is balanced during the final phase (Shadra et al., 2020, pg. 131). According to Snyder (2019), “80% of the time spent by a data scientist is on gathering, cleansing, and storing the data, while 20% of the time is spent on analyzing the data” (pg. 23). It is critical to have data preprocessed before analytics is performed for the best outcome.
References
Sharda, R., Delen, D., Turban, E. (2020). Deep learning. Analytics, data science, & artificial
intelligence: Systems for decision support (pp. 121-145). NJ, Pearson. (Sharda, 2020)
Snyder, J. (2019). Data Cleansing: An Omission from Data Analytics Coursework. Information
Systems Education Journal, 17(6), 22–29.