R Programming Data Anlaytics - Need Programming expert
Running head: CS688 – Data Analytics with R 1
CS688 – Data Analytics with R 10
CS688 – Data Analytics with R
Surendra Parimi
CS688 – Introduction to CRISP-DM and the R platform IP 1
Colorado Technical University
07/10/2019
Table of Contents Introduction to CRISP-DM and the R Platform Organizational Background 3 Organizational Background: 3 CRISP-DM(Cross-industry standard process for data mining): 3 Data Maturity: 4 Role of Data Analyst: 6 How Do we Implement the R Platform: 6 R Modeling With Regressions and Classifications (TBD) 7 Model Performance Evaluation (TBD) 8 Visualizations With R (TBD) 9 Machine Learning (TBD) 10 References 11
Introduction to CRISP-DM and the R Platform Organizational Background
Organizational Background:
The organization I currently work for and planning to implement the techniques of the data analytics course is T-Mobile USA, which offers wireless mobile phone services to 0ver 80 million customers in the United States. It’s a huge enterprise with large scale information technology systems that support the business that T-Mobile does. The company is seeing significant growth in terms of business and therefore the IT systems that are supporting the business. Myself as a DEVOPS engineer works on deploying the code to these mission critical systems, host them and operate to make sure the systems are working as expected. As the land scape of our IT systems grow, we want to be able to identify the issues in our systems in advance so that we can prevent them before causing any outage to the business. To achieve such a result, our IT systems logs needs to be analyzed in-depth to unleash the critical insights about the system performance and apply the feedback to improve our systems.
CRISP-DM(Cross-industry standard process for data mining):
The CRISP-DM helps us ensure our data analysis adheres certain standards and CRISP-DM is a proven strategy worldwide. Corporations like IBM have further enhanced and or customized the standard and came up with their own methodology knows as ‘Analytics Solutions Unified Method for Data Mining/Predictive Analytics(ASUS_DM)’
The CRISP-DM methodology involves 6 different steps
Business Understanding: Building the knowledge about business requirements and objectives from functional aspect and transforming this knowledge as a data mining objective with an implementation plan.
Data Understanding: Involves the process of data collection from diverse sources of data, review and understand the data to be able to identify the problems which compromise data quality and also give the initial understanding of what the data can deliver.
Data Preparation: The data preparation phase covers all activities to build the final dataset from the initial raw data collected.
Modeling: Modeling techniques are based on the objective of the problem being tried. So, based on the problem, model is decided and based on the model, data is collected.
Evaluation: The evaluation phase is taken up once a few models are tested and the evaluation takes places against these models to see which models fits the need best.
Deployment: Generally this will mean deploying a code representation of the model into an operating system to score or categorize new unseen data as it arises and to create a mechanism for the use of that new information in the solution of the original business problem. Importantly, the code representation must also include all the data prep steps leading up to modeling so that the model will treat new raw data in the same manner as during model development.
You may well observe that there is nothing special here and that’s largely true. From today’s data science perspective this seems like common sense. This is exactly the point. The common process is so logical that it has become embedded into all our education, training, and practice.
Data Maturity:
The data maturity in our organization has a traditional approach so far with applications writing critical data from business related, transactional to application logs which includes diverse information such as functional health of the application, and nonfunctional aspects such as performance etc., and all this data is written into RDBMS databases which has tables and rows. All this data is organized in such a way that the Data Warehouse schemas are implemented by logically segregating the data based on business functions. Over a period of time, as the systems matured, we started to realize that there is abundant of useful information in our data which is not inclusive in the RDBMS design and applied big data techniques to be able to include the unstructured or big data that has lot of critical information for our business.
Once we started including the unstructured data into our analysis, we are able to draw better insights, but it wasn’t still perfect, so we adopted additional research methodologies into the data to be able to focus on the right data (quality of data), which helps us give right insight and aids in making critical business decisions to that of better maintaining our application suite based on the insight we get from our application logs.
Once we arrived at this stage, we are able to perform predictive analyses to some extent and based on the results, we are updating our data strategy so that the right data is captured into the systems from the very beginning and a robust analysis can be done to achieve critical insights with predictability, research and classification.
A separate metadata repository has been created with logical grouping so that the data organization, utilization would be efficient and accurate. Also the metadata management helped us avoid the redundant tasks in dealing with the data and helped faster analytics. Entire teams and stake holder teams have been involved and agreed upon the metadata management policies and implemented across the organization to achieve consistency across the systems.
Role of Data Analyst:
The role of an analyst in dealing with cutting edge needs of the data analysis gets interesting and makes it a very dynamic job. The analyst while using the data would now think about what kind of data needs to be collected and makes sure to call out the missing elements of the data beforehand so that all the critical pieces of data , both structured and unstructured gets into the data analytics system. The analyst is expected to have the ability to visualize the insights to be drawn out of the data and perform relative comparisons, develop predictive models so that
To cover all the diverse insights that the data has to offer, the analyst needs to be able to think of correlation factors, and the data that is needed to correlate, compare and apply the predictive models.
The Role of the Analyst from a typical analysis to the data focused analysis would greatly enhance the overall understanding of the analyst about the organizational objectives and how to extract the desired insights out of this data and more importantly, collaborate with all the required teams to be able to get the right data to work on.
How Do we Implement the R Platform:
We utilize the R platforms robust data analytics capability in tandem with the R studio to develop and experiment our planning and implementation. R being an integrated suite, we are planning to utilize it’s computing power with statistical data manipulation, graphical display for analysis, presentations and reviews. R’s suite of operations in calculations over arrays of data especially with huge data sets would be very helpful in achieving our goal.
The linear, non-linier, time series analysis, and various statistical patterns are offered by R as libraries and it’s easy to implement R in our organization as it lays out the inroads for data analysis.
R Modeling With Regressions and Classifications (TBD)
Model Performance Evaluation (TBD)
Visualizations With R (TBD)
Machine Learning (TBD)
References
https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/
https://www.r-project.org/about.html
https://www.dataversity.net/the-5-stages-of-the-data-maturity-model/