Project Report
Applied Data Analytics Capstone Report
for
PROJECT NAME
Student Name
for
Course Number and Name
Professor Name
Prepared as course requirement for National Louis University
Date Approved:
_______________________
Month/Year
TABLE OF CONTENTS
1. EXECUTIVE SUMMARY
2. INTRODUCTION
3. DATA SELECTION APPROACH
4. METHODS USED
5. ANALYSIS AND RESULTS
5.1 Data Analysis Objectives
5.2 Preparatory Work – Data Editing
5.3 Preparatory Work – Data Cleaning
5.4 Preparatory Work – Data Coding
5.5 Types of Variables (Scales of Measurement)
5.6 Data Reduction
5.7 Data Visualization and Meaning
5.8 Sample Size (Training and Test Data)
5.9 Analysis Bias
6. CONCLUSION
6.1 Model Summary
6.1.1 Actual vs. Predicted Values
6.2 Coefficient Interpretation
6.3 Evaluation Metrics
6.3.1 Mean Absolute Error Result and Interpretation
6.3.2 Mean Square Error Result and Interpretation
6.3.3 Root Mean Square Error Result and Interpretation
6.3.4 R-Squared Score Result and Interpretation
APPENDICES
A. Independent and Dependent Variable Distribution Graph
B. HeatMap Plot Variable Relationships
C. Pair Plot Variable Relationships
D. Others
E. Etc.
1. EXECUTIVE SUMMARY
Summarize the data study scope and accomplishments.
2. INTRODUCTION
This section should briefly state purpose of the study, the dataset being used and what correlation is being sought based on the dataset variables (dependent and independent).
3. DATA SELECTION APPROACH
Describe the acquisition approach used to get the dataset (where it came from and rationale for choosing the data). Write a description of the most important data used for analysis in this section. This should include outlining the types of data, count, any pre-processing needed, if there were missing values and steps taken, any need for formatting, normalizing, or transforming categorical values to quantitative variables.
4. METHODS USED
Write down the methods you used to gather the data and the set up used for analysis.
5. ANALYSIS AND RESULTS
Include in this section what was analyzed and the conclusions you made from the analysis. Insert any charts you created from the data in this section. Should be based on analysis techniques that have been covered in class including: descriptive statistics, correlations, ANOVA, linear regression, and multiple linear regression.
5.1 Data Analysis Objective(s)
This section should describe the rationale for the data analysis.
Example : The dataset chosen provides information on the number of COVID19 testing and the number of individuals that were infected. The data analysis seeks to find out the ratio (correlation) of tested individuals from those who get infected with the virus.
· Independent variable – Number of tests
· Dependent variable – Number of infected cases
5.2 Data Editing
This section describes if any data was reclassified in the dataset being analyzed.
Example : The dataset contained different ways of coding gender (F, M, Female, Male, Fimale, Mal) the columns were standardized to denote M – for Males and F – for Females.
5.3 Data Cleaning
This section describes if any data was edited in the dataset being analyzed.
Example : The dataset was adjusted to remove extraneous information, the longitude and latitude features were removed as the city location was also part of the dataset.
.
5.4 Data Coding
This section describes if any data was coded in the dataset being analyzed.
Example : The dataset was recoded to clearly defined neighborhood areas by name instead of number (i.e. Neighborhood 342 was recoded to “The north side”, Neighborhood 231 was recoded to “The east side”).
5.5 Types of Variables (Scales of Measurement)
The purpose of this section is to define the variables included in the dataset and the scales of measurement.
Example : The dataset identified dependent variable represents home prices (in thousands – unit of measurement). The independent variables are average income per household (in thousands) and number of bathrooms (in single units – decimal).
5.6 Data Reduction
The purpose of this section is to outline if any invalid data (Null values) were found in the dataset and if those affected records (rows) were removed from the dataset. This section also identifies if any extraneous features (columns) were removed as a results of identified duplication.
.
5.7 Data Visualization and Meaning
Discuss in this section the distribution of your dependent and independent variables. What graphs (plots) did you use to identify any outliers in your dataset. Include any graphs used (distribution or heatmap graphs).
5.8 Sample Size (Training and Test Data)
The purpose of this section is to identify the chosen split between the test and training data during your model building. What is the rationale of your percentage split of the dataset?
This section would include a copy the code used to train the model which shows the data split between training and test subsets.
5.9 Analysis Bias
The purpose of this section is to identify any Bias in the dataset. Outline what different kinds of Bias can be introduced in a dataset.
6. CONCLUSION
The purpose of this section is to outline conclusions made out of the analysis of the chosen dataset. Does the independent variable impact the dependent variables? If so provide some examples of the data.
6.1 Model Summary
The purpose of this section is to identify the model approach used and any other conclusions pertinent to the analysis.
6.1.1 Actual vs. Predicted Values
The purpose of this section is to outline/list the actual vs. predicted values of your model. Are the differences between the actual and predicted values within acceptable range(s), or too far apart? What does the difference tell us?
6.2 Coefficient Interpretation
The purpose of this section is to list the analysis coefficient and what it means in relation to the dataset variables (dependent and independent variables).
6.3 Evaluation Metrics
The purpose of this section is to list the different evaluation metrics that can be used in the analysis. What does each metric mean?
6.3.1 Mean Absolute Error Result
The purpose of this section is to list the Mean Absolute Error of your analysis.
6.3.2 Mean Square Error Result
The purpose of this section is to list the Mean Square Error of your analysis.
6.3.2 Root Mean Square Error Result
The purpose of this section is to list the Root Mean Square Error of your analysis.
6.3.2 R-Square Score Result
The purpose of this section is to list the R-Square Score of your analysis.
APPENDICES
Data Analysis Reports use appendices to include supporting data, additional detailed graphs, plots, etc., or more information than what is suggested in the main text of the document. While no appendices are required for this final assignment, the following are examples of what can be included as appendices.
Appendix A
Independent and Dependent Variable Distribution Graph
Appendix B
HeatMap Plot Variable Relationships
Appendix C
Pair Plot Variable Relationships
Appendix D
Others