Project Report

profilep_patel359
FinalProject_Template.docx

Applied Data Analytics Capstone Report

for

PROJECT NAME

Student Name

for

Course Number and Name

Professor Name

Prepared as course requirement for National Louis University

Date Approved:

_______________________

Month/Year

TABLE OF CONTENTS

1. EXECUTIVE SUMMARY

2. INTRODUCTION

3. DATA SELECTION APPROACH

4. METHODS USED

5. ANALYSIS AND RESULTS

5.1 Data Analysis Objectives

5.2 Preparatory Work – Data Editing

5.3 Preparatory Work – Data Cleaning

5.4 Preparatory Work – Data Coding

5.5 Types of Variables (Scales of Measurement)

5.6 Data Reduction

5.7 Data Visualization and Meaning

5.8 Sample Size (Training and Test Data)

5.9 Analysis Bias

6. CONCLUSION

6.1 Model Summary

6.1.1 Actual vs. Predicted Values

6.2 Coefficient Interpretation

6.3 Evaluation Metrics

6.3.1 Mean Absolute Error Result and Interpretation

6.3.2 Mean Square Error Result and Interpretation

6.3.3 Root Mean Square Error Result and Interpretation

6.3.4 R-Squared Score Result and Interpretation

APPENDICES

A. Independent and Dependent Variable Distribution Graph

B. HeatMap Plot Variable Relationships

C. Pair Plot Variable Relationships

D. Others

E. Etc.

1. EXECUTIVE SUMMARY

Summarize the data study scope and accomplishments.

2. INTRODUCTION

This section should briefly state purpose of the study, the dataset being used and what correlation is being sought based on the dataset variables (dependent and independent).

3. DATA SELECTION APPROACH

Describe the acquisition approach used to get the dataset (where it came from and rationale for choosing the data). Write a description of the most important data used for analysis in this section. This should include outlining the types of data, count, any pre-processing needed, if there were missing values and steps taken, any need for formatting, normalizing, or transforming categorical values to quantitative variables.

4. METHODS USED

Write down the methods you used to gather the data and the set up used for analysis.

5. ANALYSIS AND RESULTS

Include in this section what was analyzed and the conclusions you made from the analysis. Insert any charts you created from the data in this section. Should be based on analysis techniques that have been covered in class including: descriptive statistics, correlations, ANOVA, linear regression, and multiple linear regression.

5.1 Data Analysis Objective(s)

This section should describe the rationale for the data analysis.

Example : The dataset chosen provides information on the number of COVID19 testing and the number of individuals that were infected. The data analysis seeks to find out the ratio (correlation) of tested individuals from those who get infected with the virus.

· Independent variable – Number of tests

· Dependent variable – Number of infected cases

5.2 Data Editing

This section describes if any data was reclassified in the dataset being analyzed.

Example : The dataset contained different ways of coding gender (F, M, Female, Male, Fimale, Mal) the columns were standardized to denote M – for Males and F – for Females.

5.3 Data Cleaning

This section describes if any data was edited in the dataset being analyzed.

Example : The dataset was adjusted to remove extraneous information, the longitude and latitude features were removed as the city location was also part of the dataset.

.

5.4 Data Coding

This section describes if any data was coded in the dataset being analyzed.

Example : The dataset was recoded to clearly defined neighborhood areas by name instead of number (i.e. Neighborhood 342 was recoded to “The north side”, Neighborhood 231 was recoded to “The east side”).

5.5 Types of Variables (Scales of Measurement)

The purpose of this section is to define the variables included in the dataset and the scales of measurement.

Example : The dataset identified dependent variable represents home prices (in thousands – unit of measurement). The independent variables are average income per household (in thousands) and number of bathrooms (in single units – decimal).

5.6 Data Reduction

The purpose of this section is to outline if any invalid data (Null values) were found in the dataset and if those affected records (rows) were removed from the dataset. This section also identifies if any extraneous features (columns) were removed as a results of identified duplication.

.

5.7 Data Visualization and Meaning

Discuss in this section the distribution of your dependent and independent variables. What graphs (plots) did you use to identify any outliers in your dataset. Include any graphs used (distribution or heatmap graphs).

5.8 Sample Size (Training and Test Data)

The purpose of this section is to identify the chosen split between the test and training data during your model building. What is the rationale of your percentage split of the dataset?

This section would include a copy the code used to train the model which shows the data split between training and test subsets.

5.9 Analysis Bias

The purpose of this section is to identify any Bias in the dataset. Outline what different kinds of Bias can be introduced in a dataset.

6. CONCLUSION

The purpose of this section is to outline conclusions made out of the analysis of the chosen dataset. Does the independent variable impact the dependent variables? If so provide some examples of the data.

6.1 Model Summary

The purpose of this section is to identify the model approach used and any other conclusions pertinent to the analysis.

6.1.1 Actual vs. Predicted Values

The purpose of this section is to outline/list the actual vs. predicted values of your model. Are the differences between the actual and predicted values within acceptable range(s), or too far apart? What does the difference tell us?

6.2 Coefficient Interpretation

The purpose of this section is to list the analysis coefficient and what it means in relation to the dataset variables (dependent and independent variables).

6.3 Evaluation Metrics

The purpose of this section is to list the different evaluation metrics that can be used in the analysis. What does each metric mean?

6.3.1 Mean Absolute Error Result

The purpose of this section is to list the Mean Absolute Error of your analysis.

6.3.2 Mean Square Error Result

The purpose of this section is to list the Mean Square Error of your analysis.

6.3.2 Root Mean Square Error Result

The purpose of this section is to list the Root Mean Square Error of your analysis.

6.3.2 R-Square Score Result

The purpose of this section is to list the R-Square Score of your analysis.

APPENDICES

Data Analysis Reports use appendices to include supporting data, additional detailed graphs, plots, etc., or more information than what is suggested in the main text of the document. While no appendices are required for this final assignment, the following are examples of what can be included as appendices.

Appendix A

Independent and Dependent Variable Distribution Graph

Appendix B

HeatMap Plot Variable Relationships

Appendix C

Pair Plot Variable Relationships

Appendix D

Others