QUANT homework

profilemilesss
ExcelPracticeSpreadsheetNumericalSummaries.xlsx

Introduction

Numerical Summaries: summarize the data set
Parameters : calculated using population data
Statistics: calculated using sample data
with a well defined random sample, can be used to estimate the population parameter
Examples:
1 variable numerical summaries
measures of the center (mean, median, mode)
Measures of spread (variance, standard deviation, range, IQR, standard error)
Skewness: how symmetric is the distribution? (perfect symmetric then skewness = 0)
positive (right skewed)
negative (left skewed)
Kurtosis: how much probability is in the tails of the distirbution?
usually compares to Normal distirbution tails (kurtosis = 3)
Skewed distributions Kurtosis
2 variable numerical summaries (will get to these later)
correlation
coefficient of determination
Numerical summaries of relative position of data values
z-scores
percentiles

Symbols

Descriptive Statistics

Can use Data Analysis Toolpak in Excel (File … Options … Data Analysis Toolpack … Go)
Data … Data Analysis…Descriptive Statistics
Age
80
91 Age
77
85 Mean 79.1046728209
71 Standard Error 0.9521947114
89 Median 79.2189722038
82 Mode ERROR:#N/A
79 Standard Deviation 9.5694384163
74 Sample Variance 91.5741516027
68 Kurtosis 0.0747918953
103 Skewness 0.2033261591
92 Range 47.803631584
85 Minimum 55.4111721156
81 Maximum 103.2148036997
82 Sum 7989.5719549096
64 Count 101
81
84
100
86
83
67
84
78
63
70
85
99
93
79
88
69
86
103
85
73
61
71
89
98
71
81
83
81
69
65
77
82
80
78
60
71
71
73
86
84
79
77
85
72
83
89
85
91
73
76
87
73
86
85
78
94
79
76
79
76
64
78
65
71
72
68
81
84
77
68
70
82
66
81
82
79
55
99
68
74
80
76
89
69
79

Frequency Distribution Table1

Un-grouped Frequency Distribution Table
Class Frequency
Dr77: Frequency: Unit counts for each class
Percent
Dr77: Percent: =100*(Frequency/ Frequency Total)
Relative Frequency
Dr77: Relative Frequency: = Frequency / Total Frequency
Cummulative Frequency
Dr77: Cumulative Frequency: = Sum of all prior and the current Frequency
Plastic 9 25.7142857143
Wheat 7 20
Insect 12 34.2857142857
Metal 2 5.7142857143
Liquid 5 14.2857142857
Total
Dr77: Total: Use =Sum feature in Excel
35

Additional table columns can be added beyond just the Frequency column

Percent Plastic Wheat Insect Metal Liquid 25.714285714285712 20 34.285714285714285 5.7142857142857144 14.285714285714285

z-score

Standard deviation unit distance a data value is from the mean of the distribution
You need
mean
standard deviation
Age Z-score
80
91
77
85
71
89
82
79
74
68
103
92
85
81
82
64
81
84
100
86
83
67
84
78
63
70
85
99
93
79
88
69
86
103
85
73
61
71
89
98
71
81
83
81
69
65
77
82
80
78
60
71
71
73
86
84
79
77
85
72
83
89
85
91
73
76
87
73
86
85
78
94
79
76
79
76
64
78
65
71
72
68
81
84
77
68
70
82
66
81
82
79
55
99
68
74
80
76
89
69
79

percentile

Find the value where x% of the data values fall bellow
Use PERCENTILE in Excel
write percentages in decimal format
Age
103 10th percentile 67.714681164
103
100 23rd percentile
99
99 79th percentile
98
94
93
92
91
91
89
89
89
89
88
87
86
86
86
86
85
85
85
85
85
85
85
84
84
84
84
83
83
83
82
82
82
82
82
81
81
81
81
81
81
80
80
80
79
79
79
79
79
79
79
78
78
78
78
77
77
77
77
76
76
76
76
74
74
73
73
73
73
72
72
71
71
71
71
71
71
70
70
69
69
69
68
68
68
68
67
66
65
65
64
64
63
61
60
55

SAT Subject Test Percentiles

Quartiles

Use QUARTILE in Excel
5 Number Summary
Age Q1: Quartile 1 72
80 Q2: Quartile 2 (Median)
91 Q3: Quartile 3 85
77 Minimum (=Min()) 55
85 Maximum (=Max()) 103
71
89
82 Interquartile Range (IQR) = Q3 - Q1 13
79 range over 50% of data values
74
68 Use the IQR to identify outlier boundaries
103
92 Upper Bound = Q3 + 1.5(IQR) 105
85 Lower Bound = Q1 - 1.5(IQR) 52
81
82 Question: Are there outliers in the Age dataset?
64
81
84
100
86
83
67
84
78
63
70
85
99
93
79
88
69
86
103
85
73
61
71
89
98
71
81
83
81
69
65
77
82
80
78
60
71
71
73
86
84
79
77
85
72
83
89
85
91
73
76
87
73
86
85
78
94
79
76
79
76
64
78
65
71
72
68
81
84
77
68
70
82
66
81
82
79
55
99
68
74
80
76
89
69
79

Select Data ... Insert ... Recommended Charts ... All Charts ...Box and Whisker

=QUARTILE(Data, quartile number)

histogram

Create a Histogram to see shape of distribution ( 2 ways to do this in Excel) 2) Use Insert Menu in excel
Age Bin
80 60 Bin Frequency
91 70 60 1
77 80 70 18
85 90 80 34
71 100 90 37
89 110 100 8
82 110 3
79
74
68
103
92
85
81
82
64
81
84
100
86
83
67
84
78
63
70
85
99
93
79
88
69
86
103
85
73
61
71
89
98
71
81
83
81
69
65
77
82
80
78
60
71
71
73
86
84
79
77
85
72
83
89
85
91
73
76
87
73
86
85
78
94
79
76
79
76
64
78
65
71
72
68
81
84
77
68
70
82
66
81
82
79
55
99
68
74
80
76
89
69
79

1) Data ... Data Analysis ... Histogram

Histogram

Frequency 60 70 80 90 100 110 1 18 34 37 8 3

Bin

Frequency

Upper Bound for Classes

histogram2

Use Histograms to visualize approximate shapes of data distirbutions

Other Charts 1

Categorical
Bar Chart
Pie Chart Pareto

Other Charts 2

Quantitive
Line Graph
Ogive (Cumulative Frequency Polygon)
Use Cumulative Frequency column of a frequency distribution table in Excel when graphing

To select disjoin columns in Excel, hold down your CTRL button while selecting the columns

empirical rule

Need a bell-shaped distribution to use the Empirical Rule
Age
80 Age
91
77 Mean 79.1046728209
85 Standard Error 0.9521947114
71 Median 79.2189722038
89 Mode ERROR:#N/A
82 Standard Deviation 9.5694384163
79 Sample Variance 91.5741516027
74 Kurtosis 0.0747918953
68 Skewness 0.2033261591 Question1: Approximately what percent of the age data is within 1 standard deviation unit of the mean?
103 Range 47.803631584 68% of the data values
92 Minimum 55.4111721156
85 Maximum 103.2148036997
81 Sum 7989.5719549096 Question2: Within what interval are approximately 68% of the age data values? (Hint: need standard deviation of the distirbution)
82 Count 101 upper bound 88.6741112372
64 lower bound 69.5352344046
81
84 Approximately 68% of the age data will fall between 69 years to 89 years
100
86
83
67
84
78
63
70
85
99
93
79
88
69
86
103
85
73
61
71
89
98
71
81
83
81
69
65
77
82
80
78
60
71
71
73
86
84
79
77
85
72
83
89
85
91
73
76
87
73
86
85
78
94
79
76
79
76
64
78
65
71
72
68
81
84
77
68
70
82
66
81
82
79
55
99
68
74
80
76
89
69
79