PPT4DescriptiveStatistics.ppt

*

  • After collecting data, researchers are faced with pages of unorganized numbers, stacks of survey responses, etc.
  • The goal of descriptive statistics is to aggregate the individual scores (datum) in a way that can be readily summarized
  • Following are several options that can be used to get a “picture” of how scores were distributed

*

*

  • Car Sales by Year in Millions
  • Bar chart is a very simple graph to construct in which allows you to display the frequency counts or percentages of a categorical variable
  • It is particularly useful for categorical variables with 2 or more levels or categories
  • It is the most effective way to compare frequency counts or percentages graphically between the categories.
  • In a bar chart,

Horizontal axis  Levels or categories of the variable of interest (categorical)

Vertical axis  Frequency counts or percentage

Year

2-*

*

  • Useful in statistical analysis
  • Also excellent for huge quantities of data

Can show patterns otherwise invisible

Each point is plotted base on the corresponding pair of values of the observation.

It could be used to spot outliers or possible errors, however it is not always apparent.

*

A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis.

The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other.

*

*

  • Report statistics and graphs depends on the types of variables of interest:

For continuous variables

N, mean, standard deviation, minimum maximum,

histograms, dot plots, box plots, scatter plots

For categorical variables

frequency counts, percentages

one-way tables, two-way tables

bar charts

*

*

  • A frequency distribution displays the number (or percent) of individuals that obtained a particular score or fell in a particular category
  • As such, these tables provide a picture of where people respond across the range of the measurement scale
  • One goal is to determine where the majority of respondents were located

*

  • Frequency

the number of individuals that obtained a particular score (or response)

  • Percent

The corresponding percentage of individuals that obtained a particular score

  • Cumulative Percent

The percentage of individuals that fell at or below a particular score (not relevant for nominal variables)

*

  • Frequency distribution showing the pizza delivery time of employees from a restaurant

*

*

*

  • The normal curve is often called the Gaussian distribution, after Carl Friedrich Gauss,), Gauss is acknowledged by Germany on the 10 Deutschmark bill.
  • From http://www.willamette.edu/~mjaneba/help/normalcurve.html

*

  • A mathematical model or and an idealized conception of the form a distribution might have taken under certain circumstances.

Mean of any distribution has a Normal distribution (Central Limit Theorem)

Many observations (height of adults, weight ofchildren, intelligence) have Normal distributions

  • Shape

Bell shaped graph, most of data in middle

Symmetric, with mean, median and mode at same point

To indicate the spread of the distribution, we use the standard deviation (SD) around the mean as the unit.

±1SD around the mean  68% obs.

±2SD around the mean  95% obs.

±3SD around the mean  99.7% obs.

*

(from http://www.music.miami.edu/research/statistics/normalcurve/images/normalCurve1.gif

*

  • Theoretical construction
  • Also called Bell Curve or Gaussian Curve
  • Perfectly symmetrical normal distribution
  • The mean of a distribution is the midpoint of the curve
  • The tails of the curve are infinite
  • Mean of the curve = median = mode
  • The “area under the curve” is measured in standard deviations from the mean

*

Whenever you see a normal curve, you should imagine the bar graph within it.

*

75.bin

If your data fits a normal distribution, approximately 68% of your subjects will fall within one standard deviation of the mean.

Over 99% of your subjects will fall within three standard deviations of the mean.

*

76.bin

When you have a subject’s raw score, you can use the mean and standard deviation to calculate his or her standardized score if the distribution of scores is normal. Standardized scores are useful when comparing a student’s performance across different tests, or when comparing students with each other. Your assignment for this unit involves calculating and using standardized scores.

z-score -3 -2 -1 0 1 2 3

T-score 20 30 40 50 60 70 80

IQ-score 65 70 85 100 115 130 145

SAT-score 200 300 400 500 600 700 800

*

77.bin

Normal distributions (bell shaped) are a family of distributions that have the same general shape. They are symmetric (the left side is an exact mirror of the right side) with scores more concentrated in the middle than in the tails. Examples of normal distributions are shown to the right. Notice that they differ in how spread out they are..

Pearson defines Kurtosis, = , as a measure of departure from normality in a paper published in Biometrika. A distribution is platykurtic if it is flatter than the corresponding normal curve and leptokurtic if it is more peaked than the normal curve.

*

78.bin

The mean and standard deviation are useful ways to describe a set of scores. If the scores are grouped closely together, they will have a smaller standard deviation than if they are spread farther apart.

*

Small Standard Deviation

Large Standard Deviation

Different Means

Different Standard Deviations

Different Means

Same Standard Deviations

Same Means

Different Standard Deviations

Length of Right Foot

8

7

6

5

4

3

2

1

4 5 6 7 8 9 10 11 12 13 14

Data do not always form a normal distribution. When most of the scores are high, the distributions is not normal, but negatively (left) skewed.

Number of People with

that Shoe Size

Skew refers to the tail of the distribution.

*

Because the tail is on the negative (left) side of the graph, the distribution has a negative (left) skew.

Length of Right Foot

8

7

6

5

4

3

2

1

4 5 6 7 8 9 10 11 12 13 14

Number of People with

that Shoe Size

When most of the scores are low, the distributions is not normal, but positively (right) skewed.

Because the tail is on the positive (right) side of the graph, the distribution has a positive (right) skew.

*

When data are skewed, they do not possess the characteristics of the normal curve (distribution). For example, 68% of the subjects do not fall within one standard deviation above or below the mean. The mean, mode, and median do not fall on the same score. The mode will still be represented by the highest point of the distribution, but the mean will be toward the side with the tail and the median will fall between the mode and mean.

*

mean

median

mode

Negative or Left Skew Distribution

mean

median

mode

Positive or Right Skew Distribution