LessonOneIntroductiontoInferentialStatistics1.pptx

Week One

Inferential Statistics

Email: [email protected]

Cell: 941-822-1000

Appointment: text or email

1

RATIONALE BEHIND STATISTICS

This course is designed to help you build models that will help you apply the scientific method to problems while strengthening your ability to objectively analyze data. You might also have a need to conduct research in your present position, so I hope you will be able to use these tools in the near future. In any situation in which you want to make a sound decision, inferential statistics empowers you to control for random chance. By failing to control for random chance, numbers may suggest one thing when they in fact mean something else. The results of inferential statistics tests are based on probability of errors, so, based on a probability statement like p < .05, results allow us to use sample data to make inferences to larger populations. In this example, there is less than a five percent chance that an error was made in conducting the test. In other words, we would be more than 95 percent sure of our results (100 – 5 = 95).

Research conducted for the local folk is called “action research,” which is a form of applied research . Action research is not designed for large external audiences such as those who read professional journals. Action research is geared more to the needs of local folk such as faculty committees and board members. To be solid, however, action research needs to have the same component parts (i.e., introduction; literature review, methods, analysis and conclusions) that larger studies feature.

2

Applied and theoretical research

Applied

Applied research in general is used to test theory or address a problem associated with a particular situation. The results of this type of research can be published if it is good enough. Case studies could also fall into the applied category.

Theoretical

Theoretical research is typically exploratory in nature and attempts to establish an understanding of relationships among variables. This type of research is ground-breaking and invites further study to test its tentative conclusions. To some extent all research should invite further study. Theoretical studies require lots of model building and narrative. Case studies can also be used in theoretical research.

3

Induction and Deduction

Induction is looking at many pieces of evidence to arrive at a larger conclusion. Qualitative studies certainly follow an inductive trajectory.

These kinds of studies require voluminous writing (narrative) in developing the theory and showing its potentiality to the situation at hand.

These kinds of studies may still require the use of statistical analyses.

Dr. Vann’s EdD dissertation was 80 pages long, but it had five stats tests.

Dr. Vann’s PhD dissertation was 302 pages long, but it still had a lot of numbers that were evaluated with a statistical test (a chi square statistic).

In his historical geography book, In Search of Ulster Scots Land, also uses lots of numbers and a stats test, so use numbers when they are needed to objectively make inferences about cause and effect relationships. Remember in the vast majority of cases, interferential statistics tests ask one of two questions: is there a difference (causal-comparative) or is there a similarity (correlational)?

Deduction, on the other hand, starts with a hypothesis and then uses a logical system to test it. Deduction is nearly always quantitative and dependent on tests.

Quantitative Really Means Deduction

Most studies that are applied or experimental use a deductive approach designed to answer specific questions. We are always interested in knowing whether or not there is a difference or a similarity among numbers. In a deductive study, a researcher tests a hypothesis. There are two kinds of hypotheses: null and alternate (aka research).

The bottom line is this:

Statistical tests are tools that add to a study’s objectivity.

They are not the “end all.” Don’t worship them.

Don’t be afraid of them either!

Use them clarify conclusions and frame future studies.

Collecting Data

Data can be found in many places, but sometimes we have to create our own survey forms.

Survey forms need to be valid and reliable.

What do these terms mean? Valid means that the survey measures what it purports to measure, and reliable means that the survey or instrument is consistent. As a general rule, reliability is measured by a correlation test ( r = .80 or higher); whereas validity is determined by the eyes of experts on the subject.

Tests

We will learn to use Excel to conduct a number of inferential statistics tests:

Matched pairs (aka correlated samples t-tests)

Independent samples t-tests

Chi square

ANOVA (Analysis of Variance)

Regression

Spearman’s rs

Z tests

With these seven tests, it is possible analyze many problems one is likely to face in their careers. There are certainly more advanced tests out there, but they follow the same logic as these tried and tested methods.

7

Part Two: Basic Concepts

Why is important to be open to analysis involving numbers?

There are descriptive statistics and inferential statistics. Descriptive statistics (a singular noun) is a number or figure that summaries a set of data (mean and median, mode, standard deviation, and range, as well as skew).

The use of the concept of “inferential statistics” has been around for 200 years; it grew out of a need to explain political circumstances and solve environmental problems. Charts and tables have also been used to show statistical data for about the same amount of time.

Inferential statistics is a technique (aka test or tool) that allows us to generalize data generated from a sample to a larger, unmeasured population. Inferential statistics is necessary because of the need to control for chance or the probability of an error in interpreting data from a sample. It begins with analyzing actual data and ends with a statement that considers chance. The data collected, if it is a sample, must be randomly selected; otherwise, the data will reflect only a portion of the population in question. We never write conclusions in absolute terms because we are dealing with humans. We learn and are hence not always predictable. As with predicting any sort of human behavior, the statistical answers or reports are rooted in probability. With that stated, however, I am 99 percent sure that the data analyzed from the following report came from three different groups, which were analyzed by ANOVA. This does not necessarily mean a cause and effect relationship, but it does make inferences about it: “The difference in delinquency rates among adolescents is likely associated with membership in at least one of three kinds of informal social organizations (M = 2.2) F (4, 180) = 3.44, p < .01.” Let me explain how to read the numbers in the parentheses: (M = 2.2) refers to the mean infraction of the kids; F (4,180) refers to the degrees of freedom and the reference point to locate the critical statistic on a table of significance; 3.44 is the calculated test statistic resulting from the ANOVA, so if we were to consult an F table where we find critical values, it would show that the results are significant at the .01 level; hence, my 99 percent confidence in the results.

8

Skew

Skew refers to the shape of the distribution of scores or numbers in the set of data we are examining. A normal distribution is a bell-shaped curve. Its skew is neither negative nor positive.

A positive skew has more low numbers than large numbers, so its thin or scant section is tapered toward the larger numbers. A negative skew, on the other hand, has more large numbers than small numbers, and its thin, tapered section stretches toward the low numbers. Do not think of these as good or bad; they simply describe the shape of a distribution.

9

Normal Distribution

10

Positive and Negative Distributions:

11

Related concepts:

Parameter is some numerical (number) or nominal (name) characteristic of a population. When we think of how data from a sample relates to a population, we are asking about how well the data generalizes to the population. In other words, we are talking about inferences that can be made to a segment of the population.

Variables can take the form of one to three different scales: nominal (name), interval (equal distance to other points), ordinal (order or rank), or numerical or continuous scale (number that is counted from zero to positive or negative infinity); it can have decimals or fractions.

12

An example:

Let’s look at how the number 7, for example, can be used in all cases:

“Johnny wore number seven; he and the other racers were separated by seven lanes. The poor guy finished in 7th place, and it took him seven minutes to finish the mile.”

13

What are measures of central tendency and why are they important?

Measures of central tendency refer to mean, median, and mode. They describe a numerical distribution as in a Normal curve (AKA Bell Curve) or a positively or negatively skewed distribution. In a normal curve, 95 percent of all numbers will be within two standard deviations from the mean. Standard deviations refer to “the measure of the amount of variability around the mean of the distribution.”

It is a good idea to become familiar with the symbols of these concepts. A handy way to learn them is to go to the index in Spatz’s book, find the concept, and read about it. The symbol is usually presented near the narrative. The shape of the distribution is important because, together with the number of groups in the study, it can tell you what kind of test to employ. If the distribution has two or more modes, you use a non-parametric test (chi square, Spearman’s rs, Mann-Whitney U).

14

Measures of Central Tendency, Range, and How to Use Excel to Analyze Data

Part Three of Lesson

15

Setting up Excel for Data Analysis

You will need to add the statistics function to Excel. For adding Excel with Office 2010, see http://www.youtube.com/watch?v=nfv1z2ko6jk&noredirect=1. For Office 2007, see this video: http://www.youtube.com/watch?v=-ubtpQJ1smI

Once you have added statistics tool pack, you are ready to analyze data.

In the past, we had to do stats tests by hand and perhaps a calculator. Doing a t-test took up to an hour. Excel makes doing stats tests pretty easy. It frees us up to do more important things like come up with solutions to problems.

Excel does not currently have a chi square function, nor does it have a Spearman’s rs test; therefore, we will learn to do those by hand and a calculator.

16

Practice Data

Factor One Factor Two
83 42
76 90
76 80
34 79
87 78
79 79
82 87
77 75
84 82

If you have not already loaded or added the statistics analysis (tool pack) into Excel, you will have to do so now (see the videos highlighted above). Once you have added the tool pack, left click on “Data” at the top of the Excel screen. A new set of icons appears across the top of the sheet. Look to the far right and locate the “Data Analysis” icon and left click on it.

 

Now let us calculate the descriptive data for the data in the above table. Descriptive Statistics are the mean (the average number), median (the middle number), and mode (most common number). These make up what is called measures of central tendency. Other important descriptive statistics include, standard deviation, and range (highest to lowest numbers).

To calculate descriptive statistics, (A) left click on “Descriptive Statistics” and then left click “Ok.” (B) Place the cursor in the open box identified as “Input Range” and click on it. (C) Move the cursor to the upper left cell identified as “Factor One” and left click and drag the cursor to the bottom right cell that has the number “75” in it; the table will become outlined by flashing hash lines, and a series of letters and numbers ($A$1:$B$10) will appear in the Input Box. (D) Now left click on the “Labels in the First Row” box and a check mark will appear in it. (E) Now left click the “Summary statistics” box and a check will appear in it. (F) Finally, click ok and the table will be replaced with the summary box:

17

File with screen shot loaded in Week 1 Folder for step by step instructions.

18

Here are the descriptive statistics:

Factor One   Factor Two  
Mean 75.33333 Mean 76.88889
Standard Error 5.322906 Standard Error 4.626147
Median 79 Median 79
Mode 76 Mode 79
Standard Deviation 15.96872 Standard Deviation 13.87844
Sample Variance 255 Sample Variance 192.6111
Kurtosis 7.574598 Kurtosis 6.421049
Skewness -2.66706 Skewness -2.34495
Range 53 Range 48
Minimum 34 Minimum 42
Maximum 87 Maximum 90
Sum 678 Sum 692
Count 9 Count 9

It is important to note that the same process is followed for more than two columns. Also note that a line of cell on a horizontal plane is called a row.

19

Practice…

Read the related chapters in Spatz and try exercise one in this folder.

We will start class next week by discussing this.

20