Statistics job

profileoshimili
statistics.docx

Introduction:

The following materials are presented as supplementary support for the content presented in class discussions. These materials have been drawn from edited and revised materials I have used for other research and statistics courses. The information has been organized to provide learners with additional information, or review data, for the content covered in class that I consider to be of importance. I have prepared them, not as a substitute for a test exactly, but as a translation of some of the arcane language typically found in statistics textbooks and as a way of including research design content which is often excluded in such texts. These materials are fair game for inclusion in examinations and for background to classroom discussions. I have included all of this stuff, which some might argue doesn’t belong in a statistics course because it is very difficult for me to separate the issues of research from the issues involved in statistics. Statistics, after all, simply represent tools used in research of one type or another and the selection and interpretation of statistics is the meat of quantitative research decision-making. An understanding, therefore, of the purposes and conceptual underpinnings of statistics requires at least a working knowledge of research and research terminology.

The logic behind the organization of these materials parallels the logic in the topical sequence found in the syllabus. It should be relatively easy to identify the chunks of content relevant to each major section of the course and to gauge you reading and preparation accordingly.

I. The Research process/scientific method: An overview and/or review

A. The process of “inquiry” in nursing is involved with the procedures used to acquire and adapt knowledge necessary to the efficient and effective delivery of clinical, educational and administrative nursing activities. The philosophy of science suggests that there are many “ways of knowing,” all of which are applied by practitioners at different times for different purposes. These “ways of knowing” include at least the following:

1. Tradition: We “know” that the way to do something is “this way” because we’ve always done them “this way.”

2. Authority: We know things because “so and so” says that something is so. Or...we do things this way because that’s the way we’re told to do them.

3. Experience (or trial and error): I do things this way because it seems to work best for me.

4. Logic: We do things this way because it seems logical or reasonable to do them this way. This “way of knowing” also involves the systematic application of logical procedures:

a. Inductive logic: proceeding from a single observation (the specific) to broad generalizations (the general).

b. Deductive logic: proceeding from a general principle to a specific conclusion (from the general to the specific).

5. Research: We do things this way because empirical data exist which demonstrate that this is a good way to do things.

B. Research might be defined as the systematic application of logical procedures to problem solving and information generation using the scientific method. The scientific method, you’ll recall is the systematic procedure applied in research. It consists of four steps which are followed, in sequence, when approaching a problem scientifically:

Problem------- Hypothesis--------Data Collection-------- Conclusion

Identification Generation & Analysis

Research, or scientific inquiry, is distinguished from other ways of knowing by at least four specific characteristics:

1. A concern with theory -- building or testing;

2. Empiricism -- the collection of data based upon objective, verifiable reality (which makes scientific investigations possible to replicate);

3. Generalizeability -- the ability to apply research conclusions in situations, and to subjects and settings, beyond the specific confines of the research setting; and

4. Order and control -- “order” meaning systematic and “control,” a very large concept in research design, referring to procedures implemented which limit the alternative explanations for research results and enhance the confidence in research findings.

II. Research Designs

A. Research designs are the systematic plans for the conduct of research. Different plans, or designs, are appropriate for different kinds of research. These designs are presented here because a part of the process of selecting a statistical procedure involves knowledge of the design employed. The “typology” which follows presents a brief overview of the concepts and terminology used to describe research. While a lengthier discussion is included later in these materials, I thought a picture of the major categories of research might make it a little easier to review the information:

Research Procedures

Quantitative Qualitative

(a different course)

Experiments Non-experiments

True Quasi- Descriptive Comparative Correlational

B. The research design: Okay, with the above picture in mind, the research design (the systematic plan) should:

1. Provide a set of step-by-step procedures which gives the researcher guidance on what things to do, in what order to do them, and how to go about organization the information for analysis; and

2. Provide the control necessary to insure that:

a. The research results are generalizeable to a meaningful population;

b. The “treatment” (if there is one) or the procedures (if not experimental research) are sufficiently well described to permit application; and

c. The researcher has confidence that the results were, in fact, due to the experimental treatments or other hypothesized variables and not to chance variance (error).

C. Experimental research designs: When most texts and/or researchers discuss design, most of the discussion surrounds designs intended for use in settings where an experiment of some kind is undertaken. By “experiment,” I mean that the researcher sets out to identify the effect of two or more alternative approaches to the problem. These approaches represent the independent variable(s) in research. While many pieces of nursing research involve description, comparison or correlation rather than experimentation, we will begin by talking about designs appropriate to the conduct of experiments.

1. True experiments have three things in common:

a. Manipulation of the independent variable: In plain English, this means that the researcher administers some treatment to some subjects and varies or withholds that treatment for some other subject. This is the characteristic critical to identifying a research design as an experiment.

b. Randomization: Equivalence of groups (treatment and control groups, for instance) is assured through random assignment of subjects to treatment conditions (as opposed to using already existing groups as treatment and control groups).

c. Control: This is a large concept in research that means the researcher “controls” for the effects of a variety of things which may represent alternative explanations or rival hypotheses for the results. The most visible control procedure is the use of a control or comparison group of some type or another.

2. These three characteristics are not always possible, or even desirable, to arrange, even in studies involving experiments, so a variety of designs have been developed which endeavor to approximate the characteristics of true experiments. They are better or worse depending upon the degree to which they control for the effect of various threats to validity and are characterized as “Quasi-experiments.” Quasi-experiments lack at least one of the characteristics of true experiments. Typically, the missing ingredient is the random assignment of subjects to groups. Manipulation of the independent variable(s), however, is always present.

D. Some considerations regarding designs for experiments:

1. Research designs "evolve" based upon the needs of a particular investigation. In most research situations, true experimental rigor is the goal, but it is often not possible to attain. Rigor, here, refers to the number and importance of threats to validity which are controlled for by the features of the design. As a way of demonstrating what different designs "look like," I have reproduced and discussed three designs.

2. For all of these designs, I have borrowed and tinkered with the conventions of Campbell and Stanley (Experimental and Quasi- experimental Designs for Research, 1963). To insure that we are all discussing the same things, the following are the symbols used:

R = Random assignment to treatment or control conditions;

X =An experimental treatment (independent variable) of some type;

O = An Observation or measurement of the dependent variable;

3. A pre-experimental design: The One-shot Case Study ("Pre- experimental," here, refers to a design which tries to be an experiment, but fails):

X O

This design represents a single group which received some treatment presumed to cause change. In this design, one measure of the dependent variable was taken following the implementation of the treatment. This design has such a total absence of control as to be of little or no value. It is still used, however, in some research and should be understood, if only as a bad example of research design. An example might be represented by a nurse researcher in an adult diabetic treatment program who wished to demonstrate that patient education influenced affect among patients. To test this hypothesis, s/he designed and delivered a patient education program and then administered some measure of affect (emotion/feeling) and concluded that the positive affect noted was due to the patient education program.

Problems include:

a. No manipulation of the independent variable -- the treatment occurred and was received by all subjects;

b. No comparison group;

c. No randomization -- with no comparison group, no random assignment is possible.

With the absence of control represented by these problems, it is possible to conclude that the patient education program influenced affect, but it is equally possible to conclude that the affect noted was the result of dozens of other uncontrolled variables and/or occurrences.

4. A quasi-experimental design (The non-equivalent control group design):

X O

O

This is a widely used design in nursing and education. It is widely used because in a large proportion of the research conducted in these disciplines, random assignment to treatment or control conditions, or the use of other procedures to insure equivalence of the groups is not possible. A great deal of experimental research, then, must be completed with "intact groups." This is a quasi-experimental design because it fails to meet the criterion for true experiments of randomization. This design, however, does involve manipulation of the independent variable and uses a control group as the main control feature.

5. A true experimental design (The posttest only control group design)

R X O

R O

This design is nearly identical to the non- equivalent control group design, but is a true experimental design because the researcher has added the dimension of random assignment of subjects to treatment or control conditions. While certain threats to validity potentially exist, this design controls for the effects of many of the major "rival hypotheses" or "alternative explanations" and may be referred to as "rigorous."

E. Research design considerations for non-experimental research....and/or additional "mechanical considerations" in designing research.

1. One of the purposes of a research design is to specify the "mechanical" activities that influence how the study is to be conducted. Such things as the ways in which data are to be analyzed are included in this category of design considerations. When no experiment is to be conducted, these additional mechanical steps may constitute the entire design. There are several considerations that influence these things:

a. These additional considerations are very language based. That is to say, you need to understand the terms used by researchers to describe what they did before you can effectively critique what they did ... or ... you have to understand ways to describe your intent before you can effectively select an appropriate statistical procedure. Some of these terms and their meanings are:

1.) Factorial designs: This term describes research which looks at more than a single variable, or factor, simultaneously. The term applies to a method of allocating subjects and their data to various groups.

a.) Take, for example, a study in which the researcher is interested both in the effects of patient education and private room placement on patient satisfaction. The researcher could use a factorial design to investigate the effects of both factors by assigning subjects to one of four groups, or analyzing data separately for subjects who fell into the cells if random assignment was impossible to arrange. The groups and the diagram of such a study might look like this:

Patient No Patient

__Education__Education___

Private Row

Room Total

_______________________

Semi-

Private Row

Room Total

_____________________

Column Column

Total Total

b.) Mean data on the dependent variable (patient satisfaction) could be entered in the four cells which would allow the researcher to look at the various effects of the possible combinations of the factors under consideration. These analyses would allow the researcher to look at:

Main Effects: The differences between groups when only considering one of the independent variables (factors) at a time may be analyzed by comparisons of the row totals (for the main effect of Room Placement) and then the column totals (for the main effect of patient education).

Interaction Effects: Comparing each cell to each other cell allows the researcher to see what effect the interaction, or combination of the two variables have on the outcome. In other words, the researcher can determine if some combination of the independent variables is preferable in some way to some other combination. Does that make any sense?

c.) In most cases where multiple factors are of interest, however, one or more of the variables may not be under the control of the researcher -- things like age, gender, wellness, diagnosis, etc.. These variables, things the researcher is interested in, but can't control, may be looked at and described as "blocking variables." This term does not imply that the variables "get in the way," but that for analytical purposes these variables form blocks of data that may be analyzed separately. When the researcher feels that these variables are important in a particular piece of research, s/he can investigate the effects of the variable by using a block design which treats the variable as a factor, even though true randomization is not possible.

Such a design might look like the following drawing if the previously described research were looking at the interaction of gender and diagnosis relative to patient satisfaction:

Acute Chronic

___Illness______Illness__

Males Row

Total

_____________________

Females Row

Total

_____________________

Column Column

Total Total

The same analyses as those

performed on a factorial design would be performed here and information regarding main and interaction effects (information gleaned through statistical techniques) could be used in research conclusions.

III. Some Additional Terminology Associated with the Design of Research (Some of this stuff is information we may not have time to discuss specifically in class, but which I think might be useful to you as the semester rolls along...let me know if any of this information is hopelessly muddled).

A. Longitudinal vs. cross-sectional studies

1. Technically, all studies are longitudinal, since the process of collecting data, implementing treatments, etc. requires time, but a more specific meaning is usually attached to these terms.

2. Longitudinal studies: These are studies that look at the the effect of independent variables on dependent measures over time -- usually a fairly long period of time, but the cut-off point seems to be about six months. There is a fairly famous study conducted at the University of Michigan that is a good example of longitudinal studies. In this study, begun around 1950, is still going on. In this study, a "cohort" (group) of identical twins who were reared apart from each other was identified. Extensive, on-going data collection was begun in 1950 to begin to identify whether "nature" (the genetic determinants shared by the twins) or "nurture" (the environment in which they were raised) contributed more to their personalities, life successes, etc..

3. Cross-sectional studies: Studies that are interested in dependent variable measures at only a single point in time and in which the implementation of the independent variable(s) doesn't take too long -- say, less than six months. So a study which is time limited to less than six months is usually considered a cross- sectional study. When multiple measures are conducted over a short period of time, incidentally, these are still considered cross- sectional and are given the somewhat descriptive title of "interrupted time series," or "interrupted multiple measures" designs.

B. Cohort Studies: Cohort simply means "group." A cohort study is a study in which a group( or subgroup) is studied because of a specific interest in that group. Cohort studies are usually longitudinal studies, incidentally, but not always (the "twins reared apart study" is a cohort study). Another example might be represented by a researcher who is interested in the employment patterns of nurses receiving their R.N. degree between 1970 and 1975. The cohort (nurses graduating during those years) would be sampled and measured on some variable (like where they are working) and conclusions drawn apply only to this particular cohort. Usually, in cohort studies which are longitudinal in nature, different samples of the cohort are drawn for multiple measures over time. When the same subjects are studied over time, the study is a special case of longitudinal, cohort studies called a "panel study."

C. Counterbalanced designs: When repeated measures or treatments are involved in the study and when it may be assumed that the order in which the treatments or measures are administered may have an effect on the outcome, a counterbalanced design is used. In this type of design, subjects are assigned (preferably at random) to all possible orders in which the treatments (or measurements) could be presented (or obtained).

For example, pretend that a researcher wanted to know the effect of the age of patients on nurses communication patterns. In other words, the researcher suspects that nurses communicate differently with patients of different ages. The dependent variable may have been "operationalized" (specifically defined and measured) as the number of times nurses initiated conversations in a 10 minute period. The independent variable, age of patients, might be operationalized as: 1) between 10 and 15 years; 2) between 16 and 30 years; 3.) between 31 and 65 years; and 4.) over 65 years of age. If you had only one group of nurses to deal with, you'd have to observe them with patients of different ages and note the "conversation initiating behavior." To avoid the effect of the order in which patients were seen, you would have to systematically vary the order in which each nurse saw patients. There are four different age groups, so there are 24 possible combinations of order (dredge out your math books regarding permutations and combinations if you don't see why). Therefore, an absolute minimum of 24 different nurses would be required for the study. As you can see, counterbalanced designs require a lot of folks and are, for this reason, sometimes difficult to conduct. Counterbalanced designs usually, therefore, have a limited number of treatment conditions that are investigated simultaneously. Can you follow that?

IV. Some language associated with non-experimental research:

A. Non-experimental research is typically considered to be research conducted when no effort is made to establish cause and effect relationships between variables. (Note: In some instances, however, an effort is made to establish cause and effect retrospectively when ethical constraints do not allow manipulation of the variable....an example is the research which identified the link between tampon use and Toxic Shock Syndrome. In this study, a group of women suffering from Toxic Shock Syndrome were identified and interviewed. They were asked to recount, retrospectively, everything they did prior to the onset of symptoms. The commonality found was that all [or almost all] of these women used tampons prior to the onset of symptoms. It should be understood, however, that when no manipulation is possible, the conclusion that a cause and effect relationship exists is quite tenuous.) Non-experimental research designs do not manipulate an independent variable. Control, however, is still important to handle the effect of error variance.

B. Non-experimental research is an important area of inquiry and is sometimes required when the independent variable is essentially not controllable (disease states, for example, or gender) or where there are ethical or practical constraints in controlling the variable (smoking among school children, for example).

C. Non-experimental research comes in many varieties and different texts and instructors use different terminology. I categorize non-experimental research in a couple of ways. The first has to do with the purpose of the research:

1. Descriptive Research: Research which is simply interested in describing something. Usually descriptive research involves only a single group and the question has to do with things like, for example:

a. How much do nurses in Missouri know about pre- menstrual syndrome?

b. What is the incidence of AIDS among residents of Missouri?

2. Comparative Research: Research which is interested in not only describing a group, but in comparing that group to some other group(s). For example:

a. Do hospital nurses know more or less about pre- menstrual syndrome than community nurses?

b. Is the incidence of AIDS in Missouri higher or lower than the national average?

3. Correlational Research: Research which seeks to establish a "functional" relationship between two or more variables. The purpose here is to see if variables are "linked" or correlated in some predictable way. Research which correlates pulmonary artery temperature and rectal temperatures in post-M.I. patients would be an example. I believe that the example we discussed had to do with the correlation between Graduate Record Examination Scores (GRE Scores) and Grade Point Average (GPA) among graduate nursing students.

4. Secondary Analysis: Researchers will often collect a mountain of information in the course of the study. These data may go far beyond the intended research question. Secondary analysis refers to the process of subjecting data already in existence to new analyses for the purpose of answering new research questions. These analytical procedures, and sometimes the research in which the procedures are used, are often referred to as "post hoc" or "ex post facto" (after the fact).

D. The second way of categorizing non-experimental research has more to do with the procedures used to conduct the research. Bear in mind that a given research project or report may be described in both ways. For example, a single research project may be both correlational research and survey research (See below):

1. Survey Research: This kind of research can investigate a wide range of phenomena for the purposes of description, comparison or correlation. The characteristics which identify a piece of research as a survey are:

a. The selection of a sample of individuals;

b. The gathering of data regarding their reactions, attitudes, knowledge, feelings, etc., through the use of questionnaires or interviews (which may involve mailings, face- to-face interviews, the telephone, etc.); and

c. The summary, analysis and reporting of data.

2. Retrospective Research: Research that looks at something that has already occurred and tries to identify relationships between or among the variables involved (backward-looking research, if you will). An example might be the way in which toxic shock syndrome was first linked to tampon use. The example of studying patients suffering from toxic shock syndrome was an example of retrospective research. The common element of tampon use was then more closely investigated to infer cause and effect relationships. Historical research, or record reviews, are also examples of retrospective research.

3. Prospective Research: Hypothesizing some relationship between variables, identifying some subjects who exhibit the presumed causal or predictor variable, and waiting to see if the predicted effect occurs (forwarding-looking research, maybe). A special case of prospective research involves "prediction studies."

V. Some Additional Research Terminology

A. Hypothesis: A hypothesis is the researcher's prediction of the probable outcome of the study. This prediction is based upon the theoretical/conceptual framework, the researcher's clinical experience, common sense, or other literature in the general area. There are two, or maybe three, common forms that hypotheses take:

1. Null hypothesis: A prediction that the research results will support the finding of no significant differences between groups or no significant relationships between variables. Since, regardless of the form of the hypothesis used to predict the outcome, the null hypothesis will be tested using statistical procedures, the null hypothesis is sometimes called the "statistical hypothesis." Example: There is no difference in the effectiveness with which nurses and physicians perform triage functions as measured by the XYZ Triage Test.

2. Research Hypothesis: Statement that the expected outcome of the research will be the finding of significant differences between groups or relationships between variables. Research hypotheses may take one of two forms. Which is used determines, among other things, the "tailedness" of the statistical test to be used (more on this later).

a. Directional hypotheses: Statement that not only will there be a difference or a relationship, but a prediction of the direction in which the difference or relationship will exist. Example: Nurses perform triage functions, as measured by the XYZ Triage Test, more effectively than physicians.

b. Non-directional hypotheses: Statement that there are differences or relationships expected, but no prediction of the directionality of the findings. Example: There is a difference in the ability of physicians and nurses to perform triage functions as measured by the XYZ Triage Test.

Do you see the difference?

3. Occasionally in experimental research, and more often in non- experimental research, no hypothesis is formulated. Rather, a question about the outcome is raised, with no specific prediction made. This is called, not surprisingly, a research question. Example: Are there differences in the ability of nurses and physicians to perform triage functions as measured by the XYZ Triage Test. The justification for a research question being used in experimental research arises when there is no theoretical basis in the literature, the researcher's experience, or logical inference for a prediction of any kind.

B. Variables: Variables are things which change from subject to subject, situation to situation, etc. .... things which are not constant. There are a number of adjectives that get stuck in front of the noun "variable" which clarify which kind of a variable the researcher is talking about:

1. Independent variable: The variable that is manipulated by the researcher -- the variable that is the presumed "cause" of changes in the dependent variable -- the treatment variable.

2. Dependent variable: The outcome variable -- that variable which is presumed to be caused by variance in the independent variable -- the presumed "effect."

3. Attribute variable: A characteristic of a subject which is not amenable to manipulation -- e.g., gender, age, height, weight, state of wellness, etc..

4. Active variable: A variable which can be manipulated, which can be an independent, or treatment, variable.

5. Predictor variable: In correlational research, the equivalent of the independent variable. In correlational research, it is presumed that knowledge of the predictor variable allows the researcher to reliably predict the ....(read on)

6. Criterion variable: In correlational research, the equivalent of the dependent variable.

C. Statistic: A characteristic of a sample. "Statistics," then refers to the set of mathematical procedures used to analyze information collected from one or more samples.

D. Parameter: A characteristic of a population.

E. Population: The entire set of persons, objects, concepts, settings, etc. to which the researcher wishes to generalize findings. Typically, the actual set of individuals from which the researcher selects participants (the sample) for research studies.

F. Sample: The subset of the population from which the researcher gathers data with the purpose of generalizing to the population.

VI. Data Analysis: General Introduction

A. Okay....the researcher has collected information based upon a research design. Now what? Data analysis is the process of organizing, interpreting and communicating findings to people who may be interested in what has been learned. It is also the title of this course, so we're finally moving into the designated content.

B. There are certain conventions and language used in the analysis of data that researchers (and statisticians) follow to:

1. Eliminate or reduce bias;

2. Explain relationships, differences, etc.; and

3. Communicate the strength of their findings and the confidence they have in those findings.

C. These conventions need to be understood by the research consumer in order to:

1. Understand what was done;

2. Understand the limitations and applications of the findings; and

3. Critically review the findings and their implications for practice, education or administration in nursing.

D. Data, by themselves, do not communicate very much about

the research questions or the answers to those questions. The data need to be systematically analysed to answer research questions, test hypotheses, generalize to meaningful populations, etc.. Generally, two types of data can be collected through the research process and the analyses are dictated, in part, by the nature of the data.

1. Qualitative data: Qualitative data are those data which are not put into numerical form. Qualitative data are typically the result of research endeavors such as interviews, ethnographic and other observations and the report of impressions, attitudes or feelings. These types of data are analyzed in ways which are different from quantitative procedures and have become quite sophisticated in the last few years.

There are several specifically labelled approaches to qualitative analysis including "grounded theory approaches," "trend analytic procedures," and "content analytic procedures." The analysis of qualitative data is often described as a process of "interpretation." However the process is described, qualitative data analysis is the process of organizing the information and trying, intuitively in many instances, to decide what they mean, what the implications might be and what applications are potential consequences of the data collection process. While qualitative research and the analytic methods appropriate to this method are very legitimate research procedures, qualitative analysis of data is reported in the nursing literature much less frequently than the analyses of quantitative data and is, in any event, not the principal focus of this course -- so consider the above information something to store in your trivia treasure chest. Onward!

2. Quantitative data. These are data which are expressed as numbers. These numbers can be subjected to a variety of analytical procedures to assist the researcher in determining the meaning of the data and the implications for further research, practice, teaching or administration. Qualitative data, by the way, can be quantified, in many instances, and subjected to quantitative data analytic procedures and this often done in exploratory or descriptive research.

3. Levels of measurement: There are different ways to quantify variables and, as the term "levels of measurement" would indicate, some are considered better than others. While this is not exactly true, some types of quantification (levels of measurement) leave the researcher with wider choices of statistical procedures which may be used with them. In any case, the following are the four levels of measurement usually included in research and statistics texts. These are relatively important to remember since the level of data you are dealing with determines, in part, the statistical procedure of choice:

a. Nominal: The lowest level. Nominal measures simply identify a subject as being in one particular category or another....this measure simply tells the researcher whether given subjects or groups of subjects are "alike" or "different" with reference to the variable under consideration (e.g., males vs. females, Republicans vs. Democrats, B.S.N.s vs. A.D.N.s vs. Diploma Nurses, etc.).

b. Ordinal: A little better. Ordinal data rank order subjects on the variable of interest from first to last, top to bottom, most to least, heaviest to lightest, etc.. In ordinal data, each subject receives one specific rank. For example, when we are rank ordering subjects as to weight, from heaviest to lightest, each subject will receive a separate ranking -- "1" or "first," "2" or "second," on down to "n" or "last." Just putting subjects into "groups" like "Very Heavy," "Heavy," "Average," "Light," and "Very Light," is not ranking -- it is "naming categories" and, therefore, represents nominal data. Spotting the difference sometimes takes some practice.

There is another context in which ordinal is sometimes (often, actually) by researchers and statisticians. To understand this other context, you need to have clear in your head that the “distance” from 1st to 2nd is not necessarily the same as the “distance” from 2nd to 3rd and so on. If the heaviest person in the group (ranked 1st) weighs 250 pounds and the next heaviest (ranked 2nd) is 240 pounds, the distance (or interval--see next paragraph) is 10 pounds. Further imagine that the 3rd ranked person in weight is 100 pounds. The distance, or interval, here is 130 pounds (the distance between 2nd and 3rd). The problem is that a ranking of 1, 2, 3 makes the intervals look the same. Get it? Anyway, statisticians sometimes consider any measurement device in which the distances between values are not equal as ordinal level. For the purposes of this class, I won’t ask you to make this fine a distinction, but I wanted you to know the actual definitions of the levels. Enough!

c. Interval: Interval measures specifically attach a number to the degree to which a subject exhibits a given variable (e.g., "The patient's temperature is 99.6 degrees Fahrenheit."). Using this datum (the singular of data), you can compare the subject to another subject and say specifically how far apart they are with relation to the variable under consideration (what the interval between subjects is) (e.g., "Patient A's temperature is 2 degrees Fahrenheit higher than patient B's temperature). The kicker is that interval data do not have a "true zero point." Zero degrees Fahrenheit is not the absence of heat, it is merely an arbitrary designation of "zero" to represent a particular amount of heat. The consequence is that you cannot state the relationship as a ratio (e.g., 50 degrees is not twice as hot as 25 degrees).

d. Ratio: The difference between ratio and interval data is the presence of a true zero point. Weight (more technically, "mass") is an example of a ratio measure. Zero pounds is the absence of weight, and 100 pounds is twice as heavy as 50 pounds as well as being 50 pounds more than 50 pounds. More mathematical manipulations may be performed on these data than on interval (or nominal or ordinal) data.

Incidentally, the form that data take may be altered "downward" but not "upward." That is to say, data that are in ratio form (like weight) can be expressed as interval data (by simply indicating the distance, in pounds, from one subject to another with the lightest subjects being arbitrarily designated as zero), as ordinal data (by rank ordering subjects from heaviest to lightest), or as nominal data (by arbitrarily establishing categories like "heavy," "medium weight," and "light"). You can't, however, express Fahrenheit temperature as a ratio variable. Is that clear? If not, bring it up in class and we'll go over it again.

VII. Statistics: The procedures of data analysis.

The term "statistic," you'll recall, refers to a characteristic of a sample. "Statistics," then, refers to those arithmetic procedures used to present data representative of characteristics of samples. The researcher may have, as his/her objective, one of two major purposes in presenting sample data: 1.) The description of the sample; or 2.) the drawing of inferences from those descriptions. When the researcher is interested in describing the sample, the procedures used are referred to as descriptive statistics. When the researcher is interested in drawing inferences from sample data, the procedures used are referred to as inferential statistics. (Logical, yes?)

A. Descriptive Statistics

1. As noted, descriptive statistics are those procedures used to present the characteristics of a sample. Averages, percentages, percentiles, frequencies, measures of variability and sometimes correlations are examples of descriptive statistics.

2. Simply providing tables of frequencies, percentages, percentiles, etc., however, does not help the reader quickly interpret what the findings mean. Statistics, therefore, are organized in various ways to give a more readable "picture" of the sample. Some examples include:

a. Frequency distributions: These are simply tables which arrange all the scores from the sample in numerical value from the highest to the lowest (or vice versa) and tell the reader how many subjects received each score or numerical value. An example follows:

Examination Scores Frequency: N Cumulative Frequency

75 1 1

78 2 3

80 6 9

82 6 15

84 8 23

85 11 34

86 17 51

87 12 63

88 9 72

90 7 79

91 4 83

92 2 85

The third column, the "Cumulative Frequency" is a running total of how many subjects have achieved at a given level or below. For example, you should be able to see from that column that there were 85 students in the class, or group, in question. You should also be able to see that 51 students scored 86 or less. If you can't see that, let me know, because it's a relatively important little statistic used in understanding, and calculating, "percentiles." More on that later.

b. Graphs. Graphs also give a quick picture of the characteristics of a sample. The two most frequently used graphs are: 1) Histograms -- bar graphs; and 2.) Frequency polygons -- "line graphs" or bar graphs with tops connected and the bars erased (the "normal distribution" or bell curve is the most well known example of a frequency polygon and we'll deal with it enough that you'll grow sick of it).

3. Distributions .... Frequency distributions, bar graphs, and frequency polygons all give a picture of a distribution -- how the numerical values of subjects' scores are distributed. These pictures are nice, but take up a lot of space and rely upon the visual interpretation skills of the reader for conclusions. Statistical procedures increase the precision of communication of findings by allowing the researcher to narratively and numerically describe all of the relevant characteristics of the sample using: 1) A measure of central tendency; 2.) a measure of variability; and 3.) a description of the shape of the distribution.

a. Measures of central tendency: These are measures that tell the reader about the "average" or center part of the distribution; about the ways in which the scores group themselves for most subjects. There are three commonly used measures and I'll review them quickly here:

1.) Mode: The most frequently occuring score;

2.) Median: The "middle" score; the score that evenly cuts the distribution in half; and

3.) Mean: The arithmetic average. This is the most commonly used measure of central tendency because it is generally the most stable and reliable single descriptor of central tendency for the sample, assuming that the data are in interval or ratio form and that the distribution of the characteristic in the population may be assumed to be approximately "normal" (more later on that). The mean is calculated by adding up all the scores and dividing by the number of subjects.

b. Measures of variability: These are measures

that tell the reader how spread out or "variable" the scores in a distribution are. Measures of variability, then, tell the reader to what degree the subjects in the sample are similar to, or different from, one another with respect to the variable under consideration. There are many measures of variability, but there are three that are more commonly used than the others:

1.) Range: This is the simplest (and least stable -- one individual score can change the range, but as I hope you'll see later, changing other measures of variability requires that a large number of scores change) of the measures of variability. It is simply the distance from the lowest score to the highest score. In the example of the frequency distribution presented earlier, the high score is 92 and the low score is 75. The range, then, for that distribution is 17 (92 - 75 = 17).

2.) Semi-Interquartile Range or "Q" (Many texts refer to this measure as the "Semiquartile Range" -- same difference). This measure of variability describes the middle 50% of the distribution but indicates nothing about the spread of scores outside of this "envelope."

An understanding of percentiles is necessary to the computation of this measure. A percentile score indicates the percent of the subjects in the sample who scored at or below a given raw score. It is the percentage equivalent of the cumulative percentile (See the connection). For example, a score at the 35th %ile indicates that 35% of the sample scored at or below this level. The 75th %ile is that score at, or below which, 75% of the scores fall. The 25th %ile is that score at, or below which, 25% of the scores fall. Fifty percent of the scores, then, fall between the 25th and 75th %iles. This is known as the "interquartile range." Half of this distance is known as the semi-interquartile range. An example might help: If you determine that, for a given distribution, the 75th %ile falls at 90 and the 25th %ile falls at 60, then you know that the middle half of the sample obtained scores between 60 and 90. Half of this distance (30/2) is 15 and the semi-interquartile range (Q) for this distribution is 15. This number tells you something about how "spread" the scores were on the hypothetical measure. I hope you can see that the smaller the Q value, the less variability there is among subjects in the middle half of the distribution. Another way of saying that, I suppose, is that the smaller the Q, the more homogeneity there is in the group. Make any sense?

3.) Standard Deviation. The standard deviation is by far the most commonly used measure of variability. It is the most stable and reliable estimate of variability and gives more information about the sample than other procedures. For example, where the Q tells you about the middle 50% of the sample only, the "sd" (standard deviation) describes the middle 68% (2/3) of the distribution specifically and describes the distribution of scores across the entire range of scores. We'll deal very specifically and at some length with this statistic when we discuss the characteristics of the normal distribution.

c. Descriptions of shape: When you draw a frequency polygon, the resulting picture has a certain shape. The shape of the distribution tells you some important things about the sample itself and some important things about which statistics should be used for data analysis. Some ways of describing the shape of a distribution include:

1.) Symmetry: If you folded the distribution in half, a symmetrical distribution would be superimposed upon itself. Hmmm! Not very clear. Let me try again. In a symmetrical distribution, the right and left halves of the distribution are mirror images of one another. The normal distribution, incidentally, is symmetrical. There are other symmetrical shapes, however, that are non-normal. Read on!

2.) Modality: When you draw a graph, you may have one, or more, peaks or most frequently occuring scores (modes). A uni-modal distribution has only 1 peak or mode, while a bi-modal distribution has two peaks, etc.. A normal distribution is symmetrical and uni- modal. It is possible for a distribution to be multi-modal and symmetrical. Consider the following bi- modal distribution:

______________________________________

This distribution has two peaks (two modes), but if you fold the picture in half, it is still "mirror imaged" (allowing for flaws in my artistic abilities).

3.) Skewness: Non-symmetrical distributions are referred to as "skewed" .... that's skewed (rather than whatever else it sounds like). The shape of the non- symmetrical distribution determines whether the distribution is referred to as positively or negatively skewed. Pictures are probably better than words to describe this phenomenon:

_______________________ ______________________

Positive Skewness Negative Skewness

It is the direction of the "tail" that determines the positive or negative nature of the skew. The tail goes to the right (toward the higher numbers) in a positively skewed distribution and to the left (toward the lower numbers) in a negatively skewed distribution. Technically, the "positive" or "negative" refers to the direction in which the mean is pulled. In a skewed distribution, the median and the mean are "pulled apart." The mean is always higher than the median in a positively skewed distribution and the mean is always lower than the median in a negatively skewed distribution.

4.) Kurtosis: This refers to the "height" or "peakedness" of the curve. A distribution with a very high peak is called leptokurtic, while a distribution with a very flat look is called platykurtic. Again pictures might help:

_____________________ ______________________

Leptokurtic Platykurtic

d. The question arises: "When do you use which descriptive statistic?" Basically, the researcher wants to strongest,most reliable statistic consistent with the data. For central tendency, the mean is the first choice. For variability, the standard deviation is preferred. Both of these statistics, however, are predicated upon the assumption that the distribution is normal and that the data are at the interval or ratio level. When the distribution is non- normal or when the data are ordinal or nominal, the researcher must pick other statistics. Basically, the guidelines are:

1.) Interval/Ratio Data....Normal Distribution -- Mean and Standard Deviation are used;

2.) Normal or non-normal distribution ....nominal data -- Mode and Range are used;

3.) Ordinal data and/or non-normal distribution -- Median and Q are used.

4. All of the above statistics refer to the description of a single variable from a single sample. There are ways of describing multiple variables or multiple samples:

a. Contingency Tables. Contingency tables are not really statistics. Contingency tables are simply pictures of the numbers for different variables across different groups. All comparative, experimental and much correlational research involves multiple samples and/or multiple variables for a single sample, so you will see contingency tables used to present data from these studies. For example, the following shows some fictitious data for two groups for who we have gathered data regarding Grade Point Averages:

GPA

0.0-1.0 1.1-2.0 2.1-3.0 3.1-4.0

_____________________________________

Group # 1 4 6 12 6

_____________________________________

Group # 2 6 8 10 2

_____________________________________

b. Correlations: Correlations can be considered either descriptive or inferential statistics depending upon the purpose for using the correlation and its interpretation. Simply stating the "r" value for variables within a group is descriptive in nature.

VIII. Inferential Statistics

Note: While descriptive statistics equip you to describe the characteristics of the sample, the purpose of most research is to draw conclusions about populations based upon the descriptive statistics. These conclusions must be infered from the statistics we have regarding the sample. The mathematics used to draw, or infer, conclusions are called, not surprisingly, inferential statistics.

The purposes of this course include both equiping you to calculate selected statistics and to choose appropriate statistical procedures for various research purposes. This latter process will be dealt with on and off throughout the semester, but all efforts to deal with this task will relate, at least roughly, to the following taxonomy of statistical tools:

Statistical Procedures

Descriptive Statistics Inferential Statistics

Non-

Parametric Parametric

Statistics Statistics

Central Tendency Univariate Univariate

Mean Bivariate Bivariate

Median

Mode

Multivariate Multivariate

Variability

Range Test Hypotheses Test Hypotheses

Q and

Standard Estimate Population

Deviation Parameters

Shape

Kurtosis Skewness Modality

Description of Samples

A. Okay, with this picture in mind, some narrative explanation may help:

1. Descriptive statistics: We've already discussed these. These are the devices used to describe a sample in terms of central tendency, variability and shape.

2. Inferential Statistics: These are the mathematical procedures which allow the researcher to draw inferences from sample data (statistics) about population characteristics (parameters) and/or to test hypotheses. There are two major kinds of inferential statistical procedures:

a. Parametric statistics: These are the statistical procedures which may be used to estimate population parameters (hence the name) and to test hypotheses. These procedures are very "powerful" and may be used with more confidence than non-parametric statistics, but their use is predicated upon some assumptions about the nature of the data. These include:

1.) Data must be in interval or ratio form;

2.) Normality -- it is assumed that the variable in question is normally distributed in the population to which we wish to generalize findings;

3.) Homogeneity of variance -- (technically called, are you ready for this?, "homoskedacity") it is further assumed that variance across multiple samples from the population would be approximately equal; and

4.) Reasonable number of subjects (judgement call).

b. Non-parametric statistics. These procedures are not as powerful as parametric statistics and may be only used to test hypotheses....not to estimate population parameters. They are used because they are not restricted by assumptions about the population -- sometimes, therefore, these procedures are called "distribution-free" statistics. They may be used with nominal or ordinal data and small sample sizes do not materially effect the results.

“The Normal Distribution”

I. The Normal Distribution: Importance, Background and Application

A. The properties of the normal distribution are the basis of all quantitative statistical procedures which rely upon probability – This, basically, is all the statistics which report levels of significance – which is functional all the quantitative statistics. Consequently, an understanding of the normal distribution is critical to an understanding of any of the quantitative procedures. Once you have it under your belt, a conceptual understanding of statistical procedures should come relatively easily.

B. The normal distribution, I hope you’ll recall, is a frequency polygon which pictorially displays data from a hypothetical population.

C. It is the shape of the frequency polygon which defines it as “normal.” Regardless of the number of scores included in any distribution, the normal distribution always looks the same. With apologies for my electronic artistic abilities, the normal distribution looks something like the diagram below:

___________________________________________________

D. What the normal distribution allows researchers, clinicians, teachers, administrators and others to do, in essence, is to compare different groups or individuals on some variable measure when the raw score comparisons might be meaningless. It is the predictable character of the normal distribution which does not change from variable to variable that allows this. The characteristics of the shape of the frequency polygon include:

1. It is symmetrical (which means, among other things, that if you folded it in half, both sides would look the same. Statistically, this means that it is not “skewed;”

2. It is uni-modal (has only one peak, or mode—the most frequently occurring value);

3. The “tails” (extreme ends of the polygon) never touch the ground…there is, theoretically, no zero or infinite quantity on the normal distribution; and

4. It has an absolutely predictable angle of ascent and descent on either side of the mean (meaning that the “kurtosis” is regular and predictable – it is this feature which allows for absolute predictions of “area under the curve” … more on this later).

E. When any variable (say, intelligence) can be assumed to be normally distributed in the population, any given value (individual score) or set of sample data (statistics) can be compared to the normal distribution for that variable. For example, if, for some bizarre reason, we wished to compare the relative, measured intelligence of nurses to the general population, we could do so rather easily. All that would be necessary would be to measure the IQs of a sample of nurses and compare the mean IQ of that sample to the “norms” (normal distribution of intelligence in the general population). We could, then, say (silly) things like whether nurses had higher or lower IQs than the general population, etc.. Similarly, any individual IQ score could be compared to the population in making relative statements. This is what is done with things like admission requirements, certification examinations, etc.. An individual score is compared to the population so that you can determine where the individual (or, as noted, the group) falls in relation to the population. As silly as this kind of comparison seems, the ability of the normal distribution to allow this kind of comparison makes possible the two purposes of quantitative, inferential statistics:

1. The estimate of population parameters: By comparing sample characteristics to a theoretical distribution, a researcher can estimate how much spread would be expected to occur in a distribution of sample means....thus estimating the actual population mean within certain limits (+ 1 standard deviation on the sampling distribution) – this estimate is called “the standard error of the mean.” (Don’t worry if this makes no sense, yet. We’ll deal with it a lot more later and in class.)

An example might help: Suppose a researcher wished to determine the level of knowledge regarding gerontology among nurses in Missouri and had a valid and reliable measure of this knowledge. To determine the level of knowledge among Missouri nurses, he/she would select a sample of Missouri nurses and administer the instrument to those nurses. After tabulating all of the scores, he/she can calculate the sample mean. The question is: “Is this the actual mean score which would have been achieved if all nurses in Missouri had taken the test?” The simple answer is: “Probably not.” The statistical answer must be significantly more precise and relies upon the comparison of the actual distribution of scores to the hypothetical normal distribution of the population. What statistical procedures can estimate, based upon the variance of the sample data and the size of the sample, is the most likely variance of the entire population. This will allow for a calculated estimate of the “standard deviation of the sampling distribution.” This calculation takes care of the “probably” part of the “probably not” answer given above. It will tell the researcher how inaccurate her sample mean “probably” is as an estimate of the population mean. The researcher can then report the mean score of the population in terms of the likely range in which the population mean actually falls. S/he would probably report something like: “The mean score of Missouri Nurses on the XYZ Test of Gerontological Knowledge is 78 +3 (3 being the standard deviation of the sampling distribution).” Do you get it? If not, read on. It might clear itself up.

2. The testing of hypotheses: In a similar fashion, the properties of the normal distribution assist the researcher in determining the likelihood that differences between two groups, for example, are large enough to make it unlikely that the differences could be explained by chance variation. This is known as hypothesis testing.

What happens statistically in the comparison of group means is this: First, individuals in different groups (treatment and control, for example) take a test or something similar and a score is calculated for each individual. Then an average score (usually the mean) is calculated for each sample. What the researcher wants to know is: “Are these two means from the sample population?” Or, stated another way, “What is the likelihood that two samples from the sample population would achieve mean scores that were this different?”” So, next the researcher, using statistical techniques, will calculate a grand mean of all the scores for both groups combined and, assuming normal distribution, will see where the two mean scores fall. If it is demonstrably unlikely that two scores would be as different by chance in the hypothetical (normal) distribution, the researcher asserts that the scores are “really” different. It it is likely that the magnitude of the difference could occur by chance, the researcher is forced to conclude that “no significant difference is evident.

F. It was mentioned earlier that the characteristics of the normal distribution allow comparisons between or among groups or individuals when raw scores provide little help. For example, if an administrator wished to know which of two groups or individuals was “more competent” based upon standardized test scores, a comparison of raw scores or raw mean scores might not help. Let’s say, for example, that group 1 had a mean score on the Krantz Test of Nursing Skill of 125, while group 2 achieved a mean score of 100 on the Jones Certification of Nursing Skills. Which group is “more competent” or achieved the higher score? There’s no way to know since you’re trying to compare apples (The Krantz Test) and oranges (The Jones Inventory). Assuming normal distribution of the variable, however, they can be compared using percentile scores or standard scores, the meat of the normal distribution. It might clear this up a little if I told you that the average score on the Krantz test (the average of all individuals who have ever taken it) is 150 and the average score for the Jones Inventory is 80. Knowing that, you should be able to see that the group that took the Krantz Test scored below the mean while the group that took the Jones Inventory scored above the mean...in other words, even though they had a higher raw score, the Krantz Test group did worse than the Jones Inventory group. Can you follow that? Read on.

II. Percentile Scores: A dimension of the normal distribution

A. To get a handle on percentile scores, let’s go back to the concept of the frequency distribution. This is simple a way of organizing a table of all the scores in a sample.

B. A frequency distribution presents, typically, three pieces of information (data point) about the sample results:

1. The score value itself (usually designated as “x” or “y.”)

2. The number of individuals who achieved each score (usually designated as f(x) or f(y)—which means “frequency of x” or “frequency of y”).

3. Cumulative Frequency—a running total of the number of persons, counting from the bottom up, that have received scores at that level or below (usually designated as “cum f(x)” “cum f(y)).

C. The cumulative frequency is the data point most closely associated with percentile. Percentile, as a matter of fact, could be legitimately defined as “the cumulative frequency of a given value expressed as a percent.” Hmmmm! May not be clear yet. Let’s try an example.

D. Pretend, for example, that the following represents the 25 scores for a class in statistics on the final exam:

(Number of Persons (Cumulative

(Score) Achieving Each Score) Frequency)

x f(x) cum f(x)

78 1 1

79 2 3

80 3 6

81 6 12

82 4 16

83 3 19

84 3 22

85 2 24

86 1 25

1. Okay. There are a number of things that could be said about this distribution (including the fact that the class did not do very well on the exam). As you can probably see, the first column is scores on the exam. The second column represents the number of students who got each score. The third column, the cumulative frequency, keeps a running count of the number of individuals who achieved scores from the bottom to the top. There were, for example, 6 students who achieved scores of 80 or below. Another way of stating this is that the “Cumulative Frequency associated with the score of 80 is 6.” Now for the kicker – What percent of students achieved a score of 80 or below? The answer is 24% (6 divided by 25 – 6 out of 25 students achieved a score of 80 or less). A score of 80, therefore, falls at the 24th percentile.

2. You can similarly calculate the percentile for each score. The score, 85, for example, falls at the 96th percentile—24 out of 25 students fell at or below this level. If a student, therefore, received an 85, he or she scored as well as or better than 96% of the class. Do you get it? If not, try a few more till you get the mechanics down.

3. Percentile scores provide you with significantly more practical information about individual or group scores than raw scores. Percentiles, for example, will permit you to rather easily compare an individual to a group or one group to another using different instruments which measure the same thing. For example, the earlier example of the Krantz and Jones Tests of nursing competence could benefit from knowledge of percentile scores. For example, if it is known that a score of 125 on the Krantz Test represents the 40th percentile and that a score of 100 on the Jones Inventory represents the 85th percentile, you can now more easily, and more precisely, compare the two groups. Assuming the tests are valid measures of skill, the group scoring 100 on the Jones Inventory demonstrated more competence than the group scoring 125 on the Krantz test. Do you see that?

III. Standard Scores: The Basis of the Consistency of the Normal Distribution

A. It has been stated that, regardless of the measure under consideration and regardless of the number of subjects in the distribution, a normal distribution is always shaped exactly the same way. This means there is an absolutely predictable relationship between the distance along the bottom axis and the height of the curve at any given point. This is possible because of percentiles which are proportion of the sample which achieve at a given level or below. For ease of communication and mathematical manipulation of data, the concepts surrounding percentiles can be expressed in terms of standard scores, which are usually designated as “z” scores. We will go into the concept of standard deviation in great depth later, but for now, store away the definition of a standard (z) score as a given score expressed in standard deviation units.

B. As an introduction, let me clarify what I mean by deviation. When we say “deviate” we mean “differ from.” Okay, differ from what? In this instance, we’re talking about deviation from the mean. Pretend that on a midterm exam, the average for the class was 75. Each person’s score differs or deviates from that mean score to a greater or lesser degree. If Joe got an 80, he “deviates” from the average by +5. If David scored 70, he “deviated” from the average by –5. If Doug scored 75, he “deviated” from the average by 0. Mathematically, the “standard deviation” is the average of all the deviation scores. If we say the standard deviation on the midterm was 5, we are communicating that, on average, individuals varied (deviated) from the mean by 5 points. With that in mind, standard deviation is an expression of the predictable ways in which scores vary in a normal distribution. Variance, in this context, refers to the pattern of distance any given score or set of scores are from the mean of the distribution. A predictable percentage of scores fall between the mean and any given point along the distribution. A picture would probably help. The diagram on the next page presents a normal distribution (again, with apologies) with the mean, standard deviations and the associated percentils for each standard deviation noted. The mean, incidentally, is noted with a z score of 0 since the mean is 0 standard deviations away from the mean. Understand?

__________________________________________

(Mean)

z -3 -2 -1 0 +1 +2 +3

%ile .1 2 16 50 84 98 99.9

(rounded)

C. What this pictures says, is something like:

1. The mean, the center of the normal distribution also falls at the 50th percentil (%ile, in my shorthand). This means that 50% of the scores in the distribution fall at or below the mean. Since the distribution is symmetrical, 50% of the scores also fall above the mean.

2. Sixten percent of the scores fall at or below the point designated as z = -1. This means that 16% of the scores fall at one standard deviation (sd) below the mean or lower. Again, it should be logical to you that 84% (100% - 16%) of the scores fall above the z score of –1. See that?

3. The point marked z = +1 falls at the 84th percentile. This means that 84% of the scores fall at or below a z score of +1. Once again, 16 percent of the scores fall above a z score of +1. And so on. Do you get it?

D. When a researcher has calculated the standard deviation for a given distribution, a specific score in the distribution can be expressed in terms of that standard deviation – as a z score.

1. For example, if a given score is expressed as z = +1.2, that means the score fell at 1.2 standard deviations above the mean. If another score is reported to be z = -.75, that means that the score fell at .75 standard deviations below the mean. Finally, if a score is reported as z = 0, that means that the score fell exactly on the mean, or at 0 standard deviations above or below the mean? Okay?

2. Any raw score can be converted to a z score if you know the mean and the standard deviation for that distribution. In a distribution, for example, in which the mean is 100 and the standard deviation is 10, a raw score of 110 can be expressed as z = +1 (it is one standard deviation above the mean – 100 + 10 = 110). Similarly, a score of z = -.5 is 95 (.5 or one-half, standard deviation below the mean – 100 – 5 = 95).

3. There is a formula for conversion of raw scores to z scores. There is also a derived formula for conversion of z scores to raw scores. Both rely upon knowing the mean and the standard deviation for the distribution.

The formula for conversion of raw score (x) to standard score (z) is:

_

z = x – X

sd

_

Where X = mean, sd = standard deviation and x = raw score.

An example: In a distribution where the mean is 100 and the standard deviation is 20, calculate the standard score (z) for raw score = 135.

z = 135 – 100

20

z = 35

20

z = 1.75

In this distribution, a raw score of 135 falls 1.75 standard deviations above the mean or has a z value of 1.75.

The reverse formula, the one for deriving raw scores from standard scores, is:

_

x = z(sd) + X

An example: In a distribution in which the mean is 100 and the standard deviation is 20, what is the raw score equivalent of a standard score of –1.5?

x = -1.5(20) + 100

x = -35 + 100

x = 65

In this distribution, a z score value of –1.5 (a score falling 1.5 standard deviations below the mean), has a raw score equivalent of 65. Get it?

IV. Interpreting Standard Scores as Percentiles

A. Okay....combining the last two sections, I hope that I have convinced you that standard deviations divide the normal distribution up into predictable proportions of the total pot. The distance from any given standard score to the bottom of the distribution can be expressed as a percentile. From a few pages back, you should recall that the point designated as –1.0 (1 standard deviation below the mean) represents the 16th percentile (16% of the distribution falls at, or below, the z score of –1.0).

B. Any spot on the distribution can be identified in a similar fashion as a percentile, but labeling points between even numbered z scores requires the use of Table B, included with your syllabus materials. It is labeled “The Standard Normal Distribution.” This table, as the picture at the tope of the table tries to indicate, tells you the percentage of scores that fall between the mean and any given distance, expressed as z scores, from the mean.

C. The z scores themselves are indicated on the margins of the table (both ways). The top to bottom (left side) margin provides z scores to 1 decimal place (e.g., .1, .2, 1.3, etc.). The left to right (top of the page) margin adds the second decimal place (e.g., .01, .02, etc.). Using these two margins allows you to locate the percentage of scores between the mean and any given z score value, correct to two decimal points. For example: If you go down the the left margin to the row marked 1.0 and then proceed across the page to the column headed “.03,” you have found the spot which indicates the percent of scores that fall between the mean and 1.03 standard deviations from the mean. The number you should find at this intersection is .3485. This means that 34.85% of the scores fall between the mean and a z score of 1.03.

D. Knowing that the normal distribution is symmetrical and that the mean divides the distribution exactly in half (50% of the scores are above and 50% of scores are below this line), allows you to use this table to label any z score as a percentile.

E. Let’s say, for example, that you wished to convert a z score of +1.6 to a percentile. This means you want to know what percent of the scores in a distribution fall at, or below, a z score of +1.6.

1. Well, you already know that the mean is the 50th percentile and that 50% of the scores fall below that. All you need to know to find the percentile equivalent of +1.6 is to find out how many scores fall between the mean and +1.6. You can add this percent to 50 and come up with the percentile.

2. So, go to the table and proceed down to 1.6. In that column (1.60) you find the number .4452. This means that 44.52% of the scores fall between the mean and a z score of 1.6.

3. Now, if 50% fall between the and the bottom of the distribution and an additional44.52% fall between the mean and the z score of 1.6, then you know that 94.52% fall betwween the z score of +1.6 and the bottom of the distribution. The percentile, therefore, is 94.52.

How about a picture:

50% 44.52%

______________________________________

(mean)

-3 -2 -1 0 +1 +2 +3

94.52%

4. The only trick now is to remember whether to add the value in the body of Table B to 50% or to subtract it from 50%. The answer can either be conceived of visually and logically, or mathematically.

a. When you want to know the percentile equivalent of a z score which is above the mean, you know that the percent between the mean and the z score (taken from Table B) must be added to 50% (can you see that from the diagram above?). Conversely, for a z score below the mean, you know that you must subtract the identified percentage from 50%.

b. If you can’t see that visually, don’t panic. Just remember that when the z score is positive, add the identified percentage to 50. When the z score is negative, subtract. Okay?

F. Table B can also “be reversed” to help you convert percentile scores to z scores (standard scores).

1. Remember that the numbers in the body of the table (the ones with four decimal places, e.g., .4652) tell you the percent of scores which fall between the mean and any given value of z.

2. If, for example, you wish to identify the z score for the 60th percentile, what you really want to do is to identify the z score value at which, or below, 60% of the scores fall. Again, you know that 50% of the scores fall at, or below, the mean, so you want to find the z score which adds 10% more to the 50% below the mean. It also tells you, since you know this point will be above the mean, that the z score will be positive.

3. So, you go to Table B to find the z score which encloses 10% of scores between it and the mean. Looking in the body of the table, look for the four digit number which comes closest to .1000 (10.00%). The closest value is .0987, which occurs at the intersection of .2 (row) and .05 (column). This means that 9.87% (which can be rounded to 10%) of the scores fall between the mean and a z score of .25 (.2 + .05). The 60th percentile, then, is at

z = +.25.

4. Again a picture might help:

50% 10%

______________________________________

(mean)

-3 -2 -1 0 +1 +2 +3

.25

60%

5. One more example to illustrate what you do for percentages and percentiles below the mean. Suppose, for example, you wish to identify the z score equivalent of the 30th percentile. You wish to find that point at, or below which, 30% of the scores fall. You know that 50% fall at, or below, the mean. So the 30th percentile will be below the mean (and the z score will be negative). But how far below the mean? Twenty percent below. You know that you want the cut-off for 30% and you know where the 50% cut-off is. So you find the z score which is 20 percentage points from the mean—look in the table for the number closest to .2000. The closest value is .1985 which is at the intersection of .5 (row) and .02 (column). So you know that 19.85% of the scores fall between the mean and .52 standard deviations. Since you know you will be below the mean, you know the z score is negative (-.52):

30% 20%

______________________________________

(mean)

-3 -2 -1 0 +1 +2 +3

-.52

V. Converting raw scores, percentile scores and standard scores

A. Okay! If any of the gobbledygook above makes any sense to you, you should be able to see that any point on the distribution can be expressed in at least three ways: 1) raw score; 2) z score; and 3) percentile score.

B. For any given set of sample scores, then, one specific score can be expressed in several ways. Take an easy one. For a distribution in which the mean is 50 and the standard deviation is 10, the mean score can be expressed in all three ways: 1) As a raw score = 50; 2) as a z score = 0; and 3) as a percentile score = 50th.

C. Here is what that distribution looks like with all of the values included:

______________________________________

Raw Scores 20 30 40 50 60 70 80

z Scores -3 -2 -1 0 +1 +2 +3

Percentiles: .1% 2% 16% 50% 84% 98% 99.9%

D. Any score, anywhere along the bottom axis, can be expressed in three ways. Statistically, it is important that you be able to convert scores back and forth with some ease. This will bet you used to comparing numbers from different samples with each other. Let’s do some examples from this distribution:

1. Convert a raw score of 55 to a z score and a percentile score:

a. For converting from raw to z score, you can use the formula provided earlier:

_

z = x – X

sd

z = 55 – 50

10

z = 5

10

z = +.5

b. To convert the z score to a percentile, you use Table B.

First, find the z score of .5 on the left margin of the table.

The value indicated for .5 is .1915. This means that 19.5% of the scores fall between the mean and a z score of .5.

Since the z score is positive (+.5), you know that you need to add the 19.5% to the 50% which fall below the mean. The percentile equivalent, then, is 69.15 (50. + 19.15).

2. You can begin with any of the three values and calculate the other two. For example, for a z score (in this distribution) of 1.35, calculate the raw score and then the percentile score:

a. First, convert the z score to a raw score using the formula:

_

x = z(sd) + X

x = 1.35(10) + 50

x = 13.5 + 50

x = 63.5

b. Next, use the table to identify the percentile. Finding 1.3 in the left margin and going across the .05 (l.3 + .05 = l.35), the decimal value you find is .4115 – 41.5 % of the scores fall between the mean and 1.35 standard deviations. Since the z score is positive, you add this percentage to the 50% that fall below the mean.

Percentile = 50% + 41.15%

Percentile = 91.15

4. Finally, convert a percentile of 27 to a raw score and a z score:

a. First, convert the percentile to a z score. You do this using Table B. Remember that you wish to identify that point at, or below which, 27% of the scores fall. Since you know that 50% fall at, or below, the mean, you want to find that point which encloses 23% (50% - 27%) of the scores. Look, then, for the four digit value closest to .2300. This number (.2291) is found at the intersection of .6 (row) and .01 (column). This means that 22.91% of the scores fall between the mean and .61 standard deviations. Since you’re below the, the z score will be negative: z = -.61.

b. Now you can convert the z score to a raw score using the formula:

_

x = z(sd) + X

x = -.61(10) + 50

x = -6.1 + 50

x = 43.9

E. If these are still fuzzy, practice a few and I think they will come more easily.

VI. Area Under the Curve or “Area Under the Ordinate”

A. While the identification of “corresponding scores” (percentiles, z scores, raw scores) is important, they are “tool skills.” In dealing with sample data, and in comparing one sample to another using things like t-tests (more later), it is usually more valuable to measure distances from one spot on the distribution to another rather than simply measuring proportional distance from the bottom to any raw or z score value.

B. Table B will allow you to do this with only a slight extension of your visual imagination.

C. Since this table tells you what percent of scores fall between the mean and any give z score, it can also help you determine the percent of scores that fall between any 2 z scores. Remember that the mean is a z score of 0, so this is not really a new skill.

D. Let’s take an example. Suppose I ask you to tell me the percentage of the distribution that falls between +1.0 standard deviations and –1.0 standard deviations. To answer this question, you need to find what percentage falls between the mean and +1.0 and the percentage that falls between the mean and –1.0 and add these together. Table B identifies the percentage associated with a z score of 1.0 as .3413. – 34.13% of the scores fall between the mean and 1.0 standard deviations. So, 34.13% fall between the mean and +1.0 and 34.13% fall between the mean and –1.0 standard deviations. Adding them together, you can respond that 68.26% of the scores in any normal distribution fall between – 1.0 and +1.0 standard deviations. A picture of this is on the top of the next page.

34.13% 34.13%

____________________________________________

z scores -1.0 0 +1.0

68.26%

E. The process is precisely the same, but the logic looks more complicated when both z scores are on the same side of the mean. Suppose, for example, that I ask you what percent of scores fall between +1.0 and +2.0 standard deviations. The first step is to identify the percent of scores that fall between the mean and the larger of the two z scores, +2.0. Going to Table B, you should be able to see that 47.72% of scores fall in this area. Then get the same information for the other z score. 34.13% of scores fall between the mean and +1.0 standard deviations. The answer to the question, then, is the difference between the two (47.72 – 34.13 = 13.59). So, 13.59% of scores fall between the z scores +1.0 and +2.0. A last picture:

34.13% 47.72%

_____________________________________________

Mean 1.0 2.0

13.59%

F. Again, there is an exact analogue of this when both are on the bottom side of the distribution – below the mean. Draw a picture for yourself and try a few problems like, for example, what percent of scores fall between – 2.3 standard deviations (z = - 2.3) and – 1.3 standard deviations. If it still isn’t clicking, please come and see me before exam time. Good luck!

Exercises for Block 1: The Normal Distribution

1. For a distribution having the following characteristics:

Mean = 53.5

sd = 6.3

Fill in the blank spaces:

Raw Score Standard Score Percentile Score

60 ____________ _____________

_________ -1.3 _____________

_________ ____________ 89th

48 ____________ _____________

_________ .67 _____________

_________ ____________ 40th

2. For a distribution having the following characteristics:

Mean = 348

sd = 32

Answer the following:

If this were a diagnostic tool in which the top 5% were considered pathological, what would be the cutoff score?

What percent of subjects fall in the clinically normal range, defined for this as between -1.5 and +1.5?

What percent of subjects fall above a raw score of 360?

If this were a certification tool, a certification was granted to only the top 20%, what is the minimum raw score acceptable for certification?

For a research project, a researcher wishes to exclude both the bottom 20% and the top 20% of subjects. Above and below what scores will potential subjects be excluded?

3. Who scored “better” on a given exam: Fred, who received a standard score of +.28, John, who received a standard score of -.15, or George, who received a percentile equivalent score of the 54th percentile?

4. Two individuals took parallel competency exams in ACLS. Individual A completed an exam with a mean of 100 and a standard deviation of 10. Individual B completed an exam with a mean of 150 and a standard deviation of 15. Their scores were:

Individual A: 126 Individual B: 166

If the hospital’s requirement for certification is that individuals score at, or above, the 80th percentile, which individual(s) successfully completed the exam?

page 44