Summer end 1

profilelisawatson842
Chapter9Notes.pdf

Chapter 9 Notes The purpose of evaluation is to determine the merit or worth of an evaluand. That is, we want to know whether a program had the intended effect on its participants as specified by the programs theory and model. Our ability to faithfully and confidently determine the effects of a program are in part determined by the manner in which we design the evaluation. There are many ways to think about designs within the context of evaluation and designing evaluation is a complex endeavor. Moreover, it is important to note that different designs can be used for different types of applications. Regardless of how we conceptualize and frame the relationship between the purposes and methods of an evaluation process, there are two major questions that we have to be explicitly addressed:

1. To what extent are the effects we observe in participants really due to the program and not some other reason?

2. To what extent can the results observed in participants be expected to generalize (extend to) other situations?

Both of these questions pertain more formally to the concept of validity. And there are two specific forms of validity that as evaluators we must be concerned with:

• Internal validity – Refers to the extent to which a research design includes enough control of the conditions and experiences of participants that it can demonstrate a single unambiguous explanation for a manipulation, that is cause and effect.

To what extent are the effects we observe in participants really due to the program and not some other reason?

When we have adequately attended to issues involving internal validity within the evaluation process it means that an evaluator has controlled the effects of variables other than the treatment, in order to say with confidence that the results are reflective of the treatment. Hence, we can confidently say that the observed effects are cause by the program and nothing else.

• External validity – Extent to which observations made in a study generalize beyond the specific manipulations or constraints in the study

To what extent can the results observed in participants be expected to generalize (extend to) other situations?

When we have adequately attended to issues involving external validity it means that an evaluator has ensured that the participants of the program are representative of the population, and therefore that if the treatment is applied with another group of people from that population under similar circumstances, it should be effective there as well

WHAT FACTORS DIMINISH OR THREATEN VALIDITY OF EVALUATIONS? We can classify threats to the validity of our conclusions in terms of internal and external threats

Threat Description

History Events occurring during a study (other than the program treatment) that can influence results

Maturation Naturally occurring physical or psychological changes in program participants (e.g., growth, development, aging) that can influence results

Testing Administration of test before and after program might influence scores on test independent of program (e.g., familiarity with test results in changes in scores)

Instrumentation Having pretest and posttest that differ in terms of content, structure, format or difficulty can lead to differences in scores but not due to program treatment but differences in the instruments used

Statistical regression Having extreme groups in program may artificially decrease or increase scores independent of the program treatment—if all members of a group are already scoring at the highest levels and their scores cant go any higher, any observed decline in scores may be due to the test not program treatment indicating a measurement error

Differential Selection

Differences between groups compared (treatment vs. no-treatment groups) on important characteristics may account for observed differences but these are not due to the program treatments

Experimental Mortality

Differential dropout of participants in treatment and no-treatment groups yields differences in observed effects that are not a function of the program treatment but rather an artifact of attrition within the groups.

Treatment Diffusion Proximity among participants in treatment and no-treatment groups leads to treatment exposure for the no-treatment group

Compensatory rivalry

When no-treatment group outperforms the treatment group, but those differences are not due to the treatment effects, but by competition –John Henry effect

Compensatory Equalization of treatments

If one group receives something and the other receives nothing, than any effects on the first group may be due to the fact that this group received something, and not to the specifics of what it received.

Resentful Demoralization

When members of the no-treatment group realize they did not get something that the treatment group received they may become demoralized because they are being excluded but not because they did not get the specific treatment

External Validity Threats Threat Description

Selection Treatment Interaction

Refers to the possibility that the program results may be applicable to only to that population from which the treatment and no-treatment groups were chosen—hence results may be internally valid but not generalizable

Testing Treatment Interaction

Refers to the fact that the program results may be generalizable to other groups only when a pretest is also given.

Situation effects Experimenter effects

Refers to the existence of multiple factors associated with the program itself---results may be due to a particularly charismatic instructor rather than the content of the program

Multiple treatment effects

Participants are involved in multiple programs at the same time of the evaluation, hence the findings may not be generalizable to other settings because of the confounding of multiple treatments

Population validity Extent to which results observed in a study will generalize to the population from which a ample was selected. Homogeneous attrition: Rates of attrition are about the same in

Ecological validity Extent to which results observed in a study will generalize across settings or environments

Temporal validity Extent to which results observed in a study will generalize across time and at different points in time

Outcome validity Extent to which results observed in a study will generalize across different but related DVs

HOW DO WE MITIGATE AGAINST THREATS TO INTERNAL AND EXTERNAL VALIDITY? An evaluator can try to mitigate against these potential threats by selecting an evaluation design that reduces the influence of the particular threat by the manner in which the design is executed. There are many ways by which to characterize evaluation designs—Mertens and Wilsons 2012 distinguish between quantitative vs. qualitative data; but we can also classify designs in terms of being experimental, quasi-experiemental and non-experimental. I will use this latter one to highlight how the various designs attempt to address the validity threats we just discussed. The experimental research designs use methods and procedures to make observations in which the researcher fully controls the conditions and experiences of participants by applying three required elements of control: randomization, manipulation, and comparison/control

 Randomization—involves randomly selecting participants into the study so that all individuals in a study; it also involves randomly assigning participants to the experimental conditions.

 Manipulation—involves the systematic application of an experimental treatment.

 Control—involves controlling who gets or does not get a particular treatment and ensuring that all other aspects of the experimental process are the same except for who gets or does not get a particular treatment.

Experimental research designs are the only research design capable of establishing cause—effect relationships. To demonstrate that one factor causes changes in a dependent variables, the conditions and experiences of participants must be under the full control of the research. This often means that an experiment is conducted in a laboratory and not in an environment where a behavior may occur naturally. Strength: Capable of demonstrating cause and effect. Limitation: Behavior that occurs under controlled conditions may not be the same as behavior that occurs in a natural environment We can categorize experimental research designs into one of 4 possible types of designs.

Box 9.4 provides an alternative way to conceptualize designs in Mertens & Wilson (p. 316). You will note that (R) designates randomization for all of those designs, (O) indicates and observation; and (X)

denotes a treatment. There are 5 different experimental designs which we can use to evaluate the impact of a program. Each one affords particular advantages that if relevant to the validity concerns and purpose of the evaluation enable you to more faithfully assess the program. Whether you are able to employ this designs for evaluation depends on whether or not you can randomize, manipulate and control. To the extent that you can randomize (randomly select/randomly assign participants to a treatment and no treatment group); manipulate (manipulate which group receives a treatment and which group does not); control (control for extraneous factors that may influence or impact participants that may not involve the treatment itself—e.g., control lighting and temperature on performance) then you are able to use one of the experimental designs described (see pl. 316-319). Besides practical concerns you have to think about ethical concerns with regard to the potential risks and benefits of randomizing, manipulating and controlling the treatment and the participants—how ethical is it to withhold a potential treatment for cancer from a terminally ill patient? If you cannot randomize, manipulate or control within your evaluation design, then the alternative is to employ a quasi-experimental design. To be an experimental design, it must meet the following three elements of control: 1. Randomization 2. Manipulation 3. Comparison/control group. Quasi- experiments are similar to an experiment, except that this design does one or both of the following: Includes a quasi-independent variable--Quasi-independent variable: A preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study (e.g. Gender (man, woman), health status (lean, overweight, obese). It lacks an appropriate or equivalent control group. Strength: Allows researchers to study factors related to the unique characteristics of participants. Limitation: Cannot demonstrate cause and effect

Again, there are many ways to classify the various types of quasi-experimental designs, what is most important is to pay attention to the design that matches and address the purposes and validity threats that may influence and impact the evaluation. Mertens and Wilson describe the relevant issue with regard to quasi experimental designs in Box 9.5 (pp. 320-325)

WHAT ABOUT OTHER DESIGNS THAT DO NOT CONFORM TO THE EXPERIMENT AND QUASI- EXPERIMENT CLASSIFICATION? The last category of designs involves what I refer to as non-experimental or what Mertens and Wilson (2013) classify as qualitative designs. These designs do not share any of the characteristics that are required for experimentation (e.g., randomization, manipulate, control). These designs use of methods and procedures to make observations in which the behavior or event is observed “as is” or without an intervention from the researcher. Strength: Can be used to make observations in settings that the behaviors and events being observed naturally operate (e.g. Interactions between an athlete and coach during a game). Limitation: Lacks control needed to demonstrate cause and effect.

Correlational Designs

• Measurement of two or more factors to determine or estimate the extent to which the values for the factors are related or change in an identifiable pattern

• Correlation coefficient: Statistic used to measure the strength and direction of the linear relationship, or correlation, between two factors

• The value of r can range from -1.0 to +1.0 Naturalistic Observation

The observation of behavior in the natural setting where it is expected to occur, with limited or no attempt to overtly manipulate the conditions of the environment where the observations are made (e.g. Buying behavior in a grocery store, parenting behavior in a residential home Generally associated with high external validity, but low internal validity

Qualitative Designs

• Use of scientific method to make nonnumeric observations, from which conclusions are drawn without the use of statistical analysis

• Adopts the assumption of determinism; however, it does not assume that behavior itself is universal

• Determinism: Assumption in science that all actions in the universe have a cause • Based on the holistic view, or “complete picture,” that reality changes and behavior is dynamic

Phenomenology (Individual)

• Analysis of the conscious experiences of phenomena from the first-person point of view • The researcher interviews a participant then constructs a narrative to summarize the

experiences described in the interview • Conscious experience is any experience that a person has lived through or performed and can

bring to memory • The researchers must be considerate of the intentionality or meaning of a participant’s

conscious experiences • Identify objects of awareness, which are those things that bring an experience to consciousness

Ethnography (Group)

• Analysis of the behavior and identity of a group or culture as it is described and characterized by the members of that group or culture

• A culture is a “shared way of life” that includes patterns of interaction, shared beliefs and understandings, adaptations to the environments, and many more factors

• To observe a group or culture, it is often necessary to get close up to or participate in that group or culture

• To gain entry into a group or culture without causing participants to react or change their behavior

• Researchers can covertly enter a group • Researchers can announce or request entry into a group

• Participant observation: Researchers participate in or join the group or culture they are observing

• Researchers need to remain neutral in how they interact with members of the group • Common pitfalls associated with participant observation

• The “eager speaker” bias • The “good citizen” bias • The “stereotype” bias

Case Study Analysis of an individual, group, organization, or event used to illustrate a phenomenon, explore new hypotheses, or compare the observations of many cases Case history: An in-depth description of the history and background of the individual, group, or organization observed. A case history can be the only information provided in a case study for situations in which the researcher does not include a manipulation, treatment, or intervention Illustrative: Investigates rare or unknown cases Exploratory: Preliminary analysis that explores potentially important hypotheses Case studies have two common applications: 1. General inquiry 2. Theory development

The level of control in a research design directly related to internal validity or the extent to which the research design can demonstrate cause—effect. Experimental research designs have the greatest control and therefore the highest internal validity. Nonexperimental research designs typically have the least control and therefor the lowest internal validity.

Internal validity – Extent to which a research design includes enough control of the conditions and experiences of participants that it can demonstrate a single unambiguous explanation for a manipulation, that is cause and effect External validity – Extent to which observations made in a study generalize beyond the specific manipulations or constraints in the study Constraint: Any aspect of the research design that can limit observations to the specific conditions or manipulations in a study See also Merten & Wilson (2102) Box 9.8