Creating a Single-System (Subject) Design Study

profileYaya0625
Reading3week3.pdf

EDUCATION AND TREATMENT OF CHILDREN Vol. 31, No. 4,2008

A Preliminary Examination to Identify the Presence of Quality Indicators

in Single-subject Research

Melody Tankersley

Kent Statte University

Bryan G. Cook and Lysandra Cook University of Hawaii

Abstract

Scholars in the field of special education put forth a series of papers that proposed quality indicators for specific research designs that must be present for a study to be considered of high quality, as well as standards for evaluating a body of research to determine whether a practice is evidence-based. The purpose of this article was to pilot test the quality indicators proposed for single-subject research studies in order to identify points that may need clarification or revision. To do this, we examined the extent to which the proposed quality indicators were present in two single-subject studies, both examining the effects of teacher praise on specific behaviors of school-age children. Our application of the quality indicators indicated that neither study met the minimal acceptable criteria for single-subject research. We discuss the use of the quality indicators in relation to their clarity and applicability and suggest points for deliberation as the field moves forward in establishing evidence-based practices.

A dvocating that educators base practice on research-in other words, that evidence-based practices be the primary means of

instruction utilized in classrooms--first requires that specific practices have been identified as evidence-based. Although not all educators would agree (e.g., Gallagher, 2006), we assert that scientific research is the most reliable means for determining an educational practice to be effective or evidence-based (e.g.,. Kauffman & Sasso, 2006; Landrum & Tankersley, 2004). But just how should research findings be synthesized to determine the effectiveness ofE a practice?

By yielding an overall effect size across studies examined, meta- analyses (Glass, 1976; Kavale, 2001) have become popular for synthe- sizing research findings and their results have advanced our under- standing of effective educational practices. However, no established

Correspondence should be addressed to: Melody Tankersley, 405 White Hall, Kent State University, Kent, Ohio 44242; e-mail: [email protected] telephone: 330.672-0605.

Pages 523-548

TANKERSLEY, COOK, and COOK

approach currently exists for identifying the quality of studies that are synthesized (Cooper & Hedges, 1994), which may allow a poorly de- signed and executed study to influence the overall effect size - thereby potentially misidentifying an ineffective practice as effective, or vice versa. Moreover, no firm guidelines clearly establish the minimum number of studies needed to produce reliable meta-analytic results (Cooper & Hedges). Adding to this, no agreed-upon process exists for determining effect sizes (the metric required to conduct a meta-analy- sis) for single-subject research, although several methods have been proposed and debated (e.g., R2 as discussed by Allison & Gorman, 1993; percentage of non-overlapping data as discussed by Scruggs & Mastropieri, 2001).

To address such matters surrounding the identification of evi- dence-based practices, researchers in other fields developed and implemented frameworks for determining effective practices. For example, the Division 12 Task Force of the American Psychologi- cal Association (Chambless et al., 1998), the National Association of School Psychologists (Kratochwill & Stoiber, 2002), and the What Works Clearinghouse (WWC, established in 2002 by the U.S. Depart- ment of Education; http://www.whatworks.ed.gov/) have established guidelines for evidence-based practices in clinical psychology, school psychology, and general education, respectively. Although utilizing an existing framework, such as that developed for general education, for determining evidence-based practices for students with disabili- ties would be efficient, the WWC did not originally consider single- subject research in determining whether a practice is evidence-based. The WWC has recently added single-case designs as a special type of quasi-experimental designs that, in the absence of severe design or implementation problems, can be categorized as meeting evidence standards with reservations (the same level at which randomized control trials with severe design or implementation problems are cat- egorized). As of February 2008, the WWC had not yet disseminated standards for evaluating single-case research (WWC Evidence Stan- dards for Reviewing Studies, 2006). Given the low incidence of many disabilities and the individualized nature of special education, special educators have frequently applied single-subject research to examine the effectiveness of a wide variety of practices (e.g., Lloyd, Tankersley, & Talbott, 1994; Odom & Strain, 2002; Tawney & Gast, 1984). As well- designed single-subject studies exhibit functional control, it seems that single-subject research should play a prominent role in determin- ing evidence-based practices in special education.

The Council for Exceptional Children's Division for Research (CEC-DR) sponsored a series of papers published in a special issue of

524

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

Exceptional Children that proposed quality indicators for experimen- tal, single-subject, correlational, and qualitative research that must be present for a study to be considered of high quality in special educa- tion (Graham, 2005). Homer, Carr, Halle, McGee, Odom, and Wolery (2005) recommended that high quality single-subject research studies meet a number of minimally acceptable methodological criteria in seven areas: description of participants and settings, the dependent vari- able, the independent variable, baseline, experimental control/inter- nal validity, external validity, arnd social validity. Moreover, they pro- posed standards for evaluating a body of single-subject research to determine if a practice is evidence based: (a) at least five studies that meet minimally acceptable methodological criteria, document experimen- tal control, and have been published in peer-reviewed journals; (b) the studies must be conducted by at least three different researchers across at least three different geographical locations; and (c) the stud- ies cumulatively include a total of at least 20 participants.

The minimally acceptable methodological criteria provide a framework for evaluating individual single-subject research studies to determine whether the results can be used for establishing evidence- based practices. Although such an evaluative role might not have been the intent of Homer et al. (2005), and using the criteria in an evalua- tive sense may be an unfair use of them, the overall goal of the special issue in which these criteria appeared was to "establish a set of quality indicators that were clearly stated, understandable, and readily avail- able for use as guides for identifying high-quality research in special education" (Odom et al. 2005, p. 142). As such, several papers have already been published that use the methodological criteria and their specified quality indicators set forth in the special issue of Exceptional Children to evaluate the methodological merit of research studies (e.g., Browder, Wakeman, Spooner, Ahlgrim-Delzell, & Algozzine, 2006), making this exercise meaningful in terms of establishing how useful they are as an evaluative tool.

Establishing guidelines for high-quality research and standards for evidence-based practice, such as those proposed by Homer et al. (2005), is an endeavor that has the potential to focus the efforts of spe- cial education and improve the outcomes of students with disabili- ties (see Lloyd, Pullen, Tankersley, & Lloyd, 2006). To build upon this foundation, it appears to us that the next step in this process of estab- lishing evidence-based practices in special education is to "pilot test" the proposed quality 4ndicators by applying them to actual studies. If meaningful difficulties are encountered during pilot testing, the qual- ity indicators can then be revised in order to optimize their reliability and validity. In the following sections, we therefore provide an over-

525

TANKERSLEY, COOK, and COOK

view of the quality indicators proposed for single-subject research, accompanied by a description of our thought process as we assessed the presence of the quality indicators in the reports of two research studies.

Quality Indicators for Single-subject Research

Homer et al. (2005) specified seven broad methodological fea- tures, or quality indicators, that must be present and adequately addressed for a study "to be a credible example of single-subject re- search" (p. 173): description of participants and setting, dependent variable, independent variable, baseline, experimental control/inter- nal validity, external validity, and social validity. Homer et al. enumer- ated specific criteria required to achieve each of these quality indica- tors (see Figure 1). In this paper, we examine the degree to which the quality indicators and their criteria, as proposed by Homer et al., are present in two single-subject studies. We chose single-subject stud- ies that assessed the effectiveness of the same independent variable, teacher attention, with a similar group of participants (students with or at-risk for emotional and behavioral disorders) but were conduct- ed many years apart for our pilot test. Because the quality indicators were intended to examine the research base regarding the extent to which interventions cause meaningful change in dependent variables, we thought they should be applicable to the entire library of stud- ies available for a particular intervention--studies conducted in the distant past as well as those that are more recent. We chose only two research articles instead of the entire body of literature related to the independent variable so that we could explore in depth the process of applying the quality indicators and describe the experience in detail; therefore, we do not make a determination of the effectiveness of the practice, but instead, only describe our application of the quality in- dicators.

In the first study, Hall, Lund, and Jackson (1968) investigated the effects of contingent teacher attention on the study behavior of six students who were nominated for participation by their teachers for disruptive and dawdling behaviors. This study appeared as the first research article in the Journal of Applied Behavior Analysis and has subsequently been cited 284 times according to the Web of Science (accessed January, 2008). The results of the ABAB withdrawal de- signs indicated that students increased their study behavior when the teachers provided attention contingent upon students being engaged in appropriate behavior. In the second study, Sutherland, Wehby, and Copeland (2000) investigated the effects of a teacher's behavior-spe- cific praise on the on-task behavior of nine students identified with

526

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH 527

Describing Participants and Settings

1. Participants described with sufficient detail to allow others to select individuals with similar characteristics (e.g., age, gender, disability, diagnosis).

2. The process for selecting participants is described with replicable precision. 3. Critical features of the physicad setting are described with sufficient precision

to allow replication.

Dependent Variable 1. Dependent variables are desciibed with operational precision. 2. Each dependent variable is measured with a procedure that generates a

quantifiable index. 3. Measurement of the dependent variable is valid and described with

replicable precision. 4. Dependent variables are measured repeatedly over time. 5. Data are collected on the reliability of interobserver agreement associated

with each dependent variable, and IOA levels meet minimal standards (e.g., IOA = 80%, Kappa = 60%).

Independent Variable 1. Independent variable is described with replicable precision. 2. IV is systematically manipulatid and under the control of the experimenter. 3. Overt measurement of the fidelity of implementation for the independent

variable is highly desirable.

Baseline 1. The majority of single-subject research studies will include a baseline phase

that provides repeated measurament of a dependent variable and establishes a pattern of responding that caut be used to predict the pattern of future performance, if introduction or manipulation of the independent variable did not occur.

2. Baseline conditions are described with replicable precision.

Experimental Control/Internal Validity 1. The design provides at least tluee demonstrations of experimental effect at

three different points in time. 2. The design controls for common threats to-internal validity (e.g., permits

elimination of rival hypothesis) 3. The results document a pattern that demonstrates experimental control.

External Validity 1. Experimental effects are replicated across participants, settings, or materials

to establish external validity.

Social Validity 1. The dependent variable is socially important. 2. The magnitude of change in the DV resulting from the intervention is

socially important 3. Implementation of the IV is practical and cost-effective. 4. Social validity is enhanced by implementation of the IV over extended

time periods, by typical interverition agents, in typical physical and social contexts.

Figure 1. Quality Indicators and Criteria for Determining Whether a Study Meets the Acceptable Methodological Rigor Needed to be a Credible Example of Single-Subject Research as Proposed by Homer et al. (2005)

TANKERSLEY, COOK, and COOK

emotional and behavioral disorders. During each intervention session, the teacher set a goal of delivering six behavior-specific praise state- ments and observers provided him feedback about his use of praise following each session. Results suggest that the teachers' increased use of behavior-specific praise caused an increase in students' on-task behavior. Although Sutherland et al. also discussed teacher behavior in their study (e.g., number of behavior-specific praise statements), we only focus on the analysis of the student behavior for this article. The Sutherland et al. study has been cited 13 times according to the Web of Science (accessed January, 2008) and represents a more recent application of teacher praise than the Hall et al. study.

In our pilot test the first author reviewed each study in relation to the 21 components of the 7 quality indicators put forth by Homer et al. (2005), using a binary scale (yes, the quality indicator is present or no, the quality indicator is not present) and described her justifica- tion for her score by making notes of specific evidence offered in the research studies or used to explain the quality indicator. We used a bi- nary scale because Homer et al. only stated what information should be in evidence in quality studies. The first and second author then re- viewed each quality indicator in relation to its presence in both stud- ies and confirmed a rating together, discussing points of disagreement and gaining consensus on a final score. Using this final score as the rating standard (see Table 1), the third author independently rated the presence of the each component of the quality indicators for each study. We calculated inter-rater reliability by dividing agreements by the total number of quality indicators and multiplying by 100. Per- haps it should be noted that each of the authors have some experience with single-subject research methods--as researchers (e.g., Mancina, Tankersley, Kamps, Kravits, & Parrett, 2000), but more extensively as translators of research through written outlets (e.g., Cook, Rumrill, Webb, & Tankersley, 2001; Tankersley, Harjusola-Webb, & Landrum, in press; Tankersley, McGoey, Dalton, Rumrill, & Balan, 2006) and ed- ucators in graduate programs that incorporate single-subject research methods extensively into their curricula.

The decision of the independent rater agreed with that of the collaborative decisions of the first two authors on 16 of the 21 crite- ria for the Hall et al. (1968) study, and 13 of the 21 criteria for the Sutherland et al. (2000) study-resulting in a total inter-rater agree- ment rate of .69 (the asterisk beside items in Table 1 indicate disagree- ment between the collaborative rating of the first two authors and the independent rating of the third author). In the following sections, we examine whether the quality indicators are present in the studies and describe our decision-making process for evaluating each in relation

528

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH 529

Table 1. Summary of Quality Indicators

Met by Hall et al. (1968) and Sutherland et al. (2000)

Quality Indicator Hall et al. Sutherland et al. Component (1968) (2000)

Describing participants and settings Participant description Participant selection Setting description

Dependent variable (DV) DV description Measurement of DV quantifiable Measurement of DV valid and well-dE-scribed DV measured repeatedly Interobserver agreement

Independent variable (IV) IV description IV systematically manipulated Fidelity of implementation

Baseline Repeated measurement, established peittem Baseline description

Experimental control/Internal validity Three demonstrations of experimental effect Internal validity Pattern of results

External validity

Social validity DV is socially important Change in DV is socially important IV is practical and cost effective Use in typical contexts

No no no no

No no yes no no yes

No no yes* no

No no* no

No yes yes*

no

Yes

No yes no* no* yes

No no no yes*

No yes yes yes no* yes*

No yes* yes* no

No no no

No yes yes* no

Yes

No yes no* no* yes

* indicates disagreement between the collaborative rating of the first two authors (reported as yes/no in the table) and the independent rating of the third author

TANKERSLEY, COOK, and COOK

to the criteria put forth by Homer et al 2005.

Description of Participants and Setting

The first of Homer et al.'s (2005) seven quality indicators focuses on description of the participants and setting of the study. Homer et al. (p. 166) stated that, "single-subject research requires operational definitions of the participants, settings, and the process by which par- ticipants were selected." According to Homer et al., the aim of these criteria is to allow other researchers to replicate the selection of per- sons with similar characteristics and to demonstrate the effects of the intervention in a similar setting.

Participants. Hall et al. (1968) noted that teachers (who were rec- ommended by their principals) nominated students who were dis- ruptive or dawdled. Hall et al. described a variety of general types of nonstudying behavior that participating students displayed (e.g., snapping rubber bands), but little else. In the abstract (not in the text description of participants), the authors indicated that one student was a first-grader and the other five were third-graders, but did not consistently provide other demographic data for the participants. Furthermore, Hall et al. didt not describe the specific process teachers used to nominate students for participation nor did they provide a definition of disruptive and dawdling behaviors that teachers applied in making those nominations.

Sutherland et al. (2000) provided a more detailed description of their participants and the setting in which the study occurred. Spe- cifically, the authors provided students' ages, genders, and races. Moreover, participants were identified as having emotional and be- havioral disorders (EBD). However, Sutherland et al. did not describe the process for selecting these participants, the specific characteristics associated with EBD that these students presented (e.g., aggression, withdrawal, noncompliance), or the processes for determining they had EBD. As Homer et al. (2005) stated, "global descriptions such as identifying participants as having developmental disabilities would be insufficient" (p. 167).

Setting. The primary description of the settings provided by Hall et al. (1968) was not sufficient for purposes of replication. Hall et al. stated only that, "the studies were carried out in classrooms of two elementary schools located in the most economically deprived area of Kansas City, Kansas" (p. 1). In the only detail provided about the classrooms, the authors noted the number students in the classes for five of the six participants. Sutherland et al. (2000), however, provided more information and described the classroom in which the interven- tion occurred as "a fifth-grade self-contained classroom for students

530

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

with EBD" (p. 3) and presented the type of school and its geographic location, along with the arrangement of the classroom. This descrip- tion provides enough detail to select a similar setting and therefore satisfies this criterion.

Given the limited information about participants and settings, as well as the selection of participants, it appears to us that the Hall et al. study does not meet the describing the participants and setting quality indicator. Even though they provided a detailed description of the set- ting and the participants, Sutherland et al. also fail to meet the descrip- tion of participants and settings quality indicator because they did not describe the participants or the process for selecting the participants with replicable precision.

Dependent Variable

Homer et al. (2005) proposed that for a single-subject study to meet the dependent variable quality indicator, the dependent variables must be: operationally defined, measured quantifiably, measured using a technique that is valid and described with replicable preci- sion, measured repeatedly, and assessed for consistency in recording (meeting minimal interobserver agreement ratings standards of 80% or kappa = 60%).

Operationally defined. The dependent variable for all of the par- ticipants in the study reported by Hall et al. (1968) was study behavior (in addition, teacher attention was analyzed and reported for one stu- dent and disruptive behavior for another student). Hall et al. gener- ally defined study behavior as "orientation toward the appropriate object or person: assigned course materials, lecturing teacher, or recit- ing classmates, as well as participation by the student when requested by the teacher" (p. 2). Although study behavior is defined clearly in the global sense, Hall et al. further stated that each child had his or her own specific definition of study behavior, yet did not provide those specific definitions. Given this omission, we conclude that the Hall et al. study did not meet the criterion for operationally defining the dependent variable. Sutherland et al. (2000) investigated change in students' on-task behavior, which they defined similarly to Hall et al. (1968) as "orientation by the target student(s) toward the appropriate object or person. This behavior iacluded following directions given by the teacher, paying attention to the speaker (peer or adult), and working on assigned tasks" (p. 4). We determined that Sutherland et al. provided an adequate operational definition of the target behavior and meet this criterion.

Measured quantifiably, using a technique that is valid and described with replicable precision. In relation to measurement of the dependent

531

TANKERSLEY, COOK, and COOK

variable, Hall et al. (1968) used a 10-sec interval recording technique to measure study behavior, a procedure that does result in a quantifiable index and seems to be a valid method for measuring study behavior (Homer et al., 2005, provided no guidelines for determining validity of measurement). Although the authors described where observers sat in the room and observers' behavior during the observations, they did not indicate which type of interval recording system was used (e.g., whole, partial, momentary time sampling) or describe how behavior was recorded so that readers can deduce which type of system was used. Therefore, Hall et al. do not meet the criterion for defining the measurement technique with replicable precision. Sutherland et al. (2000) used a 1-min interval, momentary time sampling procedure to observe the behavior of students in quadrants of the classroom. This measurement technique is valid for measuring on-task behavior, gen- erated a quantifiable index, and was reported with sufficient precision to allow for replication.

Measured repeatedly. In terms of repeated measurement, Hall et al. (1968) measured the dependent variable of studying between 26 and 51 times per student over the course of the study, and each phase of the reversal designs included between 3 to 20 data points (mean = 8.7, median = 7). Although Homer et al. (2005) did not specify a mini- mal number of data points required for each condition, they stated that "sufficient assessment occasions are needed to establish the over- all pattern of performance under that condition (e.g., level, trend, vari- ability)" (p. 167). For evaluating the baseline and comparison condi- tions, however, Homer et al. stated that, "documentation of a predict- able pattern during baseline typically requires multiple data points (five or more, although fewer data points are acceptable in specific cases)" (p. 168). It seems to us that patterns of performance, especially trend, cannot be calculated adequately with only three data points, and because Hall et al. provided no rationale for including so few measures of the dependent variable in some conditions, we conclude that Hall et al. do not meet the criterion for repeated measurement of the dependent variable. Similarly, Sutherland et al. (2000) measured on-task behavior 25 times over the course of their reversal design, with 3 to 10 data points per condition. Because the reversal phase with three data points does not provide sufficient opportunity to document a pattern and because no rationale is provided for including so few data, we also conclude that Sutherland et al. do not meet the criterion for repeated measurement of the dependent variable,

Assessed for consistency in recording. In relation to consistency in measurement, Hall et al. (1968) reported "periodic checks [were] made during each condition" (p. 4) for each student, with interob-

532

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

server agreement ranging from "consistently over 80%" (p. 10) to 96%, using interval-by-interval : calculation procedures. Although the percentage of agreement is strong, knowing how many times the re- searchers collected agreement chata would provide further support for this criterion. Yet Homer et al. (2005) did not specify how many obser- vations checks should be conducted. Instead, they stated that "report- ing interobserver agreement only for the baseline condition or only as one score across all measures in a study would not be appropriate" (p. 167). Hall et al. dearly stated that they assessed agreement during each condition and provided a :range across conditions per student; therefore, we conclude that they meet the criterion for assessing con- sistency in recording. Sutherland et al. reported interobserver agree- ment data on the occurrence and nonoccurrence of on-task behavior for 32% of the observation sessions across all phases of their study, with mean agreement of 82.5% (range 60%-100%). Although specific points fall below Homer et al.'s mrknimum standard of 80% for interob- servor agreement, the average is above the standard and we therefore deem Sutherland et al. as also meeting this criterion.

Considering all of the criteria, we conclude that neither study met the dependent variable quality indicator. Hall et al. (1968) did not meet the criteria associated with defining and measuring the depen- dent variable and neither Hall et al. nor Sutherland et al. (2000) mea- sured the dependent variable repeatedly.

Independent Variable

The third quality indicator -proposed by Homer et al. (2005) in- volves the independent variable. The three criteria assessed in relation to the independent variable are (a) describing the independent vari- able with replicable precision; (b) actively, rather than passively, ma- nipulating the independent variable; and (c) documenting the fidelity of independent variable implementation.

Description. Hall et al. (1968.) described their intervention in the following way:

The observer held up a small square of colored paper in a manner not likely to be noticed by the pupil whenever the pupil was engaged in study. Upon this signal, the teacher attended to the child, moved to his desk, made some verbal comment, gave him a pat on the shoulder, or the like. During weekly after-school sessions, experimenters and teachers discussed the rate of study achieved by the pupil and the effectiveness of attention pro-sided by the teacher, and made occasional adjustments in instructions as required. (p. 2)

Although the description explains what generally occurred during

533

TANKERSLEY, COOK, and COOK

intervention, as we thought of implementing the intervention based on this description, we questioned when in the process of the student studying would the observer hold up the colored paper. For example, must the student be oriented toward the appropriate person or object for a period of time, say 5-sec, before the paper was held up? Was disruption and dawdling (the two behaviors that were the basis of student nomination) ignored? What type of attention did the teacher provide to the students? As Homer et al. (2005, p. 167) noted, general descriptions of interventions "that are prone to high variability in implementation," do not meet this criterion.

The intervention in the Sutherland et al. (2000) study first in- volved the observer meeting with the teacher and describing the bene- fits of behavior-specific praise on student on-task behavior, providing examples of praise statements used during the baseline observations, and discussing the teacher's rate of praise during baseline. The ob- server and teacher then set a goal of six behavior-specific praise state- ments during observation sessions for the intervention phases. Next, also prior to data collection for the intervention stage, the observer met with the teacher to remind him of the goal and provide an ex- ample of a behavior-specific praise statement. Then after observation, the observer presented the teacher with feedback on his (the teacher's) rate of praise and reviewed examples of his praise statements pro- vided in class. We found the description of the intervention to be clear and replicable.

Systematically manipulated. The independent variable in both the Hall et al. (1968) and the Sutherland et al. (2000) studies were active- ly implemented, meaning they were systematically introduced and withdrawn in relation to the design and were under the control of the experimenters. The authors introduced interventions after assessing outcomes during baseline conditions, withdrew interventions, and re- instated them after the second baseline condition--thereby satisfying this criterion.

Fidelity of implementation. In relation to measuring implementa- tion fidelity, Homer et al. (2005) stated that this element "is expected either through continuous direct measurement of the independent variable, or an equivalent" (p. 168). Hall et al. (1968) alluded to giving teachers feedback on the effectiveness of their attention to students, which might be a form of intervention fidelity. However, they do not describe how or how often they assessed it, so this criterion was not met by Hall et al. Likewise, Sutherland et al. (2000) did not meet this criterion as they provided no documentation of how often observers met with the teacher prior to intervention to remind him of the goal, provide an example of behavior-specific praise, or provide feedback on his use of behavior-specific praise.

534

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

Hall et al. (1968) met one of the three criteria (i.e., the indepen- dent variable is actively manipLdated) related to the independent vari- able quality indicator whereas Sutherland et al. (2000) met two (i.e., they actively manipulated the independent variable and provided specific descriptions of procedures and actions in relation to the in- dependent variable). However, neither study provided sufficient evi- dence of the fidelity of independent variable implementation and, as such, do not meet this quality indicator.

Baseline

Most single-subject studies include a baseline, or comparison, condition to which the performance of the subject(s) during interven- tion conditions is compared. Homer et al. (2005) stated that baseline conditions should establish a sufficiently consistent pattern of re- sponding that can be used to predict future performance if the inde- pendent variable was not introduced. That is, a baseline must include multiple data points that do not have a substantive trend or have a trend in the opposite direction than expected during intervention. Moreover, Homer et al. state that the baseline condition should be described with replicable precision.

Repeated measurementlestablished pattern of responding. The initial baseline conditions for all participants in the Hall et al. (1968) study included multiple data points (7-15) and the initial baseline condition for Sutherland et al. (2000) included 10 data points. In their descrip- tion of this criteria, Homer et al. (2005) specified that "documentation of a predictable pattern during baseline typically requires multiple data points (five or more, although fewer data points are acceptable in specific cases) without substantive trend, or with a trend in the di- rection opposite that predicted by intervention" (p. 168). Both Hall et al. and Sutherland et al. used reversal designs, which require that baseline conditions be reinstated during the course of the study. As such, we also evaluated the number of data points during the reversal conditions and found that both studies used fewer than the specified five, without providing a justification or rationale. Hall et al. incorpo- rated only three or four data points during the reversal phase for four of their participants and Sutherland et al. included only three.

Description. Hall et al. (1968) described their baseline conditions as 30-min observations, scheduled at a time of the day when the stu- dents were engaged in seatwork, cccurring 2 to 4 times per week for a minimum of two weeks. Although the description of the baseline condition provides the necessary information for data collection (e.g., length of observation sessions), it does not meet Homer et al.'s (2005) criterion of replicable precision as it provided no discussion of what

535

TANKERSLEY, COOK, and COOK

happened in the classroom during the baseline condition. Sutherland et al. (2000) described baseline conditions as follows:

During the baseline phase, no changes in the teacher behavior were made. The sessions typically consisted of teacher- led social skills lessons during which the students were encouraged to actively participate through discussion and role play. (p. 4)

Sutherland et al. also provided information about the social skills lessons that occurred during all phases of the study, noting topics and instructional methods used, and that a structured curriculum was not followed. Although the condition was described, it seems unlikely that another researcher could replicate this baseline in terms of content. Like the Hall et al. (1968) study, Sutherland et al.'s baseline condition seemed to be the teacher's typical practice. "Typical practice" may be a legitimate baseline condition, but when applying Homer et al.'s (2005) standard of describing the baseline with "replicable precision," both Hall et al. and Sutherland et al. lacked sufficient specificity.

Neither Hall et al. (1968) nor Sutherland et al. (2000) satisfy the baseline quality indicator because neither described their baseline con- ditions with replicable precision nor graphically represented suffi- cient data points in their reversal conditions.

Experimental Control/Internal Validity

Homer et al.'s (2005) fifth quality indicator, experimental control! internal validity, "is demonstrated when predicted change in the de- pendent variable covaries with manipulation of the independent vari- able" (p. 168). They specified that three criteria be met to document experimental control: (a) a minimum of three demonstrations of ex- perimental effect at three different points in time, (b) common threats to internal validity are controlled for (e.g., passage of time, measure- ment effects, uncontrolled variables), and (c) results document a pat- tern that illustrates experimental control.

Three demonstrations. Both Hall et al. (1968) and Sutherland et al. (2000) utilized reversal designs for each participant- a design that, by definition, establishes three demonstrations of experimental control: when the intervention is (a) first introduced, (b) withdrawn, and (c) re-introduced. Although simply providing opportunities at different points in time does not mean that experimental control has been dem- onstrated, we have interpreted this criterion to evaluate only whether three opportunities at three different times is available. Evaluating whether experimental control has been established seemed to us to be the point of the final criterion of this quality indicator. As such, both

536

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

studies meet the criterion requirng three opportunities over three dif- ferent times to demonstrate exp!rimental control.

Internal validity. We were not sure how to assess the criterion for controlling common threats 1:o internal validity. For the most part, we conclude that properly executed single-subject designs such as reversal designs account for such threats., Without authors present- ing additional information indicating that such threats existed in the study (e.g., changes in the environment or measurement technique occurred; Kazdin, 1982), it appeared to us that implementing an ac- ceptable single-subject research design, as both Hall et al. (1968) and Sutherland et al. (2000) did, satisfies this criterion.

Documented experimental conztrol. Homer et al. (2005) prescribed integrating information about the "level, trend, and variability of per- formance occurring during baseline and intervention conditions" (p. 171) to determine the extent to which a functional relationship be- tween the dependent and independent variables exists. Homer et al. specified that,

Demonstration of a functional relationship is compromised when (a) there is a long latency between manipulation of the indepen- dent variable and change in the dependent variable, (b) mean changes across conditions are small and/or similar to changes within condi- tions, and (c) trends do not conform to those predicted following in- troduction or manipulation of the independent variable. (p. 171)

Hall et al. (1968) provided mean levels of the dependent variable for all phases for all participants except one, for whom the mean level for the second intervention phase is described as "... stabilizing at a rate ranging between 70% and 80%" (p. 4). In relation to trend, vari- ability, and latency of performano- , other than sporadic use of phrases that included terms related to direction or stability, such as "study rates continued to rise" (p. 4) and "a fluctuating but declining rate of study behavior" (p. 7), Hall et al. did not consistently report on these features of their findings. Similarly, the Sutherland et al. (2000) study reported only mean levels for each phase of their design and did not discuss trends or variability in the data (although these features could be estimated through visual inspection of the graphed data-provided in both studies).

Evaluating whether the studies met the criterion that results document a pattern illustrating experimental control for this qual- ity indicator proved difficult for us. On the one hand, we can visu- ally inspect the data and determine that no significant, unpredictable trends in the data were present, that there was no latency between the manipulation of the independent variable and the response of the dependent variable, and that the data were relatively stable for

537

TANKERSLEY, COOK, and COOK

both studies. However, Homer et al. (2005) discussed the "interpreta- tion" of these elements and the "integration" of results as determining whether a functional relationship exists, suggesting that the responsi- bility of documentation falls to the researchers. Given the importance of the interpretation of the data in establishing experimental control, we conclude that neither Hall et al. (1968) nor Sutherland et al. (2000) meet this criterion because they did not adequately describe how their results demonstrated experimental control in their discussion of study findings.

Although both Hall et al. (1968) and Sutherland et al. (2000) included three opportunities at three different times to demonstrate experimental control and both implemented single-subject research designs that controlled for threats to internal validity, they did not meet the experimental control/internal validity quality indicator because they did not discuss the extent to which their results document a pat- tern that establishes experimental control.

External Validity

To meet the quality indicator of external validity, Homer et al. (2005) required that multiple participants, settings, materials, or be- haviors be included in a study. We interpreted this to mean that utiliz- ing multiple participants, settings, or (rather than and) behaviors satis- fies this criterion. Hall et al. included six participants in their study, which included three different interventionists (i.e., teachers) in three different classroom settings. Moreover, they discussed the effects of the intervention on behaviors other than their primary dependent variable, study behavior (e.g., grades, disruptive behaviors, academic achievement), thereby satisfying the quality indicator. Sutherland et al.'s (2000) application of the intervention allowed documentation of results in relation to the one setting, one interventionist, and 6 stu- dents in the classroom. Although teacher use of praise statements (behavior-specific and non-behavior-specific) is reported, the only student outcome reported is on-task behavior. Sutherland et al. met the requirements of external validity by incorporating more than one student in the investigation.

Social Validity

The final quality indicator focuses on the social validity of the goals, procedures, and findings of a study. Homer et al. stated that the criteria for demonstrating social validity are:

"* The dependent variable is socially important; "* The magnitude of change in the dependent variable

resulting from the intervention is socially important;

538

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

"* Implementation of the independent variable is practical and cost effective; and

"* Social validity is enhlmced by implementation of the independent variable over extended time periods, by typical intervention agents, in typical physical and social contexts. (p. 174)

Social importance. Both Hall et al. (1968) and Sutherland et al. (2000) discussed how the dependent variables related to important school behavior. Hall et al. briefly discussed how "disruptive behav- iors and low interest in academic achievement [were] conditions gen- erally conceded to make teaching and motivation for study difficult" (p. 11) in their conclusion. Sutherland et al. globally identified off-task behavior as an inappropriate behavior often demonstrated by stu- dents with EBD and as such, targeted on-task behavior as an incom- patible behavior to increase students' classroom success. Although the points they made in relation to the social importance of the dependent variables were brief and unsupported by references to empirical evi- dence, we found that both Hall et al. and Sutherland et al. satisfied this criterion.

Magnitude of change. In addition to targeting a socially impor- tant dependent variable, Homer et al. (2005) required that the change in the dependent variable as a result of the intervention be meaning- ful. Although the data dearly demonstrated that the participants in both the Hall et al. (1968) and Sutherland et al. (2000) studies gained in their targeted behaviors, the authors made no connection between these results and the clinical needs of their participants. As such, nei- ther study meets this criterion.

Practical and cost effective. Homer et al. (2005) noted in their crite- ria for documenting the social vaLdity of a study that the implementa- tion of the independent variable be practical and cost-effective. How- ever, Homer et al. provided no specific guidelines for evaluating this criterion. We can speculate that an intervention that uses little more than a cue to praise or feedback on use of behavior specific praise would be practical and cost effective; however, when we consider the fact that the cue was another person holding a card (as in the Hall et al., 1968, study) or the feedback was given by an observer in the classroom (as in the Sutherland et al., 2000, study) the costs increase dramatically. Regardless of our speculations, neither Hall et al. nor Sutherland et al. provided any discussion as to their interventions be- ing practical or cost effective. As such, we must determine that neither study met this criterion.

Use in typical contexts. Homer et al. (2005) stated that social valid- ity is enhanced when the intervention is implemented over time, by

539

TANKERSLEY, COOK, and COOK

typical interventionists, and in typical contexts. Both Hall et al. (1968) and Sutherland et al. (2000) incorporated interventions that typical teachers implemented within their own classrooms for multiple ses- sions and therefore meet this criterion.

Although both studies implemented dependent variables that are socially important and evaluated the interventions in typical con- texts with typical interventionists over time, both studies fail to meet the social validity quality indicator because neither provided evidence that the independent variable was practical and cost effective, nor dis- cussed the relationship between the change in dependent variable and the needs of the subjects.

Discussion of Applying Quality Indicators to Two Studies

We selected these two studies (i.e., Hall et al., 1968; Sutherland et al., 2000) to review based on no specific criteria other than that they examined the same treatment and outcome for students with or at risk for developing EBD, their design (i.e., single-subject), and that they appeared to be of high quality after an initial reading. Because our goal was to pilot the quality indicators for single-subject research as proposed by Homer et al. (2005) and to discuss that process in depth, we also wanted studies that were separated by several years. It is note- worthy that our cursory search of the literature revealed a consider- ably higher number of studies that used single-subject research meth- ods to determine intervention effectiveness for students with EBD, in comparison to experimental or quasi-experimental investigations. The multitude of single-subject studies that exist on a variety of behavioral interventions suggests that the literature base is sufficiently robust to determine whether many educational practices are evidence-based vis-a-vis single-subject research.

We determined that neither of the studies reviewed were of ac- ceptable quality according the quality indicators set forth by Homer et al. (2005). Although not explicitly stated, it appears that Homer et al. require that all quality indicators be met for a study to be consid- ered of high quality. However, both Hall et al. (1968) and Sutherland et al. (2000) met each of the criteria for only one of the seven quality indicators (i.e., external validity). See Table 1 for an overview of the results of our application of the quality indicators to the two studies.

A number of possible explanations exist as to why so few quality indicators were present in the studies we examined. For example, it may be that the studies we selected happened not to be of very high methodological quality. In other words, perhaps applying the qual- ity indicators correctly identified the designs and procedures of these studies as being unacceptable. As we elaborate in the following para-

540

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

graphs, we do not think this is the primary cause of these studies not meeting the proposed quality indicators.

A second possible reason that we found both studies we re- viewed to be of unacceptable methodological quality is that the qual- ity indicators are excessively rigorous. Quality indicators for special education research should, we posit, be guided by a consensus among leaders in the field regarding what constitutes high quality research. Just how high the bar for methodological excellence should be set is a critical matter that warrants w.rious consideration and debate. Set- ting the bar too high excludes meaningful studies; setting it too low allows flawed research to influence which practices the field regards as evidence-based.

After applying the proposed quality indicators to two single- subject research studies, we agree with the spirit of all the proposed quality indicators and criteria. hi fact, the quality indicators set forth by Homer et al. (2005) parallel the dimensions of applied behavior analysis as called for by Baer, Wolf, and Risley (1968) to guide the re- porting of single-subject research studies. According to these pioneers, the evaluation of an applied behavior analysis study "must be applied, behavioral, and analytic; in addition, it should be technological, conceptu- ally systematic, and effective, and it should display some generality" (p. 92. italics in original). Certainly, the quality indicators proposed by Homer et al. are sound methodological tools for investigating out- comes of interventions--interventions derived from a behavior ana- lytic perspective or other conceptual frameworks that contribute to applied and experimental work (see Kazdin, 1982 for a discussion).

Although we agree with the spirit of the quality indicators and recognize they are built firmly on the guiding principles of applied be- havior analysis, we found that without operational definitions of the quality indicators, we encountered difficulties in applying them. Not only did we, independently and collaboratively, have a difficult time making decisions regarding the presence of the quality indicators, we also had an unacceptably low level of inter-rater of agreement. Homer et al. (2005) specified a criterion of at least .80 for inter-rater agreement for studies reviewed, which seems reasonable to apply to the applica- tion of quality indicators, as well. If subsequent applications Homer et al.'s proposed quality indicators for single-subject research yield similarly low levels of inter-rater reliability, refinement of the quality indicators seems dearly warranted.

Indeed, we suggest that the field make efforts to clarify and per- haps even re-state some of the criteria. For example, although imple- menting a practical, cost-effective independent variable is an impor- tant consideration in single-subject research, practicality might not be

541

TANKERSLEY, COOK, and COOK

an essential element of a practice's effectiveness. Moreover, we think that research reports that do not address every requirement discussed in the quality indicators might still be meaningful. For example, Hom- er et al. (2005) stated that specifying the instruments and processes used to determine a disability is important for replication and should be included in the description of participants. We agree that such in- formation is useful; however, studies that do not supply that level of detail, but instead only state the identified disability category of the participants, perhaps should not be eliminated from further consider- ation in determining whether the study contributed meaningfully to the research literature.

It is also possible that we misinterpreted and/or misapplied the quality indicators. That is, at times, we found it difficult to determine whether the studies satisfied particular quality indicators and we may have erred in concluding that they did not. Our confusion in applying the quality indicators seemed to primarily arise from two sources: (a) lack of clarity/completeness in reporting the research and (b) lack of clarity in describing the quality indicators. Often, when the research- ers reported insufficient information for us to clearly conclude wheth- er a quality indicator had been addressed, the shortcoming appears to have been a function of reporting, rather than reflecting a significant methodological inadequacy in the studies. For example, neither Hall et al. (1968) nor Sutherland et al. (2000) adequately described the pro- cess for selecting participants or clearly discussed that the patterns in their findings demonstrated experimental control. Yet we are confi- dent that these experienced researchers could easily have provided more in-depth descriptions of participant selection procedures and discussed trends and variability of their data. Indeed, aside from the areas of intervention implementation fidelity and interventionist ac- ceptability of treatment, all unmet quality indicators in the studies we reviewed seem to be related to omissions in reporting, rather than inappropriate methodology.

Page limitations of most journals and concerns regarding the "readability" of articles may often compromise authors' abilities to report every aspect of a research study in detail. Many high quality journals are notoriously restrictive in their formatting requirements and reviewers or editors may abridge reports of research in order to improve the readability of the published articles. It also may be prob- lematic to penalize, in a sense, studies for not reporting information that the authors did not know they needed to account for (e.g., cost effectiveness). It appears ill-advised to discount studies because of in- complete reporting when the missing information does not represent an important methodological flaw-especially when the researchers

542

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

were not aware of future standards for reporting that information. Alternatively, assuming that studies did meet quality indicators when presented with incomplete information increases the risk of "false positives" (i.e., considering studies to be of high quality when they actually are not).

The other primary source of difficulty for us in determining whether the studies we reviewel met the quality indicators was con- fusion regarding,what exactly was required to meet the proposed cri- teria. For example, in the published table that lists the quality indica- tors and their criteria, Homer et al. (2005) stated that measurement of intervention implementation fidelity is "highly desirable" (p. 174) in evaluating the quality of a study. Yet in their descriptive text preced- ing the table they stated that "documentation of adequate implemen- tation fidelity is expected..." (p. 168). Such conflicting information (desirable vs. expected) made it unclear as to whether reporting this information was absolutely required.

We also found several of the criteria to require some degree of subjective determination. For example, terms such as sufficient detail and replicable precision are open to interpretation and were difficult to apply. Similarly, Homer et al. (2005) provided no guidance for deter- mining which critical features of the physical setting must be described, establishing the practicality and cost-effectiveness of an independent variable, or ascertaining the social importance of the dependent vari- able. Moreover, when Homer et al. provided guidance, it was some- times less than apparent to us how to apply it. For example, interob- server agreement levels of 80% and kappa coefficients of 60% were offered as examples, not as criteria to apply in all cases. Although five or more data points were identified as necessary for baseline condi- tions, we were unclear as to whether a similar niumber was necessary for other conditions. And, in regard to baselines, Homer et al. noted that, "fewer data points are acceptable in specific cases" (p. 168), mak- ing even that guidance provisional.

We also wrestled more than once with the question of where the burden of proof lay. For example, are researchers required to dis- cuss each point of analysis to document a pattern that demonstrates experimental control (i.e., trends, latency variability of data, levels)? Or are readers supposed to determine whether experimental control exists based on their own visual inspection of graphed data? Must researchers document the social validity of the dependent variable, or are readers left to determine whether it is socially important? If less than five data points exist for a certain phase, is it the responsibility of the authors to justify having fewer than five data points, or can readers come to their own determination regarding whether a trend

543

TANKERSLEY, COOK, and COOK

is sufficiently established with less than five data points? Such ques- tions kept us from being able to quickly apply the quality indicators (together, the first two authors spent more than 12 hours reviewing these two studies in relation to the quality indicators) and required that we conclude that the burden of documentation must rest with the researchers; without such a decision, we could not reliably apply the quality indicators. As a consequence of adhering to this decision, we determined that the studies did not meet a number of criteria even though we could have easily provided our own rationale for how the studies appeared to meet them. This situation concerns us because we think that standards for reporting that were developed and applied after an article was published will result in many well-designed and properly executed research studies being considered as unacceptable. "Retro-fitting" (Cook & Tankersley, 2007) quality indicators to previ- ously conducted studies may, then, have the effect of inappropriate- ly reducing the pool of research on which the field will determine whether a practice is evidence-based.

From our analysis, we determined that two points of action may decrease some of the confusion we encountered in applying the proposed quality indicators for single-subject research. First, it may be necessary to establish a means by which reviewers can determine whether specific criteria associated with reporting results--such as the extent to which "a pattern demonstrates experimental control" (Homer et al., 2005, p. 174) or whether "sufficient assessment occa- sions" occurred (Homer et al., p. 167)--have been met in existing re- search by post hoc analysis. For example, during the course of our review, we were able to estimate mean levels and trends, as well as determined the variability or stability of data and the immediacy with which dependent variables responded to changes in the independent variables. Ultimately, however, we decided for this review that the quality indicators should be evaluated based on what the research- ers reported and not on our interpretations of their data or reports. However, given that we could do it, it seems plausible that guidelines for post hoc analysis of data could be established for applying the quality indicators to existing studies that did not provide sufficient discussion of issues such as demonstration of experimental control. A system for determining whether certain criteria are met by reviewers of studies conducting their own post hoc analysis of results may be analogous to what often occurs when researchers conducting a meta- analysis encounter a relevant study that does not report an effect size. From the data reported, current researchers can often compute an ef- fect size and include that study in their meta-analysis. Having specific guidelines for how to determine aspects of the design and results that

544

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

are not reported may allow the field to consider many previously con- ducted, high-quality studies thai: did not report information related to one or more quality indicators h- sufficient depth. However, once the field has clearly established quality indicators, all research from that point forward should be required to address them adequately and the post hoc criteria would not apply.

Second, clarifying points of conflicting information and decreas- ing subjectivity for several quality indicators and their criteria would be helpful. Although we understand that all ambiguity cannot be eliminated from the process, describing the quality indicators more concretely and consistently will undoubtedly remove some of the dif- ficulty we came upon in applying them. Operationalizing the quality indicators and their criteria will not be an easy feat: if too much pre- scriptive detail is stipulated, we may lose the flexibility'necessary to apply the quality indicators to the diverse range of single-subject re- search. Conversely, if criteria are too general or subjective, reviewers will be left to their own interpretation and the criteria will be applied in unreliable ways. By continubig to pilot test and revise the quality indicators, we are confident thai: a balance between prescriptive, spe- cific criteria and conceptual, general standards can be achieved.

Conclusion

In concluding, we acknowledge and thank Homer and his col- leagues (2005) for developing quality indicators for single-subject re- search; as well as Hall et al. (1968) and Sutherland et al. (2000) for conducting edifying and applicmble research. Following the advice of Gersten, Fuchs, Compton, Coyrne, Greenwood, and Innocenti (2005), we pilot tested the quality indicators for single-subject research in an attempt to begin the process of examining, refining, and ultimately applying them to identify evidence-based practices in special educa- tion. We conclude that some re-ision in the proposed quality indica- tors is needed. On the basis of reviewing two studies, we think that some of the quality indicators and related criteria need to be more objectively stated and reviewers be provided with clearer guidelines for what is necessary to satisfy them. Furthermore, we recommend that the quality indicators be eyamined to determine whether they are too dependent on researchers' interpretations--especially in re- lation to how they are "retro-fitted" to existing research (Cook & Tankersley, 2007)--and whether guidelines for post hoc analysis can be determined for existing research. Otherwise, we speculate that, as currently formulated and as applied to previously conducted studies, the quality indicators will discount some meaningful studies as being of insufficient quality due to reporting deficiencies rather than actual

545

TANKERSLEY, COOK, and COOK

methodological flaws. Revised versions of the proposed quality indi- cators will, we think, permit special educators to further the process of meaningfully establishing evidence-based practices for students with disabilities.

References

Allison, D. B., & Gorman, B. S. (1993). Calculating effect sizes for meta-analysis: The case of the single case. Behaviour Research and Therapy, 32, 885-890.

Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimen- sions of applied behavior analysis. Journal of Applied Behavior Analysis, 1, 91-97.

Browder, D. M., Wakeman, S. Y., Spooner, F., Ahlgrim-Delzell, L., & Algozzine, B. (2006). Research on reading instruction for in- dividuals with significant cognitive disabilities. Exceptional Children, 72, 392-408.

Chambless, D. L., Baker, M., J., Baucom, D. H., Beutler, L. E., Calhoun, K. S., Crits-Christoph, P., Daiuto, A., DeRubeis, R., Detweiler, J., Haaga, D. A. F., Bennett Johnson, S., McCurry, S., Mueser, K. T., Pope, K. S., Sanderson, W. C., Shoham, V., Stickle, T., Williams, D. A., & Woody, S. R. (1998). Update on empirically validated therapies, II. The Clinical Psychologist, 51(1), 3-16.

Cook, B., Rumrill, P. Jr., Webb, J., & Tankersley, M. (2001). Quantitative research designs. In P. D. Rumrill, Jr. and B. G. Cook (Eds.), Research in Special Education: Designs, Methods, and Applications (pp. 121-158). Springfield, IL: Charles C. Thomas.

Cook, B. G., & Tankersley, M. (2007). A preliminary examination to identify the presence of quality indicators in experimental re- search in special education. In J. B. Crockett, M. M. Gerber, and T. J. Landrum (Eds.), Achieving the radical reform of special education: Essays in honor of James M. Kauffman (pp. 189-212). Mahwah, NJ: Lawrence Erlbaum Associates.

Cooper, H., & Hedges, L. V. (Eds.). (1994). The handbook of research syn- thesis. New York: The Russell Sage Foundation.

Gallagher, D. J. (2006). If not absolute objectivity, then what? A reply to Kauffman and Sasso. Exceptionality, 14, 91-107.

Gersten, R., Fuchs, L. S., Compton, D., Coyne, M., Greenwood, C., & Innocenti. M. S. (2005). Quality indicators for group experi- mental and quasi-experimental research in special education. Exceptional Children, 71, 149-164.

546

QUALITY INDICATORS IN SINGLE-SUBJECT RESEARCH

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3-8.

Graham, S. (2005). Criteria for evidence-based practice in special edu- cation [special issue]. Ecceptional Children, 71.

Hall, R. V., Lund, D. & Jackson, D. (1968). Effects of teacher attention on study behavior. Joumal of Applied Behavior Analysis, 1, 1- 12.

Homer, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., & Wolery, M. (2005). The use of single-subject research to identify evidence- based practice in special education. Exceptional Children, 71, 165-180.

Kauffman, J. M. & Sasso, G. M. (2006). Toward ending cultural and cognitive relativism in special education. Exceptionality, 14, 65-90.

Kavale, K. A. (2001). Meta-analysis: A primer. Exceptionality, 9, 177- 183.

Kazdin, A. E. (1982). Single-case r?search designs: Methods for clinical and applied settings. New York: Oxford University Press.

Kratochwill, T. R., & Stoiber, C. (2002). Evidence-based interventions in school psychology: Conceptual foundations of the proce- dural and coding manual of Division 16 and the Society for the Study of School Psychology Task Force. School Psychology Quarterly, 17, 341-389.

Landrum, T. J., & Tankersley, M. (2004). Science at the schoolhouse: An uninvited gueft. Journal of Learning Disabilities, 37, 207-212.

Lloyd, J. W., Pullen, P., C., Tankersley, M., & Lloyd, P. A. (2006). Criti- cal -dimensions of experimental studies and research synthe- ses that help define effective practices. In B. G. Cook and B. R. Schirmer (Eds.), What is special about special education (pp. 136-154). Austin, TX: Pro'Ed.

Lloyd, J. W., Tankersley, M., & Talbott, E. (1994). Using single-subject methodology to study leELning disabilities. In S. Vaughn and C. Bos (Eds.), Research Ihsues in Learning Disabilities: Theory, Methodology, Assessment, and Ethics (pp. 163-177). New York: Springer-Verlag.

Mancina, C., Tankersley, M., Kamps, D., Kravits, T., & Parrett, J. (2000). Brief report: Reduction cf inappropriate vocalizations for a child with autism using a self-management treatment pro- gram. Journal of Autism and Developmental Disorders, 30, 599- 606.

547

TANKERSLEY, COOK, and COOK

Odom, S., & Strain, P. S. (2002). Evidence-based practice in early in- tervention/early childhood special education: Single-subject design research. Journal of Early Intervention, 25, 169-179.

Scruggs, T. E., & Mastropieri, M. A. (2001). How to summarize single- participant research: Ideas and applications. Exceptionality, 9, 227-244.

Sutherland, K. S., Wehby, J. H., & Copeland, S. R. (2000). Effect of vary- ing rates of behavior-sepcific praise on the on-task behavior of students with EBD. Journal of Emotional and Behavioral Dis- orders, 8, 2-8.

Tankersley, M., Harjusola-Webb, S., & Landrum, T. J. (in press). Using single-subject research to establish the evidence-base of spe- cial education. Intervention in School & Clinic.

Tankersley, M., McGoey, K. E., Dalton, D., Rumrill, P. D., & Balan, C. M. (2006). Speaking of research: Single subject research meth- ods in rehabilitation. Work: A Journal of Assessment, Prevention, & Rehabilitation, 26, 85-92.

Tawney, J. W., & Gast, D. L., (1984). Single-subject research in special edu- cation. Columbus, OH: Merrill.

What Works Clearinghouse Evidence Standards for Reviewing Stud- ies. (2006). What Works Clearinghouse evidence standards for re- viewing studies. Retrieved on February 21, 2007 from http:// www.whatworks.ed.gov/reviewprocess/study.standards_fi- nal.pdf

548

COPYRIGHT INFORMATION

TITLE: A Preliminary Examination to Identify the Presence of Quality Indicators in Single-subject Research

SOURCE: Educ Treat Child 31 no4 N 2008

The magazine publisher is the copyright holder of this article and it is reproduced with permission. Further reproduction of this article in violation of the copyright is prohibited.