Education EDU530 Week 5 assignment

bwilliams327
Chptr2.pdf

36

Chapter 2

Deciding What to Assess

Chief Chapter Outcome

An ability to identify, and understand the impact of, key factors that can help teachers determine the measurement targets for their classroom assessments

Learning Objectives

2.1 Identify key factors impacting teachers’ decision making when determining measurement targets for classroom assessments.

2.2 Identify key components of the Common Core State Standards when determining measurement targets for classroom assessments.

If classroom assessment does not help teachers do a better job of educating their students, then classroom assessment has no reason to exist. Consistent with that premise, this book about classroom assessment is really about teach- ing, not testing. That’s because well-conceived classroom assessment almost always leads to better taught students. So, before getting into some of the measurement concepts typically linked to educational testing, this chapter addresses an overridingly important educational question that each teacher must answer about classroom assessment. This crucial question, suitably high- lighted in uppercase type, is

WHAT SHOULD A CLASSROOM TEACHER TRY TO ASSESS?

In this chapter you’ll encounter a collection of factors that teachers should consider before deciding what sorts of things their classroom assessments should measure.

M02_POPH0936_10_SE_C02.indd 36M02_POPH0936_10_SE_C02.indd 36 09/11/23 6:01 PM09/11/23 6:01 PM

­What atto AAsAA 37

What to Assess Far too many teachers simply stumble into an assessment pattern without giv- ing serious consideration to why they’re assessing what they’re assessing. Many teachers, for example, test students in order to dispense grades in a manner that somehow reflects the levels of academic performances that students have dis- played on those tests. Students who score well on the teacher’s tests are given good grades; low-scoring students get the other kind. Traditionally, the need to dole out grades to students has been a key factor spurring teachers to assess their students. Other teachers, of course, also employ classroom tests as motivational tools—that is, to encourage students to “study harder.”

Yet, as suggested in Chapter 1, there are a number of other significant reasons why teachers construct and use assessment instruments. For instance, results of classroom assessments may be employed to identify certain students’ areas of deficiency so the teacher can more effectively target additional instruction at those content or skill areas where there’s the greatest need. Another important function of classroom assessment is to help teachers understand more clearly, before design- ing an instructional sequence, what their end-of-instruction targets really are. This clarification occurs because properly constructed assessment procedures can exemplify, and thereby illuminate, the nature of instructional targets for teachers.

Decision-Driven Assessment Teachers use tests to get information about their students. Teachers typically make score-based interpretations about their students’ status with respect to whatever curricular aims are being represented by the tests. Based on these interpretations, teachers then make decisions. Sometimes the decisions are as straightforward as whether to give Manuel Ortiz an A or a B. Sometimes the decisions are more difficult, such as how to adjust an ongoing instructional unit based on students’ performances on an along-the-way exam. But whatever the decisions are, class- room assessment should be unequivocally focused on the action options teachers have at their disposal. The information garnered from assessing students is then used to help a teacher make the specific decision that’s at issue. Because the nature of the decision to be illuminated by the test results will usually influence the kind of assessment approach the teacher selects, it is important to clarify, prior to the creation of a test, just what decision or decisions will be influenced by students’ test performances.

It may seem silly to you, or at least somewhat unnecessary, to identify in advance what decisions are linked to your classroom assessments, but it really does make a difference in determining what you should assess. For instance, suppose the key decision riding on a set of test results is how to structure a series of remedial instructional activities for those students who performed poorly dur- ing a teaching unit focused on a higher-order thinking skill. In this sort of situ- ation, the teacher would definitely need specific diagnostic information about

M02_POPH0936_10_SE_C02.indd 37M02_POPH0936_10_SE_C02.indd 37 09/11/23 6:01 PM09/11/23 6:01 PM

38 Ch pter 2 Dsciding ­What atto AAsAA

the “building-block” subskills or enabling knowledge each student did or didn’t possess. Thus, it is not sufficient for a test merely to assess students’ mastery of the overall skill being taught. There would also need to be a sufficient number of items assessing students’ mastery of enabling knowledge or each building-block subskill. Based on the diagnostic data derived from the test, a sensible set of remedial instructional activities could then be designed for the low performers.

If, on the other hand, the decision linked to test results is a simple determi- nation of whether the teacher’s instruction was sufficiently effective, a pretest– posttest assessment of more global (and less diagnostic) outcomes would suffice. It’s also appropriate, when judging the quality of a teacher’s instruction, to assess students’ attitudes relevant to what was being taught. We’ll look into how you can measure your students’ attitudes in Chapter 10.

In short, the decisions to be informed by assessment results should always influence the nature of the assessments themselves. Teachers should, therefore, routinely consider the decision(s) at issue prior to creating a classroom assess- ment device.

A fairly easy way to decide whether a test’s results will really influence a classroom teacher’s decision is to imagine that the test results turn out in two opposite ways—for example, a set of excellent student test performances versus a set of disappointing student test performances. If the teacher would be likely to make a different decision based on those disparate sets of performances, then the teacher truly has a test that can help inform decisions. If the teacher’s decision would be pretty much the same no matter what the test results were, there’s a strong likelihood that the assessment procedure will be more ritual than genuine help for the teacher.

The Role of Curricular Aims Much classroom assessment takes place after an instructional sequence has been completed. There are exceptions, of course, such as the during-instruction moni- toring of students’ progress as part of the formative-assessment process (about which you’ll learn lots more in Chapter 12). There’s also the pretesting a teacher might use as part of a pretest–posttest evaluation of the teacher’s instructional effectiveness. Yet many classroom tests are used at the close of instruction. And, because much classroom instruction is intended to help students achieve speci- fied outcomes, it is quite reasonable to think that if, early on, teachers consider the outcomes they want their students to achieve, those teachers can more readily answer the what-to-assess question. What teachers should assess will, in most instances, stem directly from the intended consequences of instruction, because those hoped-for consequences will influence what the teacher will be teaching and, most likely, what the students will be learning.

Let’s pause for just a moment to consider how best to label a teacher’s instructional intentions. This might seem unnecessary to you because, after all, why should the way teachers label their instructional aspirations make any real

M02_POPH0936_10_SE_C02.indd 38M02_POPH0936_10_SE_C02.indd 38 09/11/23 6:01 PM09/11/23 6:01 PM

­What atto AAsAA 39

difference? Aren’t a teacher’s instructional aspirations simply what a teacher hopes will happen to kids as a consequence of the teacher’s instruction?

In recent years, the label “content standards” has emerged as the most fash- ionable descriptor for what in one locale might be called “learning outcomes” and, in another setting, “instructional goals.” If you accept this reality as you rumble through the book, you might prefer to mentally slide in whatever synonym suits you. Regrettably, there is confusion among educators about how to describe the outcomes we hope students will achieve. If you are aware of this potential con- fusion, you can adroitly dodge it and move on to more important concerns. If you don’t understand the nature of curriculum-label confusion, however, you are likely either to become confused yourself or, worse, add to the confusion.

Getting down to basics, curriculum consists of the sought-for ends of instruction—for example, changes in students’ knowledge or skills that a teacher hopes students will experience as a result of what was taught. In contrast, instruction represents the means teachers employ to promote students’ achieve- ment of the curricular ends being sought. Instruction, therefore, consists of the activities a teacher has students carry out in an attempt to accomplish one or more intended curricular outcomes. Simply put, curriculum equals ends and instruction equals means. Confusion arises when educators try to plaster diverse sorts of labels on the curricular ends they hope their students will accomplish.

No widespread consensus among educators exists regarding what to call the intended outcomes of instruction. During the past half-century, for example, we’ve seen curricular outcomes referred to as goals, objectives, outcomes, expecta- tions, benchmarks, and content standards. In most instances, certain labels were supposed to apply to broader, more general curricular outcomes, while other labels were given to more specific, less general outcomes. What’s important when carving out what you want your classroom assessments to measure is that you must understand precisely the meaning of whatever curricular labels are being used by colleagues in your school, district, or state. In most instances, whatever the curricular labels are, they are presented (often with examples) in some sort of official curricular document. Be sure you have a handle on what sorts of cur- ricular descriptors are being used in your own corner of the world. You’ll dodge much curricular chaos if you do. Most crucially, do not assume that you and your coworkers are using curricular labels in the same way. More often than not, you won’t be. So, in passing, just do a brief definition dance to see whether you are all dancing to the same tune.

More often than not in this book, the label curricular aim will be employed to characterize the desired outcomes of instruction. That phrase curricular aim has not been around for all that long and, therefore, has not had most of its meaning leeched from it. You’ll sometimes encounter in the book such phrases as “educa- tional objectives” or “instructional objectives” to describe curricular aims—just for a descriptive change of pace. But, as indicated, you need to find out what sorts of curricular labels are being used in the setting where you teach. Odds are that those labels will be different from the descriptors used here. To illustrate,

M02_POPH0936_10_SE_C02.indd 39M02_POPH0936_10_SE_C02.indd 39 09/11/23 6:01 PM09/11/23 6:01 PM

40 Ch pter 2 Dsciding ­What atto AAsAA

the learnings being fostered in this book’s chapters are described as “learning objectives,” but might be referred to by your colleagues as “learning outcomes,” or “interim aims.” That’s okay, so long as you know what’s going on—and can clarify terminology tangles if they transpire.

Consideration of your own curricular aims, then, is one action you can take to help get a fix on what you should assess. The more clearly you state those aims, the more useful they will be to you in answering the what-to-assess question. In the 1960s and 1970s, there was widespread advocacy for behavioral objectives—that is, educational objectives stated in terms of the postinstruction behavior of learners. The push for behavioral objectives was fueled, at least in part, by a dissatisfaction with the traditional form of extremely general and ambiguous instructional objectives then used by educators. Here is an example of such an overly general objective: “At the end of the course, students will understand the true function of our federal government.” My favorite gunky instructional objective of that era was one found in a state-level language arts syllabus: “The student will learn to relish literature.” Thereafter, I kept on the lookout for an objective such as “The student will mayonnaise mathematics.” I haven’t encountered one—yet.

Still, in their attempt to move from unclear general objectives toward super-clear behavioral objectives, the proponents of behavioral objectives (and I was right in there proponing up a storm) made a serious mistake. We failed to realize if we encouraged teachers to use instructional objectives that were too specific, they would generate a profusion of small-scope behavioral objectives. The resulting piles of hyperspecific instructional objectives would so overwhelm teachers that they would end up paying attention to no objectives at all.

If you are currently a classroom teacher, you already know that you can’t real- istically keep track of whether your students are achieving hundreds of curricular aims. If you are preparing to be a classroom teacher, you’ll discover that dozens and dozens of curricular aims will soon overwhelm you. The trick in conceptual- izing curricular aims that help rather than hinder is to frame those aims broadly enough so you can sensibly organize instruction around them, while making sure that the aims are still measurable. The degree of measurability is clearly a key in the way you should state your curricular aims. The broader the objective is, while remaining measurable, the more useful the objective will be, because you can be guided by an intellectually manageable set of curricular intentions. If you can conceptualize your curricular aims so that one broad, measurable aim subsumes a bevy of lesser, smaller-scope objectives, you’ll have a curricular aim that will guide your instruction more effectively and, at the same time, will be a big help in deciding what to assess.

In language arts instruction, for example, when teachers set out to have their students become good writers, a curricular aim is often focused on students’ ability to generate an original composition. That aim, although still measurable (by simply having students write an original essay), encompasses a number of smaller-scope aims (such as the students’ ability to employ appropriate syntax, word choice, and proper spelling).

M02_POPH0936_10_SE_C02.indd 40M02_POPH0936_10_SE_C02.indd 40 09/11/23 6:01 PM09/11/23 6:01 PM

­What atto AAsAA 41

The concept of measurability, however, is trickier than it first appears to be. Let’s see why. A mere two paragraphs ago you were urged to frame your curricu- lar aims broadly—so they subsumed more specific curricular intentions. How- ever, you might also want to be sure that your students’ mastery of a curricular aim can be measured by a single, conceptually unitary assessment approach—not a fragmented flock of separate assessment tactics. Beginning teachers soon dis- cover how difficult it is to arrive at instructional decisions based on a diverse array of dissimilar assessment tactics.

To illustrate, when teachers measure a student’s ability to compose an original persuasive essay by asking students to whip up such an essay (usually referred to as a “writing sample”) from scratch, those teachers can use the writing sample to get a fix on students’ punctuation skills, content mastery, spelling, and word usage—all at one time. A unitary assessment strategy has been employed that, in one fell swoop, captures students’ mastery of lesser, contributory subskills and bodies of knowledge. The way to use students’ writing samples to arrive at an inference about their composition prowess—and the separate subcomponents of that prowess—is an example of unitary measurability.

On the other hand, if the only way to measure a student’s mastery of a curric- ular aim is by relying on a collection of substantively different assessment tactics, then unitary measurability is not present. To illustrate, suppose a teacher had formulated a curricular aim labeled “Geometry” that embraces (1) congruence,

DsciAiton tims On-Dsmhnd AAsAAmsnat

Yuki Nishimura is a fourth-grade teacher who has

set up her social studies activities on a mastery

learning basis. Students are allowed to work through

most of their social studies assignments at their own

speed. Much of the social studies program in Yuki’s

class is based on reading assignments, so students

can move at their own pace. When students believe

they are ready to demonstrate mastery of the

social studies skills and knowledge that Yuki has

described in written documents distributed early in

the school year, students set up an oral assessment.

Yuki then spends 10 to 15 minutes presenting a

series of short-answer items that students must

answer orally. So that students do not discover from

previously assessed students what the items on the

assessment are, Yuki selects items at random from

a pool of nearly 50 items for each of the 4 major

social studies assessments during the year.

Although most students seem to appreciate

Yuki’s willingness to let them be assessed when

they’re ready, several students have complained

that they “got a harder test” than some of their

classmates. The dissatisfied students have

encouraged Yuki to retain her mastery learning

model, but to assess all students at the same time.

Yuki is deciding whether to maintain her on-call

oral assessments or revert to her former practice of

using written examinations administered to the entire

class at the same time.

If you were Yuki,

what would your decision be?

M02_POPH0936_10_SE_C02.indd 41M02_POPH0936_10_SE_C02.indd 41 09/11/23 6:01 PM09/11/23 6:01 PM

42 Ch pter 2 Dsciding ­What atto AAsAA

(2) similarity, (3) circles, (4) equations, and (5) geometric measurement. However, in order to determine students’ mastery of the “Geometry” goal, five separate sets of items were constructed, each focused on one of the five separate subcom- ponents of the “Geometry” curricular aim. This conceptualization, as you can see, does not constitute unitary measurement of students’ curricular-aim mas- tery. The five distinct subcomponents of the curricular aim have not been con- solidated in such a way that a single assessment strategy can accurately reflect students’ attainment of the five subcomponents. What we would have here is what might be characterized as “counterfeit coalescence.” What teachers should try to identify is a modest number of high-import curricular aims that can be measured via a unitary assessment approach. A bevy of divergent, uncoalesced subtests clearly does not constitute unitary measurability.

Many educators these days use the concept of grain size to represent the breadth of a curricular aim. More specific curricular aims are considered “ small-grain-size” instructional objectives, whereas more general curricular aims are regarded as “large-grain-size” instructional objectives. One useful way of dis- tinguishing among curricular aims in terms of grain size is to estimate the amount of instructional time it will take for students to accomplish what’s embodied in a particular curricular aim. If the curricular aim can be achieved by students in a short while, perhaps in a few days or even in one class session, then the aim is usually thought of as a small-grain-size outcome. If, on the other hand, a cur- ricular aim seems likely to take weeks or months to accomplish, such an aim is regarded as a large-grain-size outcome.

Here are two examples of small-scope (small-grain-size) curricular aims in language arts:

• When presented with reading passages containing unfamiliar words, stu- dents can employ context clues to infer the unfamiliar words’ meanings.

• The student will be able to write sentences in which verbs agree in number with relevant nouns and pronouns.

Now let’s look at a broad-scope (large-grain-size) language arts curricular aim that subsumes these two small-scope objectives plus a number of similar small-scope objectives:

• After reading an age-appropriate nonfiction essay, the student will be able to compose two paragraphs, the first of which summarizes the essay’s central message and the second of which describes the student’s personal reaction to the essay’s message.

Strive to come up with a half-dozen or so truly salient, broad yet measurable curricular aims for your own classroom. Too many small-scope, hyperspecific objectives will be of scant value to you because, if you’re at all normal, you’ll soon disregard an overwhelming array of overly specific curricular targets. On the other hand, a small number of intellectually manageable, broad yet measur- able curricular aims will not only prove helpful to you instructionally but will

M02_POPH0936_10_SE_C02.indd 42M02_POPH0936_10_SE_C02.indd 42 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 43

also help you answer the what-to-assess question. Simply put, you need to create assessments to help you determine whether your students have, as you hoped, achieved your chief curricular aims. This means that when deciding what to mea- sure in your classroom tests, you will frequently need curricular courage—that is, the courage to measure well what’s most important.

“Right-Sizing” Assessment Results As you roll breathlessly through subsequent chapters in this fun-filled book, you’ll often be reminded of the importance that the size of test-result reports can play in the decisions made not only by teachers, but also by students themselves. This is as it should be. Realistically, the collection of assessment-provided evidence should lead to more defensible decisions about not only whole classes of students, but also individual students. If teachers, however, pay insufficient attention to the grain size embodied by reports of students’ test performances, you can be certain that those results will have little impact on teachers’ decisions. Under such circumstances, testing becomes a “makes-no-difference” enterprise that has consumed teacher time and student time, but has made no bona fide contribution to students’ learning.

One of the two most common perils in determining how to report a test’s results is making the reporting size too broad and too general so that both teach- ers and students will only have, at best, a fuzzy understanding of what has been measured. The second common mistake is to report at a too specific level and, as a consequence, end up with so many small-scope reports that the user of such tiny-dollop reports is literally overwhelmed by too many specific descriptions of students’ performances. Either of these two shortcomings in determining the grain size of students’ reported performances can completely undermine what- ever advantages had been sought from such evidence.

Regrettably, the quest for not-too-broad and not-too-narrow reports of stu- dents’ assessment results are in direct conflict with one another. In later chapters we will consider ways to apply a Goldilocks solution to coming up with the right grain size for reporting students’ results. If teachers pay insufficient heed to those results because they’re too general or too specific, the dividends of employing assessments to benefit students simply evaporates. We will dig into this key issue in several later chapters.

Cognitive, Affective, and Psychomotor Assessment In thinking about what to assess, classroom teachers can often be helped by considering potential assessment targets according to whether they are chiefly focused on cognitive, affective, or psychomotor targets. Cognitive assessment targets are those that deal with a student’s intellectual operations—for instance, when the student displays acquired knowledge or demonstrates a thinking skill such as decision making or problem solving. Affective assessment targets

M02_POPH0936_10_SE_C02.indd 43M02_POPH0936_10_SE_C02.indd 43 09/11/23 6:01 PM09/11/23 6:01 PM

44 Ch pter 2 Dsciding ­What atto AAsAA

are those that deal with a student’s attitudes, interests, and values, such as the student’s self-esteem, risk-taking tendencies, or attitudes toward learning. Psy- chomotor assessment targets are those focused on a student’s large-muscle or small-muscle skills. Examples of psychomotor assessments include tests of a stu- dent’s keyboarding skills in a computer class and tests of a student’s prowess in shooting a basketball in gym class.

The distinction among cognitive, affective, and psychomotor educational outcomes was first introduced in 1956, when Benjamin Bloom and his col- leagues (Bloom et al., 1956) proposed a classification system for educational objectives organized around those three categories. While recognizing that there are usually admixtures of all three domains of behavior in most things teachers ask students to do on tests, the 1956 Taxonomy of Educational Objectives made it clear there is typically a dominant kind of student behavior sought when teachers devise educational objectives for their students. For example, if a student is completing an essay examination in a social studies class, the student may be using a pencil to write (psychomotor) the essay and may be feel- ing remarkably confident (affective) about the emerging quality of the essay. But the chief domain of performance involved is cognitive because it is the student’s intellectual analysis of the test’s items and the student’s intellectual organization of a response that mostly account for what the student’s essay contains. Bloom’s cognitive taxonomy included six levels of students’ intel- lectual activity—ranging from the lowest level, knowledge, all the way up to the highest level, evaluation (1956).

Standards-Based Classroom Assessment Whether you are an elementary teacher who has instructional responsibility for a number of subject areas, or a secondary teacher who has instructional responsibil- ity for one or two subject areas, you should be concerned that your instruction and your assessments embody appropriate content. Fortunately, in the last decade or so, various national organizations have attempted to isolate the most important content to teach in their respective subject areas. These efforts to delineate content can prove quite helpful to teachers who are trying to “teach the right things.” And they were a pivotal part of the 1990s educational standards movement.

First off, let’s define what an educational standard actually is. Unfortunately, you’ll find that different educators use the term all too loosely and often incor- rectly. Actually, there are two types of educational standards you need to know about:

• A content standard describes the knowledge or skills that educators want students to learn. Such curricular aims are also referred to as “academic con- tent standards.”

• A performance standard identifies the desired level of proficiency at which educators want a content standard mastered. Such sought-for performance levels are also referred to as “academic achievement standards.”

M02_POPH0936_10_SE_C02.indd 44M02_POPH0936_10_SE_C02.indd 44 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 45

It should be apparent that although we might refer to both of the above as “standards,” they are very different creatures. Just as the actors in a western movie might loosely refer to “cattle,” those folks know that boy-cattle and girl-cattle per- form decisively different functions. It’s the same story today when educators utter the word standards. An academic content standard (a curricular aim) is patently different from an academic achievement standard (a performance level). To refer to them both as “standards” can cause a cattle-car full of confusion!

During the 1990s, standards were recommended by many organizations, led by the National Council of Teachers of Mathematics (NCTM), which in 1989 pub- lished Curriculum and Evaluation Standards for School Mathematics (1989). In that publication, NCTM made a major effort to reconceptualize what mathematics should be taught by identifying a set of important new content standards. Those content standards defined the chief curricular aim of mathematics education as the promotion of students’ mathematical power—that is, students’ ability to use mathematical thinking and understanding to solve problems and to communicate the results of those solutions mathematically.

Soon thereafter, other national subject-matter associations began to bring forth their own sets of content standards and, in some instances, performance standards as well. Most of them turned out to be very impressive. All of the stan- dards documents represent the efforts of top-flight subject-area specialists who devoted a great deal of time to describing the knowledge and skills in a particular content area that students should learn.

It would be great if you could spend some time reviewing these standards documents to identify what you might be teaching (and, therefore, testing). But if you ever do engage in such a binge review of national content standards, try not to meekly succumb to the curricular preferences expressed in those standards col- lections. The reason for such a warning is straightforward. These standards docu- ments were produced by subject-matter specialists. And subject-matter specialists, more often than not, simply adore their subject matters. Subject-matter specialists would like to see students master almost everything in their field.

Subject specialists often approach their standards-identification tasks not from the perspective of a busy classroom teacher, but from the perch of a content devotee. Content devotees and classroom teachers, just like steers and cows, are different members of the cattle crowd. If teachers tried to teach students to do all the things set forth in most standards documents, there’d be no time left for food, sleep, or films! Content specialists, or so it would seem, have never bumped into the process of prioritizing. Establishing curricular priorities is far more challenging than one might think. Nonetheless, from an instructional point of view, it is far better for students to master deeply a modest number of a subject’s highest-priority content standards than to possess a spotty or superficial mastery of an expansive range of that subject’s content.

Perhaps the safest curricular course of action for teachers, particularly novice teachers, is to consult the most recent national standards document(s) relevant to your own teaching responsibilities, and then see which of the content standards

M02_POPH0936_10_SE_C02.indd 45M02_POPH0936_10_SE_C02.indd 45 09/11/23 6:01 PM09/11/23 6:01 PM

46 Ch pter 2 Dsciding ­What atto AAsAA

you have time to address instructionally. Without question, you’ll get some won- derful ideas from the standards documents. But select your own content stan- dards judiciously; don’t try to stuff a whole wad of national standards into your students’ skulls. That wad, far too often, just won’t fit.

Norm-Referenced Versus Criterion-Referenced Assessment There are two rather distinctive, yet widely used, assessment strategies avail- able to educators these days: norm-referenced measurement and criterion-referenced measurement. The most fundamental difference between these two approaches to educational assessment is in the nature of the interpretation that’s employed to make sense out of students’ test performances.

With norm-referenced measurement, educators interpret a student’s perfor- mance in relation to the performances of students who have previously taken the same examination. This previous group of test-takers is usually called the norm group. Thus, when educators try to make sense of a student’s test score by “refer- encing” the score back to the norm group’s performances, it is apparent why these sorts of interpretations are characterized as norm referenced.

To illustrate, when a teacher asserts that a student “scored at the 90th percentile on a scholastic aptitude test,” the teacher means the student’s test performance has exceeded the performance of 90 percent of students in the test’s norm group. In education, norm-referenced interpretations are most frequently encountered when reporting students’ results on academic apti- tude tests, such as the SAT, or on widely used standardized achievement tests, such as the Iowa Tests of Basic Skills or the California Achievement Tests. In short, norm-referenced interpretations are relative interpretations of students’ performances, because such interpretations focus on how a given student’s performance stacks up in relation to the performances of other students— usually previous test-takers.

In contrast, a criterion-referenced measurement strives to be an absolute inter- pretation, because it hinges on the degree to which the criterion (for example, a curricular aim) that’s represented by the test has actually been mastered by the student. For instance, instead of a norm-referenced interpretation such as “the student scored better than 85 percent of the students in the norm group,” a criterion-referenced interpretation might be that “the student mastered 85 percent of the test’s content and can be inferred, therefore, to have mastered 85 percent of the curricular aim’s skills and/or knowledge represented by the test.” Note that a criterion-referenced interpretation doesn’t depend at all on how other students performed on the test. The focus is on whatever curricular aim (or educational variable) has been measured by the test.

As you can see, the meaningfulness of criterion-referenced interpretations is directly linked to the clarity with which an assessed curricular aim is expli- cated. Accurately described curricular aims can yield crisp, understandable

M02_POPH0936_10_SE_C02.indd 46M02_POPH0936_10_SE_C02.indd 46 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 47

criterion-referenced interpretations. Ambiguously defined curricular aims are certain to yield fuzzy criterion-referenced interpretations of scant utility.

Although many educators refer to “criterion-referenced tests” and “ norm-referenced tests,” there are, technically, no such creatures. Rather, there are criterion- and norm-referenced interpretations of students’ test performances. For example, educators in a school district might have built a test to yield criterion-referenced interpretations, then used the test for several years and, in the process, gathered substantial data regarding the performances of district students. As a consequence, the district’s educators could build normative tables permit- ting norm-referenced interpretations of the test that, although initially devised to provide criterion-referenced inferences, can still provide meaningful norm- referenced interpretations.

Because most of the assessment-influenced decisions faced by classroom teachers are benefited by those teachers’ understanding of what it is students can and cannot do, not merely their relative standing in relationship to one another, classroom teachers will generally want to arrive at criterion-referenced rather than norm-referenced interpretations. A teacher’s instructional choices, for example, are better serviced by evidence about particular students’ skills and knowledge than by evidence about how those students compare with one another.

About the only instance in which classroom teachers will need to employ norm-referenced interpretations occurs in fixed-quota settings—that is, when there are insufficient slots for all of a teacher’s students to be assigned to a given educational experience. For example, suppose there were five students to be cho- sen for a special outside-of-class enrichment program in mathematics, and the teacher had been directed by the school principal to choose the “best” math- ematics students for the program. In such a situation, because best is, by defini- tion, a relative descriptor, norm-referenced interpretations would be warranted. A teacher might build a test to spread students out according to their mathemat- ics skills so that the top students could be identified. Barring those rare situa- tions, however, most classroom teachers would be better off using tests yielding criterion-referenced interpretations. Criterion-referenced interpretations provide a far more lucid idea of what it is that students can and can’t do. Teachers need such clarity if they’re subsequently going to make solid instructional decisions.

Later, in Chapter 13, you’ll be considering standardized achievement tests, assessment devices usually built to provide norm-referenced interpretations. At that time, the appropriate and inappropriate uses of test results from such assess- ment instruments will be identified. For most classroom assessment, however, the measurement strategy should almost always be criterion referenced rather than norm referenced.

Before saying farewell to the distinction between criterion-referenced and norm-referenced assessment strategies, let’s highlight a confusion that first emerged about 50 years ago but seems to be bubbling up again in recent years.

When Robert Glaser (1963) first introduced the notion of criterion-referenced measurement more than five decades ago, there were actually two different

M02_POPH0936_10_SE_C02.indd 47M02_POPH0936_10_SE_C02.indd 47 09/11/23 6:01 PM09/11/23 6:01 PM

48 Ch pter 2 Dsciding ­What atto AAsAA

meanings he provided for the term criterion. On the one hand, Glaser used it in a fairly traditional manner to represent a level of performance, such as one might employ when saying, “Let’s get the quality of all trainees’ performance up to a suitable criterion.” In the 1960s, most people tended to use the word criterion in precisely that way—as a desired level of performance. On the other hand, in Glaser’s article he also used “criterion” to represent a clearly defined criterion behavior such as a well-explicated cognitive skill or a tightly delineated body of knowledge. Well, in those early days, many educators who were beginning to learn about criterion-referenced testing were not sure which of those two very different notions to employ.

Yet, it soon became apparent that only the “criterion as defined behavior” notion really made much sense because, if the “criterion as level of performance” interpretation were adopted, then all of the existing education tests could instantly be designated as criterion referenced simply by choosing a desired performance level for those tests. Such an approach to assessment would be fundamentally no different from the traditional approaches to educational testing so widely used at the time. Glaser was looking for a new way of thinking about educational testing, not more of the same (1963). Thus, in the 1970s and 1980s, criterion-referenced testing was seen as a way of ascertaining a test-taker’s status with regard to one or more well-defined criterion behaviors. This interpretation was sensible.

Just during the last few years, however, many measurement specialists are once more touting the “criterion as level of performance” notion. This is a clear instance of regression toward confusion. It is not a suitable way to think about criterion-referenced assessment. Please be on guard so you can spot such flawed thinking when it crosses your path. Criterion-referenced measurement is built on an assessment strategy intended to determine a test-taker’s status with respect to a clearly defined set of criterion behaviors, typically well-defined skills or knowledge.

Selected Responses and Constructed Responses Assuming a classroom teacher has come up with a reasonable answer to the what-to-assess question, an important point in determining how to assess it arises when the teacher must choose between assessment strategies that elicit selected responses and those that elicit constructed responses from students. If you think for a moment about the way educators assess students’ educational status, you’ll realize there are really only two kinds of responses students can make. Sometimes students select their responses from alternatives we present to them—for example, when we give students multiple-choice tests and students’ responses must be selected from each item’s available options. Other examples of selected-response assessment procedures are binary-choice items, such as those found in true–false tests where, for each item, students must select either a true or a false answer.

In contrast to selected-response tests, in constructed-response tests, students construct all sorts of responses. In an English class, for instance, students construct

M02_POPH0936_10_SE_C02.indd 48M02_POPH0936_10_SE_C02.indd 48 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 49

original essays. In a woodshop class, students construct coffee tables. In a speech class, students construct 5-minute oral speeches. In a drama class, students con- struct their responses while they present a one-act play. In a homemaking class, students construct soufflés and upside-down cakes. Constructed-response tests lead either to student products, such as the coffee tables and soufflés, or to behav- iors, such as the speeches and the one-act plays. When the students’ constructed responses take the form of a tangible product, classroom teachers have the luxury of judging the quality of the product at their leisure. For instance, after students create an original essay, the teacher can take this product home on the weekend and, while relaxing in the jacuzzi, either grade or immerse each essay.

When students’ constructed responses are in the form of behaviors, however, such as when students are asked in a physical education class to run a mile in less than 12 minutes, the teacher must somehow make a record of the behavior—or else it disappears. The record might be created by the teacher’s use of written summaries or be captured electronically via an audio or video recording. For example, when students in a speech class give a series of impromptu speeches, the teacher typically fills out some sort of evaluation form for each speech. Similarly, physical education teachers use their stopwatches to note how much time it takes for students to run a mile, and then note those elapsed times in a record book.

It was formerly thought that, although selected-response tests were dif- ficult to create, they were easy to score. Similarly, it was once believed constructed-response tests were just the opposite—simple to create but tough to score. These days, however, many educators have become more sophisticated about constructed-response tests and are employing more types of constructed- response assessment procedures than essay tests and short-answer tests. One prominent reason for teachers to increase their use of constructed-response items is a rising tide of dissatisfaction with tests that seem to focus too heav- ily on items measuring students’ memorization capacities rather than analytic skills. It is generally conceded that both selected-response and constructed- response tests require taking considerable care when creating them. It is still true, however, that scoring selected-response tests is substantially easier than scoring constructed-response tests. Ease of test scoring, of course, constitutes a definite lure for many teachers.

If, as a classroom teacher, you’re choosing between these two assessment schemes, you need to focus on the nature of the score-based inference you wish to make from your students’ responses. Some score-based inferences sim- ply shriek out for selected-response assessment schemes. For instance, if you want to make inferences about how many historical facts your students know, a selected-response procedure will not only fill the bill, but will also be simple to score. Other score-based inferences you might wish to make really require students to display complex behaviors, not merely choose among options. If, for example, you want to make inferences about how well students can compose son- nets, then selected-response assessment schemes just won’t cut it. To see whether students can create a sonnet, you need to have them construct sonnets. Reliance

M02_POPH0936_10_SE_C02.indd 49M02_POPH0936_10_SE_C02.indd 49 09/11/23 6:01 PM09/11/23 6:01 PM

50 Ch pter 2 Dsciding ­What atto AAsAA

on a selected-response test in such a situation would be psychometrically silly. As usual, the test-based inference the teacher must make should guide the choice between selected-response and constructed-response assessment strategies. Nonetheless, if teachers plan to incorporate constructed-response items in their tests, they must be sure to have sufficient time and energy available to score students’ responses accurately.

It is always possible, of course, for teachers to rely on both of these two assess- ment strategies in their classrooms. A given assessment, for instance, might rep- resent a blend of both selected- and constructed-response approaches. As always, the assessment procedure should be employed to help the teacher get a fix on the students’ status with respect to an otherwise unobservable variable in which the teacher is interested.

It should be noted that the distinction between selected-response and constructed-response test items is not meaningfully related to the previously drawn distinction between norm-referenced and criterion-referenced measure- ment. A test composed exclusively of selected-response items could lead to results destined for either norm-referenced or criterion-referenced interpretations. Simi- larly, a test consisting only of constructed-response items might legitimately lead to norm-referenced or criterion-referenced interpretations about test-takers. The more comfortable a teacher is with various types of testing tactics, the more likely it is that the teacher will select the most appropriate assessment alternative to inform whatever decision is being informed by the results of a classroom test.

A State’s Official Curricular Aims A state’s designated educational policymakers typically decide on the curriculum aims—that is, the content standards—to be promoted for the state’s public school students. This is because education, not having been identified in the U.S. Consti- tution as a federal responsibility, has historically been regarded as a state obliga- tion. Until the last decade or so, the usual way for a state to identify the curricular targets for its schools was to assemble groups of content-capable educators who, after substantial deliberation, identified the content standards to be promoted in the state’s public schools. As might be expected, considerable agreement was seen between the official curricular aims of different states, but meaningful differences in curricular preferences were also observed. With the arrival of the Common Core State Standards (CCSS) in 2010, however, this changed rather dramatically. Many states accepted the mathematics and English language arts (ELA) curricular aims represented by the CCSS. Officials in states where the CCSS were adopted also received some federal dollars to implement the adoption process. Yet, because of the perception that the CCSS were being “pushed” on states by the federal gov- ernment, substantial antipathy developed in many states toward what was often seen as a federal intrusion on states’ curricular choices. Although, as was pointed out in Chapter 1, a number of states have adopted the CCSS as their official state curricular goals, numerous states have renamed them—in part to discourage the

M02_POPH0936_10_SE_C02.indd 50M02_POPH0936_10_SE_C02.indd 50 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 51

perception that the state’s official curricular aims were fashioned with the sup- port of “the feds.” As was true in earlier years when state-specific curricular aims were identified by each state, the adoption of the CCSS, or a differently labeled version of them, was usually accomplished by a state board of education or, in some instances, by a state legislature. Although, at present, we can see many states in which their current content standards are demonstrably derivative from the original, unmassaged CCSS, other states have arrived at their own state’s content standards by other routes.

In the next few pages you will find recommendations for national content standards in ELA and mathematics (from CCSS) and for science (Next Generation Science Standards—NGSS). If the state in which you are located has chosen state- wide curricula aims in one or more of these fields, it is surely imperative for you to familiarize yourself with at least the depiction of those curricular targets seen in this book’s upcoming descriptions. Clearly, if you end up teaching in a state where your state’s official curricular targets are the same as (or mostly similar to) your state’s curricular aims in mathematics, ELA, or science, then you should certainly supplement the descriptive information provided here. However, if your state’s curricular goals in those three content areas do not coincide with what’s presented in this book, then it would be sensible for you to do some additional state-level curriculum digging on your own.

phrsnat thlk Suppose that Ms. Collins, a parent of one of your students, has cornered you after school and asked for a few minutes of your time. She wants to know what all the talk about “new, challenging standards” really means. She’s read an article about educational standards in the local newspaper, and she wonders whether the “new” standards are really new. Her question to you is, “Why is there all this fuss about educational standards these days when it seems that teachers in this school are still teaching pretty much what they’ve always taught, and testing what they’ve always tested?”

If I were you, here’s how I’d respond to Ms. Collins:

“I really understand your concern about educational standards, Ms. Collins. They’ve certainly been receiving plenty of attention from educators during recent years. But, regrettably, there’s still

a fair amount of confusion about standards, even among educators.

“Actually, there are two kinds of educational standards that you and I need to consider. A content standard simply identifies the skills and knowledge we want students to acquire in school. For example, your daughter has recently been improving her composition skills quite dramatically. She can now write narrative and persuasive essays that are quite wonderful. An example of a fairly general content standard would be ‘Students’ mastery of written communication skills.’

“A performance standard, on the other hand, indicates how well a content standard should be mastered. In this school we score our students’ essays on a four-point scale, and the school’s teachers have agreed that a student’s written submissions should achieve at least a three in written communication to be considered satisfactory.

(Continued)

M02_POPH0936_10_SE_C02.indd 51M02_POPH0936_10_SE_C02.indd 51 09/11/23 6:01 PM09/11/23 6:01 PM

52 Ch pter 2 Dsciding ­What atto AAsAA

THE COMMON CORE STATE STANDARDS As noted earlier, on June 2, 2010, the CCSS were released by the National Gover- nors Association (NGA) and the Council of Chief State School Officers (CCSSO) (2010). In the next section of this chapter, an attempt will be made to provide a brief introductory picture of the CCSS in English language arts and mathematics. Then, immediately thereafter, a similar short description will be presented for a set of national content standards for science. Yet, because these national curricu- lar aims have now been accepted (sometimes after being renamed) by many of the 50 states, it should be apparent that any teacher in a state where the curricu- lar aims have been adopted (whatever they are called) should have much more than a nodding acquaintance with these curricular targets. Accordingly, educa- tors are encouraged to consult either the CCSSO (http://www.ccsso.org) or the NGA (http://www.nga.org/cms/home.html) for a far more detailed depiction of what makes up the CCSS. Moreover, if you are a teacher in a state where some rendition of the CCSS has been adopted, it is likely that your state or district will have information available regarding the nature of these curricular aims.

When the eighth edition of this book was published, the CCSS were at the height of their heyday. Almost all states had accepted them as their official state curricular aims, and it was thought that a pair of federally supported, consortium-based accountability tests would soon be in place to measure stu- dents’ mastery of those nearly identical content standards. Accordingly, that earlier edition of the book contained a considerably expanded treatment of the CCSS and urged readers to familiarize themselves with those curricular aims. But, to the surprise of many observers, the quest for widespread acceptance of an essentially identical set of curricular aims simply evaporated, and the availability of essentially identical consortium-built accountability tests disap- peared in the same way. As a consequence, what you’ll find in this new edition of Classroom Assessment is an abbreviated treatment of the CCSS. After all, if you are teaching in a state where a set of curricular targets other than the CCSS has been adopted, why do you need to pore over the intricacies of a set of cur- riculum targets at which your state’s schools are not aiming? Unless, of course, you are experiencing a serious case of intrastate curricular envy. If you are, fear

“What we now call content standards, we formerly described as goals, objectives, or outcomes. What we now call performance standards, we formerly described as passing levels or cut-scores. So, some of the descriptive labels may be new, but the central idea of trying to teach children important skills and knowledge—this hasn’t really changed very much.

“What is new, at least to me, is a much greater concern about the defensibility of our content and performance standards. All across the nation, educators are being urged to adopt

more challenging standards—standards that demand more of our children. This has also been taking place in our own school. Thus, if you look more closely at what we’re teaching and testing here, I think you’ll see important differences in the kinds of skills and knowledge we’re pursuing— and the levels of excellence we expect of our students.”

Now, how would you respond to Ms. Collins?

M02_POPH0936_10_SE_C02.indd 52M02_POPH0936_10_SE_C02.indd 52 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 53

not. No empirical evidence exists indicating that such a malady is either fatal or even long-lasting.

Clearly, however, teachers must become comfortably conversant with whatever way their state has chosen to carve up its curricular goals. Moreover, if your state has chosen a set of assessments to measure students’ mastery of the skills and knowl- edge embodied in those curricular aims, try to cozy up to descriptions of those state assessments. What’s advocated at the state level can, and should, have an impact on what you teach and test. Even though it might be interesting to contrast your state’s curricular aspirations with those of other states, what’s imperative is that you understand what curricular targets your state’s teachers are supposed to be pursuing and, of course, how students’ mastery of those targets is going to be assessed.

Before presenting what will be, then, an abbreviated introduction to the chief features of the CCSS, a few stage-setting remarks regarding these curricular aims are in order. The CCSS did not arrive until June 2010, and in the early months there- after many U.S. educators were not familiar with the nature of the CCSS. When many educators began to dig into the substance of the CCSS, they anticipated that these newly spawned curricular targets would be similar to the sets of state- specific content standards then seen throughout the nation. Those state-identified curricular aims had typically consisted of a list of content standards (that is, the sets of the knowledge and skills that students were supposed to master)—typically grouped under subject-area headings such as reading, writing, and mathematics. Beneath those subject headings, the content standards themselves usually were grouped under a set of general labels (sometimes called “strands”) such as geom- etry, numeration, and algebra. Beneath those content standards, educators usually found further particularization, such as the specific bodies of knowledge or skills a student should achieve in order to attain mastery of any content standard. These collections of state curricular aims were often too numerous, at least in the view of the educators who were required to promote students’ mastery of them, but these state-specific aims—as typically formulated—were relatively easy to understand.

Regrettably, such was not the case with the CCSS. Several categorization strata had been employed to describe the curricular targets in the CCSS, and unfortunately, the labeling scheme employed for the ELA curricular targets is not the same as the labeling approach employed in mathematics. Thus, elementary teachers who were instructionally responsible for both English language arts and math were obliged to shift gears continually when seeking guidance from CCSS documents regarding appropriate curricular targets for those two content fields.

In the next few paragraphs, then, a particularly brief introduction to the orga- nizational structure of the CCSS ELA and mathematics content standards will be provided. This “handshake” with the CCSS can be followed up with a much more meaningful “hug” by your spending time with the actual content of the CCSS (NGA and CCSSO, 2010). Let’s begin with English language arts. A consideration of how the CCSS tried to resolve its curriculum challenges can be illuminating. By examining what the CCSS architects ended up with, you can gain a rough idea of how such national curricular guides often look.

M02_POPH0936_10_SE_C02.indd 53M02_POPH0936_10_SE_C02.indd 53 09/11/23 6:01 PM09/11/23 6:01 PM

54 Ch pter 2 Dsciding ­What atto AAsAA

ENGLISH LANGUAGE ARTS In an effort to identify K–12 standards to help ensure that all U.S. students would be college and career ready in literacy no later than the close of high school, those crafting the CCSS identified a set of curricular targets both in English Language Arts and in Literacy in History/Social Studies, Science, and Technical Subjects. As you review the complete array of these standards, your comprehension of them will usually be enhanced if you are aware of the several ways in which the ELA and literacy standards have been categorized in the CCSS. Let’s consider those categorizations.

For openers, there’s a fairly straightforward distinction between grades K–8 and grades 9–12. For grades K–8, the standards are set out on a grade-by-grade basis. At the secondary level, however, two grade bands are used to describe the standards for grades 9–10 and 11–12.

Then, at each K–8 grade level you will find sets of standards for reading, writing, speaking and listening, and language. For the reading outcomes, there’s an important distinction drawn between standards based on students’ reading of literature versus informational text. Architects of the CCSS reading standards wanted U.S. students to experience greater exposure to informational text than in years past. In grades K–5, in fact, the standards are broken out separately for literature and for informational texts. Ten “College and Career Readiness Anchor Standards” are presented under the following four general categories: (1) Key Ideas and Details, (2) Craft and Structure, (3) Integration of Knowledge and Ideas, and (4) Range of Reading and Level of Text Complexity.

In addition to the ELA reading standards, the CCSS also spell out anchor standards for writing, for speaking and listening skills, and for language. As was the case with reading, these curricular aims are broken out by separate grades in K–8, and then two grade ranges are used in grades 9–10 and 11–12.

Beyond the identification of ELA standards as described here, separate stan- dards are also provided in the CCSS for literacy in grades 6–12 dealing with history/social studies, science, and technical subjects. Both reading and writ- ing standards are given for students in three grade ranges: 6–8, 9–10, and 11–12. Authors of these grade 6–12 literacy standards in reading intend for instructional attention to those standards to be integrated along with instruction focused on the K–5 reading standards described earlier (NGA and CCSSO, 2010, p. 61).

The original publication by CCSSO and NGA (2010) not only contains sub- stantial introductory material related to the CCSS but, in the case of the ELA standards, also presents a number of components intended to assist in the imple- mentation of the standards. To illustrate, the CCSS authors strongly advocate that students become ever more able to engage in “close analysis” of increasingly complex textual materials. Accordingly, helpful illustrations of increasingly com- plex materials—both literary and informational text—are supplied (NGA and CCSSO, 2010, p. 32).

Given the several ways in which the designers of the CCSS attempt to orga- nize the curricular aims embodied in their conceptualization of what students

M02_POPH0936_10_SE_C02.indd 54M02_POPH0936_10_SE_C02.indd 54 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 55

should learn, educators who wish to take full advantage of the CCSS—not only in ELA but also in mathematics—should become flat-out familiar with any CCSS segments in which they are interested. Thus, rather than being overwhelmed and/or confused by a set of seemingly off-putting curricular aims, educators can become knowledgeable about these influential collections of curriculum goals. If you’re an educator in a state where the curricular aims are decisively different from the content standards found in the CCSS, of course, you have far less reason to cuddle up to the CCSS curricular aims.

MATHEMATICS Let’s look now at the CCSS for mathematics. The working group that identified the math standards divided those standards into two groups, the mathematical practice standards and the mathematical content standards. The content standards are identified for each grade level, kindergarten through grade 8, and also at high school. These mathematics curricular aims are categorized in three ways.

Standards define what students should understand and be able to do. Here is an example of a standard: “Use place value understanding to round whole numbers to the nearest 10 or 100.”

Clusters represent groups of related standards. Here is an example of a cluster that includes the preceding standard: “Use place value understanding and properties of operations to perform multi-digit arithmetic.”

Domains are larger groups of related standards. For example, the domain “Number and Operations in Base Ten” includes the preceding clusters of related standards.

To illustrate how this sort of categorization system works, let’s take a quick look at one grade level and the numbers of domains, clusters, and standards iden- tified for that particular grade. In the fourth grade, we find five domains, each of which contains one, two, or three clusters of standards. And each of these grade 4 clusters contains one to three standards. (Several of the standards have been bro- ken down further into subdivisions reflecting two, three, or four different aspects of a student’s understanding of a given standard.) In all, then, there are 28 grade 4 CCSS standards with, if one also considers the three “subdivided” standards, a grand total of 37 curricular targets in the CCSS mathematics standards.

Turning from the math content standards, let’s look at the mathematical prac- tice standards. The practice standards remain the same across the grades because they describe varieties of expertise that mathematics educators at all levels should seek to foster in their students. These practice standards are drawn from processes and proficiencies that have traditionally occupied important places in mathemat- ics education. They are presented in Figure 2.1.

As the designers of the CCSS saw it, the Mathematical Practice standards describe ways in which those who engage in the discipline of mathematics should increasingly engage with the subject matter as they grow in mathematical

M02_POPH0936_10_SE_C02.indd 55M02_POPH0936_10_SE_C02.indd 55 09/11/23 6:01 PM09/11/23 6:01 PM

56 Ch pter 2 Dsciding ­What atto AAsAA

expertise and maturity. It was the hope of those who crafted the CCSS mathemati- cal standards that educators would make a serious effort to connect the Standards for Mathematical Practice to the Standards for Mathematical Content. Candidly, given the relatively few years that the CCSS math standards have been with us, there is still much work to be done in finding the most sensible ways to interrelate these two meaningfully different sets of expectations.

There is an important curriculum truth nicely revealed in the efforts of the CCSS architects to come up with the practice standards shown in Figure 2.1 and then urge educators to adroitly blend them with the CCSS mathematics content standards. This truth is that integrating different conceptualizations of standards is tough to do. For example, how does a fifth-grade teacher whomp up items to measure students’ skill in solving elementary geometric problems while, at the same time, gauging students’ ability to “reason abstractly and quantitatively” (Mathematical Practice Standard No. 2)? Whereas it is understandable that those who generated the mathematical standards would wish students to master key content and, at the same time, adopt sound mathematical practices, it is far easier to request such double-dipping than to implement it.

Turning to high school, the CCSS mathematics standards are listed as con- ceptual categories: (1) Number and Quantity, (2) Algebra, (3), Functions, (4) Mod- eling, (5) Geometry, and (6) Statistics and Probability. These six categories are believed to represent a coherent view of high school mathematics. To illustrate, a student’s work with functions can cross a number of traditional course boundar- ies up through and including calculus.

Each of these conceptual categories is broken out at the high school level by the CCSS architects. For example, the geometry category is subdivided into (1) Congruence, (2) Similarity, Right Triangles, and Trigonometry, (3) Circles, (4) Expressing Geometric Properties with Equations, (5) Geometric Measurement and Dimension, and (6) Modeling with Geometry. And each of these subdivi- sions typically comprises several further subdivisions. Slight differences also exist

1. Make sense of problems and persevere in solving them. 2. Reason abstractly and quantitatively. 3. Construct viable arguments and critique the reasoning of others. 4. Model with mathematics. 5. Use appropriate tools strategically. 6. Attend to precision. 7. Look for and make use of structure. 8. Look for and express regularity in repeated reasoning.

Figure 2.1 Common Core Standards ftor MhatWsmhatichl prhcatics

SOURCE: Nhatitonhl GtovsrntorA AAtocihatiton Csnatsr ftor BsAat prhcaticsA hnd Ctouncil tof CWisf Sathats ScWtotol OfficsrA. ll rigWatA rsAsrvsd. © CtopyrigWat 2010.

M02_POPH0936_10_SE_C02.indd 56M02_POPH0936_10_SE_C02.indd 56 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 57

for high school in the mathematical practices for the conceptual categories that, in elementary grades, had remained the same.

It should be apparent that the manner in which the CCSS mathematics standards are organized could easily become confusing. There are so many curricular targets— in different categories and at different levels—embraced by the math standards, that educators really do need to arrive at a “mentally manageable” way of representing the diverse mathematical outcomes being sought for students and expected of them.

“Many a Slip . . .” The older I get, the fewer folks I find who are familiar with some of the maxims I’ve relied on most of my life. One of those sayings is, “There’s many a slip twixt the cup and the lip!” Besides giving me an opportunity to use the word “twixt” (an option that doesn’t come along all that often), this particular maxim reminds us that as we raise our cups of hot cocoa toward our mouths or make any set of serious plans, numerous things can go wrong. Through the years, loads of laps have been scalded by errant, about-to-be-consumed drinks of one sort or another. Yet, this little-used adage still alerts us to the fact that well-conceived, headlong hopes often stumble en route to the finish line. And so it was with curricular aims such as those embodied in the CCSS.

For the most part, the CCSS curricular aims are equal to or better than the state curricular goals they were replacing—that is, the ones that were adopted by individual states during the last several decades. Indeed, a report issued in late 2011 by the respected nonpartisan Center on Education Policy in Washington, DC, seemed to confirm such an estimate of CCSS quality. After surveying school districts in the 44 states where the CCSS had been officially adopted, this report concluded that almost 60 percent of officials in those districts agreed “that the new standards in math and English language arts are more rigorous than the ones they are replacing” (Center on Education Policy, 2011, p. 1). Moreover, many district-level officials recognize the merits of those curricular aims. Where, then, does “slip twixting” mar this melodious picture? Here’s an answer to that ques- tion: Curricular aims—whether local, state, or national—are just words, and sometimes fairly meaningless words. That’s right, no matter how laudable the curricular aspira- tions embodied in the CCSS or similar goals are, if they are poorly assessed, then their positive impact will almost certainly be minimal.

Science: Not to Be Left Behind At the same time that the CCSS was coming into being, and thereby exerting a serious influence on curricular thinking about English language arts and math- ematics, another important content area was undergoing some major rethinking as well—namely, science. This thinking culminated in the Next Generation Science Standards (NGSS). Teachers who have responsibility for the teaching of science, whether in elementary school or secondary school, are certain to be encounter- ing curricular thought stemming directly from the conceptualization of science education that’s contained in the NGSS.

M02_POPH0936_10_SE_C02.indd 57M02_POPH0936_10_SE_C02.indd 57 09/11/23 6:01 PM09/11/23 6:01 PM

58 Ch pter 2 Dsciding ­What atto AAsAA

The NGSS derived from a multistate effort to create a new set of curricular aims for science predicated on the belief that there are three distinct, yet equally important, dimensions to learning science: (1) disciplinary core ideas, (2) scientific and engineering practices, and (3) crosscutting concepts. These three dimensions are combined to form every content standard. Ideally, each dimension works with the other two to help students build a coherent understanding of science over time. These important science standards were developed by a consortium of 26 states, along with the National Science Teachers Association, the American Association for the Advancement of Science, the National Research Council, and Achieve, a nonprofit organization active in the generation of content standards in English and mathematics. Unlike the development procedures witnessed in the creation of the CCSS, a major effort was made so that interested parties could review the emerging versions of these new science standards. To illustrate, not only were science-specialist reviewers called on from the participating consortium states to help shape the draft science standards, but the public was also invited to review and react to the standards that were under development. The final version of the NGSS was released in April 2013.

Central to the NGSS are the previously mentioned three dimensions: disci- plinary core ideas, scientific and engineering practices, and crosscutting concepts. Disciplinary core ideas consist of specific content and subject areas. The second dimension, scientific and engineering practices, is intended to foster students’ mas- tery of specified content and subjects, but to do so via the methods employed by engineers and scientists. Finally, crosscutting concepts help students explore inter- relationships among four domains of science—that is, physical science, life science, earth and space science, and engineering design. For example, when such concepts as “cause and effect” are made explicit for students, those concepts can help students develop a coherent and scientifically based view of the world surrounding them.

But, as we saw with respect to the CCSS in mathematics, melding differ- ent ways of describing curricular aspirations, as was done when math practices were to be amalgamated with math content, is no trivial challenge. Given that the developers of the NGSS were committed to creating a science curriculum wherein all three NGSS dimensions coalesce, of disciplinary core ideas, scientific and engi- neering practices, and crosscutting concepts, it is apparent that teachers charged with promoting their students’ scientific mastery will need ample support in the way of instructional materials, assessment instruments, and guidance regarding how best to test and teach their students.

James Pellegrino, co-chair of the National Academy of Sciences committee tasked with developing assessments for NGSS, believes that a coherent science assessment system must include (1) large-scale assessments to audit student learn- ing over time so as to evaluate the effectiveness of science education programs, (2) classroom assessments to help identify areas where students are making progress or struggling, as well as for assigning course grades, and (3) regularly collected information on the quality of classroom instruction to determine whether stu- dents have had an opportunity to learn what’s being tested via both large-scale

M02_POPH0936_10_SE_C02.indd 58M02_POPH0936_10_SE_C02.indd 58 09/11/23 6:01 PM09/11/23 6:01 PM

“rigWat-Sizing” AAsAAmsnat rsAulatA 59

and classroom science assessments (Pellegrino, 2016, p. 1). Clearly, teachers who wish to promote in their classroom assessments the conceptualization of science embodied in the NGSS will need to devote some serious thinking, ideally along with colleagues, to decide how best to implement this conceptually captivating but challenging way of thinking about how to teach science.

Reports from teachers who have attempted to both teach and test their stu- dents’ mastery of the NGSS indicate that the multidimensionality of those cur- ricular aims presents an implementation obstacle to even the most ardent NGS S proponents. Remembering that the three basic dimensions to learning science are (1) disciplinary core ideas, (2) scientific and engineering practices, and (3) cross- cutting concepts, many teachers who are not entirely comfortable with scientific content have difficulty arriving at coherent ways of blending instruction and, thereafter, measuring their students’ attainment. So, if you embark on a personal attempt to fit those three basic dimensions together in your own teaching and testing, do not be surprised if substantial effort is required. Making the distinct dimensions mesh merrily with one another is flat-out tough to do.

What’s Assessed on Your State’s Accountability Test? Teachers working with students who are on the receiving end of a state’s annual accountability tests should, without question, take into consideration what is to be assessed each year on state accountability assessments. Results from such tests are typically administered in the spring, near the close of a school year, and the results can have a substantial impact on the evaluations awarded to schools, school dis- tricts, and teachers. Accordingly, when deciding what to measure in your own classroom tests, you would be foolish not to consider what’s being assessed by such unarguably significant tests.

Because the education officials of each state must determine what sorts of accountability tests will be employed in their state, every teacher needs to find out for certain what’s to be measured by state assessments in that teacher’s state. Having done so, the teacher needs to determine whether his or her students, because of grade levels taught or subject matter taught, will be assessed each year regarding what the teacher was supposed to be teaching. Obviously, what’s eligible for assessment on such high-stakes tests should have a major impact on what the teacher attempts to teach and what the teacher tries to assess.

Even teachers whose students will be assessed only in later years should become conversant with what’s to be measured in those assessments to come. For example, teachers of first-graders or second-graders—whose students will not be tested each year—should surely consider what’s to be tested in the third grade, when state-level accountability testing begins. Similarly, a 9th-grade English teacher whose students won’t be taking an accountability test until the 10th or 11th grade should still look into what will be assessed in English language arts when accountability testing resumes.

M02_POPH0936_10_SE_C02.indd 59M02_POPH0936_10_SE_C02.indd 59 09/11/23 6:01 PM09/11/23 6:01 PM

60 Ch pter 2 Dsciding ­What atto AAsAA

Just a few years ago, many educators still clung to a belief that because of more widely accepted common curricular aims, such as those embraced in the CCSS and the NGSS (or in modest variations of those content standards), it might still be possible for a small set of accountability tests to measure various states’ mastery of essentially identical curricular goals. Yet, in the words of the ancient Greek philosopher Heraclitus, who claimed that “change is the only constant in life,” we seem to be in the midst of an era of all-out assessment flux. State education authorities are, with relatively few exceptions, going their own way. Although it seems likely that small groups of states may collaborate to develop mutually acceptable curricular goals and accountability assessments, the vision of an accountability testing system in which test results from nearly all states could be meaningfully contrasted seems as dead as the proverbial dodo.

For a variety of reasons, such as (1) cost considerations, (2) a public backlash against too much testing, and (3) the perception that the Common Core State Stan- dards had been a federal rather than a state initiative, many states have withdrawn their acceptance of the CCSS altogether, only to adopt a set of state-determined curricular aims and state-commissioned assessments of how well the state’s stu- dents have achieved those aims. Accordingly, what’s to be assessed by important external tests will have an impact on a teacher’s decisions about the content of classroom assessments—especially if the teacher’s own students will be taking those assessments immediately or in later years. This doesn’t mean that whatever is tested on a later, important test needs to be completely mirrored by a teacher’s classroom tests. But some serious scrutiny of what’s to be measured by any sub- sequent significant assessment is called for.

It is invariably expected that a state’s accountability tests should be aligned with the state’s official curricular aims. Let’s look briefly at the term alignment. In the field of education, alignment is the only 9-letter word that seems to function as a 4-letter word. This is because many people play fairly fast and loose with a term that, by its nature, seems to evoke fast and loose play. Alignment is the degree to which there is a meaningful agreement between two or more of the following: curriculum, instruction, and assessment. What is generally called for, of course, is the agreement of a state’s accountability tests with the state’s content standards. To the extent that such alignment between curriculum and assessment has been achieved, teachers would have to be ninnies not to recognize that it is sensible to also align their classroom curricula and assessments with whatever state content standards are to be assessed by the accountability tests in those teachers’ states.

A major difficulty with the concept of alignment, unfortunately, is that it is decidedly squishy. What constitutes alignment in one educator’s eyes may be seen as nonalignment by another educator. And there is often much self-interest in “seeing alignment where none exists.” So, whenever you hear someone touting the alignment among instruction and curriculum and assessment, try to dig a bit deeper so you can see whether the alignment being described is based on rigor- ous or relaxed ways of determining agreement levels. (An excellent guide to the determination of alignment has been provided by Cizek et al., 2018.)

M02_POPH0936_10_SE_C02.indd 60M02_POPH0936_10_SE_C02.indd 60 09/11/23 6:01 PM09/11/23 6:01 PM

whAW in h Ssh tof AAsAAmsnat CWhngsA 61

Awash in a Sea of Assessment Changes One challenge regarding how to carry out educational assessment in the midst of today’s meaningful assessment flux becomes apparent when we see G. Leef, in an issue of the National Review (April 22, 2022), supporting Richard Phelps’s belief that when the federal government installed the No Child Left Behind Act (NCLB) “government has made a terrible mess of educational testing.” Leef points out the essence of Phelps’s criticism is that in turning over to local educa- tors the chief responsibility for policing the very process that’s used to evaluate local schools and districts, all hands-off objectivity vanishes. We are, as Leef puts it, “judging schools and teachers based on those NCLB test scores they them- selves proctor.” Clearly, the more intimately that local educators are involved in accountability-focused testing, the more changes we may need to make to our evaluation and assessment strategies during the years ahead.

Then, too, one of the most intensely defended benefits associated with pub- lic school teaching is tenure. Although frequently misunderstood as guarantee- ing an instructor a lifetime position—and that’s not a trivial guaranty during COVID-triggered changes in the way the United States operates its public schools. Interestingly, we often see meaningful disagreements regarding tenure in higher education that, in time, are almost certain to be tussled with by K–12 teachers and their representatives. Walt Gardner, a former high school English teacher in Los Angeles, has, for more than a decade, published thought-provoking opinion pieces in America’s most widely read periodic publications. For example, in an essay featured by the James G. Martin Center (June 19, 2022), Gardner raises serious concerns about preserving tenure’s positive features for both higher education and K–12 schools.

Gardner concludes that “some colleges are now breaking with the past in addressing tenure. In October 2021, for example, the Georgia Board of Regents approved a new policy that allows administrators in its public universities to remove a tenured professor with little or no faculty input. Whether demonstrably bad teaching might lead to such an outcome remains to be seen, but there is reason to hope.” Will future turmoil in the collection of tenure-relevant evidence oblige assessment specialists to urge evidence-collection tactics that are dramatically dif- ferent than those we currently employ? This is another “time-will-tell” issue likely to involve those on both sides of any pro–con tenure-focused disagreements.

Other Exams You Should Know a Bit About Although classroom teachers should be at least aware of key international assessments of students’ learning, as well as the most prominent national assessment of students’ achievements, rarely will the content of those exami- nations have a meaningful impact on what teachers attempt to assess with their own students. Nonetheless, to be totally unaware of such large-scale

M02_POPH0936_10_SE_C02.indd 61M02_POPH0936_10_SE_C02.indd 61 09/11/23 6:01 PM09/11/23 6:01 PM

62 Ch pter 2 Dsciding ­What atto AAsAA

standardized educational tests would not be the mark of a knowledgeable educational professional. Accordingly, here is a quick summary of the best known of these exams.

The National Assessment of Educational Progress (NAEP) is a congressio- nally mandated project of the U.S. Department of Education. It assesses U.S. students’ knowledge and skills in geography, reading, writing, mathematics, science, U.S. history, the arts, civics, and other academic subjects. Since 1969, NAEP has periodically surveyed the achievement of students at ages 9, 13, as well as 17 and, since the 1980s, in grades 4, 8, and 12. NAEP was originally intended to measure U.S. students’ levels of academic achievement over time. Because NAEP scores, for several of its subject areas, are employed to rank states’ performances, NAEP results typically generate considerable interest— especially for those states ranked high or low. Due to the sampling procedures used, NAEP scores, however, are not reported for individual students, schools, or districts—other than two dozen or so of the nation’s largest school districts. Given the purposefully national orientation of NAEP, it is remarkably diffi- cult for classroom teachers to derive practical instruction-related insights from what’s assessed by the NAEP tests. Thus, although NEAP is widely regarded as “the gold standard” of educational testing because of the technical care with which its tests are developed, administered, and analyzed, its implications for classroom assessments are limited.

Turning from national to international assessments, one soon seems to be operating in an alphabet soup of testing labels. The oldest of three best-known international educational assessments is TIMSS (Trends in International Math- ematics and Science Study), which was initially administered in 1995. TIMSS mea- sures students of 10 and 14 years of age every 4 years, and, as its title implies, it assesses students’ skills and knowledge in mathematics and science. TIMSS is managed by the International Association for the Evaluation of Educational Achievement (IEA).

In 2000 we first saw the administration of PISA (Programme for International Student Assessment), a project of the Organization for Economic Cooperation and Development (OECD). These examinations measure students at age 15 in reading, mathematics, science, and problem solving. They are administered every 3 years, and special emphasis is given to one of those 4 areas during each assessment.

Finally, IEA began offering another international exam in 2001 with the release of PIRLS (Progress in International Reading Literacy Study). PIRLS is administered to samples of 10-year-old students every 5 years, and it focuses on their abilities in reading.

Because NEAP is carried out by the U.S. federal government, the actual size of the national samples and the frequency with which different assessments are repeated depend substantially on governmental funding. For the three interna- tional studies, students from a meaningful number of nations take part. To illus- trate, for PISA we often see about 70 nations participating, for TIMSS the number of nations approximates 60, and for PIRLS more than 50 countries typically take

M02_POPH0936_10_SE_C02.indd 62M02_POPH0936_10_SE_C02.indd 62 09/11/23 6:01 PM09/11/23 6:01 PM

whAW in h Ssh tof AAsAAmsnat CWhngsA 63

part. Reports from these assessments, as well as NAEP, are available from the usual Internet sources.

Because this second chapter of Classroom Assessment is focused on a teach- er’s decisions regarding what to assess in the teacher’s own classroom, are there any content determiners for a teacher’s classroom tests lurking in these major exams and what they measure? Well, a substantial diversity of opinion exists with respect to what NAEP or the trio of international assessments offer in the way of curricular guidance or, for that matter, what their results tell us about the quality of schooling offered in the United States and, in the case of NAEP, in its various states. Yes, for every person who supports the importance of these prominent exams, there seems to be another who worries that the substantial differences among nations—and states—render the rankings of nations—and states— according to students’ results fraught with so many cautions as to render the implications of those rankings largely meaningless. If you ever decide to probe these sorts of issues in greater depth yourself, you will discover that reaching a personal decision about the instructional or evaluative merits of NAEP, TIMSS, PISA, and PIRLS is far from easy.

Moreover, given the financial demands and energy requirements of most of these formidable assessments, more than a handful of educators have registered doubts regarding the cost/benefit payoffs of these large-scale tests. These assess- ment programs are swimming in seas of prestige and respect, but are their results, though unarguably interesting, all that beneficial for the education of children in the world’s schools?

Assessment Blueprints A number of teachers find it helpful, especially when constructing major exami- nations, to develop what’s called an assessment blueprint or a test blueprint. Such blueprints can vary dramatically in the level of rigor and specificity they embody. For the most part, these blueprints are two-way tables in which one dimension represents the curricular aims the teacher wants to assess, and the other dimen- sion represents the levels of cognitive functioning being sought. For instance, the following example from a teacher’s assessment blueprint illustrates how these organizational schemes work by dividing each curricular aim’s items into knowl- edge (low-level recall of information) and cognitive levels higher than mere recall.

Number of Items

Cognitive Level

Curricular Aim Knowledge Above Knowledge

1 3 1

2 1 4

3 2 2

etc. etc.

M02_POPH0936_10_SE_C02.indd 63M02_POPH0936_10_SE_C02.indd 63 09/11/23 6:01 PM09/11/23 6:01 PM

64 Ch pter 2 Dsciding ­What atto AAsAA

Assessment blueprints can help teachers identify curricular aims that are not being measured with sufficient items or that lack sufficient items at an appropri- ate level of cognitive demand. As noted earlier in the chapter, the Bloom Tax- onomy of Educational Objectives (Bloom et al., 1956) has been around for well over a half-century, and for about 50 years it was the only system used to contrast different levels of cognitive demand. As you can see in the previous illustration, sometimes Bloom’s six levels of cognitive demand were scrunched into a smaller number—for instance, into a simple dichotomy between Bloom’s lowest cognitive level (that is, memorized knowledge) and any cognitive demand above knowledge.

However, in 2002, Norman Webb of the University of Wisconsin provided educators with a procedure for gauging the alignment between a test’s items and the curricular aims supposedly being measured by that test (Webb, 2002). A key component of Webb’s alignment process was his proposed four Depth of Knowl- edge categories, which functioned as a cognitive-demand classification system similar to Bloom’s. Because Webb’s depth of knowledge (DOK) system is often used by today’s educators not only in connection with assessment blueprints, but also when choosing among contending curricular aims, you should be familiar with its composition:

DOK Level Title of Level

1 Recall and Reproduction

2 Skills and Concepts

3 Short-Term Strategic Thinking

4 Extended Thinking

As you can see, the DOK levels range from a lowest level of Recall and Repro- duction, which is comparable to Bloom’s lowest levels of Knowledge and Com- prehension, up to a highest level of Extended Thinking, which is akin to Bloom’s highest levels of Synthesis and Evaluation (1956).

Clearly, if you are ever involved in making any sort of curricular choices among competing curricular aims or are attempting to create an assessment blue- print for your own tests or for those of your colleagues, you should become more knowledgeable about both the Bloom and the Webb frameworks for categorizing the cognitive demands reflected in different curricular goals. It is much tougher to secure agreement on the levels of cognitive challenge associated with diverse curricular aims than it first might appear.

Call on a Colleague Too many classroom teachers carry out their professional chores in isolation. If you’re trying to figure out what to assess in your own classroom, one straightfor- ward scheme for tackling the task is to seek the counsel of a colleague, particularly one who teaches the same sort of class and/or grade level you teach.

M02_POPH0936_10_SE_C02.indd 64M02_POPH0936_10_SE_C02.indd 64 09/11/23 6:01 PM09/11/23 6:01 PM

Ctollscatiton tof ­What-atto- AAsAA CtonAidsrhatitonA 65

Always keeping in mind that what you assess should be governed by the decisions to be influenced by your students’ assessment results, simply solicit the advice of a respected colleague regarding what sorts of things you should be assessing in your class. If you’re uncomfortable about making such a request, sim- ply read (with feeling) the following scripted question: “If you were I, what would you assess?” The phrasing of this question isn’t exotic, but it will get the job done. What you’re looking for is some sensible advice from a nonpartisan coworker. If you do choose to solicit the advice of one or more colleagues regarding what you plan to test, remember you are getting their advice. The final decision is yours.

If you’ve already devoted quite a bundle of thought to the what-to-assess ques- tion, you might simply ask your colleague to review your early ideas so you can dis- cern whether your planned assessment targets seem reasonable to another teacher. A second pair of eyes can frequently prove useful in deciding whether you’ve really staked out a defensible set of emphases for your planned classroom assessments.

A Collection of What-to-Assess Considerations So far, eight factors have been identified for potential consideration by teachers who are trying to decide what to measure with their classroom tests. Teachers who consider all or most of these factors will generally come up with more defensible assessment targets for their classroom assessments than will teachers who do not take these factors into consideration. Accordingly, this chapter’s chief intended outcome is that you will not only remember what these eight factors are, but will also understand the probable impact of each of those factors on a teacher’s choice regarding what to measure with classroom tests.

Presented next, then, are eight potential factors a teacher should consider when determining what to assess via classroom tests. You’ll note that each consideration is now framed as a question to be answered, and brief labels have also been added to each factor in a transparent attempt to make the factors more readily memorable. This should help if you ever choose to employ them in constructing your own tests.

1. Decision Focus: Are clearly explicated decision options directly linked to a test’s results?

2. Number of Assessment Targets: Is the number of proposed assessment tar- gets for a test sufficiently small so that those targets represent an instruction- ally manageable number?

3. Assessment Domain Emphasized: Will the assessments to be built focus on the cognitive domain, the affective domain, or the psychomotor domain?

4. Norm Referencing and/or Criterion Referencing: Will the score-based infer- ences to be based on students’ test performances be norm referenced, crite- rion referenced, or both?

M02_POPH0936_10_SE_C02.indd 65M02_POPH0936_10_SE_C02.indd 65 09/11/23 6:01 PM09/11/23 6:01 PM

66 Ch pter 2 Dsciding ­What atto AAsAA

5. Selected-Response Mode Versus Constructed-Response Mode: Can stu- dents’ responses be selected responses, constructed responses, or both?

6. Relevant Curricular Configurations: Will students’ performances on this classroom assessment contribute to students’ mastery of your state’s offi- cially approved set of curricular aims, such as, for example, a set of multistate chosen content standards?

7. National Subject-Matter Organizations’ Recommendations: Are the knowl- edge, skills, and/or affect assessed by this classroom assessment consonant with the curricular recommendations of prominent and respected national subject-matter associations?

8. Collegial Input: If a knowledgeable colleague is available, has this educator reacted to the proposed assessment targets in a classroom test to be built?

Finally, Bev Perdue—former North Carolina governor, former classroom teacher, and former chair of the National Assessment Governing Board, a national nonpartisan board that sets policy for the National Assessment of Educational Progress (NAEP was described earlier in this chapter)—in an April 22, 2022, Edu- cation Week Commentary, marshals a powerful argument that many children, particularly children of color, children in poverty, and children from rural com- munities, are currently being left behind. She pulls no punches in arguing that if the federal government does not correct our current situation by making educa- tion a bona fide national priority, a disastrous loss of talent will ensue. She cites results of a January 2022 survey indicating that “55 percent of teachers report that the COVID pandemic has prompted them to consider retiring or leaving the profession. (emphasis in original).” Perdue concludes her analysis emphatically: “The federal government has the imperative to ensure that education is a national priority, and right now, the data show it’s not.”

Summing up, then, educational testing as currently carried on in our schools is not a classic “same old, same old” phenomenon. On the contrary, we often find ourselves experiencing a flock of serious changes that will oblige us to tackle brand-new problems in brand-new ways. Rarely will you be able to deal with many brand-new assessment dilemmas merely by looking up a solution in a text- book’s table of contents or its nicely alphabetized index. No, this is a time when you will often personally need to discover how today’s challenges must be met by using tomorrow’s thinking.

Tests as Instructional Influencers Historically, teachers have administered classroom tests at or near the close of an instructional sequence. And, of course, teachers would like their students to perform well on any end-of-instruction tests. If students are successful on end-of-instruction classroom tests, it usually means that they have learned well and, accordingly, teachers have taught well. More often than not, teachers’ class- room tests have been constructed only after instruction is over or when instruction

M02_POPH0936_10_SE_C02.indd 66M02_POPH0936_10_SE_C02.indd 66 09/11/23 6:01 PM09/11/23 6:01 PM

Ctollscatiton tof ­What-atto- AAsAA CtonAidsrhatitonA 67

is nearing its conclusion (for instance, near the end of a semester, a school year, or a short-duration teaching unit). Obviously, classroom assessments devised after instruction is over will have little impact on what the teacher teaches. The teacher’s instruction, or at least the majority of it, is a “done deal” by the time the test is constructed.

Classroom assessments, therefore, have typically had little impact on instruc- tional planning. At least this was true until the era of educational accountabil- ity arrived in the 1970s and 1980s. During that period, legislators in most states enacted a string of laws requiring students to pass tests on basic skills in order to receive high school diplomas or, in some cases, to be advanced to higher grade levels. Thereupon, what educators have come to call “high-stakes” tests origi- nated. That’s because there were important consequences for the students who were required to take those tests.

There is another sense in which statewide tests became high-stakes tests. In schools where all, or almost all, students passed the tests, a school’s teachers and administrators were applauded. But in schools where many students failed the tests, a school’s teachers and administrators were blamed. The stakes associated with statewide testing were elevated even more in the 1980s, when newspapers began to publish annual statewide test scores of all districts and all schools in the state. Each school and district was ranked in relation to the state’s other schools and districts. Such ranking, of course, often spawns rancor among those being ranked.

As noted in Chapter 1, what was being measured by the tests soon began to influence the nature of what was taught in classrooms. The phrase

M02_POPH0936_10_SE_C02.indd 67M02_POPH0936_10_SE_C02.indd 67 09/11/23 6:01 PM09/11/23 6:01 PM

68 Ch pter 2 Dsciding ­What atto AAsAA

measurement-driven instruction became widely used during the eighties to mean an educational reform strategy in which it was hoped that the content of high- stakes tests would have a substantial and positive influence on the content of classroom instruction.

Because of this heightened attention to student performance on important statewide and districtwide tests, we also saw increased attention being given to the evidence of student success on teacher-made classroom tests. Many edu- cational administrators and policymakers concluded that if students’ perfor- mances on statewide tests reflected a school’s educational effectiveness, then even at the classroom level, test results could indicate whether good teaching was taking place. Greater and greater emphasis was given to “evidence of effectiveness” in the form of improved student performance on whatever types of classroom tests were being employed. Increasingly, evidence of stu- dent learning—evidence in the form of student performance on classroom tests—is being used in personnel appraisals of teachers. In many settings, a teacher’s competence is currently being assessed in terms of how well the teacher’s students perform on the teacher’s classroom tests. During the past several years, considerable pressure has been brought on educational leaders to make students’ test scores a prominent factor when evaluating individual teachers. We’ll look into recent developments regarding teacher evaluation in Chapter 15.

Given this ever-increasing administrative reliance on student test results as one indicator of a teacher’s instructional effectiveness, it was only natural that many teachers tried to address instructionally whatever was to be tested. Tests had clearly begun to influence teaching. And it was increasingly recognized by teachers not only that the content of their tests could influence their teaching but also that the content of those tests should influence their teaching.

If testing should influence teaching, why not construct classroom assessments prior to instructional planning? In this way, any planned instructional sequence could mesh more effectively with the content of the test involved. Moreover, why not build classroom assessments with the instructional implications of those assessments deliberately in mind? In other words, why not build a classroom assessment not only before any instructional planning, but also in such a way that the assessment could beneficially inform the teacher’s instructional planning? The contrast between a more traditional educational approach, in which instruction influences assessment, and the kind of assessment-influenced instruction being described here can be seen in Figure 2.2.

In a traditional approach to instructional design, an approach in which instruction influences assessment, the teacher (1) is guided by the curriculum that’s been adopted by the state and/or district, (2) plans instructional activi- ties to promote the educational objectives set forth in that curriculum, and (3) assesses students. In an assessment-influenced approach to instructional design, also indicated in Figure 2.2, the teacher (1) starts with curricular aims, (2) then moves to create assessments based on those goals, and (3) only thereafter plans

M02_POPH0936_10_SE_C02.indd 68M02_POPH0936_10_SE_C02.indd 68 09/11/23 6:01 PM09/11/23 6:01 PM

Ctollscatiton tof ­What-atto- AAsAA CtonAidsrhatitonA 69

instructional activities intended to promote students’ mastery of the knowledge, skills, and/or attitudes to be assessed. In both approaches, curriculum is the start- ing point. Curricular aims still govern the entire process. But the two contrasting approaches differ in the order in which instructional planning and assessment development occur.

Dividends of Assessment Illumination Assessment-illuminated instruction rests on the assumption that, in many instances, what is to be assessed will influence the teacher’s instructional deci- sions. Consequently, if the assessment can be planned prior to the teacher’s instructional planning, the resulting clarity will help the teacher make more appropriate instructional choices.

When teachers develop their classroom assessments before making instruc- tional plans, teachers are meaningfully clarifying the curricular aims being pur- sued. In other words, the assessment serves as the operational definition of whether a curricular aim has been achieved. Such clarity provides far more insight for teachers when, subsequently, they make their instructional plans.

It is important to remember that classroom assessments represent the knowl- edge and/or skills being promoted. Teachers should never teach toward the class- room test itself. Instead, they should teach toward the knowledge, skills, and/ or attitudes sampled by the classroom test. As you will learn in Chapter 4 deal- ing with validity, the entire edifice of educational testing rests on our arriving at score-based inferences about what’s lurking inside students’ crania. Teaching directly to a test’s specific items can seriously muck up the accuracy of such infer- ences. Happily, when teachers construct classroom assessments prior to instruc- tional decision making, they will have a markedly better understanding of what the instructional targets really are. And, of course, the more clearheaded teachers are about curricular ends, the more skillfully they can select suitable instructional means to accomplish those ends.

Instruction-Influenced Assessment

Assessment-Influenced Instruction

Instruction

Curriculum Instruction Assessment

AssessmentCurriculum

Figure 2.2 trhdiatitonhl InAatrucatiton-Influsncsd AAsAAmsnat VsrAuA AAsAAmsnat-Influsncsd InAatrucatiton, BtoatW tof ­WicW rs Gtovsrnsd by Curriculhr imA

M02_POPH0936_10_SE_C02.indd 69M02_POPH0936_10_SE_C02.indd 69 09/11/23 6:01 PM09/11/23 6:01 PM

70 Ch pter 2 Dsciding ­What atto AAsAA

Increased clarity regarding the nature of the curricular aims being promoted yields all sorts of instructional-planning benefits. Here are a few of the most salient dividends you, as a teacher or a teacher-in-preparation, will receive:

• More accurate learning progressions. Because you’ll know more clearly what the terminal results of an instructional sequence are supposed to be, you can better pinpoint the enabling knowledge or the subskills your students must acquire along their way to mastering what’s being taught. (You’ll learn more about “learning progressions” in Chapter 12.)

• More on-target practice activities. Because you’ll know better what kinds of end-of-instruction outcomes are being sought for your students, you can make sure to select guided-practice and independent-practice activities that are more accurately aligned with the target outcomes.

• More lucid expositions. Because you’ll understand more clearly what’s to be assessed at instruction’s conclusion, during instruction you can provide clearer explanations to your students about the content involved and where the instructional activities are heading.

Instructional planning will benefit from early development of classroom assessments most fundamentally because you will, as a consequence of prior assessment construction, better understand what it is hoped that your students will be able to do when instruction has been concluded.

A Profusion of Item Types Beyond the basic constructed-response versus selected-response distinction con- sidered earlier, there are numerous types of items available for inclusion in class- room assessment procedures. Certain of these item types have been employed for many, many years. The essay test, for example, has probably been around since Socrates was strutting his instructional stuff. Multiple-choice tests have been with us for more than a century. True–false tests were probably used, at least in oral form, by prehistoric teachers who might have tossed such true–false tough- ies to their students as “True or False: Today’s dinosaur was a leaf-munching Apatosaurus (or Brontosaurus) rather than a people-munching Tyrannosaurus rex.”

During recent years, however, there have been some significant advances in educational measurement procedures so that, for example, some educational measurement specialists now advocate wider use of portfolio assessment and per- formance assessment. These significant item types must now be added to the menu of assessment alternatives open to today’s classroom teacher. In coming chapters you’ll be learning how to create classroom assessments using all of the major item types now available in the classroom teacher’s assessment arsenal. And, of course, as each hour goes by, advances in micro-technology enable teachers to rely on a number of sophisticated computer programs to provide online or other gyrations

M02_POPH0936_10_SE_C02.indd 70M02_POPH0936_10_SE_C02.indd 70 09/11/23 6:01 PM09/11/23 6:01 PM

Ctollscatiton tof ­What-atto- AAsAA CtonAidsrhatitonA 71

leading to computer-configured assessments. Fortunately, states and districts are substantially expanding the range of electronic options available to today’s (and tomorrow’s) teachers.

We’re going to start off with the more traditional sorts of selected- and constructed-response items because, quite frankly, you’re more likely to be famil- iar with those item types, and you’ll have less difficulty in learning about how to crank out, for instance, a simply splendid short-answer test item. Thereafter, in separate chapters, we’ll be considering the special kinds of items to use when you’re trying to get a fix on students’ affective status.

As a preview, then, in subsequent chapters you’ll be encountering the follow- ing types of items that can be employed in classroom assessments:

• Binary-choice items (Chapter 6)

• Multiple binary-choice items (Chapter 6)

• Multiple-choice items (Chapter 6)

• Matching items (Chapter 6)

• Short-answer items (Chapter 7)

• Essay items (Chapter 7)

• Observational approaches (Chapter 8)

• Performance tests (Chapter 8)

• Portfolios (Chapter 9)

• Affective assessment procedures (Chapter 10)

Many of these types of items can be presented to test-takers via computers, so we’ll also be looking at how common this practice has become, particularly for large-scale educational exams.

As you can see, there are plenty of options open to classroom teachers when it comes to item types. When you’re choosing among item types, select the one that best allows you to make the test-based inference in which you’re interested and whatever decision is linked to it. Sometimes it won’t make much difference which types of items you choose. In many instances, a batch of multiple-choice items is not vastly superior to a group of matching items. In other cases, however, there will be substantial differences in the suitability of differing item types. To make your own choice, simply ask yourself, Will my students’ responses to this type of item elicit evidence that allows me to reach a defensible inference contributing to the educational decision facing me?

Unless, of course, you’re using a red geography textbook. If so, forget every- thing you’ve just read.

As we consider various types of items later, you’ll become more conversant with the particulars of each item type, and thus more skilled in identifying their utility to you in your own classroom assessments.

M02_POPH0936_10_SE_C02.indd 71M02_POPH0936_10_SE_C02.indd 71 09/11/23 6:01 PM09/11/23 6:01 PM

72 Ch pter 2 Dsciding ­What atto AAsAA

What Do Classroom Teachers Really Need to Know About What to Assess? What is the main message of this chapter for classroom teachers who must decide what to assess? The answer is rather straightforward. All you need to do is to think seriously about this issue. Any experienced teacher will, if pressed, confess that instructional inertia plays a key role in what usually goes on in the teach- er’s classroom. Classroom teachers find it far easier to engage in activities this year that they engaged in last year, simply because it takes less effort to employ “same-old same-old” instructional procedures than to install new ones. That’s simply because there’s also assessment inertia inclining teachers to rely on what- ever assessment schemes they’ve previously utilized. It’s so much easier to re-use or slightly massage last year’s true–false exams than to whip up a brand-new performance test.

Buat ­What DtosA tWiA hhvs atto Dto wiatW tshcWing? Most first-year teachers have only a foggy idea of what stuff they should be teaching. Oh, of course, there is often a state curriculum syllabus setting forth a seemingly endless array of knowledge and skills that students might possess. But, unfortunately, many teachers find such syllabi altogether overwhelming. So new teachers often gravitate toward whatever content is addressed in textbooks. And, when I was a first-year teacher, that’s just what I did. Along with courses in high school English, speech, and U.S. government, I also had to teach a course in geography. Having never taken a college course in geography, and knowing barely more than the four major directions on a compass, I thanked heaven each night for the bright red geography textbook that allowed me to feign geographical competence. Had it not been for that comforting red textbook, I surely would have perished pedagogically before I learned the difference between the equator and a Mercator map.

It should be apparent to you that if teachers carry out a consummately classy instructional act, but focus on the wrong curricular content, this constitutes a genuine waste. Yet many beginning

teachers don’t devote sufficient thought to what they need to teach and, therefore, what they need to assess. And many seasoned teachers also can’t provide a solid rationale for what they’re teaching.

The impact of the federal accountability legislation on most teachers’ instruction—and classroom assessment—has been considerable. That’s why you really need to find out how your state carries out its accountability program. If your state’s leaders have installed suitable accountability tests, you’ll be in a good position to key off those tests and supply your students with equally sensible classroom tests and on-target instruction. If your state’s accountability tests are inappropriate, however, you’ll frequently find yourself smack in the midst of an insoluble puzzle.

This chapter’s focus on what to test and how to test it is intended to help you recognize that you should be assessing the most important skills, knowledge, and affect you want your students to acquire. And this recognition should help you realize that those assessment foci are also what you should be teaching.

M02_POPH0936_10_SE_C02.indd 72M02_POPH0936_10_SE_C02.indd 72 09/11/23 6:01 PM09/11/23 6:01 PM

tWrss-CWhpatsr prsvisw 73

In recognition of these all-too-human tendencies to adopt the path of least resistance, it becomes more important to try to “get it right the first time.” In other words, the more up-front thought teachers give to answering the what-to-assess question, the more likely they’ll be able to dodge serious assessment errors that, because of temporal pressures, could plague them for years.

Because of the impact of federal accountability laws on you and your fellow teachers, you need to understand the nature of the tests your state’s leaders have chosen to satisfy the state’s accountability needs. You definitely need to find out whether the state’s accountability tests are winners or losers. Those tests will have an enormous impact on what you try to teach, what you try to test, and how you try to teach it.

A Three-Chapter Preview Because the next three chapters in the book are highly related, here’s a little pre- viewing of what you’ll be encountering in those chapters. Later in the book, you’ll learn all sorts of nifty ways to assess your students. Not only will you learn how to create a variety of traditional paper-and-pencil tests, such as the true–false and multiple-choice instruments, but you’ll also discover how to generate many of the alternative forms of technology-assisted educational assessment that are receiving so much attention these days.

If you’re evaluating a test—whether it’s your own test, a test developed by another teacher, or a test distributed by a commercial test publisher—you will need some criteria to help you decide whether the test should be wildly applauded or, instead, sent scurrying to a paper shredder.

Judging educational tests, in fact, is fundamentally no different from evalu- ating brownies. If, for example, you are ever called on to serve as a judge in a brownie bake-off, you’ll need some evaluative criteria to use when you assess the merits of the contending brownies. I realize your preferences may differ from mine, but if I were chief justice in a brownie-judging extravaganza, I would employ the three evaluative criteria presented in Table 2.1. If you are a brownie aficionado and prefer other evaluative criteria, I can overlook such preferences. You may, for example, adore nut-free and high-rise brownies. However, I have no personal interest in establishing a meaningful brownie-bond with you.

Table 2.1 ptoatsnatihl Criatsrih ftor Brtownis Judging

Criterion Optimal Status

Fudginess Should brim with chocolate flavor—so much, indeed, as to be considered fundamentally fudgelike.

Chewiness Must be so chewy as to approximate munching on a lump of fudge- flavored Silly Putty.

Acceptable Additives Although sliced walnuts and extra chocolate chips are permissible, shredded raw brussels sprouts are not.

M02_POPH0936_10_SE_C02.indd 73M02_POPH0936_10_SE_C02.indd 73 09/11/23 6:01 PM09/11/23 6:01 PM

74 Ch pter 2 Dsciding ­What atto AAsAA

When we evaluate just about anything, we employ explicit or implicit evalu- ative criteria to reach a judgment about quality. And although there may be sub- stantial disagreements among brownie devotees regarding the criteria to employ when appraising their favorite baked goods, there are far fewer disputes when it comes to judging the quality of educational assessment devices. In the next three chapters, we’re going to look at the three widely accepted criteria that are used to evaluate educational assessment procedures. Two of these criteria have been around for a long while. The third is a relative newcomer, having figured in the evaluation of educational tests only during the last few decades.

Table 2.2 lists the three criteria we’ll be considering in the next three chapters. The capsule description of each criterion presented in the table will let you know what’s coming. After considering each of these three criteria in its very own chap- ter, you’ll get some recommendations regarding what classroom teachers really need to know about the three criteria. In other words, you’ll get some advice on whether you should establish an intimate relationship or merely a nodding acquaintance with each criterion.

Of all the things you will learn from reading this book, and there will be lots of them, the topics treated in the next three chapters are regarded by psycho- metricians (assessment specialists) as the most important. Chapters 3, 4, and 5 will familiarize you with the pivotal criteria by which assessment specialists tell whether educational assessment procedures are fabulous or feeble. But because this text is focused on classroom assessment, we’ll be looking at the content of the next three chapters in relation to the assessment-illuminated educational deci- sions that must be made by classroom teachers.

Table 2.2 prsvisw tof atWs tWrss tsAat-evhluhatiton Criatsrih atto Bs CtonAidsrsd in CWhpatsrA 3, 4, hnd 5

Criterion Brief Description

Reliability Reliability represents the consistency with which an assessment procedure measures whatever it’s measuring.

Validity Validity reflects the degree to which evidence and theory support the accuracy of interpretations of test scores for proposed uses of tests.

Fairness Fairness signifies the degree to which assessments are free of elements that would offend or unfairly penalize particular groups of students on the basis of students’ gender, ethnicity, and so on.

Chapter Summary One major question guided the content of this chapter: What should a classroom teacher assess? It was contended that a teacher’s conclusions must be primarily influenced by the decisions the teacher hopes to illuminate on the basis of assessment data gathered from students.

A teacher’s curricular aims play a prominent role in the teacher’s choice of assessment empha- ses. In particular, large-grain-size, measurable objectives were recommended as helpful vehicles for identifying potential assessment targets. The three domains of student outcomes contained

M02_POPH0936_10_SE_C02.indd 74M02_POPH0936_10_SE_C02.indd 74 09/11/23 6:01 PM09/11/23 6:01 PM

rsfsrsncsA 75

in Bloom’s Taxonomies of Educational Objectives were also suggested as a helpful framework to decide whether certain kinds of outcomes are being overemphasized in instruction or assess- ment (1956). A state’s curricular aims, like nearly all intended curricular outcomes, will be clearly understood only when assessments to measure students’ mastery of the aims have been created.

This chapter identified eight considerations for teachers to keep in mind when deciding what to assess: (1) decision focus, (2) number

of assessment targets, (3) assessment domain emphasized, (4) norm-referenced versus criterion-referenced assessment, (5) selected- response versus constructed-response mode, (6) relevant curricular configurations, (7) national subject-matter organizations’ recommenda- tions, and (8) collegial input. Teachers who attend to all or most of these eight consider- ations will be more likely to arrive at defensible decisions about what to measure in their class- room assessments.

References Bloom, B. S., et al. (1956). Taxonomy of educational

objectives: Handbook I: Cognitive domain. New York: David McKay.

Center on Education Policy. (2011). District officials view common state standards as more rigorous, expect new standards to improve learning. Press Release, September 14. Author.

Cizek, G. J., Kosh, A. E., & Toutkaushian, E. K. (2018). Gathering and evaluating validity evidence: The generalized assessment alignment tool, Journal of Educational Measurement, 55(4): 477–512.

Francis, E. M. (2022). Deconstructing depth of knowledge: A method and model for deeper teaching and learning. Solution Tree Press.

Gamson, D., Eckert, S., & Anderson, J. (2019). Standards, instructional objectives and curriculum design: A complex relationship, Phi Delta Kappan, 100(6): 8–12.

Gardner, G. (2022, June 29). Tenure’s false promise. The James G. Martin Center for Academic Renewal. https://www.jamesgmartin. center/2022/06/tenures-false-promise/

Glaser, R. (1963). Institutional technology and the measurement of learning outcomes: Some questions, American Psychology, 18, 519–521.

IES–NCES. (2022). Trends in International Mathematics and Science Study (TIMSS). IES– NCES. https://nces.ed.gov/timss/

Leef, G. (2022, April 22). How government meddling ruined educational testing in the U.S., National Review. https:// www.nationalreview.com/corner/ho w- government-meddling-ruined-educational- testing-in-the-u-s/

McMillan, J. H. (2018). Classroom assessment: Principles and practice for effective standards- based instruction (7th ed.). Pearson.

Michigan Assessment Consortium’s Assessment Learning Network. (2019). Criterion- and norm- referenced score reporting: What is the difference? michiganassessmentconsortium.org/aln. https://www.michiganassessmentconsortium. org/wp-content/uploads/LP_NORM- CRITERION.pdf

NAEP: National Assessment of Educational Progress. (2022). About NAEP: A common measure of student achievement. NAEP. https:// nces.ed.gov/nationsreportcard/about/

National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards for school mathematics. The Council.

National Governors Association and the Council of Chief State School Officers. (2010). The Common Core State Standards initiative. Author.

NGSS. Lead States. (2013). Next Generation Science Standards: For states, by states. The National Academies Press.

M02_POPH0936_10_SE_C02.indd 75M02_POPH0936_10_SE_C02.indd 75 09/11/23 6:01 PM09/11/23 6:01 PM

76 Ch pter 2 Dsciding ­What atto AAsAA

OECD.org. (2022). PISA: Programme for International Student Assessment. OECD. https://www.oecd.org/pisa/

Pellegrino, J. W. (2016). 21st century science assessment: The future is now. SRI Education White Paper. SRI International.

Pellegrino, J. W., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of educational assessment. National Research Council, Division of Behavioral and Social Sciences and Education, Center for Education, Board on Testing and Assessment. National Academy Press.

Perdue, B. (2022, April 22). What if we treated public education like the crisis it is? Education Week. https://www.edweek.org/ policy-politics/opinion-what-if-we-treated- the-state-of-public-education-like-the- crisis-it-is/2022/04

Popham, W. J. (2014a). Criterion-referenced measurement: Half a century wasted? Educational Leadership, 71(6): 62–67.

Popham, W. J. (2014b). The right test for the wrong reason, Phi Delta Kappan, 96(1): 46–52.

Queensu.ca. (2022). Deciding what to assess. Teaching and Learning in Higher Education. Retrieved September 20, 2022, from https:// www.queensu.ca/teachingandlearning/ modules/assessments/05_s1_02_deciding_ what_to_assess.html

Renaissance. (2022). What’s the difference? Criterion-referenced tests vs. norm-referenced tests. Renaissance. https://www.renaissance. com/2018/07/11/blog-criterion-referenced- tests-norm-referenced-tests/

Webb, N. L. (2002). Alignment study in language arts, mathematics, science, and social studies of state standards and assessment for four states. Council of Chief State School Officers.

M02_POPH0936_10_SE_C02.indd 76M02_POPH0936_10_SE_C02.indd 76 09/11/23 6:01 PM09/11/23 6:01 PM

tsAating thkshwhy 77

A Testing Takeaway

Deciding What’s Worth Assessing: Prioritization Is the Key* W. James Popham, University of California, Los Angeles

A key moment arrives during the development of any educational test, whether the test is a nationally standardized achievement exam or a classroom quiz in a U.S. history class, when a decision must be made about what to assess. Depending on how the “what to assess” question is answered, today’s educators are often pressured to improve students’ performances on educational tests that measure way too many assessment targets. Let’s look at this harmful test-development mistake.

Because a chief function of educational tests is to collect evidence to help teachers arrive at accurate inferences about students’ hidden knowledge and skills, and then to teach those students, evidence from educational tests must support accurate inferences about students’ covert abilities. Thus, each cognitive skill or body of knowledge measured by an educational test must be represented by enough test items. For instructional purposes, it would make scant sense to use an achievement test intended to measure students’ mastery of 25 different things with a 25-item test—one item per thing.

Moreover, because most educational tests are intended to satisfy one dominant purpose, the nature of a test’s measurement mission is pivotal in determining how many items should measure what’s being assessed. For example, if a test is supposed to support instruction, then each thing measured ought to be assessed by enough items (say, 5–10) to help teachers arrive at an accurate, actionable interpretation regarding each assessed skill or body of knowledge.

The chief obstacle to choosing appropriate content to be assessed is the absence in most curricular decision-makers (including many teachers who build their own classroom tests) of the courage to prioritize assessable content. Yes, those deciding what a test will assess often identify too many curricular targets—too many to be taught well in the available instructional time, and too many to be measured well in the available testing time. Those determining what’s to be measured simply “want it all”! Yet having too many assessment targets, which invariably translates into too many instructional targets, often leads to superficial levels of teaching in a valiant effort by teachers to “cover everything that’s being assessed.”

If you, personally, ever have an opportunity to offer reactions to proposed test content, be sure to urge decision-makers to have the courage to prioritize. It is far better to teach students to master deeply a modest number of important, prioritized outcomes than to teach them to master too many unprioritized outcomes superficially.

*Frtom CWhpatsr 2 tof Classroom Assessment: What Teachers Need to Know, 10atW sd., by ­. JhmsA ptopWhm. CtopyrigWat 2022 by pshrAton, wWicW Wsrsby grhnatA psrmiAAiton ftor atWs rsprtoducatiton hnd diAatribuatiton tof atWiA Testing Takeaway, wiatW prtopsr hatatribuatiton, ftor atWs inatsnat tof incrshAing hAAsAAmsnat liatsrhcy. digiathlly AWhrshbls vsrAiton iA hvhilhbls frtom WatatpA://www.pshrAton.ctom/Aattors/sn-uA/pshrAtonpluA/ltogin.

M02_POPH0936_10_SE_C02.indd 77M02_POPH0936_10_SE_C02.indd 77 09/11/23 6:01 PM09/11/23 6:01 PM