week 1

chp1a.pdf

1Assessment: A Foundation

Image Source/SuperStock

Pretest

1. Assessment refers to use of tests to measure children’s achievements. T/F 2. Assessment always has the same purpose of determining if and how much children are learning.

T/F 3. One of the best ways to use assessment outcome information is to determine if changes need to

be made in methods and services to children. T/F 4. The advent of testing means that early childhood professionals do not need to be concerned

about their biases and the manner in which these biases might influence assessment outcomes. T/F

5. Federal funding of public education is linked to students’ performance on standardized tests. T/F

Answers can be found at the end of the chapter.

Introduction

Learning Objectives

By the end of this chapter, you should be able to:

ሁ Describe the evolution of assessment practices in education. ሁ Understand and use the key language of assessment. ሁ Explain how assessment assists early childhood personnel in making decisions to

continually improve the quality of services. ሁ Summarize the principles associated with quality assessment. ሁ Explain the primary shifts in education and the manner in which these trends affect

assessment practices.

Remi is an early intervention professional and has been working with infants and toddlers and their families for 15 years. She has a deep understanding of child development and knows what to expect of children at different ages. She also recognizes when children’s development is sig- nificantly or even slightly more advanced or delayed than what is expected at a given age.

When she started her career, Remi relied heavily on formal assessments to determine students’ performance level, but now she rarely does more than observe children and take mental notes. Remi is such a keen observer that she feels it is unnecessary to document in writing whether chil- dren are responding to parents’ or professionals’ intervention. The only assessments she com- pletes are those required by law.

Recently, however, Remi was asked to take on a new professional apprentice, Anika, who has completed a degree in early childhood (EC) education. Anika is eager to put theory into practice; she wants to use all the tools she studied in school. Not only is Anika aware of changes in early childhood that increasingly require more assessment evidence, she also knows that regular data collection is important for being an effective professional.

Anika writes notes, collects observational data, and keeps a record of children’s successes on work samples. She analyzes the data and compares the results often to see whether children are making progress. When she approaches Remi and suggests that, based on her observations, they work with parents to introduce rich language activities, Remi becomes defensive. Remi sees no reason to change what they are doing, but she cannot support her opinion because she does not have any assessment information that tracks her children’s language gains. On the other hand, Anika shows Remi plenty of data and compares the children’s progress to that which can be expected over time. The gap between the two groups is apparent. When the two teachers finally agree to introduce new language intervention supports to families, their continued posi- tive assessment outcomes makes a believer out of Remi.

Introduction The term education is derived from Latin words meaning “I lead” or “I conduct.” If educators’ purpose is to lead or conduct learning, then it stands to reason that they would want to know if their leadership actually results in learning. If educators’ goal is to be great educators, they would want their leadership to make a difference and result in the highest level of growth possible. This is the purpose of assessment.

Section 1.1A Brief History of Assessment

As the opening vignette illustrates, assessment is important. Many educators have strong instincts about children, and these can be helpful in some ways. However, instincts are inca- pable of truly and usefully reflecting what children know and can do. Furthermore, with- out assessment to guide instruction, valuable time and opportunities can be wasted. As one becomes aware of the positive impact of appropriate assessment in action, one can—like Remi—come to believe in the power of assessment. In order to use assessments most effec- tively, it is important to first have a solid understanding of assessment.

1.1 A Brief History of Assessment If we consider schooling across human history, the practice of measuring human traits—and particularly mental abilities—is relatively new in education. In fact, measurement of human behavior was largely misunderstood until the mid-19th century. Until recently, precisely mea- suring if and how well children learn, how children interact with their world, and their physi- cal and social well-being was neither necessary nor sophisticated. Now assessment is consid- ered not only useful but essential to educational goals.

Early Assessment Practices Prior to the 19th century, few children received a formal education. Those who did typically received individual or very small group instruction from family tutors or governors over the course of many years. As a consequence, informal and ongoing appraisal of learning was intimate and comprehensive. Tutors knew the point at which their students had mastered concepts as well as when children appeared to be incapable of learning the material. If some- one—a parent perhaps—wanted to know how a student was performing, they could ask the tutor, who would know exactly what knowledge and skills the student had mastered. There were no formal records, scores, or grades—only the description of skills learned as provided by a student’s teacher.

In this setting students were not compared to one another, and there was no evaluative scale to document growth or describe children’s performance (such as A, B, C, D, F). The only important “measurement” of the day was whether students did or did not learn the content and skills that had been set forth by their tutor. For centuries this descriptive method was the only assessment method used.

The First Grading Scale In 1792 William Farish, a tutor at Cambridge University in England, invented a grading scale with scores of 1 to 10. He thought it might help reduce widespread favoritism when evaluat- ing students at Cambridge (Sonjak, 2010). During this same time, known as the industrializa- tion era, the number of children attending school increased. It became impractical to expect teachers to know each child’s abilities intimately, now that they were responsible for 15 to 20 students instead of 1 to 3. The uniform grading system introduced by Farish and modified over the years made it possible to appraise many students, though more superficially than the descriptive method previously used by tutors and governors.

Over the centuries grading systems varied, including Yale’s categorical system in which optimi was a title given to top performers and pejores to the lowest performers. Harvard used a 100-point scale in the late 18th century before moving to the 4-point scale (0–4.0) that is

Section 1.1A Brief History of Assessment

commonly used in most U.S. schools today. In the late 19th century, Harvard instituted the first known letter grading scale (that is, A–E) but then substituted this with a Class I, II, III, IV and V scale. In the 20th century institutions moved back and forth between mostly numerical scales such as the one Farish and Harvard developed and letter grade scales.

Development of Standardized Testing Just as standard grading scales evolved to accommodate growing numbers of students, stan- dard assessment tools evolved in a similar manner. As explained, Farish introduced numeri- cal grades to reduce bias by increasing uniformity in grading. The purpose was not so much to compare one student to another but to evaluate the proficiency of individual students in meeting performance expectations. In other words, he wanted to know how well the gradu- ates demonstrated mastery of the curriculum.

To compare students to each other, Horace Mann introduced uniform or standardized writ- ten exams in the United States in the early 1800s. By the mid-19th century, grading for the sake of making comparisons across students became common practice. These comparisons helped schools assess students’ relative achievement and separated weaker students from those who qualified for higher educational opportunities (Edwards, 2006). Many countries around the world continue to use this method of testing students to determine advancement from elementary school to high school. In the United States many colleges and universities require students to take college entrance exams (such as the SAT and ACT exams) and select those with the highest scores for admission.

Standardized testing of mental abilities also arose in the 19th century when an English inven- tor, anthropologist, explorer, and geographer named Sir Francis Galton began developing tests of human traits such as intelligence. More importantly, he began using intelligence test results to derive correlations with the results of other tests, such as reaction time (Goldstein, 2012).

Galton was fascinated with understanding what factors correlated with intelligence. He famously concluded that there was a relationship between men’s mental abilities and their degree of baldness. The basis for this flawed correlation was a belief that the overheated brains of great male thinkers burned the hair off their heads (presumably, the less hair, the more intelligence). Silly as his conclusion may seem today, Galton stirred great interest in the assessment of cognitive traits.

In a later effort, Galton awarded test participants a score that revealed how smart they were, which was based on 17 measurements that included height and weight, head circumference, and the power of their punch. Still later, Galton began to measure intellectual ability by timing participants’ reactions to sounds. He refined his intelligence assessment after determining that cognitive ability was reflected in the speed with which humans can process and respond to incoming information. Galton’s creative and persistent passion for assessment of human traits sparked the intelligence testing movement that would follow.

The IQ Test Building on the early work of Galton, French psychologist Alfred Binet developed the first modern test of intelligence in 1905, the Binet–Simon scale. Binet’s test yielded a score, which would later be called an intelligence quotient, or IQ score. Initially, Binet used his IQ test to show the academic potential of French children as a way to advocate for universal edu- cation. After comparing many children, including those with and without disabilities, Binet

http://childpsych.umwblogs.org/intelligence-testing-2/binet-simon-scale/

Section 1.1A Brief History of Assessment

developed a scale based on mental age. Children’s mental age was calculated by assessing what skills they could perform relative to other children of the same chronological age (Mack- intosh, 2011). Once the mental age was determined, the IQ score was derived as follows:

Mental age × 100 = IQ score Chronological age

Binet’s test required children from ages 2 to 12 to complete a variety of tasks, including repeat- ing back a sequence of numbers, making a complete sentence from three words, and defining words (see Figure 1.1) (Kaplan & Saccuzzo, 2012). If approximately half the children at a given age passed an item, it was considered a valid mental measurement for that chronological age.

Figure 1.1: Items on a 1908 version of the Binet–Simon scale, from ages 3 to 13 years

ሁ When children were tested using the Binet–Simon scale, their actual abilities (mental age) were compared to what they should be able to do according to their chronological age. Each age level has several items, but it is expected that normally developing children will only be able to master some of them. The number of items expected to be completed is indicated next to each age level.

Age 3 (five items)

• Point to various specified parts of face

• Repeat two digits forward

Age 4 (four items)

• Name familiar objects

• Repeat three digits forward

Age 5 (five items)

• Copy drawing of a square

• Repeat a sentence with 10 syllables

Age 6 (seven items)

• Give age • Repeat a

sentence with 16 syllables

Age 7 (eight items)

• Copy drawing of a diamond

• Repeat five digits forward

Age 8 (six items)

• Recall two pieces of information from a passage

• Identify the difference between two objects

Age 9 (six items)

• Recall six pieces of information from a passage

• Recite the days of the week

Age 10 (five items)

• Create a sentence that includes three specified common words

• Recite the months of the year, in order

Age 11 (five items)

• Define abstract words

• Determine the problem with absurd statements

Age 12 (five items)

• Repeat seven digits forward

• Explain the meaning of pictures

Age 13 (three items)

• Explain the difference between pairs of abstract terms

Source: Based on Kaplan & Saccuzzo, 2012.

Whereas Binet wanted education to be more egalitarian, the U.S. translator of the assess- ment, H. H. Goddard, wanted to use the tool to maintain White upper class supremacy, as well as the rigid social class structure that existed at the time (Gregory, 2007). As a propo- nent of the eugenics movement of the late 19th and early 20th centuries, Goddard believed that the human race could be improved through selective breeding. He viewed White Ameri- cans as intellectually superior to other races and believed those who were intellectually infe- rior should be prevented from reproducing in order to further strengthen the White race.

Section 1.1A Brief History of Assessment

These beliefs were widely accepted in the United States, and laws were passed in 31 states to allow citizens with low IQs to be sterilized (Comfort, 2014). This was Goddard’s motiva- tion in translating the Binet-Simon Test of Intellectual Capacity—to have a means by which to test and systematically identify those with intellectual weakness who should be prevented from procreating. His promotion of the Binet-Simon scale popularized IQ testing in the United States and caught the interest of several prominent psychologists, including Lewis Terman at Stanford University (Gregory, 2007).

In 1916 Terman standardized Binet’s test for U.S. children and later expanded the test for older children and adults. Like Goddard, Terman intended it to be used to identify those with intel- lectual inferiority and systematically prevent them from reproducing. Terman, too, believed this was socially justified based on an assumption that crime and poverty were linked to low intelligence.

Later, Terman became more interested in studying children who scored on the highest end of the scale (see the feature box titled A Tale of Terman and the Termites), which resulted in the development of the Stanford–Binet intelligence scales. This test is still widely used today to measure intellectual abilities based on the observation that IQ is distributed across a range of abilities; most individuals have average IQ, and a very small number have very high or very low IQs.

A Tale of Terman and the Termites When Alfred Binet developed the Binet–Simon scale, it became possible to classify people based on their IQ score. Lewis Terman believed that a child’s IQ score could be used to predict his or her success as an adult. Terman began his research, the Genetic Study of Genius, in 1921 with 1,528 children aged 5 years (mostly from California) who scored above 135 on the IQ test. Terman tracked these geniuses, whom he dubbed “Termites,” and their respective accomplishments for the next 35 years. In 2000, nearly 50 years after Terman’s death, the subjects were still being tracked, making this the longest running research project in history.

Unfortunately, this study is marred by two major f laws. First, Terman became a close mentor to many of the participants, which introduced the possibility of bias in his interpretation of outcomes. Second, and more importantly, Terman did not compare his group of geniuses to other children, so there is no way of knowing if these children were more or less successful because of their IQ score. Later analyses showed that although the Termites’ accomplishments were good, they were not remarkable once family resources were taken into account. That is, many termites did very well as adults, but others led only average lives. What tended to distinguish one from the other was the income level of the child’s family when the child entered the study. Even without conducting IQ tests, it is likely that any randomly selected group of children that share the Termites’ same income levels would have similar outcomes (Leslie, 2000).

The lesson from Terman’s research is the fallacy that scores on IQ or other standardized tests can make accurate predictions about children’s future accomplishments. Rather, these tests tell one story—that individual growth and development is significantly affected by children’s educational opportunities and life experiences. These outweigh early performance on tests, especially when just one test is used to measure a child’s traits.

(continued)

http://files.eric.ed.gov/fulltext/EJ781688.pdf

Section 1.1A Brief History of Assessment

A Tale of Terman and the Termites (continued) Critical-Thinking Questions

1. If an IQ test is incapable of reliably predicting a child’s future, what purpose might it serve?

2. Do you think other types of assessment of young children (and older children) are vulnerable to the inf luence of family resources in the way that performance on IQ tests is inf luenced by wealth and opportunity? What are the implications of your conclusions?

Source: Adapted from Leslie, M., 2000. https://alumni.stanford.edu/get/page/magazine/article/?article_id=40678

Uses for Assessments Intelligence testing set the stage for development of a wide range of behavioral assessments whose results would allow individuals to be sorted and classified. Indeed, in the early 20th century, there was an escalation in development and widespread use of formal testing. Dur- ing and after World War I, Terman and Harvard professor Robert Yerkes developed, and the U.S. government used, the army mental tests to help determine who would thrive in armed forces leadership positions (Edwards, 2006). The tests themselves were deeply flawed, with many items based on assumptions of class and race and often having nothing to do with men- tal ability (such as knowing the rules for a card game, or naming the city in which a rare and expensive car was manufactured) (Owen, 1985).

Yerkes’s test was revised slightly and renamed the Scholastic Achievement Test (SAT) by Carl Brigham, a reformed eugenics proponent and psychologist at Princeton University. The SAT continues to be used today as a college entrance exam (Bock, 2014). Brigham’s SAT test was designed to predict success of candidates in higher education and was preceded by standard- ized achievement tests used in public schools. For example, in 1922, Samuel Brooks, a super- intendent of schools in New Hampshire, proposed using standardized scores on tests given at the end of each school year in English, mathematics, history, and geography to determine if stu- dents met the minimum competencies for moving to the next grade. Brooks also used students’ scores on these tests as an incentive for teachers, by linking them to pay raises and bonuses.

The idea that standardized intelligence and achievement tests could be used to make impor- tant decisions about children infiltrated schools largely because of the Educational Testing Service, a noneducational, nongovernmental agency started in 1947 (Bock, 2014). This cor- poration, which publishes the SAT, has tirelessly promoted and developed standardized test- ing as a way to measure competencies. As a demonstration of its confidence in standardized tests, by 1965 the federal government mandated that schools report test results to prove their accountability for federal funds they received (Scott, 2004). This requirement marked the beginning of a testing culture that is evident in contemporary schools.

The Tracking Movement During the 1960s and 1970s, schools began to use assessments to track or categorize students by test performance (or perceived ability) and then sort them into high-, average-, and low- performing groups, with the intention of increasing educational efficiency (Edwards, 2006).

https://alumni.stanford.edu/get/page/magazine/article/?article_id=40678

http://libcudl.colorado.edu:8081/MediaManager/srvr?mediafile=MISC/UCBOULDERCB1-58-NA/1508/i73498403.pdf

https://www.ets.org/

Section 1.1A Brief History of Assessment

The logic behind this system was that if schools put all high-achieving students together, they would not be held back by slower learning students and be able to advance more quickly. Similarly, putting all lower achieving students together would permit schools to more closely match curriculum to their learning level. In other words, professional educators believed that teachers could provide more targeted and effective education when students were placed in classrooms according to test scores.

The tracking movement was later discredited when it was found to result in several unin- tended consequences. For example, lower performing students, students with disabilities, and minority students were found to do far worse when segregated by test scores than they did when placed in more heterogeneous groupings (Werblow, Urick, & Duesbery, 2013). By placing low-performing students together, children quickly realize (either by deduction or through bullying) that their potential to learn is limited. After they conclude that their work is unlikely to yield success, they often respond by putting forth minimal effort and having low expectations (Rist, 2014). Further, middle- and high-achieving students did not perform appreciably better in tracked classes. It is possible that factors leading to their success—such as expectations of success, enrichment opportunities, quality of teachers and curriculum, and wealth and social skills—were unchanged even though they were grouped with similar stu- dents (Celik & Ekinci, 2012).

Using assessments for tracking highlighted the important relationship between outcomes and expectations of children (by both families and teachers). On the basis of findings that tracking did not help most children, many schools began to de-track by the 1980s. However,

in the 21st century tracking has again become a wide- spread solution to concerns that U.S. children are not keeping academic pace with children in other nations (Mehan, 2015). It is too early to say if the current move- ment will have the same results as the earlier tracking movement.

The Accountability Movement In 1983 the National Commission on Excellence in Education published A Nation at Risk, a report that marked the first of many warnings that U.S. schools were falling behind those of other nations. The investigation that led to the report was fueled by concerns that the United States was declining both economically and militarily. The report concluded that compared to previous years, U.S. students’ illiteracy rate was higher; test scores were lower (compared to other developed nations); and tests scores worsened as students moved from lower to upper grades (Wong, Guthrie, & Harris, 2014). Once the world’s standard for educational excellence, U.S. schools became criticized for lacking innovation and responsiveness to the emerging dig- ital age. Initial reforms proposed to repair America’s slide in international rankings included improving teacher quality, increasing student engagement time, raising standards, bolstering academic content, and improving school leadership (Wong et al., 2014).

Despite the recommendations and subsequent response to A Nation at Risk, U.S. students have continued their downward trajectory on international rankings of science, mathematics, and literacy (Carnoy & Rothstein, 2013). A flurry of reforms was offered to fix America’s so-called troubled schools, including school choice, charter schools, voucher systems, teacher merit pay, back-to-basics curricula, standard curricula, and accountability (Mathison & Ross, 2013). Of these reforms, accountability—which is based on standardized test scores—emerged as the primary means by which schools are controlled (Mathison & Ross, 2013).

Reflection Why might tracking be on the rise again, if it was unsuccessful when first implemented?

http://www2.ed.gov/pubs/NatAtRisk/index.html

Section 1.1A Brief History of Assessment

The Elementary and Secondary Education Act (ESEA) has been the primary funding and policy mechanism of the federal government since 1965. In 2002 the ESEA underwent a fundamen- tal overhaul in the form of No Child Left Behind (NCLB), a law that made school and teacher accountability for student success the centerpiece of education reform. The motivation for NCLB was the belief that U.S. students are being left behind students from nations studied by the Organisation for Economic Co-operation and Development and that the U.S rankings con- tinue to decline from year to year (Programme for International Student Assessment, 2012). At the same time, student performance on standardized tests has actually improved; so it is not a matter of students getting worse, but more that improvements in U.S. schools are not keeping pace with improvements in other nations.

To address this and other issues, President Barack Obama proposed comprehensive pre- school services for 4-year-olds in his 2013 State of the Union address. In an attempt to move the U.S. student achievement ranking up from its current position of 28 out of 38 nations, Obama called on Congress to expand quality of services to 4-year-olds across the country through the “Preschool for All Initiative” (Slack, 2014). In order to qualify for this proposed funding, quality programs would be required to include:

1. state standards for early learning, 2. qualified teachers for preschool programs, and 3. a plan for developing and implementing comprehensive data-collection and assess-

ment systems.

Arne Duncan, U.S. secretary of education, proposed improving the quality of services to chil- dren with disabilities by using assessments. In order to receive federal funding, states must show that children with disabilities have made progress through scores on the National Assessment of Educational Progress (U.S. Department of Education, 2014). Although there is much skepticism about this proposal, particularly after more than a decade of high-stakes testing fatigue, it is clear that there will be no early retreat from the belief that standardized testing is the go-to tool for improving education.

To respond to the decline in student rankings, NCLB set out to transform the relationship between education and assessment by making testing the driving force of public kindergar- ten through 12th grade (K–12) education. In order to receive federal funding, schools must achieve certain scores on federally mandated tests. Schools that consistently fail to achieve at this level risk losing their funding and may be closed. Additionally, teacher pay and tenure are increasingly linked to student performance on these tests. According to NCLB, these dual threats make schools and teachers accountable for students’ success in a way that will lead to improved academic achievement. With so much to lose, NCLB-mandated assessments that determine funding have become known as high-stakes testing.

The accountability movement has had two primary consequences. First, children in the United States are tested more often and more extensively than in any other country in the world (Ravitch, 2010). Yet there is little evidence that this culture of testing has made more than a modest difference in student outcomes (Nichols, Glass, & Berliner, 2012). Perhaps this is because standardized testing is only one means of assessment—and, for all the resources that have been invested, the least meaningful.

There is much controversy over the pervasive reliance on standardized testing as the basis for leveraging change in education. Research findings show that high-stakes testing has driven teachers to “teach to the test,” meaning teachers focus on content only if is going to be on

http://www.ed.gov/esea

http://data.oecd.org/united-states.htm

Section 1.1A Brief History of Assessment

a test. They also spend a great deal of time teaching test-taking skills rather than content (Allington & Gabriel, 2012). Schools have narrowed the curriculum to focus on those areas on which students will be tested, which sacrifices a robust curriculum that fully integrates sci- ence, social studies, and art (Milner, 2013).

Second, students tend to react to the pressure to do well with less motivation and higher stress levels, making children as early as prekindergarten (pre-K) feel like failures when they get test questions wrong (Lingard, 2013). As we will discuss in later chapters, the current standardized testing movement overshadows other forms of assessment that have greater potential to guide educators in providing high-quality education.

Challenge Complete the timeline below, identifying contributions to current assessment practices.

Year Contribution to assessment

1750

1792

Early 1800s

1850

Late 1800s

1905

1917

1922

1947

1965

1970s

1983

2003

Present

Refer to the Appendix for the answers.

Section 1.2The Language of Assessment

1.2 The Language of Assessment Over the centuries assessment practices have evolved from poorly defined to highly struc- tured. Along the way assessment has become critical to understanding learning. Today assessment is an essential educational tool that complements curriculum and instruction. The intent of this text is to help those who serve very young children develop a deep under- standing of assessment’s potential and to embrace the means by which important decisions advance the needs of children. To do this, it helps to be fluent in the language commonly used in assessment. As a broad construct, assessment is not difficult; even the tools of assess- ment are usually not difficult to understand. However, designing assessment so that educa- tors select the right tool for the job is a bit more challenging. Assessment procedures must be meaningful without being overly burdensome, and their results must be used in a way that informs the most powerful instruction.

Because of these challenges, many educators do not use assessment to its greatest advantage (Duncan, 2009). Yet it is important that educators find value in assessment, in the same way that curricula and instruction are valued. In fact, assessment and instruction are the two main parts of the whole of teaching. Simply put, assessment informs instruction, which influences outcomes (see Figure 1.2). This relationship will be examined and referred to throughout this text.

Figure 1.2: Assessment teaching cycle ሁ Assessment and teaching are connected, with one leading to the other and back again. Assessing

children informs teaching, and subsequent teaching influences learning, which is again assessed to determine progress; this further informs further teaching.

Assess Teach

Teach Assess

The term assessment is derived from the Latin word assessus, which is translated as “seated beside a judge.” This is a helpful translation, because all forms of educational assessment make a judgment about a person. A teacher who is seated beside a student makes a judg- ment about what progress the child is making in signing “more.” When an administrator men- tors a new faculty member, he or she sits beside the teacher and makes a judgment about

Section 1.2The Language of Assessment

how effectively the teacher executes strategies to guide students’ social play. Thus, whether intentional or not, educators continually make judgments and therefore continually conduct assessment.

Equally useful terms used by educators to describe assessment are measurement and evaluation. Measurement is a process of counting or using numbers to explain the amount or size of something. Evaluation takes advantage of measurement data to make a judgment about people’s behaviors or the outcome products of behaviors. To illustrate the difference between measurement and evaluation, picture a teacher named Dr. O’Dell reading a story to a group of children. As she interjects questions, she counts the number of meaningful answers, providing a measurement of listening comprehension. With this number, O’Dell is able to eval- uate the information in several possible ways: to compare children’s listening comprehension to previous performance, to compare some children’s comprehension to that of other chil- dren, to determine if children have met curricular objectives, and/or to evaluate instruction to determine if children need more help.

It is helpful to distinguish between assessment as both a process by which information is gathered and the means by which assessment is conducted. We use various procedures (pro- cesses) to assess children, and there are a range of tools (means) used to execute the process. For example, a speech pathologist may assess a student’s language development by adminis- tering (assessment process) a test (one assessment tool), such as the Expressive One Word Picture Vocabulary Test 4. Assessment may also be conducted by observing and writing down the types of words used and the purposes for which they are used, or by rating students’ lan- guage development on a skills checklist.

Types of Data All forms of assessment rely on the collection of data, which are facts or information used to make calculations or for analysis. In educational assessment, data or facts include written descriptions of a child’s behaviors during dramatic play. Collecting these descriptions over time permits analysis of behavior patterns, which can then be used to make intervention decisions.

For example, suppose Ms. Peabody observes her new student, Beth, playing alone. This sin- gle day of isolated play will probably not raise any red flags, since Beth was familiarizing herself with her new environment. On Beth’s second day, however, Ms. Peabody notices that she is again playing alone. After seeing the behavior repeated, Ms. Peabody is now more concerned and begins to purposefully look for interaction patterns. In recording the types of interactions, their length, and the types of activities during which peer interactions occur, Ms. Peabody discovers that Beth rarely sustains joint attention with peers for more than 30 seconds and exhibits a pattern of disengagement as soon as other children take their turn in play.

The type of anecdotal or descriptive data Ms. Peabody collected is referred to as qualitative data. It is possible to be very descriptive about children’s behavior using qualitative data and to consider factors that might not reveal themselves if simply looking at numbers. That is, an observer of this same child might collect data by counting the number of times she initiates contact with another child. By only counting the number of interactions, this observer might miss the tendency of the child to make contact when there is only one other child nearby.

http://www.pearsonclinical.com/language/products/100000338/expressive-and-receptive-one-word-picture-vocabulary-tests-fourth-edition-rowpvt-4-eowpvt-4.html

Section 1.2The Language of Assessment

On the other hand, an observer collecting only qualitative data may not realize that there is a 10% increase in the number of contacts made each week. The observer who counted initiations is collecting quantitative data. So, when data give rich information about the quality of behavior, the data collection is qualitative; when precise information about the quantity of behaviors is collected, the data are quantitative. Table 1.1 compares the two types of data.

Table 1.1: Comparison of qualitative and quantitative data

Qualitative data Quantitative data

Sam calms down when I give him a texturized toy to hold.

Ari named 27 animals in the zoo book.

Tarik will work until she has found the right place for a puzzle piece before moving on the next piece.

Justice asks for help an average of 17 times per day.

Rocky will count to 20 one day, but then he only counts to 5 the next.

Megan standing jumped 2.6 feet.

Lilith will not go outside unless accompanied by an adult.

Kori named 15 letters with 90% accuracy.

Formative and Summative Assessments There are two major types of educational assessment: formative and summative. Educators use many kinds of formative assessments on a daily basis. The term formative explains its purpose; a continuous gathering of evidence as children’s knowledge and skills are formed in order to make adjustments in intervention as needed. These tools provide ongoing informa- tion regarding the speed and accuracy of student progress. Formative assessments include but are not limited to systematic and informal observations of students, skills checklists and rubrics, tests, and the products of daily activities.

Although each of these tools and many other useful formative measures will be discussed thoroughly in later chapters, it is worth noting that of the two types of assessment, forma- tive assessment is the most important for improving the quality of education. As evidence of this, Australian educational researcher John Hattie (2009) compiled and synthesized decades of research and ranked educational practices according to their relative effective on student achievement. Of the 138 factors ranked (which included teacher preparation, socioeconomic status, typical multimonth summer breaks, homework, and parental involvement), conduct- ing formative evaluations ranked 3rd and providing feedback (an aspect of formative evalu- ation) ranked 10th. Moreover, when teachers use data to provide formative feedback, their impact on student achievement is even higher. An example of putting the two together would be, “Evan, your castle is so creative (feedback); before you had one main gate to get in and out, but with your tunnels, your knights now have a way to get in and out without being seen by the dragon (formative).”

Summative is a term that also describes its purpose—to sum up learning. Rather than being conducted on an ongoing basis, summative assessments take place at the end of learning. Learning may take place over the course of a unit (for example, forest animals); a regular time

Section 1.2The Language of Assessment

frame (for example, beginning, middle, and end of the year); or a treatment (for example, mental health intervention); after which assessment is used to determine if or how much learning took place. Typically, summative tools are used to make important programmatic decisions about a child, a classroom, or a program. For example, when a speech therapist wants to know how much language progress a toddler has made from a home intervention over a 6-month period, he or she measures growth by administering a summative assess- ment. Common forms of summative evaluations include curriculum and standards-based assessments, standardized ability or achievements tests, and screening tools.

Types of Summative Assessments The best known type of summative tool is a standardized assessment. Standardized assess- ments include tests used for screening, diagnostics, and ability testing. Each of these types of assessment have one thing in common: They have a standard scale against which individu- als taking the test are compared. A scale is a range of scores from low to high (for example, 1–10 or 1–100). For a standardized test, the scale is developed by collecting data from a large sample of individuals who are believed to represent the characteristics of those who will be taking the test. The scale establishes norms, or the range of scores that represent “normal” or average behavior for a particular age or grade level. Those individuals with scores that do not fall within the normal range on the standard scale are more rare and may be considered special, either because they score unusually high or unusually low.

Two important technical features that distinguish useful tests from those whose results should be used cautiously (if at all) are validity and reliability. Although a more detailed explanation of these two qualities will take place in a later chapter, in short a test is considered valid if it tests what it is supposed to test. For example, a teacher wishing to measure children’s early number sense would want an assessment that measures understanding of counting, number names and order, and relative amounts. If the assessment measures addition and subtraction, this would not be a valid test for number sense. A mathematics test may also be rendered invalid by designing problem-solving items that require reading from children who are not good readers. The reading requirement makes the test invalid for any child who has difficulty reading but might otherwise be able to solve a mathematics problem. If a test is not valid, it is not useful.

A second feature of standardized tests is reliability. A reliable test is one whose results can be depended on to yield a similar score for a child every time it is administered. Therefore, reliability refers to the degree to which an assessment consistently measures targeted behav- iors. If a test is not reliable or dependable, it is not useful.

Screening tools are standardized assessments used to quickly and easily identify children who may be special in some way. Often used with large groups of children, screening instru- ments may also be administered individually. When a professional wishes to determine if children within a group or individual children are at risk of developmental, health, or achieve- ment delays, a screening tool provides an indicator, or “red flag,” without having to conduct time-consuming and expensive comprehensive assessments. If children fall far below the norms on a screening (in developmental areas such as emotional development, visual acu- ity, motor skills, and language competencies), then more comprehensive assessments might be conducted. These quick and easy assessments are not comprehensive and consequently

Section 1.2The Language of Assessment

tend to be less valid and reliable. This makes their results unsuitable for important decision making. However, because of their ease of administration, they are quite useful for identifying those children to whom greater attention should be paid.

Curriculum-referenced assessments may actually be formative or summative, since ongo- ing assessments monitor progress within a curriculum as well as children’s overall progress at natural transitions, such as the end of a unit or the year. For example, suppose Mr. Cruella, a kindergarten teacher, uses a curriculum titled Let’s Begin with the Letter People for literacy instruction, which has built-in assessments that he can use on a daily basis. Mr. Cruella also periodically compares children’s progress to standards for early literacy to determine if chil- dren are below, at, or above state kindergarten benchmarks. This may be accomplished by using a range of standardized curriculum-referenced tools such as Aimsweb, which can be conducted quarterly for all children and weekly for children who are at risk and receiving supplemental instruction.

Challenge Check the column(s) where the descriptor on the left is a feature of one or both of the two types of assessment.

Description Formative assessment Summative assessment

Used infrequently

Used frequently

Observation

Checklists

Rubrics

Tests

Child work products

Standardized assessments

Screening

Ability or achievement tests

Curriculum- or standards- based assessments

Refer to the Appendix for the answers.

http://ies.ed.gov/ncee/wwc/interventionreport.aspx?sid=271

http://www.aimsweb.com

Section 1.3The Assessment-Learning Process

1.3 The Assessment-Learning Process There are many models of the assessment-learning process, and they all have at least three stages: assessment, planning, and teaching. Each stage in the process leads to another and eventually back to the first, where the cycle begins again. Another way to consider the assess- ment-learning process is a spiral in which assessment, planning, and teaching are progres- sive (see Figure 1.3). One important thing to note about the cycle is that it begins and ends with assessment. In between assessment there is continuous planning, teaching, and learn- ing. Throughout the spiraling process, there are many points at which data are used to keep children on the learning pathway.

Figure 1.3: Spiraling formative assessment process ሁ Although it may appear that assessment and teaching cycle, there is actually a spiral effect,

whereby each assessment should reflect more advanced learning. Teaching plans then build on student growth, and learning advances even further, as reflected in assessment. This process is dynamic, with advances in learning influencing what is assessed, and assessment influencing more advanced teaching, and so on.

Assessment

Teaching

Learning Learning

Learning

Planning

Assessment

Knowing Where to Begin Children of the same age or grade level do not develop at the same rate. Therefore, a one-size- fits-all approach to instruction is not usually effective, and sometimes a teacher may unknow- ingly proceed with instruction that is not working for any children. Before teaching begins, it is essential that educators know where to start instruction. If instruction begins with a skill a student already possesses, time is wasted. At the same time, beginning instruction for an advanced skill when a child does not possess prerequisite skills can be frustrating for both

Section 1.3The Assessment-Learning Process

the student and the teacher. Finding the point at which a child is most ready to learn offers the most promising educational beginning (Trumble, 2013).

In order to find the path’s best starting point, it is necessary to carefully measure the behav- ior or traits under consideration. Instructional efficiency is often described by three levels of learning: frustration, acquisition, and mastery (Alberto & Troutman, 2012). Children experi- ence a high degree of frustration when instruction is such that they make more than 30% errors, and most behaviors are considered mastered when children make fewer than 10% errors. In both cases instruction is less efficient than it could be. In the first case children make too many errors and can become easily confused or give up; in the latter case children are spending valuable instructional time practicing skills they already know. So the beginning point should be somewhere between 70% and 90% accuracy, where a child is growing while experiencing a good deal of encouraging success.

The following example illustrates the importance of finding the right beginning point. In an Early Head Start class, 2-year-old Imelda can run and jump faster and farther than anyone else in the class, but she still primarily uses one-word phrases. Jose, who is the same age, uses three- to four-word sentences and issues narratives that would be more typical of four-year- olds. However, Jose does not follow instructions well, and he frequently shows his frustration by throwing objects.

From this scenario, we can see that Imelda and Jose have different learning priorities. We can also see that some level of assessment has been conducted to determine present perfor- mance. This information establishes the beginning point of the children’s respective learn- ing paths. Knowing where Jose is developmentally enables a teacher to plan intervention to improve his listening and compliance skills. Other strategies will be designed to make the most of and enrich his precocious language. Imelda’s teachers, on the other hand, will focus on language development. They will try to engage her in naturally occurring social interac- tions in which more progressively complicated phrases are modeled and expected (for exam- ple, when Imelda wants a toy, she will be asked to name it with two words before the toy is given to her). Her teachers will also take advantage of Imelda’s physical prowess to teach two- to three-word phrases by modeling, “Imelda is jumping up,” and then asking, “What is Imelda doing?”

Determining the Best Path to Follow Once educators know where to begin, they can plot a path or make a plan. This stage of instruction may be referred to as goal setting. It is not possible to know what to teach unless it can be known where the teaching will lead. Once teachers know where a child is starting, they can use assessment data to determine steps that will lead toward the child’s mastery of the behavior or skill. The initial assessment to determine where to start the learning process will also provide information that helps educators use particular learning activities, teaching strategies, and the child’s interests to build the path. Educators also have the option to use the assessment information and compare it with developmental benchmarks that are linked directly to a curriculum, which will provide a direct path toward an established goal. Table 1.2 provides an example of benchmarks and corresponding objectives and goals that can be used to outline a learning path.

Section 1.3The Assessment-Learning Process

Table 1.2: Benchmarks and goals

Imelda Jose

Benchmark • Uses 3- to 4-word sentences with a verb and a noun

• Uses approximately 400 different words

• Uses words to get what he wants • Listens to and follows simple

instructions

Goal In naturally occurring interactions, will use 2- to 3-word phrases

When asked, will follow 1- to 2-step instructions as requested

Mastery achieved (end of path)

By age 3, expressive vocabulary of 400 words; 3-word phrases in 80% of conversational turns

Completes request within 10 seconds, 4 out of 5 times

Source: Coffey, Guttentag, Johnson, Landry, & Zucker, 2012.

Knowing When the Path Is Complete As mentioned, educators would not want to teach a child something the child already knows, nor would they want to stop teaching before mastery is fully achieved. By referencing the goals they have established, regularly assessing student progress lets educators know when the end of the path has been reached. The most salient reason for assessing young children is to answer the question, “Has the child’s learning advanced?” There are several ways to determine this.

First, a teacher may compare a child’s initial performance with his or her present perfor- mance. For example, if a child was using two-word phrases in November and four-word sen- tences in June, a teacher may have confidence that he or she played a hand in helping the child progress. Another approach to determining progress would be to compare the child’s perfor- mance to other children of the same age. If the child is performing at about the same level or better than other children, a teacher may be satisfied that the provided learning opportuni- ties kept the child on track.

Even if a student improved measurably, a teacher should also ask, “Yes, but was it enough?” From the beginning of intervention until assessment is completed, a child may have made progress, but a professional will want to know if that progress is sufficient when compared to the growth that is expected. Another dimension of assessment is to regularly compare a child’s progress on class activities to grade-level expectations established by the curriculum. For example, it constitutes growth if Imelda has advanced from 30 to 100 words by the time she is age 3, even though she will still be far behind norms established for 3-year-olds.

Knowing When a Different Path Is Needed Children develop and learn at different rates; some require more or different intervention than others. Children who are precocious should receive intervention that challenges them socially and cognitively. On the other hand, children with disabilities may require a range of services that typically developing children do not need.

The Individuals With Disabilities Education Improvement Act of 2004 (IDEIA) requires states to provide intervention and services to children with disabilities from birth through age 21. However, in order for children to be eligible for services, they must meet certain state criteria. As an example, for 10-year-old Ryan to qualify for services under the category of intellectual

http://www.parentcenterhub.org/repository/idea/

Section 1.3The Assessment-Learning Process

disability, he must have a significant deficit in intellectual and adaptive abilities (practical skills such as self-care and social behaviors). Intellectual disability is determined by admin- istering both an IQ test and a measure of adaptive behavior. Scores on these two tests must be very far below average for Ryan to qualify for special education under this category. Either way, Ryan will follow a slightly different path than other children his age, one that is appropri- ate for his educational needs.

Children who are more advanced than others their age may benefit from enrichment services. It is inappropriate to assume that highly capable children will learn regardless of their experi- ences; these children need to be challenged and encouraged from an early age in the same way that children who are delayed need individualized attention (Subotnik, Olszewski-Kubilius, & Worrell, 2011). For example, suppose Kestrel is a bright 3-year-old whose parents are college educated; she has advantages at home that need to be supplemented in early learning settings. Unlike Ryan, however, Kestrel will probably not be entered into a special educational program, since very few states fund early childhood gifted programs (Coleman, Gallagher, & Job, 2012). The gifted field is recently evolving past the notion that gifted means “high IQ” and toward a focus on talents such as creativity, critical thinking, and music (Pfeiffer, 2012). In the past using assessments to classify children as gifted was fraught with error and controversy. This change in focus may result in clearer definitions, more accurate identification through assessment, and greater availability of services. Until then, Kestrel may receive the specialized opportuni- ties she needs by responsive and creative professionals within programs that currently exist.

Knowing If the Pace of Learning Is Appropriate Although all the purposes of assessment are important, some are more important than oth- ers. Just because a child is put on an appropriate path, he or she will not necessarily find the most direct route to its end. If a child’s route is erratic and continually derailed, the journey will be very long (see Figure 1.4). Therefore, regular and continuous progress assessment is needed to ensure a child stays the course. If a child’s progress slows or digresses, or if there are many errors, changes should be made to redirect the child. These may include trying dif- ferent activities or making changes to modeling, scheduling, prompting, and feedback.

Figure 1.4: Two paths to achievement ሁ The erratic line illustrates instruction with no formative assessment guidance, and the straight

line represents a direct path to mastery guided by formative assessment. To ensure that learning is most efficient (straight line), teachers should collect and use formative assessments.

Opinion, Anecdote Path

Formative Assessment Path

Beginning of the road

End of the road

Beginning of the road

End of the road

Section 1.3The Assessment-Learning Process

There are approximately 1,900 days from the time a child is born until the day he or she enters kindergarten. Every day counts in terms of making the most of a child’s early child- hood (Coffey et al., 2012). Generally speaking, the more frequent the assessment, the more responsive a professional may be to delays or diversions in learning pathways where children fail to make progress. A direct path toward learning mastery is the most efficient, and ongo- ing formative assessment increases the chances that progress will be direct. Detected early through formative assessment, a professional may respond quickly to put the child back on a straight path to success.

Knowing If Pathways Intersect It is important to know what, if any, conditions interfere with children’s progress. For exam- ple, suppose that through daily probes, Aisha’s teacher notices that she has stopped making progress on letter sounds after learning the first 14 in the letter skill sequence. Her teacher also notices that Aisha has difficulty paying attention in small groups for more than 1 or 2 minutes. After assessing Aisha’s ability to stay focused on a task, her teacher concludes that an intervention designed to improve task attention would help Aisha learn letter sounds more efficiently. In this way, assessments are used to find the intersection between the academic learning pathway and social behavior pathway. By connecting the two through assessment, Aisha’s teacher can align the pathways to work together.

Comparing One Child’s Path to Another’s Some of the most common educational types of comparisons are achievement, intelligence, language, and motor development. In comparing a child to what would be expected based on the child’s age or grade, it is possible to determine if he or she is ahead or behind (and if so, how far). Group comparisons are also made; one school’s performance can be compared to other schools’ performances, and state performances are compared to other states. School districts often rank the quality of each school in the district, from best to worst, according to comparisons of overall achievement of students within each school. States are similarly ranked based on achievement in the arts, reading, science, writing, mathematics, and other areas (National Assessment of Educational Progress, 2014).

Such comparisons are useful for several reasons. For example, a school district might decide that a lagging school should receive additional resources and support in order to raise the level of achievement. As mentioned earlier, accountability means that educators and educa- tional programs must prove that they meet the public’s expectation of quality by performing well compared to past performance and compared to the performance of children in other schools, school districts, states, and nations. For individual children, knowing that a child or group of children trails their same-age or same-grade peers should trigger the investment of different or additional resources to try to close the achievement gap.

Knowing If the Path Is Being Properly Tended Those responsible for executing an intervention plan also benefit from external evalua- tion. Teachers, therapists, paraprofessionals, program administrators, and even parents are responsible for the integrity of services delivered to children.

Increasingly, school reform demands that early childhood education (ECE) personnel become more accountable for children’s performance on high-stakes testing. In some states teacher

Section 1.3The Assessment-Learning Process

employment, advancement, and merit pay are linked to student achievement gains (Corcoran, 2010). This direction in educational practice may or may not improve the quality of teaching performance, and there are other more effective and less threatening approaches to profes- sional development based on more direct teacher evaluation and feedback. A quality evalua- tion is one form of continuous teaching improvement (Danielson, 2011).

In the same way, educators may serve as coaches who have the goal of improving parents’ self- efficacy in meeting their children’s needs (Moore, Barton, & Chironis, 2013). As with the evalu- ation, planning, and teaching cycle described earlier, continuous assessment of parents’ suc- cess ensures that parent coaches provide targeted mentoring. It should be noted, however, that parents are voluntary partners; assessment of their progress should be more informal than those used to evaluate teachers. The assessment should be adapted for different parents and their interests and used only for the purpose of helping families achieve their own objectives.

For example, a professional might explain to Juan’s parents that his language development is high and that it might be helpful to ask him to “use his words” to explain when he is frustrated. The professional may offer, “If you like, I can show you how this would work” or “Would you like more information on this strategy?” Even better, in demonstrating the partnership, this professional could say, “This might be more difficult for some children who are not as verbal as Juan. Here is what I am going to try in class whenever Juan shows frustration by throwing something. Because I know he cannot listen when he is upset, I will show him how to take a couple of deep breaths and wait for him to calm down. Then I will ask him to use his words to tell me what he needs. I will tell him that after he puts the toy back, we can take care of his need, showing Juan that using his words will work bet- ter than throwing. If you try this at home, too, we can compare how well it works.” This last communication focuses on Juan’s positive abilities and is followed by a message that the parent and educator are partners. In this way assessing parent success can be authentic and nonintrusive.

Knowing If Programs Are Effective Educational organizations regularly conduct self-assessment to determine if organizational practices and outcomes align with their missions. Mission statements are often further defined by organization goals, and it is progress toward these goals that lets organizations know if they are on the right road. To do this, many forms of evidence are needed: individual and classroom progress data, teacher evaluations, family involvement, organization manage- ment, and fiscal information (Bernhardt, 2013). As an example, suppose Play A-Long Acad- emy is a private preschool that caters to affluent families that have choices about where they send their children. Parents want to know that the program is accredited by the National Association for the Education of Young Children (NAEYC), the foremost professional body in early childhood education.

To stay accredited, Play A-Long Academy conducts ongoing assessments to ensure it is always in compliance with NAEYC standards. Children are assessed on progress toward benchmarks; teachers are observed and assessed by administrators; and families and communities are invited to provide input on program quality. An aggregation and analysis of all these data help Play A-Long Academy continuously improve itself. If the information gathered suggests that one or more important program areas are weak—such as home–school collaboration or early

Reflection What skills might a parent coach need in order to successfully work with families?

http://www.naeyc.org/academy/

Section 1.4General Principles of Assessment

academic skills development—Play A-Long will be able to respond to the assessment data and work to improve these areas to maintain its accreditation with the NAEYC.

Challenge Draw a picture of a learning pathway. Along the road, illustrate all of the ways in which assessment can help children get from the beginning to the end of their path.

1.4 General Principles of Assessment In addition to using quality individual assessment tools that are valid and reliable, it is also important to consider the process—that is, how assessment is conducted. To derive the great- est benefit from assessment tools and the assessment process, several general principles apply. Evidence of best practices in assessment provides the basis for these principles. These apply to both formative and summative evaluation, which are described below (Shultz, Whit- ney, & Zickar, 2013; Spodek & Saracho, 2014).

Use Multiple Measurements No single measure of a child is sufficient for important decision making, whether the decision pertains to everyday teaching or long-term decisions regarding eligibility for services. For example, standardized tests are fallible for three reasons:

1. They measure only a snapshot of time. What if that period is influenced by whether the child had breakfast, was ill, or was anxious because his or her father was taken to jail the night before? This small slice may be misleading and cannot provide a comprehensive picture of a child.

2. Very few standardized tests have strong reliability and validity. In other words, most tests are not very good at their jobs.

3. It is difficult to design a standardized test that represents all the possible children who might be taking the test. If, for example, a test did not include children with disabilities in its norming (and few do), then it should not be given to children with disabilities.

Formative assessments are just as problematic—perhaps more so, because there is gener- ally no effort made to establish whether they are valid and reliable. Nor can these tests be used to compare children to each other, since they are not normed. In addition, formative assessments are just as vulnerable to testing bias, since they are usually not developed to represent children of different races, cultures, languages, geographic origin, and ability. How- ever, because teachers tend to conduct multiple formative assessments of student perfor- mance, these weaknesses may be overcome. The way to improve the accuracy and usefulness of evidence used to make decisions is to collect and evaluate multiple assessment sources, both summative and formative. Combining summative and formative assessment results is an excellent way to fully understand children.

Section 1.4General Principles of Assessment

Consider Your Sources In the same way that one assessment tool is unlikely to yield a complete picture of a child, a single assessor is unlikely to provide a thorough and objective assessment profile. When serious concerns arise, it is helpful to gather information from all relevant sources. Possible candidates for an assessment team include children, educators, psychologists and behavior specialists, speech therapists and other related service providers, administrators, and—most importantly—parents. Parents and other important caregivers are essential for several rea- sons. These are typically the most informed and invested members of a team. Although pro- fessionals have a different kind of understanding of early childhood education, in general, the experts on individual children are their parents and caregivers. Therefore, parents are a key source of rich and contextualized information about individual children and how effectively a program serves them. An assessment profile created by all relevant entities will unveil the most complete and objective understanding of a child.

Use the Right Assessment for the Job Selecting an assessment that will do what is needed is critical to the assessment-teaching relationship. In making the selection, the first question to ask is, “What decision is being made?” and then, “What information is needed to make this decision?” Finding the tool that will answer the question, “Did the children learn what I thought they were going to learn?” requires carefully considering the learning-path objectives. If a teacher wants a child to learn to write a short paragraph, it must be clear what the teacher expects from students prior to assessing the work. Does the teacher want creativity? Accurate facts? Topic and summary sentences? Correct spelling? More challenging than determining expectations is determining how best to measure the expectations. For example, creativity may be assessed by counting the number of new ideas or elaborate explanations, or by rating the degree of creativity on a scoring rubric.

We can think of formative assessment as a tool for answering shorter term questions (What do I do today for this child?) and summative evaluations for answering longer term questions (What decision will be best for this child’s future?). For a long time many educators have depended too heavily on standardized tools for decision making (Kohn, 2000). As a result of this overdependence, children of color are much more likely to be identified as at risk and with disabilities than White children (Jencks & Phillips, 2011). Although standardized assessments are still useful for some decisions, there is a growing understanding that the right tool for educational decision making is usually one that is ongoing in nature, is respon- sive to changes in children, and assesses whether children are making progress according to individual learning objectives and learning standards; in other words, assessments that are authentic (Brassard & Boehm, 2011).

Do It Right Standardized tests not only have standard scores, but are administered using standard proce- dures. Every time a certain test is given, the instructions are delivered using the same words. Participants respond in the same mode (such as orally or in writing). Time limits are imposed and answers are scored according to strict guidelines. Adhering to these administrative pro- tocols is essential to maintain test validity. If a test is given in one way to one student and another way to another student, the two students’ scores cannot be compared; one may have

Section 1.4General Principles of Assessment

had an advantage. When tests are given in the same proscribed way each time, there is said to be administrative fidelity.

Ensuring that assessments are conducted in a way that will yield the most accurate outcomes is important for all forms of measurement. One of the most serious issues in assessment fidel- ity is assessor objectivity, or the ability to see things as they are (Reed & Sturges, 2013). Over time, humans tend to see the behavior of others through a filter that builds without their awareness. Based on experiences with those individuals, this subjective filter may embellish or mute sensitivity to performance. In order to maintain measurement fidelity, it is helpful to have a second person observe the same behaviors or rescore a work sample.

To illustrate, suppose Bella is an infant specialist who recently earned her bachelor’s degree. She takes her first job on a Native American reservation in Montana because the state offers to pay her student loan debt for this service. She is eager and willing when she takes the posi- tion, but after a while she begins to feel isolated by her cultural differences and her lack of knowledge about Native American culture. She frequently consults her developmental mile- stones checklist to assess the infants across developmental domains. Over time it seems that the children with whom she works are more frequently and more significantly delayed. Due to her surface understanding of the population and her own growing anxiety, she attributes changing child outcomes to poverty, family chaos, and substance abuse. However, in reality the children are no different; it is Bella’s filter that has changed. Her experiences influenced her observations; they impacted her ability to assess children objectively and maintain mea- surement fidelity.

Do It for the Right Reasons The most important reason to undertake assessment is to improve the quality of instruc- tion. If one’s motivation lies elsewhere, an educator should question whether the assessment should be used at all. For example, educators often give assignments or tests in order to assign a student a grade. If these assignments are not assessed in a way that further informs student understanding or informs a teacher about the degree to which a child is mastering a concept, then the quality of instruction is not advanced.

Assessment Bias Assessments are vulnerable to several biases, such as cultural and language bias. Therefore, educators must be ever vigilant to reduce this bias. Many standardized tests, for example, are not designed for children with disabilities or those whose first language is not English. Similarly, assessments designed by professionals often introduce bias inadvertently. If chil- dren are unable to demonstrate their ability because their first language is not English, an assessment delivered in English will not serve them. Over time repeated underestimations of children’s ability can have serious consequences. It can result in lowered expectations, erro- neous recommendations for special education, and lifelong consequences of underachieve- ment. Therefore, when necessary, professionals must alter assessment procedures to permit students to work around conditions such as native language, vision or hearing impairments, learning disabilities, attention-deficit/hyperactivity disorder (ADHD), autism, and emotional and mental health disabilities.

The exception to such alterations is when using a standardized test protocol that requires “sameness.” Even then, there are times when standard protocols unfairly measure student

http://www.preschoollearningcenter.org/images/upload/developmental_checklist.pdf

Section 1.4General Principles of Assessment

progress, based on language or disability and where alterations in testing procedure are both ethical and permissible. Again, the point of assessment is to improve instruction, and if the results fail to accurately measure a student’s ability because of a language, experience, or dis- ability barrier, educators are likely to take wrong turns.

Share Outcomes With Parents Assessment outcomes are critical to effective teaching and are also required as part of a program or by state and federal policies. However, the most important stakeholders in the assessment of children’s progress are parents. As mentioned, parents are essential sources of assessment information. Likewise, parents are key recipients of assessment information. After all, the people most invested in educational outcomes are those who are committed to raising a child. Through regular communication, families may use learning updates from teachers to inform them of their activities at home, thereby supplementing instruction. Build- ing parent–educator partnerships is particularly important for children who are at risk (De Carvalho, 2014).

All parents like to hear that their children are thriving, so sharing stories of success as well as quantitative outcomes that illustrate progress can only improve the quality of relation- ships between educators and families. However, many parents of children who are challenged report that they only hear negative reports from school officials (Laivani, 2012). Most parents prefer to be in the loop when there are problems, but parents are likely to disengage if they tend to only hear negative stories about their child. At the very least, such parents will experi- ence anxiety and become likely to avoid contact with educational personnel (Zeitlin & Curcic, 2014). Therefore, balancing communication so that families hear about progress (even if it is slow) as well as information about challenges will help the two parties work together.

Sharing assessment information among team members is also a way to maximize the power of data. A child study team is one name for the collaboration that takes place between families and professionals around the needs of a child. Members of a child study team could include an educator, parents, a mental health care or social worker, a physical or occupational therapist, a speech and language specialist, a psychologist, and an administrator. Teachers that serve groups of children in a classroom or another setting are also surrounded by professionals who work closely together, which may include other classroom professionals, parents, physi- cal educators, librarians, arts specialists, administrators, and others. Regularly communicat- ing assessment information with team members improves instruction and allows profession- als to develop a deeper understanding of the children they serve.

FERPA Though educators are encouraged to share assessment information with parents, there are limitations of what can be shared and with whom. The Family Education Rights and Privacy Act (FERPA), also known as the Buckley Amendment, governs the sharing of educa- tional information. FERPA has two important tenets with which educators should be familiar (Records, 2012). First, educational programs must permit parents to review any of their child’s records, including test results, evaluations completed by educators or other person- nel, and intervention plans.

The second key element of FERPA is that it prohibits any personally identifiable records to be released to unauthorized persons. Records—including assessment information gathered

Section 1.4General Principles of Assessment

by a teacher—may be shared with others within a school if such persons have a legitimate need to know. But in order to serve a child well, otherwise confidential information may not be shared. Educators have a legal and moral obligation to refrain from casual, gossipy, or any other way of disclosing assessment information without carefully considering its legitimacy.

Challenge Fill in the blanks with the assessment term that best fits.

A 2-year-old boy named Tinker was enrolled in Early Head Start. In 2 weeks it became apparent that Tinker was not using words to express his needs. Mr. Joseph, the teacher, was worried that this might be a sign of cognitive delay, so he asked the program psychologist to administer a standardized intelligence test for toddlers. Tinker’s score was right where it should be for a 2-year-old, and Mr. Joseph realized he had used the wrong . He needed to understand Tinker better, so Mr. Joseph watched Tinker play at the sand table and collected observational data. During this observation, it was clear that Tinker was a good problem solver, and though he did not talk to his peers, he did seem to enjoy being around them. He found that Tinker used one-word phrases 25% of the time and gestures to communicate 80% of the time. By collecting measurement, Mr. Joseph now had a better understanding of Tinker.

Mr. Joseph now believed that Tinker had an expressive but not receptive language delay, and he wanted to find out more. He searched online and found a quick (free) early childhood language assessment, which was a checklist that he filled out and scored for Tinker. Based on this assessment and his observations, Mr. Joseph planned a language intervention for Tinker. Quickly, though, he could see that his intervention was going nowhere.

When he discussed this with his program language therapist (according to , he could share these data with her since she had a legitimate right to review Tinker’s records), she asked what tool he had used to assess Tinker’s language. She rolled her eyes and said, “Look at the quality of that test, Mr. Joseph, it has poor or consistency. It is really designed to measure receptive language, so it is not for your purposes. Also, it was normed on children in Tahiti so it is .” Mr. Joseph felt a little dumb, then he brightened. “But what about my observational data? This is pretty good!” The language therapist said, “Did you double-check your first observation?” Mr. Joseph admitted he had not.

“Ok, then let’s collect some data together and see if we see the same behaviors you observed,” said the therapist.

The two collected data for 3 days and then compared their results. The language therapist’s results were consistent from day to day, but Mr. Joseph’s data were all over the place. It seemed that Mr. Joseph did not follow the same observational procedures, so there was no to his assessment. Furthermore, when Mr. Joseph explained why he had scored certain communications the way he did, he seemed to be inf luenced by what he wanted to see more than what he actually saw; Mr. Joseph lacked .

While it was clear that Mr. Joseph’s heart was in the right place and that he was collecting data for the right reason—that is to —he required more intensive training to become a reliable assessor. The language therapist suggested that the first place to start in collecting information about Tinker was to visit with his .

Refer to the Appendix for the answers.

Section 1.5Current Trends in Assessment

1.5 Current Trends in Assessment Perhaps the most visible and controversial term in education right now is assessment, or more accurately, testing. This trend has largely been driven by politics but is influenced by a number of other trends that are shaping the future of education. Understanding how assess- ment policies and practices are evolving for elementary and secondary education offers a better understanding of trends specific to early childhood education that will be discussed in Chapter 2.

Authentic Assessment It is an interesting time in education, a time in which two major trends in assessment may be on divergent pathways. As mentioned earlier, heavy emphasis is placed on accountability using standardized tests. However, these tests are inauthentic in that they do not directly mea- sure the skills and knowledge children gain from instruction within their classroom curricula. Since school curricula and instructional philosophies vary from state to state and district to district (if not teacher to teacher), what children learn in school does not consistently align with high-stakes assessment items. For example, because high-stakes testing in K–12 class- rooms is completed on computers, skills that cannot be measured on a computer (e.g., social development, presentation, group, and long-term project development) cannot be assessed. To accurately measure what children are actually learning, educators need authentic assess- ments that are designed to align with the curriculum and instruction.

At the same time that schools are under intense pressure to prepare students to do well on these accountability-driven standardized tools, the value of authentic assessment meth- ods to improve teaching and learning is increasingly being promoted. Authentic assessments involve assessing children in activity-based curricula and using the products of these activities to measure learning. Authentic measurements—such as teacher observations and student work samples—have the potential to supplement, if not supplant, more traditional assess- ments. An assumption of authentic assessment is that teachers get to know children deeply by using a variety of tools that measure progress while children are engaged in developmen- tally appropriate activities that are in turn designed to help children achieve learning objec- tives. Since the goal of education is to prepare students to master the skills that will enable them to thrive outside of school, it is believed that relevant or authentic activities increase the transfer of learning from school to the real world.

It stands, then, that authentic assessment would measure the degree to which real-world skills are learned. Examples of such assessments include stories or journal entries (pictures and words), essays, scrapbooks, scale reproductions (such as of a volcano or a commu- nity), plays, surveys, photo essays, laboratory reports, murals, experiments, and exhibitions (Schurr, 2012). These examples make clear that the activity and its natural product are the assessment. Measuring not only what students know (product) but how they use what they know (process) is authentic assessment, providing the capacity to determine if both aspects of learning have been mastered in students’ work (Wiggins, 2011).

Common Core Standards Standards-driven instruction has culminated in the Common Core Standards (CCS), which have now been adopted by 46 states (states opting out of CCS are Alaska, Texas, Virginia, and Nebraska). The CCS are a set of K–12 learning standards developed by four private

Section 1.5Current Trends in Assessment

organizations and later endorsed by the U.S. Department of Education (Ravitch, 2010). The intent of the CCS was to raise the bar nationally on academic achievement and postsecond- ary skills that are thought to relate to adult success in higher education and work (Porter, McMaken, Hwang, & Yang, 2011). A second purpose of the CCS was to provide a level play- ing field (common) for teaching foundational abilities (core) for students across the coun- try so that expectations (standards) would be high for all children regardless of the state in which they reside. Raising the bar in itself has influenced assessment practices, since states have adopted national high-stakes tests aligned with the CCS such as Smarter Balanced assessments.

Even though they are widely adopted, the CCS have raised considerable controversy. Diane Ravitch, a former member of the Bush administration who helped write and advocate for NCLB, is a loud critic of the CCS (and implementation of NCLB). Her criticisms include the near complete exclusion of educators from development of the CCS; absence of any field test- ing to see if the standards had validity; very little public engagement in the development of the CCS; assumptions that are unfounded (such as that children will better learn critical thinking from informational text than fictional text); and the fact that some states actually had better and well-validated tests that were replaced by the CCS (for example, Maryland) (LaVenia, Cohen-Vogel, & Lang, 2015).

It is a misconception that the CCS were developed by the U.S. Department of Education, though one that is understandable, considering that the federal government made access to funding contingent on their adoption. In many states, adoption of the CCS has already resulted in large declines in student test scores, a factor related to higher dropout rates (Plunk, Tate, Bierut, & Grucza, 2014). Controversy related to the CCS is likely to continue, as their vision and implementation failed to include important stakeholders. For now, however, high-stakes assessment practices in the United States are largely driven by expectations of the CCS.

21st-Century Learning Skills The CCS have changed the landscape in more ways than simply raising standards for achieve- ment. In addition to emphasizing basic skills, the new standards embed priorities from 21st- century learning skills (Voogt & Roblin, 2012). Stakeholders from many fields, including business, politics, and education, believe that the 21st century is vastly different and chang- ing so rapidly that adults must possess a set of skills that are more complex and flexible than those targeted by traditional curricula.

Indeed, the world around us is so mercurial that some believe the most important skill to teach is flexibility, since today’s educators cannot predict what skills children born today will need in order to thrive as adults (Carroll, 2013). Although the term has been used gen- erally as well as explicitly by different groups, there is agreement that some of the most important 21st-century competencies are infused throughout the CCS, which include critical thinking or problem solving, creativity and innovation, digital literacy, teamwork, cultural competence, ethical and emotional awareness, and communication skills (Kereluik, Mishra, Fahnoe, & Terry, 2013). The CCS and 21st-century learning skills framework are both part of school reform efforts. Changing standards impose on teachers the dual challenges of finding ways to facilitate acquisition of these learning priorities and then finding ways to measure acquisition.

http://www.smarterbalanced.org/smarter-balanced-assessments/

Section 1.5Current Trends in Assessment

Data-Driven Decision Making Accompanying the standards-based high-stakes testing environment has been a broad trend toward data-driven decision making (DDDM), a model borrowed from the field of business (Marsh, Pane, & Hamilton, 2006). The DDDM movement has morphed into a more healthy approach to assessment, which values formative data as much as standardized testing. The emphasis has shifted, if only to be more inclusive of other forms of data, from “‘assessment of learning’ to ‘assessment for learning’” (Mandinach & Jackson, 2012, p. 16).

This trend is a departure from past decades, in which data-driven practices were viewed as a burden, if not outright opposed to educational goals. Today, however, it is recognized that for- mative data increase objectivity and accountability due to the use of multiple measures that provide a more accurate and ongoing view of children’s progress; without data, decisions are based on opinion, anecdotes, and judgments, and “that is no longer good enough” (Mandinach & Jackson, 2012, p. 28). DDDM aims for continuous improvement. To this end, it uses a variety of data-driven strategies, including:

• adapting instruction as needed to meet students’ different needs; • forming and testing hypotheses about student needs and instructional solutions; • using many sources of data for decision making; • featuring progress monitoring data that includes actual student performance in the

classroom as well as formal data (annual and intermediate standardized and curric- ulum-based benchmark results);

• ongoing modification of instructional tools based on outcome data; • conducting analyses of error patterns as well as student mastery of skills to gain

deep understanding of student learning; • relying on not just test results but many different kinds of student work to monitor

outcomes, including projects, homework, group work, and so on; • attending to outcomes and learning patterns for all children, not just those with

special needs or who are precocious; and • using data teams to analyze the data, plan interventions, and monitor outcomes

(Mandinach & Jackson, 2012).

Data Literacy Before educators can be expected to effectively use data in decision making, they need new and different ways to adequately prepare personnel to become data literate. The current sec- retary of education, Arne Duncan, cautions that educators are not fully prepared to use DDDM effectively because they are not consistently prepared for data literacy or to make day-to-day data-driven teaching decisions (Duncan, 2012). Consequently, through the accrediting bodies for college teacher education programs (the National Association for the Education of Young Children for early childhood educator preparation programs and the Council for the Accredi- tation of Educator Preparation for P–12 educator preparation programs), greater pressure is being placed on personnel trainers to teach the data-based decision-making process (Mandi- nach & Gummer, 2013).

Technology Advances in technology affect every aspect of education, and assessment is no exception. Digital tools are now used to create and assess activities, and for collecting, aggregating,

https://www.naeyc.org/

http://caepnet.org

Section 1.5Current Trends in Assessment

displaying, and interpreting data. Shute (2011) coined the term stealth assessment to refer to software-based assessment that is seamlessly integrated into the learning environment.

For example, video learning games that are both engaging and provide learning opportuni- ties may also measure social skills, critical thinking, and tenacity. Games are programmed to assess isolated responses such as what kinds of decisions are made, how long a child will continue a difficult problem, and content understanding (for example, mathematics, science). Physics Playground provides an example of stealth assessment, in which students skills are assessed as they attempt to resolve 80 different levels of physics problems. Machine-based reasoning (algorithms) in stealth assessment calculates progress toward success on learner objectives. These programs can also aid in assessment by drawing conclusions about learners that might be too complex or time-consuming for teachers to draw.

The best of these stealth assessment tools give immediate feedback and adjust learner activi- ties based on assessment information. Harnessing the power of such programs is not only easier than traditional manual assessment and individually delivered feedback to children provided by professionals, but may be a more accurate and responsive way to formatively evaluate students for the purposes of enhancing learning. Tools such as electronic “all student response systems” automatically collect and summarize classroom and individual responses using a variety of input devices such as handheld clickers or laptops. Project-based technol- ogy programs introduce a way to monitor learning as it happens, instead of at the end stage (Marshall, Petrosino, & Martin, 2010). Technology, coupled with assessment, will simplify data input to help professionals cope with increased expectations for rigorous evaluation, analyzing and summarizing data in ways that make it more useful, and providing richer analy- sis of individual learning so that instruction can be targeted to children’s needs.

Student-Driven Assessment Over the past 2 decades, there has been a growing acknowledgement of the benefit of involv- ing students in their own learning. Transferring control from teacher to student ranges from offering choices in activities to comprehensive student-directed learning. Self-assessment and peer assessment are among the strategies used to engage students in ways that will develop good judgment, self-efficacy, and self-reflection—skills that help individuals become inde- pendent learners (Willey & Gardner, 2010). As it turns out, students, including very young ones, tend to quite accurately assess the quality of their work and their understanding of con- tent (Falchikov, 2013). Since the primary purpose of evaluation is continuous improvement, it appears that shifting some of the responsibility to students has the potential to inform learn- ing at the most fundamental level. Since no one has more investment in learning than stu- dents themselves, the more they discover about their own learning through self-assessment, the better they are able to attune to this purpose (Boud, 2013).

Peer-to-peer evaluations are also widely touted as a way to build collaborative skills for working with other children. However, peer evaluators are less accurate than self-evaluators (Strijbos & Sluijsmans, 2010). One study found that 80% of feedback received by elementary students came from peers—and that 80% of this feedback was erroneous (Nuthall, 2007). It may be that although peer-to-peer assessment can be an effective approach for learning, it requires careful training and oversight.

http://www.empiricalgames.org/projects.html

Section 1.5Current Trends in Assessment

Response to Intervention Categories of children with mild disabilities, including those with learning disabilities and mild intellectual disabilities, are overrepresented in groups of poor children and children of color (Zhang, Katsiyannis, & Roberts, 2014). It is believed that many of these children become eligible for special education because of a gradually widening gap created by experiential, social, financial, and cultural differences, rather than the existence of an actual disability. An additional structural problem in special education has been a decades-long dissatisfaction by professional educators with the process used to classify children under the learning dis- abilities category.

In this process, psychologists use an indirect route to determine if a child has a disability. They administer two tests: an achievement and an intelligence test. As discussed earlier, if a child scores significantly higher on the IQ test (potential to achieve) than the achievement test (actual achievement), the discrepancy is presumed to be the result of a disability. Educators have never fully embraced this method, since the logic of the discrepancy method does not describe the learning needs of children who are served under this category (Scanlon, 2013).

Response to intervention (RTI) is a term used to describe a process of identifying children who lag in social development, mathematics, or reading, and providing a tiered level of inter- vention to help these children reach age- or grade-level benchmarks. RTI provides a possible solution to the flawed process used to identify learning disabilities and an environmentally and educationally created failure to meet children’s developmental needs. The RTI model will be discussed further in subsequent chapters, but the widespread adoption of RTI in schools across the United States is a trend affecting assessment and is worth considering here.

RTI provides relatively short and moderate to intensive supplemental intervention for chil- dren as soon as they begin to fall behind their peers in reading, math, or social skills. The process calls for intermittent assessment of all children to determine if they are at or behind grade-based benchmarks. Once children fall below a certain threshold, they qualify for RTI services (many programs consider a 25% lag to be the point at which a child should receive RTI services).

Once a student qualifies for RTI, educators establish learning goals for the student relative to learning benchmarks for their age or grade. Sometimes the benchmarks themselves are the goal, and sometimes a less aggressive goal is established if a child is far behind. More intensive intervention is provided, and the child’s progress is monitored closely and fre- quently. It is compared to a goal line on a chart to determine if intervention is helping narrow the learning gap (see Figure 1.5). Intervention continues until the learning goal is achieved and the child regains benchmark performance. If the gap does not close, teachers respond by altering intervention strategies and gradually increasing the intensity of intervention. If none of these data-based decisions result in significant improvement, or if the child does not respond to intervention, the educational team may conclude there is an actual learning disability.

Section 1.5Current Trends in Assessment

Figure 1.5: Example of RTI intervention chart ሁ In this chart, the RTI teacher has set a 5-week goal for naming letters, against which progress will

be measured. Each week, the RTI teacher monitors progress on letter naming and records the child’s score on the graph (actual progress line). Though progress was promising for the first 2 weeks, the child began to plateau during weeks 3, 4, and 5. This lack of progress is visually seen in the gap between the goal and actual progress, and informs the RTI teacher that a change may be needed.

Goal line

Actual progress line

2 3

Weeks

N u

m b

e r

o f

le tt

e rs

c o

rr e

c tl

y n

a m

e d

p e

r m

in u

4 51

Letter Naming

RTI represents a significant shift in education toward using data to make educational deci- sions. Although many mistakenly believe RTI is a special education initiative, it is actually a regular education process for children who are at risk of a learning disability but have not been identified as having one. Teachers do not directly determine if a child has a disability, but based on cumulative RTI data, they may make a recommendation that special education services be considered. Then a professional team (that includes the teacher) completes the legal process to determine eligibility in special education. Interestingly, the progress monitor- ing elements of RTI provide a powerful formative assessment model that all educators could use for all students.

Challenge Design an essay question for this chapter. Make it challenging. Write an answer to your question. Then determine how to score the answer. Ask a peer to evaluate your answer, and then evaluate your own answer without looking at your peer’s assessment. Compare and analyze the similarities and differences in the evaluations. Are the outcomes consistent with the research on self-evaluation and peer evaluation?

Summary and Resources

Chapter Summary In education, assessment is a process by which a professional collects and uses information to make judgments about students. In this field, assessment outcomes are used to advance the interests of students through data-informed decision making. From the 18th century to today, both the purposes and the means by which children are assessed have evolved considerably. Today assessment is at the core of education; it is an essential element of a cycle in which assessment informs teaching and teaching influences learning, which is again assessed to strengthen teaching.

The spiraled relationship between assessment, teaching, and learning is supported by many small and large data-driven decisions along the way. These are intended to guide children’s learning so that they know where to begin and, when they get there, how to make the journey without getting lost. These assessments also indicate when additional supports are needed and when to skip children ahead within a curriculum. Thus, instruments must be ongoing and capable of accurately measuring the targeted behaviors. This formative assessment is often complemented by more formal summative assessment, both of which are useful to professionals.

In order for assessment to effectively inform decision making, instruments and strategies for gathering data must be selected and used effectively. Professionals need tools that are valid, reliable, and free of bias. To gather the most meaningful information about a child, profession- als need to collect multiple samples of information, be well trained in collecting and interpret- ing information, and administer assessments in the way they are designed.

Terms, principles, types of assessments, and trends affecting assessment discussed in this chapter will appear throughout this text. In the following chapters, elements introduced here will be unpacked and examined in greater detail. Chapter 2 explores many of the same top- ics as Chapter 1, though it is focused singularly on assessment in early childhood education. Chapters 3 covers core principles of assessment that will be revisited throughout the text. Chapters 4 to 9 describe the purposes, design, application, and interpretation of authentic (observation, performance-based, and curriculum-based) and formal summative assess- ment (screening, diagnostic, and standardized). The final chapter discusses how assess- ment outcomes are interpreted and used for decision making at individual, classroom, and program levels.

Posttest 1. The first grading scale was developed to .

a. inform parents of children’s progress b. reduce favoritism and introduce uniformity and fairness in evaluation c. permit Farish to teach many students and easily evaluate them in order to make

education more profitable d. determine whether students would graduate from college

Summary and Resources

2. The driving force behind today’s assessment culture is . a. a fear associated with low international rankings b. concern for teacher satisfaction c. the need to provide educators with effective tools to make important instruc-

tional decisions d. to support educators’ efforts in continuing the excellence of public education

3. An ongoing assessment designed to measure progress and make decisions about instruction is . a. summative b. teacher based c. a test d. formative

4. Why would a professional conduct an assessment prior to beginning instruction? a. to avoid teaching something that is already known b. to satisfy curiosity c. to be able to report any problems to parents d. to build a solid relationship with a child

5. What would indicate that a child is taking an indirect route to mastering a learning objective? a. receiving a “C” b. being absent from school c. having a fever d. making many errors

6. To achieve the most thorough understanding of a child’s growth, a teacher should assess using . a. valid assessments that are administered without deviation, regardless of student

characteristics b. only teacher-made assessments c. multiple assessments over time d. high-stakes tests

7. What should a professional do if he or she is asked to administer a test written in English for a child whose primary language is Mandarin, even though the child understands most of what is said in school? a. Do not give the test. b. Administer the test in English. c. Ask the child’s brother, who speaks English well, to administer the test in

Mandarin. d. Adapt the parts of the test that the child does not understand.

8. To ensure that professionals both share students’ assessment data with families and keep assessment data confidential, Congress passed . a. IDEIA b. NCLB c. NAEYC d. FERPA

Summary and Resources

9. Which of the following are considered 21st-century skills and are key to the Com- mon Core Standards and high-stakes tests? a. art and science b. critical-thinking and creativity c. social skills and teamwork d. music and technology

10. Response to intervention is . a. a process of identifying learning gaps and moderate to intense remediation early

on, to help children catch up b. a special education program that serves children with mild learning problems in

a regular education setting c. an attempt to provide mentorship to educators whose performance has been

assessed as weak d. a description of the manner in which very young children respond to intervention

provided by parents at home

Answers: 1 (b), 2 (a), 3 (d), 4 (a), 5 (d), 6 (c), 7 (d), 8 (d), 9 (b), 10 (a)

Critical-Thinking and Discussion Questions 1. It has been said that intelligence is defined as whatever knowledge and skills the test

developer tests. Given your understanding of the evolution of IQ tests, how would you respond to this idea?

2. In what way does knowing the history of assessment in education inform your understanding of assessment practices used today?

3. Is there a benefit to a wayward pathway to learning? 4. Do you think it is accurate to say that parents are the “experts” for their child? If not,

why? If so, is this always the case? 5. If you were tasked with improving education, would standardized assessment be the

primary way you would choose to improve the quality of schools? If not, what three alternatives would you try first?

Additional Resources John Hattie’s factors that affect quality education describes the most effective practices in

education based on his research analysis of more than 800 research analyses. http://visible-learning.org/hattie-ranking-influences-effect-sizes-learning-achievement

This interactive module on response to intervention provides a rich introduction to RTI, along with videos, case studies, and practice models. http://iris.peabody.vanderbilt.edu/module/rti01-overview

The NAEYC on assessment practices is a position paper on assessment practices by the fore- most early childhood professional organization. http://www.naeyc.org/positionstatements/cape

Film: The history of language testing explains how assessment practices for language evolved. http://englishagenda.britishcouncil.org/seminars/ language-testing-looking-back-and-looking-forward

http://visible-learning.org/hattie-ranking-influences-effect-sizes-learning-achievement

http://iris.peabody.vanderbilt.edu/module/rti01-overview

http://www.naeyc.org/positionstatements/cape

http://englishagenda.britishcouncil.org/seminars/language-testing-looking-back-and-looking-forward

Summary and Resources

Teacher Channel: The formative assessment video series explains what formative assess- ment is and how professionals can use formative assessment in decision making. https://www.teachingchannel.org/blog/2015/01/30/ formative-assessment-practices-sbac

Answers and Rejoinders to Chapter Pretest 1. False. In education, assessment is a process of making judgments about behavior

to form a conclusion and may take many different forms. Testing is one small part of educational assessment, which ranges from informal (observational, anecdotal, work samples) to formal (screening, diagnostic, ability testing).

2. True. Assessment has evolved from personal and unstructured methods to complex methods that range from very unstructured to highly structured, but always with the same purpose of determining if and how much children are learning.

3. True. By carefully assessing children’s developmental and academic growth, early childhood professionals can use outcome information to determine if changes need to be made in methods and services to children.

4. False. Bias is unavoidable in testing. However, the degree of bias can be reduced by selecting quality assessment tools and adapting assessment procedures as needed to yield the most fair outcomes.

5. True. The No Child Left Behind Act requires schools to conduct annual assessments of children using standardized tests. If schools consistently perform poorly, the fed- eral government can close schools and shift money elsewhere.

Rejoinders to Posttest 1. Farish was the first known educator to develop a scale by which to evaluate stu-

dents. His purpose was to reduce the widespread bias that existed at Cambridge University in evaluating students on oral exams. His was a 10-point scale.

2. Since the publication of A Nation at Risk in 1983, there has been a national frenzy associated with the demise of U.S. student standing compared to other nations. A fear that low test scores will translate to a decline in economic and military superi- ority has been a major factor in driving school reform. Increasing school account- ability through performance on standardized assessments is the lynchpin to school reform in the 21st century.

3. Formative assessments are designed to measure progress on an ongoing basis and are used to make decisions about instruction. Summative assessments, on the other hand, are conducted at the end of instruction to determine if learning occurred as a result of instruction.

4. There are many reasons to conduct assessment prior to instruction. One is to avoid teaching something that children already know. Others include discovering the point at which children will learn without becoming frustrated and knowing how far a child has come since instruction began.

5. If formative assessment shows that children are making a lot of errors (more than 30%), they are likely on a wayward path that will delay the time it takes for them to master learning objectives.

6. Using multiple measurements provides a larger piece of the picture, offers confi- dence that behavior patterns rather than a single data point are more accurate, and tracks growth in a child that reveals more than simply the amount learned.

https://www.teachingchannel.org/blog/2015/01/30/formative-assessment-practices-sbac

Summary and Resources

7. If a child does not understand all the items on a test or assessment, it will unfairly evaluate what the child knows. To fairly assess children and determine their true capabilities, assessors should adapt the test as necessary. In some cases this would involve an interpreter, but not the brother.

8. The Family Education Rights and Privacy Act, or Buckley Amendment, was passed to ensure that families could request and review the contents of their children’s educa- tion files. At the same time, only those professionals with a legitimate need to see the files are permitted to have access.

9. The Common Core Standards were written to embrace 21st-century skills. These skills include communication and teamwork, technology and innovation, critical thinking and problem solving, creativity and tenacity, and content knowledge. High- stakes tests now align with the CCS and attempt to measure student acquisition of these 21st-century skills.

10. RTI is a process of identifying learning gaps and moderate to intense remediation early on, to help children catch up. By undergoing regular universal assessment, children are identified as soon as they begin to fall behind.

Key Terms 21st-century learning skills A set of skills that will enable graduates to thrive in global technical environments of the 21st century, such as critical thinking, teamwork, creativ- ity, and technology.

accredited Programs that undergo evalua- tion by an outside professional organization to determine whether the program meets minimal quality standards. An accredited program is considered to be of good quality.

assessment The act of collecting educa- tional data on children in order to make a judgment about development and learning.

authentic assessment An assessment that measures children’s growth in real contexts, using actual materials and instructions.

best practices Practices that current research supports as being most valid in a particular field.

child study team Professionals and family members who work together to make deci- sions regarding a child’s education.

curriculum-referenced assess- ments Tests that are used to determine whether children are making progress in

learning skills taught within a particular curriculum.

data Facts or information used to make calculations or for analysis.

data literacy The ability to use data to make day-to-day data-driven teaching decisions.

eugenics A movement that was based on a belief that intellectual inferiority led to crime and poverty; it advocated systematic sterilization to eliminate both.

evaluation A process that makes a judg- ment about children’s behaviors or the outcome of children’s behaviors.

Family Education Rights and Privacy Act (FERPA) A federal law protecting educa- tional records from access by unauthorized entities and giving parents the right to view and contest educational records.

fidelity A circumstance in which tests are given in the same way each time following testing guidelines.

formative assessments A continuous gath- ering of evidence as children’s knowledge

Summary and Resources

and skills are formed in order to make adjustments in intervention as needed.

goal setting A process in which profession- als establish expectations for measurable learning within a given period.

intelligence quotient A score on an indi- vidual ability test providing an estimate of a child’s ability to benefit from education and solve problems.

mastery A situation in which a skill or abil- ity has been learned to a very high degree, often determined by reaching a learning objective.

measurement A process of counting or using numbers to explain the amount or size of something.

norms Score ranges on a standardized test that represent “normal” or average behavior for a particular age or grade level.

objectivity The ability to see things as they are.

qualitative data Descriptive information.

quantitative data Numerical information.

reliability The degree to which a test measures behavior with the same outcomes from one assessment to the next.

response to intervention (RTI) A term used to describe a process of identifying children who lag in social development, mathematics, and reading, then providing a tiered level of intervention to help these chil- dren reach age- or grade-level benchmarks.

scale A range of scores from low to high.

screening tools Brief assessments used to determine whether children are at risk of having delays or disabilities.

standardized assessment A formal testing procedure in which administration and scor- ing procedures are the same for all children; used to compare children tested to estab- lished norms.

summative assessments Tests conducted at the end of learning (unit, time period, year) in order to determine how much has been learned as a result of instruction.

test A formal assessment tool with explicit response requirements, scoring, and interpretation.

tracking A system of grouping children according to ability based on test scores.

valid A situation in which items on the test accurately measure the intended target behaviors.