discussion

profileDiamond768
FurtherTopicsinAssessment1.docx

PSYC 545

Further Topics in Assessment

Table of Contents Item Development/Formats 2 Validity 4 Standard Setting 8 Computer-Based Testing 10 Testing Special Populations 12 Performance Assessment 15 Cognitive/Principled (Evidence-Centered) Assessment Design 16 Portfolio Assessment 17

Item Development/Formats

Baron, J. B. (1991). Strategies for the development of effective performance exercises. Applied Measurement in Education, 4, 305-318.

Bennet, R., & Ward, W. (1993). Construction versus choice in cognitive measurement. Hillsdale, NJ: Lawrence Erlbaum Associates.

Crehan, K. D., Haladyna, T. M., & Brewer, B. W. (1993). Use of an inclusive option and the optimal number of options for multiple-choice items. Educational and Psychological Measurement, 53, 241-247.

Cronbach, L. J. (1946). Response sets in objective tests. Educational and psychological measurement, 6, 475-494.

Downing, S. M., & Haladyna, T. M. (1997). Test item development: Validity evidence from quality assurance procedures. Applied Measurement in Education, 10, 61-82.

Downing, S. M., & Haladyna, T. M. (Eds.) (2006). Handbook of Testing (pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.

Haladyna, T. M. (1992). The effectiveness of several multiple-choice item formats. Applied Measurement in Education, 5, 73-88.

Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Hillsdale: Lawrence Erlbaum.

Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2, 37-50.

Haladyna, T. M., & Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309-334.

Haladyna, T. M., & Shindoll, R. R. (1989). Item shells: A method for writing effective multiple-choice items. Evaluation & The Health Professions, 12, 97-106.

Kreiter, C. D., & Frisbie, D. A. (1989). Effectiveness of multiple true-false items. Applied Measurement in Education, 2, 207-216.

Lukhele, R. Thissen, D., & Wainer, H. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31, 234-250.

Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34, 207-218.

Martinez, R. J., Moreno, R., Martin, I, & Trigo, M. E. (2009). Evaluation of five guidelines for option development in multiple-choice item writing. Psicothema, 21, 326-330.

Mentzer, T. L. (1982). Response biases in multiple-choice test item files. Educational and Psychological Measurement, 42, 437-448.

Osterlind, S. J. (1989). Constructing test items. Hingham, MA: Kluwer.

Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13.

Sireci, S.G, Wiley, A., & Keller, L.A. (2002). An empirical evaluation of selected multiple-choice item writing guidelines. CLEAR Exam Review, 13(2), 20-26.

Thissen, D., Wainer, H., & Wang, X-B. (1994). Are tests comprising both multiple-choice and free-response items necessarily less unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement, 31, 113-123.

Validity

Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40, 955-959.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A pre- liminary theory of action for summative and formative assessment. Measurement, 8(2–3), 70–91.

Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with states' content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21-29.

Brennan, R. L. (2001). Some problems, pitfalls, and paradoxes in educational measurement. Educational Measurement: Issues and Practice, 20(4), 6-18.

Burger, S. E., & Burger, D. L. (1994). Determining the validity of performance-based assessment. Educational Measurement: Issues and Practice, 13(1), 9-15.

Crocker, L. M., Miller, D., & Franks E. A. (1989). Quantitative methods for assessing the fit between test and curriculum. Applied Measurement in Education, 2, 179-194.

Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3-17). Hillsdale, New Jersey: Lawrence Erlbaum.

D’Agostino, J. V., et al. (2008). The rating and matching item-objective alignment methods. Applied Measurement in Education, 21, 1-21.

Hambleton, R. K., (1984). Validating the test score In R.A. Berk (Ed.), A guide to criterion-referenced test construction. Baltimore: Johns Hopkins University Press, pp. 199-230.

Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education. Washington, DC: American Psychological Association. Available for download at http://www.apa.org/science/fairtestcode.html.

Kane, M.T. (1992). An argument-based approach to validity. Psychological Bulletin, 112,527-535.

Kane, M. (2006). Validation. In R. L. Brennan (Ed). Educational measurement (4th edition, pp. 17-64). Washington, DC: American Council on Education/Praeger.

Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73.

Kuehn, P. A., Stallings, W. M., Holland, C. L. (1990). Court-defined job analysis requirements for validation of teacher certification tests. Educational Measurement: Issues and Practice, 9 (4), 21-24.

Martone, A., & Sireci, S. G. (2009). Evaluating alignment among curriculum, assessments, and instruction. Review of Educational Research 4, 1332-1361.

Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement, (3rd ed.) (pp. 13-103). Washington, D.C.: American Council on Education.

Nelson, D. S. (1994). Job analysis for licensure and certification exams: Science or politics? Educational Measurement: Issues and Practice, 13(3), 29-35.

O’Neil, T., Sireci, S. G., & Huff, K. F. (2004). Evaluating the consistency of test content across two successive administrations of a state-mandated science assessment. Educational Assessment, 9, 129-151.

Porter, A. C., Polikoff, M. S., Zeidner, T., & Smithson, J. (2008). The quality of content analyses on state student achievement tests and content standards. Educational Measurement: Issues and Practice, 27(4), 2-14.

Sireci, S.G. (1998). Gathering and analyzing content validity data. Educational Assessment,5, 299-321.

Sireci, S. G. (1998). The construct of content validity. Social Indicators Research.

Sireci, S. G. (2007). On test validity theory and test validation. Educational Researcher, 36(8), 477-481.

Sireci, S. G. (2009). Packing and upacking sources of validity evidence: History repeats itself again. In R. Lissitz (Ed.), The Concept of Validity: Revisions, New Directions and Applications (pp. 19-37). Charlotte, NC: Information Age Publishing Inc.

Sireci, S. G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104.

Sireci, S.G., & Geisinger, K.F. (1998). Equity issues in employment testing. In J.H. Sandoval, C. Frisby, K.F. Geisinger, J. Scheuneman, & J. Ramos-Grenier (Eds.). Test interpretation and diversity (pp. 105-140). American Psychological Association: Washington, D.C.

Sireci, S.G., & Green, P.C. (2000). Legal and psychometric criteria for evaluating teacher certification tests. Educational Measurement: Issues and Practice, 19(1), 22-31, 34.

Smith, I. L., & Hambleton, R. K. (1990). Content validity studies of licensing examinations. Educational Measurement: Issues and Practice, 9(4), 7-10.

Wainer, H., & Braun, H. (1988). Test validity. Lawrenceville, NJ: Erlbaum.

Wainer, H., & Sireci, S. G. (2005). Item and test bias. Encyclopedia of social measurement volume 2, 365-371. San Diego: Elsevier.

Ying, L., & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29-37.

Standard Setting

Cizek, G. J. (1996). Setting passing scores. [An NCME instructional module]. Educational Measurement: Issues and Practice, 15 (2), 20-31.

Cizek, G. J. (2001). Standard setting: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.

Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.

Davis-Becker, S. L., Buckendahl, C. W., & Gerrow, J. (2011). Evaluating the bookmark standard setting method: The impact of random item ordering. International Journal of Testing, 11, 24-37.

Hambleton, R. K., Brennan, R. L., Brown, W., Dodd, B., Forsythe, R. A., Mehrens, W. A., Nellhaus, J., Reckase, M., Rindone, D., van der Linden, W. J., & Zwick, R. (2000). A response to “setting reasonable and useful performance standards” in the National Academy of Sciences “Grading the Nation’s Report Card.” Educational Measurement: Issues and Practice, 19(2), 5-14.

Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.

Karantonis, A., & Sireci, S. G. (2006). The bookmark standard setting method: A literature review. Educational Measurement: Issues and Practice, 25 (1), 4-12.

Linn, R. L. (2003, September 1). Performance standards: Utility for different uses of assessments. Educational Policy and Analysis Archives, 11(31). Retrieved September 1, 2003 from http://epaa.asu.edu/epaa/v11n31.

Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.

Meara, K. P., Hambleton, R. K., & Sireci, S. G. (2001). Setting and validating standards on professional licensure and certification exams: A survey of current practices. CLEAR Exam Review, 12 (2), 17-23.

Perie, M. (2008). A guide to understanding and developing performance-level descriptors. Educational Measurement: Issues and Practice, 27(4), 15-29.

Reckase, M. D. (2006a). A conceptual framework for a psychometric theory for standard setting with examples of its use for evaluating the functioning of two standard setting methods. Educational Measurement: Issues and Practice, 25 (2), 4-18.

Reckase, M. D. (2006b). Rejoinder: Evaluating standard setting methods using error models proposed by Schulz. Educational Measurement: Issues and Practice, 25 (3), 14-17.

Sireci, S. G., Hambleton, R. K., & Pitoniak, M. J. (2004). Setting passing scores on licensure exams using direct consensus. CLEAR Exam Review 15(1), 21-25.

Sireci, S. G., Randall, J., & Zenisky, A. (2012). Setting valid performance standards on educational tests. CLEAR Exam Review, 23(2), 18-27.

Sireci, S.G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12, 301-325.

Zieky, M. J., Perie, M., & Livingston, S. A. (2008). Cutscores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.

Computer-Based Testing

Almond, R. G., Steinberg, L., S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5). Available at http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml.

Drasgow, F., & Olson-Buchanan, J. B. (Eds.) (1999). Innovations in Computerized Assessment. Mahwah, NJ: Lawrence Erlbaum.

Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice, 20 (3), 16-25.

Luecht, R. M. (2005). Some useful cost-benefit criteria for evaluating computer-based test delivery models and systems. Journal of Applied Testing Technology, available at http://www.testpublishers.org/Documents/JATT2005_rev_Criteria4CBT_RMLuecht_Apr2005.pdf

Luecht, R. L., & Sireci (2011). A review of models for computer-based testing. Research report 2011-2012. New York: The College Board.

Randall, J., Sireci, S. G., Li, X., & Kaira, L. (2013). Evaluating the comparability of paper- and computer-based science tests across sex and SES subgroups. Educational Measurement: Issues and Practice, 31(4), 2-12.

Sands, W. A., Waters, B. K. & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association.

Sireci, S. G. (2004). Computerized-adaptive testing: An introduction. In J. Wall and G. Walz (Eds). Measuring up: Assessment issues for teachers, counselors, and administrators (pp. 685-6947). Greensboro, NC: CAPS Press.

Sireci, S. G., & Zenisky, A. L. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In S.M. Downing and T.M. Haladyna (Eds.), Handbook of Testing (pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.

Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12, 15-20.

Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd edition). Hillsdale, NJ: Lawrence Erlbaum.

Wainer, H., & Kiley, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.

Zenisky, A.L., & Sireci, S.G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15, 337-362.

Testing Special Populations

Abedi, J. & Ewers, N. (2013). Accommodations for English learners and students with disabilities: A research based decision algorithm. Smarter Balanced Assessment Consortium.

Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke, J., Torres, C., Haertel, G., Dolan, R., Beddow, P., & Lazarus, S. (2010). Technology-enabled and universally designed assessment: Considering access in measuring the achievement of students with disabilities—A foundation for research. Dover, NH: Measured Progress and Menlo Park, CA: SRI International.

Council of Chief State School Officers (1992). Recommendations for improving the assessment and monitoring of students with limited English proficiency. Washington, DC: Author.

Fisher, R. J. (1994). The Americans With Disabilities Act: Implications for measurement. Educational Measurement: Issues and Practice, 13(3), 17-26, 37.

Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6, 304-312.

Geisinger, K. F. (1994). Psychometric issues in testing students with disabilities. Applied Measurement in Education, 7, 121-140.

Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests: A progress report. European Journal of Psychological Assessment, 10, 229-244.

Phillips, S. E. (1994). High-stakes testing accommodations: Validity versus disabled rights. Applied Measurement in Education, 7, 93-120.

Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In P.W. Holland & H.Wainer (Eds.), Differential item functioning (pp. 367-388). Hillsdale, NJ: Erlbaum.

Sireci, S. G. (2005). Unlabeling the disabled: A perspective on flagging scores from accommodated test administrations. Educational Researcher, 34(1), 3-12.

Sireci, S. G. (2009). No more excuses: New research on assessing students with disabilities. Journal of Applied Testing Technology, 10 (2). Available at http://www.testpublishers.org/Documents/Special%20Issue%20article%201%20.pdf.

Sireci, S. G. (2011). Evaluating test and survey items for bias across languages and cultures. In D. Matsumoto and F. van de Vijver (Eds.) Cross-cultural research methods in psychology (pp. 216-240). Oxford, UK: Oxford University Press.

Sireci, S. G., DeLeon, B., & Washington, E. (2002, Spring). Improving teachers of minority students’ attitudes towards and knowledge of standardized tests. Academic Exchange Quarterly, 162-167.

Sireci, S. G., Han, K. T., & Wells, C. S. (2008). Methods for evaluating the validity of test scores for English language learners. Educational Assessment, 13, 108-131.

Sireci, S. G., & Mullane, L. A. (1994). Evaluating test fairness in licensure testing: the sensitivity review process. CLEAR Exam Review, 5 (2) 22-28.

Sireci, S. G., & Parker, P. (2006). Validity on trial: Psychometric and legal conceptualizations of validity. Educational Measurement: Issues and Practice, 25(3), 27-34.

Sireci, S. G., & Pitoniak, M. J. (2007). Assessment accommodations: What have we learned from research? Large scale assessment and accommodations: What works? In C. C. Laitusis & L. Cook (Eds.) (pp. 53-65).

Sireci, S. G., Scarpati, S., & Li, S. (2005). Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research, 75, 457-490.

Thompson, S, & Thurlow, M. (2002, June). Universally designed assessments: Better tests for everyone! Policy Directions, Number 14. Minneapolis, MND: National Center on Educational Outcomes.

Performance Assessment

Clauser, B. E., Subhiyah, R. G, Nungester, R. J., Ripkey, D. R., Clyman, S. G., McKinley, D. (1995). Scoring a performance-based assessment by modeling the judgments of experts. Journal of Educational Measurement, 32, 397-415.

Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education, 4, 289-303.

Lane, S. (1993). The conceptual framework for the development of a mathematics performance assessment instrument . Educational Measurement: Issues and Practice, 12(2), 16-23.

Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5-8, 15.

Pearson, P. D. & Garavaglia, D. R. (1997). Improving the information value of performance items in large scale assessments. Paper commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes for Research.

Quellmalz, E. S. (1991). Developing criteria for performance assessments: The missing link. Applied Measurement in Education, 4, 319-331.

Shavelson, R. J., Baxter, G., & Pine, J. (1991). Performance assessment in science. Applied Measurement in Education, 4, 347-362.

Cognitive/Principled (Evidence-Centered) Assessment Design

Gorin, J. S. (2006). Test design with cognition in mind. Educational Measurement: Issues and Practice, 25(4), 21-35.

Gorin, J. S. (2007). Test construction and diagnostic testing. In Leighton, J. & Gierl, M. (Eds.) Cognitive diagnostic assessment for education (pp. 173-201). Cambridge: Cambridge University Press.

Huff, K. & Goodman, Dean P. (2007). The demand for cognitive diagnostic assessment. In Leighton, J. & Gierl, M. (Eds.) Cognitive diagnostic assessment for education (pp. 19-60). Cambridge: Cambridge University Press.

Leighton, J.P. & Gierl, M.J. (Eds.). (2007). Cognitive diagnostic assessment for education: Theories and applications. Cambridge, MA: Cambridge University Press.

Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In The Concept of Validity: Revisions, New Directions and Applications (R. Lissitz, Ed.) (pp. xx-xx). Charlotte, NC: Information Age Publishing Inc.

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). On the role of task model variables in assessment design. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 97-128). Mahwah, NJ: Lawrence Erlbaum.

Williamson, D. M., Mislevy, R. J., & Almond, R. G. (2004). Evidence-centered design for certification and licensure. CLEAR Exam Review, 15(2), 14-18.

Portfolio Assessment

Hardin, J., & Wright, V. H. (2017). E-portfolio assessment: A mixed methods study of an instructional leadership program’s assessment system. Research in the Schools, 24(1), 63-79.

Kim, Y., & Yazdian, L. S. (2014). Portfolio Assessment and Quality Teaching. Theory Into Practice, 53(3), 220-227. doi:10.1080/00405841.2014.916965

Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5-16.

Reckase, M. D. (1995). Portfolio assessment: A theoretical estimate of score reliability. Educational Measurement: Issues and Practice, 14(1), 12-14, 31.

Younger, D. (2015). Meaningful rigor in portfolio assessment. The Journal of Continuing Higher Education, 63(2), 126-129. doi: 10.1080/07377363.2015.1042956

Page 2 of 17