PSYC 545 DB
See attachments
3 years ago 10
DiscussionAssignmentInstructions.docx
FurtherTopicsinAssessment.docx
DiscussionAssignmentInstructions.docx
Discussion Assignment Instructions
The student will post one thread of at least 400 words. The student must then post 2 replies of at least 200 words. For each thread, students must support their assertions with at least one (1) scholarly and at least one (1) scriptural citation in current APA format. Acceptable sources include: The Bible, the textbook, and other peer-reviewed articles and/or books.
Before writing your Discussion post, please read the document "Further Topics in Assessment"
You will pick one of several topics in the "Further Topics in Assessment" document. Find and read 3-5 articles about that topic from the list in the "Further Topics in Assessment" document. Write your post describing the topic to your classmates in "layman's terms." These articles will need to be found within the LU Library database (https://www.liberty.edu/library). If you can’t find a source, or it is unavailable through the LU Library, please choose a different article from the document. If you would like more sources or another topic, please email me. Try to write your post so that they will be able to understand it given the content they have learned in this course.
Please use the following questions to guide your post (it may not be possible to answer all questions for your topic):
1. In a few sentences, what does the technique or assessment-related topic entail?
2. How is this approach more beneficial than a multiple-choice assessment?
3. How has this approach furthered the field of assessment?
4. How does this approach relate to validity of the assessment scores?
FurtherTopicsinAssessment.docx
PSYC 545
Further Topics in Assessment
Table of Contents Item Development/Formats 2 Validity 4 Standard Setting 8 Computer-Based Testing 10 Testing Special Populations 12 Performance Assessment 15 Cognitive/Principled (Evidence-Centered) Assessment Design 16 Portfolio Assessment 17
Item Development/Formats
Baron, J. B. (1991). Strategies for the development of effective performance exercises. Applied Measurement in Education, 4, 305-318.
Bennet, R., & Ward, W. (1993). Construction versus choice in cognitive measurement. Hillsdale, NJ: Lawrence Erlbaum Associates.
Crehan, K. D., Haladyna, T. M., & Brewer, B. W. (1993). Use of an inclusive option and the optimal number of options for multiple-choice items. Educational and Psychological Measurement, 53, 241-247.
Cronbach, L. J. (1946). Response sets in objective tests. Educational and psychological measurement, 6, 475-494.
Downing, S. M., & Haladyna, T. M. (1997). Test item development: Validity evidence from quality assurance procedures. Applied Measurement in Education, 10, 61-82.
Downing, S. M., & Haladyna, T. M. (Eds.) (2006). Handbook of Testing (pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.
Haladyna, T. M. (1992). The effectiveness of several multiple-choice item formats. Applied Measurement in Education, 5, 73-88.
Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Hillsdale: Lawrence Erlbaum.
Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2, 37-50.
Haladyna, T. M., & Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309-334.
Haladyna, T. M., & Shindoll, R. R. (1989). Item shells: A method for writing effective multiple-choice items. Evaluation & The Health Professions, 12, 97-106.
Kreiter, C. D., & Frisbie, D. A. (1989). Effectiveness of multiple true-false items. Applied Measurement in Education, 2, 207-216.
Lukhele, R. Thissen, D., & Wainer, H. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31, 234-250.
Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34, 207-218.
Martinez, R. J., Moreno, R., Martin, I, & Trigo, M. E. (2009). Evaluation of five guidelines for option development in multiple-choice item writing. Psicothema, 21, 326-330.
Mentzer, T. L. (1982). Response biases in multiple-choice test item files. Educational and Psychological Measurement, 42, 437-448.
Osterlind, S. J. (1989). Constructing test items. Hingham, MA: Kluwer.
Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13.
Sireci, S.G, Wiley, A., & Keller, L.A. (2002). An empirical evaluation of selected multiple-choice item writing guidelines. CLEAR Exam Review, 13(2), 20-26.
Thissen, D., Wainer, H., & Wang, X-B. (1994). Are tests comprising both multiple-choice and free-response items necessarily less unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement, 31, 113-123.
Validity
Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40, 955-959.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A pre- liminary theory of action for summative and formative assessment. Measurement, 8(2–3), 70–91.
Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with states' content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21-29.
Brennan, R. L. (2001). Some problems, pitfalls, and paradoxes in educational measurement. Educational Measurement: Issues and Practice, 20(4), 6-18.
Burger, S. E., & Burger, D. L. (1994). Determining the validity of performance-based assessment. Educational Measurement: Issues and Practice, 13(1), 9-15.
Crocker, L. M., Miller, D., & Franks E. A. (1989). Quantitative methods for assessing the fit between test and curriculum. Applied Measurement in Education, 2, 179-194.
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3-17). Hillsdale, New Jersey: Lawrence Erlbaum.
D’Agostino, J. V., et al. (2008). The rating and matching item-objective alignment methods. Applied Measurement in Education, 21, 1-21.
Hambleton, R. K., (1984). Validating the test score In R.A. Berk (Ed.), A guide to criterion-referenced test construction. Baltimore: Johns Hopkins University Press, pp. 199-230.
Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education. Washington, DC: American Psychological Association. Available for download at http://www.apa.org/science/fairtestcode.html.
Kane, M.T. (1992). An argument-based approach to validity. Psychological Bulletin, 112,527-535.
Kane, M. (2006). Validation. In R. L. Brennan (Ed). Educational measurement (4th edition, pp. 17-64). Washington, DC: American Council on Education/Praeger.
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73.
Kuehn, P. A., Stallings, W. M., Holland, C. L. (1990). Court-defined job analysis requirements for validation of teacher certification tests. Educational Measurement: Issues and Practice, 9 (4), 21-24.
Martone, A., & Sireci, S. G. (2009). Evaluating alignment among curriculum, assessments, and instruction. Review of Educational Research 4, 1332-1361.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement, (3rd ed.) (pp. 13-103). Washington, D.C.: American Council on Education.
Nelson, D. S. (1994). Job analysis for licensure and certification exams: Science or politics? Educational Measurement: Issues and Practice, 13(3), 29-35.
O’Neil, T., Sireci, S. G., & Huff, K. F. (2004). Evaluating the consistency of test content across two successive administrations of a state-mandated science assessment. Educational Assessment, 9, 129-151.
Porter, A. C., Polikoff, M. S., Zeidner, T., & Smithson, J. (2008). The quality of content analyses on state student achievement tests and content standards. Educational Measurement: Issues and Practice, 27(4), 2-14.
Sireci, S.G. (1998). Gathering and analyzing content validity data. Educational Assessment,5, 299-321.
Sireci, S. G. (1998). The construct of content validity. Social Indicators Research.
Sireci, S. G. (2007). On test validity theory and test validation. Educational Researcher, 36(8), 477-481.
Sireci, S. G. (2009). Packing and upacking sources of validity evidence: History repeats itself again. In R. Lissitz (Ed.), The Concept of Validity: Revisions, New Directions and Applications (pp. 19-37). Charlotte, NC: Information Age Publishing Inc.
Sireci, S. G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104.
Sireci, S.G., & Geisinger, K.F. (1998). Equity issues in employment testing. In J.H. Sandoval, C. Frisby, K.F. Geisinger, J. Scheuneman, & J. Ramos-Grenier (Eds.). Test interpretation and diversity (pp. 105-140). American Psychological Association: Washington, D.C.
Sireci, S.G., & Green, P.C. (2000). Legal and psychometric criteria for evaluating teacher certification tests. Educational Measurement: Issues and Practice, 19(1), 22-31, 34.
Smith, I. L., & Hambleton, R. K. (1990). Content validity studies of licensing examinations. Educational Measurement: Issues and Practice, 9(4), 7-10.
Wainer, H., & Braun, H. (1988). Test validity. Lawrenceville, NJ: Erlbaum.
Wainer, H., & Sireci, S. G. (2005). Item and test bias. Encyclopedia of social measurement volume 2, 365-371. San Diego: Elsevier.
Ying, L., & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29-37.
Standard Setting
Cizek, G. J. (1996). Setting passing scores. [An NCME instructional module]. Educational Measurement: Issues and Practice, 15 (2), 20-31.
Cizek, G. J. (2001). Standard setting: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.
Davis-Becker, S. L., Buckendahl, C. W., & Gerrow, J. (2011). Evaluating the bookmark standard setting method: The impact of random item ordering. International Journal of Testing, 11, 24-37.
Hambleton, R. K., Brennan, R. L., Brown, W., Dodd, B., Forsythe, R. A., Mehrens, W. A., Nellhaus, J., Reckase, M., Rindone, D., van der Linden, W. J., & Zwick, R. (2000). A response to “setting reasonable and useful performance standards” in the National Academy of Sciences “Grading the Nation’s Report Card.” Educational Measurement: Issues and Practice, 19(2), 5-14.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.
Karantonis, A., & Sireci, S. G. (2006). The bookmark standard setting method: A literature review. Educational Measurement: Issues and Practice, 25 (1), 4-12.
Linn, R. L. (2003, September 1). Performance standards: Utility for different uses of assessments. Educational Policy and Analysis Archives, 11(31). Retrieved September 1, 2003 from http://epaa.asu.edu/epaa/v11n31.
Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.
Meara, K. P., Hambleton, R. K., & Sireci, S. G. (2001). Setting and validating standards on professional licensure and certification exams: A survey of current practices. CLEAR Exam Review, 12 (2), 17-23.
Perie, M. (2008). A guide to understanding and developing performance-level descriptors. Educational Measurement: Issues and Practice, 27(4), 15-29.
Reckase, M. D. (2006a). A conceptual framework for a psychometric theory for standard setting with examples of its use for evaluating the functioning of two standard setting methods. Educational Measurement: Issues and Practice, 25 (2), 4-18.
Reckase, M. D. (2006b). Rejoinder: Evaluating standard setting methods using error models proposed by Schulz. Educational Measurement: Issues and Practice, 25 (3), 14-17.
Sireci, S. G., Hambleton, R. K., & Pitoniak, M. J. (2004). Setting passing scores on licensure exams using direct consensus. CLEAR Exam Review 15(1), 21-25.
Sireci, S. G., Randall, J., & Zenisky, A. (2012). Setting valid performance standards on educational tests. CLEAR Exam Review, 23(2), 18-27.
Sireci, S.G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12, 301-325.
Zieky, M. J., Perie, M., & Livingston, S. A. (2008). Cutscores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.
Computer-Based Testing
Almond, R. G., Steinberg, L., S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5). Available at http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml.
Drasgow, F., & Olson-Buchanan, J. B. (Eds.) (1999). Innovations in Computerized Assessment. Mahwah, NJ: Lawrence Erlbaum.
Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice, 20 (3), 16-25.
Luecht, R. M. (2005). Some useful cost-benefit criteria for evaluating computer-based test delivery models and systems. Journal of Applied Testing Technology, available at http://www.testpublishers.org/Documents/JATT2005_rev_Criteria4CBT_RMLuecht_Apr2005.pdf
Luecht, R. L., & Sireci (2011). A review of models for computer-based testing. Research report 2011-2012. New York: The College Board.
Randall, J., Sireci, S. G., Li, X., & Kaira, L. (2013). Evaluating the comparability of paper- and computer-based science tests across sex and SES subgroups. Educational Measurement: Issues and Practice, 31(4), 2-12.
Sands, W. A., Waters, B. K. & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association.
Sireci, S. G. (2004). Computerized-adaptive testing: An introduction. In J. Wall and G. Walz (Eds). Measuring up: Assessment issues for teachers, counselors, and administrators (pp. 685-6947). Greensboro, NC: CAPS Press.
Sireci, S. G., & Zenisky, A. L. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In S.M. Downing and T.M. Haladyna (Eds.), Handbook of Testing (pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12, 15-20.
Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd edition). Hillsdale, NJ: Lawrence Erlbaum.
Wainer, H., & Kiley, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.
Zenisky, A.L., & Sireci, S.G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15, 337-362.
Testing Special Populations
Abedi, J. & Ewers, N. (2013). Accommodations for English learners and students with disabilities: A research based decision algorithm. Smarter Balanced Assessment Consortium.
Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke, J., Torres, C., Haertel, G., Dolan, R., Beddow, P., & Lazarus, S. (2010). Technology-enabled and universally designed assessment: Considering access in measuring the achievement of students with disabilities—A foundation for research. Dover, NH: Measured Progress and Menlo Park, CA: SRI International.
Council of Chief State School Officers (1992). Recommendations for improving the assessment and monitoring of students with limited English proficiency. Washington, DC: Author.
Fisher, R. J. (1994). The Americans With Disabilities Act: Implications for measurement. Educational Measurement: Issues and Practice, 13(3), 17-26, 37.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6, 304-312.
Geisinger, K. F. (1994). Psychometric issues in testing students with disabilities. Applied Measurement in Education, 7, 121-140.
Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests: A progress report. European Journal of Psychological Assessment, 10, 229-244.
Phillips, S. E. (1994). High-stakes testing accommodations: Validity versus disabled rights. Applied Measurement in Education, 7, 93-120.
Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In P.W. Holland & H.Wainer (Eds.), Differential item functioning (pp. 367-388). Hillsdale, NJ: Erlbaum.
Sireci, S. G. (2005). Unlabeling the disabled: A perspective on flagging scores from accommodated test administrations. Educational Researcher, 34(1), 3-12.
Sireci, S. G. (2009). No more excuses: New research on assessing students with disabilities. Journal of Applied Testing Technology, 10 (2). Available at http://www.testpublishers.org/Documents/Special%20Issue%20article%201%20.pdf.
Sireci, S. G. (2011). Evaluating test and survey items for bias across languages and cultures. In D. Matsumoto and F. van de Vijver (Eds.) Cross-cultural research methods in psychology (pp. 216-240). Oxford, UK: Oxford University Press.
Sireci, S. G., DeLeon, B., & Washington, E. (2002, Spring). Improving teachers of minority students’ attitudes towards and knowledge of standardized tests. Academic Exchange Quarterly, 162-167.
Sireci, S. G., Han, K. T., & Wells, C. S. (2008). Methods for evaluating the validity of test scores for English language learners. Educational Assessment, 13, 108-131.
Sireci, S. G., & Mullane, L. A. (1994). Evaluating test fairness in licensure testing: the sensitivity review process. CLEAR Exam Review, 5 (2) 22-28.
Sireci, S. G., & Parker, P. (2006). Validity on trial: Psychometric and legal conceptualizations of validity. Educational Measurement: Issues and Practice, 25(3), 27-34.
Sireci, S. G., & Pitoniak, M. J. (2007). Assessment accommodations: What have we learned from research? Large scale assessment and accommodations: What works? In C. C. Laitusis & L. Cook (Eds.) (pp. 53-65).
Sireci, S. G., Scarpati, S., & Li, S. (2005). Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research, 75, 457-490.
Thompson, S, & Thurlow, M. (2002, June). Universally designed assessments: Better tests for everyone! Policy Directions, Number 14. Minneapolis, MND: National Center on Educational Outcomes.
Performance Assessment
Clauser, B. E., Subhiyah, R. G, Nungester, R. J., Ripkey, D. R., Clyman, S. G., McKinley, D. (1995). Scoring a performance-based assessment by modeling the judgments of experts. Journal of Educational Measurement, 32, 397-415.
Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education, 4, 289-303.
Lane, S. (1993). The conceptual framework for the development of a mathematics performance assessment instrument . Educational Measurement: Issues and Practice, 12(2), 16-23.
Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5-8, 15.
Pearson, P. D. & Garavaglia, D. R. (1997). Improving the information value of performance items in large scale assessments. Paper commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes for Research.
Quellmalz, E. S. (1991). Developing criteria for performance assessments: The missing link. Applied Measurement in Education, 4, 319-331.
Shavelson, R. J., Baxter, G., & Pine, J. (1991). Performance assessment in science. Applied Measurement in Education, 4, 347-362.
Cognitive/Principled (Evidence-Centered) Assessment Design
Gorin, J. S. (2006). Test design with cognition in mind. Educational Measurement: Issues and Practice, 25(4), 21-35.
Gorin, J. S. (2007). Test construction and diagnostic testing. In Leighton, J. & Gierl, M. (Eds.) Cognitive diagnostic assessment for education (pp. 173-201). Cambridge: Cambridge University Press.
Huff, K. & Goodman, Dean P. (2007). The demand for cognitive diagnostic assessment. In Leighton, J. & Gierl, M. (Eds.) Cognitive diagnostic assessment for education (pp. 19-60). Cambridge: Cambridge University Press.
Leighton, J.P. & Gierl, M.J. (Eds.). (2007). Cognitive diagnostic assessment for education: Theories and applications. Cambridge, MA: Cambridge University Press.
Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In The Concept of Validity: Revisions, New Directions and Applications (R. Lissitz, Ed.) (pp. xx-xx). Charlotte, NC: Information Age Publishing Inc.
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). On the role of task model variables in assessment design. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 97-128). Mahwah, NJ: Lawrence Erlbaum.
Williamson, D. M., Mislevy, R. J., & Almond, R. G. (2004). Evidence-centered design for certification and licensure. CLEAR Exam Review, 15(2), 14-18.
Portfolio Assessment
Hardin, J., & Wright, V. H. (2017). E-portfolio assessment: A mixed methods study of an instructional leadership program’s assessment system. Research in the Schools, 24(1), 63-79.
Kim, Y., & Yazdian, L. S. (2014). Portfolio Assessment and Quality Teaching. Theory Into Practice, 53(3), 220-227. doi:10.1080/00405841.2014.916965
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5-16.
Reckase, M. D. (1995). Portfolio assessment: A theoretical estimate of score reliability. Educational Measurement: Issues and Practice, 14(1), 12-14, 31.
Younger, D. (2015). Meaningful rigor in portfolio assessment. The Journal of Continuing Higher Education, 63(2), 126-129. doi: 10.1080/07377363.2015.1042956
Page 2 of 17
DiscussionAssignmentInstructions.docx
Discussion Assignment Instructions
The student will post one thread of at least 400 words. The student must then post 2 replies of at least 200 words. For each thread, students must support their assertions with at least one (1) scholarly and at least one (1) scriptural citation in current APA format. Acceptable sources include: The Bible, the textbook, and other peer-reviewed articles and/or books.
Before writing your Discussion post, please read the document "Further Topics in Assessment"
You will pick one of several topics in the "Further Topics in Assessment" document. Find and read 3-5 articles about that topic from the list in the "Further Topics in Assessment" document. Write your post describing the topic to your classmates in "layman's terms." These articles will need to be found within the LU Library database (https://www.liberty.edu/library). If you can’t find a source, or it is unavailable through the LU Library, please choose a different article from the document. If you would like more sources or another topic, please email me. Try to write your post so that they will be able to understand it given the content they have learned in this course.
Please use the following questions to guide your post (it may not be possible to answer all questions for your topic):
1. In a few sentences, what does the technique or assessment-related topic entail?
2. How is this approach more beneficial than a multiple-choice assessment?
3. How has this approach furthered the field of assessment?
4. How does this approach relate to validity of the assessment scores?
FurtherTopicsinAssessment.docx
PSYC 545
Further Topics in Assessment
Table of Contents Item Development/Formats 2 Validity 4 Standard Setting 8 Computer-Based Testing 10 Testing Special Populations 12 Performance Assessment 15 Cognitive/Principled (Evidence-Centered) Assessment Design 16 Portfolio Assessment 17
Item Development/Formats
Baron, J. B. (1991). Strategies for the development of effective performance exercises. Applied Measurement in Education, 4, 305-318.
Bennet, R., & Ward, W. (1993). Construction versus choice in cognitive measurement. Hillsdale, NJ: Lawrence Erlbaum Associates.
Crehan, K. D., Haladyna, T. M., & Brewer, B. W. (1993). Use of an inclusive option and the optimal number of options for multiple-choice items. Educational and Psychological Measurement, 53, 241-247.
Cronbach, L. J. (1946). Response sets in objective tests. Educational and psychological measurement, 6, 475-494.
Downing, S. M., & Haladyna, T. M. (1997). Test item development: Validity evidence from quality assurance procedures. Applied Measurement in Education, 10, 61-82.
Downing, S. M., & Haladyna, T. M. (Eds.) (2006). Handbook of Testing (pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.
Haladyna, T. M. (1992). The effectiveness of several multiple-choice item formats. Applied Measurement in Education, 5, 73-88.
Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Hillsdale: Lawrence Erlbaum.
Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2, 37-50.
Haladyna, T. M., & Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309-334.
Haladyna, T. M., & Shindoll, R. R. (1989). Item shells: A method for writing effective multiple-choice items. Evaluation & The Health Professions, 12, 97-106.
Kreiter, C. D., & Frisbie, D. A. (1989). Effectiveness of multiple true-false items. Applied Measurement in Education, 2, 207-216.
Lukhele, R. Thissen, D., & Wainer, H. (1994). On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement, 31, 234-250.
Martinez, M. E. (1999). Cognition and the question of test item format. Educational Psychologist, 34, 207-218.
Martinez, R. J., Moreno, R., Martin, I, & Trigo, M. E. (2009). Evaluation of five guidelines for option development in multiple-choice item writing. Psicothema, 21, 326-330.
Mentzer, T. L. (1982). Response biases in multiple-choice test item files. Educational and Psychological Measurement, 42, 437-448.
Osterlind, S. J. (1989). Constructing test items. Hingham, MA: Kluwer.
Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13.
Sireci, S.G, Wiley, A., & Keller, L.A. (2002). An empirical evaluation of selected multiple-choice item writing guidelines. CLEAR Exam Review, 13(2), 20-26.
Thissen, D., Wainer, H., & Wang, X-B. (1994). Are tests comprising both multiple-choice and free-response items necessarily less unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement, 31, 113-123.
Validity
Aiken, L. R. (1980). Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement, 40, 955-959.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A pre- liminary theory of action for summative and formative assessment. Measurement, 8(2–3), 70–91.
Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with states' content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21-29.
Brennan, R. L. (2001). Some problems, pitfalls, and paradoxes in educational measurement. Educational Measurement: Issues and Practice, 20(4), 6-18.
Burger, S. E., & Burger, D. L. (1994). Determining the validity of performance-based assessment. Educational Measurement: Issues and Practice, 13(1), 9-15.
Crocker, L. M., Miller, D., & Franks E. A. (1989). Quantitative methods for assessing the fit between test and curriculum. Applied Measurement in Education, 2, 179-194.
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3-17). Hillsdale, New Jersey: Lawrence Erlbaum.
D’Agostino, J. V., et al. (2008). The rating and matching item-objective alignment methods. Applied Measurement in Education, 21, 1-21.
Hambleton, R. K., (1984). Validating the test score In R.A. Berk (Ed.), A guide to criterion-referenced test construction. Baltimore: Johns Hopkins University Press, pp. 199-230.
Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education. Washington, DC: American Psychological Association. Available for download at http://www.apa.org/science/fairtestcode.html.
Kane, M.T. (1992). An argument-based approach to validity. Psychological Bulletin, 112,527-535.
Kane, M. (2006). Validation. In R. L. Brennan (Ed). Educational measurement (4th edition, pp. 17-64). Washington, DC: American Council on Education/Praeger.
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1-73.
Kuehn, P. A., Stallings, W. M., Holland, C. L. (1990). Court-defined job analysis requirements for validation of teacher certification tests. Educational Measurement: Issues and Practice, 9 (4), 21-24.
Martone, A., & Sireci, S. G. (2009). Evaluating alignment among curriculum, assessments, and instruction. Review of Educational Research 4, 1332-1361.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement, (3rd ed.) (pp. 13-103). Washington, D.C.: American Council on Education.
Nelson, D. S. (1994). Job analysis for licensure and certification exams: Science or politics? Educational Measurement: Issues and Practice, 13(3), 29-35.
O’Neil, T., Sireci, S. G., & Huff, K. F. (2004). Evaluating the consistency of test content across two successive administrations of a state-mandated science assessment. Educational Assessment, 9, 129-151.
Porter, A. C., Polikoff, M. S., Zeidner, T., & Smithson, J. (2008). The quality of content analyses on state student achievement tests and content standards. Educational Measurement: Issues and Practice, 27(4), 2-14.
Sireci, S.G. (1998). Gathering and analyzing content validity data. Educational Assessment,5, 299-321.
Sireci, S. G. (1998). The construct of content validity. Social Indicators Research.
Sireci, S. G. (2007). On test validity theory and test validation. Educational Researcher, 36(8), 477-481.
Sireci, S. G. (2009). Packing and upacking sources of validity evidence: History repeats itself again. In R. Lissitz (Ed.), The Concept of Validity: Revisions, New Directions and Applications (pp. 19-37). Charlotte, NC: Information Age Publishing Inc.
Sireci, S. G. (2013). Agreeing on validity arguments. Journal of Educational Measurement, 50, 99-104.
Sireci, S.G., & Geisinger, K.F. (1998). Equity issues in employment testing. In J.H. Sandoval, C. Frisby, K.F. Geisinger, J. Scheuneman, & J. Ramos-Grenier (Eds.). Test interpretation and diversity (pp. 105-140). American Psychological Association: Washington, D.C.
Sireci, S.G., & Green, P.C. (2000). Legal and psychometric criteria for evaluating teacher certification tests. Educational Measurement: Issues and Practice, 19(1), 22-31, 34.
Smith, I. L., & Hambleton, R. K. (1990). Content validity studies of licensing examinations. Educational Measurement: Issues and Practice, 9(4), 7-10.
Wainer, H., & Braun, H. (1988). Test validity. Lawrenceville, NJ: Erlbaum.
Wainer, H., & Sireci, S. G. (2005). Item and test bias. Encyclopedia of social measurement volume 2, 365-371. San Diego: Elsevier.
Ying, L., & Sireci, S. G. (2007). Validity issues in test speededness. Educational Measurement: Issues and Practice, 26(4), 29-37.
Standard Setting
Cizek, G. J. (1996). Setting passing scores. [An NCME instructional module]. Educational Measurement: Issues and Practice, 15 (2), 20-31.
Cizek, G. J. (2001). Standard setting: Concepts, methods, and perspectives. Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.
Davis-Becker, S. L., Buckendahl, C. W., & Gerrow, J. (2011). Evaluating the bookmark standard setting method: The impact of random item ordering. International Journal of Testing, 11, 24-37.
Hambleton, R. K., Brennan, R. L., Brown, W., Dodd, B., Forsythe, R. A., Mehrens, W. A., Nellhaus, J., Reckase, M., Rindone, D., van der Linden, W. J., & Zwick, R. (2000). A response to “setting reasonable and useful performance standards” in the National Academy of Sciences “Grading the Nation’s Report Card.” Educational Measurement: Issues and Practice, 19(2), 5-14.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed.). Westport, CT: American Council on Education/Praeger.
Karantonis, A., & Sireci, S. G. (2006). The bookmark standard setting method: A literature review. Educational Measurement: Issues and Practice, 25 (1), 4-12.
Linn, R. L. (2003, September 1). Performance standards: Utility for different uses of assessments. Educational Policy and Analysis Archives, 11(31). Retrieved September 1, 2003 from http://epaa.asu.edu/epaa/v11n31.
Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.
Meara, K. P., Hambleton, R. K., & Sireci, S. G. (2001). Setting and validating standards on professional licensure and certification exams: A survey of current practices. CLEAR Exam Review, 12 (2), 17-23.
Perie, M. (2008). A guide to understanding and developing performance-level descriptors. Educational Measurement: Issues and Practice, 27(4), 15-29.
Reckase, M. D. (2006a). A conceptual framework for a psychometric theory for standard setting with examples of its use for evaluating the functioning of two standard setting methods. Educational Measurement: Issues and Practice, 25 (2), 4-18.
Reckase, M. D. (2006b). Rejoinder: Evaluating standard setting methods using error models proposed by Schulz. Educational Measurement: Issues and Practice, 25 (3), 14-17.
Sireci, S. G., Hambleton, R. K., & Pitoniak, M. J. (2004). Setting passing scores on licensure exams using direct consensus. CLEAR Exam Review 15(1), 21-25.
Sireci, S. G., Randall, J., & Zenisky, A. (2012). Setting valid performance standards on educational tests. CLEAR Exam Review, 23(2), 18-27.
Sireci, S.G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate standard setting. Applied Measurement in Education, 12, 301-325.
Zieky, M. J., Perie, M., & Livingston, S. A. (2008). Cutscores: A manual for setting standards of performance on educational and occupational tests. Princeton, NJ: Educational Testing Service.
Computer-Based Testing
Almond, R. G., Steinberg, L., S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment, 1(5). Available at http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml.
Drasgow, F., & Olson-Buchanan, J. B. (Eds.) (1999). Innovations in Computerized Assessment. Mahwah, NJ: Lawrence Erlbaum.
Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice, 20 (3), 16-25.
Luecht, R. M. (2005). Some useful cost-benefit criteria for evaluating computer-based test delivery models and systems. Journal of Applied Testing Technology, available at http://www.testpublishers.org/Documents/JATT2005_rev_Criteria4CBT_RMLuecht_Apr2005.pdf
Luecht, R. L., & Sireci (2011). A review of models for computer-based testing. Research report 2011-2012. New York: The College Board.
Randall, J., Sireci, S. G., Li, X., & Kaira, L. (2013). Evaluating the comparability of paper- and computer-based science tests across sex and SES subgroups. Educational Measurement: Issues and Practice, 31(4), 2-12.
Sands, W. A., Waters, B. K. & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association.
Sireci, S. G. (2004). Computerized-adaptive testing: An introduction. In J. Wall and G. Walz (Eds). Measuring up: Assessment issues for teachers, counselors, and administrators (pp. 685-6947). Greensboro, NC: CAPS Press.
Sireci, S. G., & Zenisky, A. L. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In S.M. Downing and T.M. Haladyna (Eds.), Handbook of Testing (pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.
Wainer, H. (1993). Some practical considerations when converting a linearly administered test to an adaptive format. Educational Measurement: Issues and Practice, 12, 15-20.
Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd edition). Hillsdale, NJ: Lawrence Erlbaum.
Wainer, H., & Kiley, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.
Zenisky, A.L., & Sireci, S.G. (2002). Technological innovations in large-scale assessment. Applied Measurement in Education, 15, 337-362.
Testing Special Populations
Abedi, J. & Ewers, N. (2013). Accommodations for English learners and students with disabilities: A research based decision algorithm. Smarter Balanced Assessment Consortium.
Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke, J., Torres, C., Haertel, G., Dolan, R., Beddow, P., & Lazarus, S. (2010). Technology-enabled and universally designed assessment: Considering access in measuring the achievement of students with disabilities—A foundation for research. Dover, NH: Measured Progress and Menlo Park, CA: SRI International.
Council of Chief State School Officers (1992). Recommendations for improving the assessment and monitoring of students with limited English proficiency. Washington, DC: Author.
Fisher, R. J. (1994). The Americans With Disabilities Act: Implications for measurement. Educational Measurement: Issues and Practice, 13(3), 17-26, 37.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6, 304-312.
Geisinger, K. F. (1994). Psychometric issues in testing students with disabilities. Applied Measurement in Education, 7, 121-140.
Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests: A progress report. European Journal of Psychological Assessment, 10, 229-244.
Phillips, S. E. (1994). High-stakes testing accommodations: Validity versus disabled rights. Applied Measurement in Education, 7, 93-120.
Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In P.W. Holland & H.Wainer (Eds.), Differential item functioning (pp. 367-388). Hillsdale, NJ: Erlbaum.
Sireci, S. G. (2005). Unlabeling the disabled: A perspective on flagging scores from accommodated test administrations. Educational Researcher, 34(1), 3-12.
Sireci, S. G. (2009). No more excuses: New research on assessing students with disabilities. Journal of Applied Testing Technology, 10 (2). Available at http://www.testpublishers.org/Documents/Special%20Issue%20article%201%20.pdf.
Sireci, S. G. (2011). Evaluating test and survey items for bias across languages and cultures. In D. Matsumoto and F. van de Vijver (Eds.) Cross-cultural research methods in psychology (pp. 216-240). Oxford, UK: Oxford University Press.
Sireci, S. G., DeLeon, B., & Washington, E. (2002, Spring). Improving teachers of minority students’ attitudes towards and knowledge of standardized tests. Academic Exchange Quarterly, 162-167.
Sireci, S. G., Han, K. T., & Wells, C. S. (2008). Methods for evaluating the validity of test scores for English language learners. Educational Assessment, 13, 108-131.
Sireci, S. G., & Mullane, L. A. (1994). Evaluating test fairness in licensure testing: the sensitivity review process. CLEAR Exam Review, 5 (2) 22-28.
Sireci, S. G., & Parker, P. (2006). Validity on trial: Psychometric and legal conceptualizations of validity. Educational Measurement: Issues and Practice, 25(3), 27-34.
Sireci, S. G., & Pitoniak, M. J. (2007). Assessment accommodations: What have we learned from research? Large scale assessment and accommodations: What works? In C. C. Laitusis & L. Cook (Eds.) (pp. 53-65).
Sireci, S. G., Scarpati, S., & Li, S. (2005). Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research, 75, 457-490.
Thompson, S, & Thurlow, M. (2002, June). Universally designed assessments: Better tests for everyone! Policy Directions, Number 14. Minneapolis, MND: National Center on Educational Outcomes.
Performance Assessment
Clauser, B. E., Subhiyah, R. G, Nungester, R. J., Ripkey, D. R., Clyman, S. G., McKinley, D. (1995). Scoring a performance-based assessment by modeling the judgments of experts. Journal of Educational Measurement, 32, 397-415.
Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991). Quality control in the development and use of performance assessments. Applied Measurement in Education, 4, 289-303.
Lane, S. (1993). The conceptual framework for the development of a mathematics performance assessment instrument . Educational Measurement: Issues and Practice, 12(2), 16-23.
Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5-8, 15.
Pearson, P. D. & Garavaglia, D. R. (1997). Improving the information value of performance items in large scale assessments. Paper commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes for Research.
Quellmalz, E. S. (1991). Developing criteria for performance assessments: The missing link. Applied Measurement in Education, 4, 319-331.
Shavelson, R. J., Baxter, G., & Pine, J. (1991). Performance assessment in science. Applied Measurement in Education, 4, 347-362.
Cognitive/Principled (Evidence-Centered) Assessment Design
Gorin, J. S. (2006). Test design with cognition in mind. Educational Measurement: Issues and Practice, 25(4), 21-35.
Gorin, J. S. (2007). Test construction and diagnostic testing. In Leighton, J. & Gierl, M. (Eds.) Cognitive diagnostic assessment for education (pp. 173-201). Cambridge: Cambridge University Press.
Huff, K. & Goodman, Dean P. (2007). The demand for cognitive diagnostic assessment. In Leighton, J. & Gierl, M. (Eds.) Cognitive diagnostic assessment for education (pp. 19-60). Cambridge: Cambridge University Press.
Leighton, J.P. & Gierl, M.J. (Eds.). (2007). Cognitive diagnostic assessment for education: Theories and applications. Cambridge, MA: Cambridge University Press.
Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In The Concept of Validity: Revisions, New Directions and Applications (R. Lissitz, Ed.) (pp. xx-xx). Charlotte, NC: Information Age Publishing Inc.
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). On the role of task model variables in assessment design. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (pp. 97-128). Mahwah, NJ: Lawrence Erlbaum.
Williamson, D. M., Mislevy, R. J., & Almond, R. G. (2004). Evidence-centered design for certification and licensure. CLEAR Exam Review, 15(2), 14-18.
Portfolio Assessment
Hardin, J., & Wright, V. H. (2017). E-portfolio assessment: A mixed methods study of an instructional leadership program’s assessment system. Research in the Schools, 24(1), 63-79.
Kim, Y., & Yazdian, L. S. (2014). Portfolio Assessment and Quality Teaching. Theory Into Practice, 53(3), 220-227. doi:10.1080/00405841.2014.916965
Koretz, D., Stecher, B., Klein, S., & McCaffrey, D. (1994). The Vermont portfolio assessment program: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5-16.
Reckase, M. D. (1995). Portfolio assessment: A theoretical estimate of score reliability. Educational Measurement: Issues and Practice, 14(1), 12-14, 31.
Younger, D. (2015). Meaningful rigor in portfolio assessment. The Journal of Continuing Higher Education, 63(2), 126-129. doi: 10.1080/07377363.2015.1042956
Page 2 of 17