Chapter 4 Homework/ Chapter 4 Test

Chapter4.pdf

Home >Applied Sciences homework help >Chapter 4 Homework/ Chapter 4 Test

Susan J. Beck, Ph.D., MLS(ASCP)CM and Vicky A. LeGrys, D.A., MT(ASCP) Division of Clinical Laboratory Science, The University of North Carolina at Chapel Hill Copyright 2014: The American Society for Clinical Laboratory Science

Test Development and Analysis The material presented in this chapter should enable the learner to:

1. Describe the sequence of steps involved in planning an instructional activity. 2. List the purposes of testing. 3. Describe placement, formative, and summative testing. 4. Discuss the uses of various testing platforms. 5. Compare norm referenced and criterion referenced exam with regard to function, analysis, and

evaluation criteria. 6. State the limitations of testing and suggest ways of improvement. 7. Describe the development of a grading rubric to decrease subjectivity. 8. Outline the steps involved in designing a test. 9. Discuss the advantages and disadvantages of the various types of test items. 10. Given an example of a multiple choice test item, analyze the item and suggest ways to improve

it using the stated criteria. 11. Given an instructional unit, develop and analyze a test using the guidelines presented in this

unit. 12. Discuss the statistics used to describe the central tendency and dispersion of test scores. 13. Evaluate the performance of a test item based on the item’s difficulty index and the

discrimination index. 14. Define reliability and validity. 15. Discuss the ways that an instructor can improve the reliability of test scores. 16. Given a set of objectives and an examination for a course, evaluate the content validity of the

examination. 17. Given an instructional unit, develop and analyze a test using the guidelines stated in this unit.

THE INSTRUCTIONAL SEQUENCE

Goals: The first step in planning an instructional activity is to identify goals. Goals are statements that describe the general knowledge skills or attitudes that the learner will possess after the instructional activity.

Objectives: The next step in instructional planning is writing objectives. Objectives are statements that describe the specific learning outcomes of an instructional activity and they are written in greater detail than goals. Objectives specify the observable knowledge, skills or attitudes that the learner is expected to exhibit after completing an instructional activity. Objectives serve to communicate the instructor's intent for a particular learning experience.

Learning Activities: Instructional activities or learning activities are chosen to help the learner master the objectives. They include such things as lectures, laboratory exercises, demonstrations, and online instruction.

Evaluation: An evaluation process should be planned to tell the instructor whether or not the learners were able to master the objectives and whether or not the instructor's approach was effective in assisting learners.

PURPOSES OF TESTING Testing is an integral part of any clinical laboratory educational experience. It is used as an evaluation tool in CLS/CLT programs, in continuing education programs and in competency testing for laboratory personnel. From the instructor’s perspective, testing is useful for:

 Making objective decisions concerning the competency of a learner or practitioner in the clinical laboratory. In competency assessment for laboratory personnel, testing is one mechanism to document that the individual possesses the necessary knowledge, skills and attitudes required to perform laboratory procedures.

 Evaluating the effectiveness of instruction and curriculum. Identifying subject areas in which the majority of the learners performed poorly provides instructors with the opportunity to alter the presentation of the material to make it more effective.

From the learner’s perspective, testing is a useful tool for:

 Assessing strengths and weaknesses and improving problem areas as the course progresses.

 Motivating the learner to study the material and challenging the competitive learner.

TYPES OF TESTS Placement tests are used to assess the prerequisite skills needed for a course of study or the knowledge, skills and attitudes prior to instruction. Formative tests are used to assess knowledge during a course. These test grades are often averaged into the final grade and cover a limited portion of material.

Summative tests are used to assess the mastery of material and learners’ readiness to progress in a program or into the profession. Examples of summative testing include

 comprehensive final examinations for a course.

 comprehensive examination evaluating all areas of the curriculum given at the end of the program.

 certification examinations to assess the competency of individuals to work in the clinical laboratory.

 competency testing for laboratory personnel to assess the practitioner’s mastery of laboratory procedures.

TESTING PLATFORMS Tests can be given in a variety of platforms such as

1. Written

2. Oral: Used in the laboratory to question individual learners in order to develop their problem solving abilities

3. Online: Used in standardized achievement tests, certification examinations, distance education,

and in course management systems. The advantages of online testing compared to written examinations is that the testing is arranged individually, the individual answers fewer questions, testing takes less time, it is easier to enter the answers using a computer versus on an answer sheet and the results of the test are often available sooner.1

4. Practical examinations: Developed by combining the performance of a laboratory procedure

with the response to written questions. Practical examinations are able to assess both psychomotor and cognitive skills. This type of test format is very useful in evaluating learner performance and is more accurate than using written tests alone to assess the learner’s ability to function effectively in the clinical laboratory.

NORM-REFERENCED AND CRITERION-REFERENCED TESTS Characteristics of norm-referenced tests:

 They represent the traditional achievement tests in which learners are evaluated against one another on test performance.

 They are often graded on a curve establishing a set number of As, Bs, and so on. An 80% score on a norm-referenced test may be an A if everyone in the class performed at a level below 80% or an 80% may be a C if the mean for the test was 85%.

 The group of test takers determines the standards of acceptability. These tests may be useful in selection of learners for advancement but are limited in their ability to assess competency.

Characteristics of criterion-referenced tests:

 They assess mastery by achieving predetermined minimal competencies. Minimal competency is defined as "the knowledge, skills and attitudes necessary to perform an activity according to prevailing standards of adequacy."2

 The criteria for mastery are developed prior to test administration.

 It is possible for all learners to do well on the criterion-referenced test items if they have achieved minimal competency.

 They are used in a pass/fail situation such as certification examinations Both criterion-referenced and norm-referenced tests are used in clinical laboratory educational programs. Many programs establish minimum scores in both the didactic/lecture and laboratory courses which must be achieved by the student in order to remain in the program. In a profession such as clinical laboratory science, all individuals must be able to master certain skills such as typing blood, identifying abnormal cells, and recognizing panic values. These skills must be achieved by all individuals, regardless of peer performance. The thought of an individual who did not achieve minimal competency but passed "on a curve" and graduated from a program is contrary to ethical standards of practice. After establishing minimum competency standards, represented by course or curriculum objectives, many clinical laboratory education programs use norm referencing to assign learners a grade relative to the performance of others.

Norm-Referenced Tests Criterion-Referenced Tests

Function Ranks the learner’s performance relative to the group’s performance

Compares the learner’s performance with predetermined standards

Analysis Occurs after the test administration

Occurs before and after test administration

Pass/Fail criteria

Determined by the test group’s performance

Pre-determined, independent of the test group’s performance

PITFALLS OF TESTS Problem areas include:

1. Limited assessment of cognitive skills 2. Assessing irrelevant material

3. Assessing material not covered in the objectives 4. Subjective grading 5. Poorly designed tests

1. Limited assessment of cognitive skills

The most critical shortcoming of tests for clinical laboratory science learners is that they contain too many items at the recall level. While it is important that all learners know certain basic facts, it is essential that clinical laboratory science learners be able to analyze, interpret, and evaluate data. Often the higher level cognitive skills are included in the course objectives, yet when the test is constructed, only the lower skills are tested. Learners soon discover that the majority of the questions are at the recall level and, therefore, study by memorizing the facts. In an effort to make the test more difficult, many instructors include trivial or esoteric facts in their test items, ignoring fundamental principles. Learners who perform well on these tests may be good at memorization but may not be competent to practice.

Suggestions for improvement: To ensure that higher cognitive level items are included in the test, the instructor should present novel situations requiring application, analysis, interpretation, and evaluation. A key point in designing higher level test items is to present data the requiring the learner to interpret unfamiliar examples. Test items can ask the learner to: 2,3

 Perform calculations

 Identify unfamiliar examples of peripheral blood smears

 Interpret unfamiliar examples of QC

 Recognize and correct discrepant results

 Evaluate the reportability of test results

 Judge appropriate troubleshooting action

 Interpret test results in a case study

 Resolve a quality control problem

 Determine appropriate further testing or follow up action

 Predict results and outcomes 2. Assessing irrelevant material

Another problem can occur with tests when the content of the tests is not relevant to current practice in the clinical laboratory. In this case, the test cannot assess competency for entry level practitioners.

Suggestions for improvement: Instructors should continually evaluate and update their objectives and tests to ensure that the objectives and the tests reflect current practice.4,5

3. Assessing material not covered in the objectives

A common and often valid complaint from learners concerning testing is that they studied the wrong material or did not know what to expect on the test.

Suggestions for improvement: This situation can be rectified by designing the test material based on the course objectives. The instructor should avoid stating the objectives in one manner and testing in another. For example:

Objective: Discuss common interferences in creatinine assays. Which of the following test questions best matches this objective? 1. Serum creatinine is used to assess the function of the: A. liver B. kidney C. bone D. CNS

2. Which of the following situations will result in an increase in serum creatinine values? A. hepatitis B. diabetic ketoacidosis C. myocardial infarction D. metastic bone disease Question 2 matches the objective 4. Subjective grading

Another frequent learner complaint is that the grading on a test or assignment was too subjective or vague. If a learner does not understand what constitutes acceptable and unacceptable performance, it is difficult for the instructor and the learner to assess performance.

Suggestions for improvement: Criteria for acceptable performance should be established in the objectives at the beginning of the course and should be given to the learner. For assignments with evaluations that are subjective e.g. research papers, oral presentations, discussion forums, a grading rubric can be useful. A grading rubric consists of a grid, linking scaled scores to descriptive criteria. See an example of a rubric below. The use of a rubric is beneficial to both learners and instructors as a mechanism for clarifying expectations and decreasing subjectivity in evaluation. To develop a rubric, the instructor should begin by developing a list of qualities that compromise proficiency in the desired learning

outcomes. Generally a numerical score is associated with the level of quality for a quantitative evaluation and can be weighed differently for the various criteria. Once a draft rubric is developed, the instructor should ask colleagues to review it and provide feedback. The instructor may find that after use, revisions are needed for future use. While developing a grading rubric can initially be time consuming for the instructor to develop, eventually it saves time by standardizing evaluation and decreasing grading ambiguity. 6-8

Example of a Rubric for Evaluating an Oral Presentation Student: Jane Doe

Topic: Evaluation of a Novel PCR Mastermix

Overall Score: 84

Criteria Evaluation and Comment

Organization: 30 points-each criteria worth 6 points

 Logical progression of

ideas 6

 Focused 6

 Easy to follow 6

 Contained introduction

 Contained valid

conclusion 6

28 points. The presentation was

well organized, logical, focused

and easy to follow. The

conclusion was supported by the

information presented. The

introduction could be improved

by providing more detail

concerning the justification for

the project.

Delivery: 30 points-each criteria worth 6 points

The presenter:

 Spoke at an appropriate

pace 4

 Spoke at an appropriate

volume 6

 Established eye contact

 Spoke enthusiastically 6

 Spoke clearly 4

24 points. Jane was an

enthusiastic speaker with

appropriate volume. However,

she seemed very nervous which

caused her to rush her

presentation and, at times,

mumble her words. She should

practice looking at the audience

more than looking at her notes

in future presentations

Content: 40 points-each criteria worth 8 points

 Well researched 6

 Comprehensive 5

 Accurate 8

 Current and appropriate

references 5

 Presenter used the material on the slides for illustration and organization but orally expanded on the

printed text 8

32 points. The presentation was

well prepared and an accurate

description of the project.

However, it failed to address all

of the components of a

validation study in that there

was no discussion of precision.

The references should have

included the current CAP

requirement for method

validation. Jane did elaborate

on the materials on the slides

and did more than simply read

them to the audience.

5. Poorly designed tests

Due to time constraints, test construction is often postponed until the last minute. In a panic, the instructor may retrieve the test from last year, change the date, and use it again or may quickly create a new test without taking the time needed to revise or edit the items. When an old test is reused, some items may be irrelevant and the test may be incomplete. A hastily prepared new test may be given only to discover that some items have no one best answer. In these situations it is difficult to arrive at any useful assessment of learner performance.

Suggestions for improvement: The ideal method of test construction is to create several test items after each lecture. It is easier to create plausible distracters (incorrect answers) when learners’ perceptions are fresh in the instructor's mind. The following section on designing test material will describe a systematic approach to creating an effective test.

DESIGNING TEST MATERIAL Tests should always be based on objectives and should cover all taxonomic levels. It has been suggested that for an examination for laboratory professionals, 50% of the items should be at the recall level, 30% application and 20% analysis.2 1. Choose the type of questions for the examination.

Decide whether the test will include multiple choice, matching, short answer, or essay questions. The choice will depend on the purpose of the examination. Each type of question has its characteristic strengths and weaknesses which are described later in this unit. No one testing form is perfect in assessing competency; therefore, other forms of evaluation should be included in the overall assessment of a learner’s knowledge, skills and attitudes.

2. Determine the number of items on the examination and the time limitations for the examination.

The test should be as long as possible, given the time constraints, to ensure reliability.

 Multiple choice questions take approximately 1 minute to answer,

 Completion or short answer items take 1 to 3 minutes to answer

 Essay questions vary widely in their response time. 3. Allocate the number of test items devoted to each topic.

In order for the test to be valid, the test should reflect the amount of time spent on each subject category and should include a representative sampling of the material covered in the course. In choosing items for inclusion, the instructor should decide on the importance of the material to the learners' future needs.

4. Create the test items.

Clearly written course objectives are used to guide the instructor in item construction. 5. Group the test items according to the type of question.

Group all the multiple choice questions together and all the matching or short answer items together.

6. Organize the test items according to the topic and begin the test with the easiest items in order to build the learners' confidence. 7. Develop the test directions.

Directions should state where to record the answers, the time limit, the maximum points assigned to the items, and the number of test items. The directions should also state whether or not there is only one answer per item or if some items may have multiple answers. Directions should state what, if any resources the student may use while taking the exam such as the textbook, an atlas or calculator. The Honor Code for the institution should be stated and the student should sign, acknowledging their acceptance.

8. Review the test for ambiguity and typographical errors.

If possible, have a colleague review the test for clarity. ITEM CONSTRUCTION Test items can be divided into two main groups: Objective items that include multiple choice, true/false, and matching Subjective items that include short answer and essay questions Each type of item has advantages, disadvantages, and guidelines for construction. 8-10

Item Type Advantages Disadvantages

Multiple Choice

Easy to grade; Samples a large amount of material

Difficult to construct; May test at a lower cognitive level

True/False Easy to grade; Samples a large amount of material

Difficult to construct; Encourages guessing; Tests at the lower cognitive level

Matching Easy to construct; Useful for testing specific facts

Encourages memorization; Tests at the lower cognitive level

Essay/Short Answer

Easy to construct; Useful for testing higher cognitive levels; Stimulates creativity, communication and organization skills; Minimizes guessing

Time-consuming to grade; Lack of objectivity in grading; Limited sampling ability

Multiple Choice Items Multiple choice items consist of a stem which poses the question or problem, and several responses which include the correct answer and distracters:

Which of the following represents a 1:10 dilution? (Stem)

A. 0.5 mL of serum plus 9.5 mL of diluent (Distracter) B. 0.1 mL of serum plus 0.9 mL of diluent (Correct answer) C. 1 mL of serum plus 10 mL of diluent (Distracter) D. 1.5 mL of serum plus 9.5 mL of diluent (Distracter)

Multiple choice items:

 are the most popular type of test items

 are found in certification examinations.

 are useful to test the recall of facts

 can be designed to measure higher cognitive skills such as analysis and evaluation 2,3 Advantages of multiple choice items for the instructor:

 Easier to evaluate and analyze than subjective test items because scoring is highly objective and lends itself to automated grading systems.

 Can be quickly answered by the learners, so the tests can contain numerous items. This permits a greater sampling of the course material.

Advantages of multiple choice items for the learner:

 Individual items are not heavily weighted and the possibility of guessing the correct answer exists.

 The right answer is present, requiring recognition of the correct response instead of creating the appropriate response, as would be the case with a short answer or essay test item.

Disadvantages of multiple choice items for the instructor:

 They are more time-consuming to develop than other test formats.

 Can measure only recall and promote memorization. Instructors may think that they are testing analysis, only to discover upon closer inspection that the situation in the item was discussed in class. When this occurs, recall is measured, not analysis.

Disadvantages of multiple choice items for the learner

 The instructors, in an attempt to increase the item difficulty, may base the item on a trivial fact, instead of focusing on current, relevant knowledge required for entry level competency.

Suggestions for Improvement: To improve the quality of multiple choice items the following guidelines should be followed: 8-10

1. The directions for the multiple choice items should state that the learner is to select the one best answer. 2. The item should measure significant facts, not trivia. Poor: Sanger sequencing is also called: A. Pyrosequencing B. PCR C. Q PCR * D. Direct sequence analysis Better: In Sanger sequencing, incorporation of which of the following reagents stops

elongation? * A. ddNTP B. dNTP C. Taq Pol D. quencher probe Note: *The asterisk indicates the correct answer 3. The stem should present the problem clearly and specifically either in the form of a question or an incomplete statement. Poor: Hemoglobin AlC * A. Is measured by immunoassay B. Reflects diabetic control over several hours C. Is a new thalassemia variant D. Has decreased affinity for oxygen Better: Hemoglobin AlC can be measured in the clinical laboratory by A. Titration * B. Immunoassay C. ELISA D. Nephelometry 4. The stem should be clearly worded and concise. Avoid any unnecessary information in the stem unless it is important for the learner to be able to recognize irrelevant data.

Poor: A Group O Rh positive patient was admitted to the hospital for elective surgery. He had

been blood donor for 5 years and had never experienced any adverse reactions to blood donation. Four units of blood were crossmatched for this patient; however, only two units were used during the surgery. Two days after his surgery his white cell count was elevated and he developed a fever. A culture of the surgical wound was obtained and he was given antibiotic therapy. On the fifth day after his surgery his white count and his temperature were normal but it was noted that his hemoglobin level was falling and his serum bilirubin was slightly elevated. There were no signs of active bleeding. Most probably, this patient

A. Received ABO incompatible blood B. Has post-transfusion hepatitis C. Had an allergic reaction to the transfused blood * D. Had a delayed hemolytic transfusion reaction Better: A patient received two units of red blood cells with no complications. Five days later his

hemoglobin had not risen above the pre-transfusion level. The patient had no signs of active bleeding and it was noted on his chart that his serum bilirubin was slightly elevated. Most probably, this patient

A. Received ABO incompatible blood B. Has post-transfusion hepatitis C. Had an allergic reaction to the transfused blood * D. Had a delayed hemolytic transfusion reaction 5. The stem should be stated in a positive tone since the purpose of learning is to know the correct information, not the incorrect information. The instructor cannot assume that if learners know what does not apply that they will also know what does apply. If, however, it is necessary to state the stem in a negative form, underline the negative word and avoid double negatives between the stem and the alternatives. Poor: Which of the following descriptions does not apply to DNA structure? A. It does not contain uracil B. It is a double-stranded helix C. It is stabilized by hydrogen bonding * D. It is composed of ribose molecules Better: Which of the following describes the structure of DNA? A. It is composed of uracil bases * B. It is a double-stranded helix C. It is stabilized by base pair covalent bonding D. It contains ribose molecules

6. The stem should be qualified if a comparison is required or if the list of distracters is a subset of all possible answers. The phrase "of the following" can be used for this purpose. For example: Which of the following toxicology methods represents the most definitive confirmatory procedure? A. HPLC B. Immunoassays * C. GC/MS D. Spot Tests 7. The alternatives should be concise with any repetitive information placed in the stem. Poor: An adult male patient with glomerulonephritis had a creatinine clearance test

performed. Which of the following results would be consistent with the diagnosis? * A. A creatinine clearance of 60 mL/min B. A creatinine clearance of 100 mL/min C. A creatinine clearance of 140 mL/min D. A creatinine clearance of 180 mL/min Better: Which of the following creatinine clearance results is consistent with a diagnosis of

glomerulonephritis in an adult male? * A. 60 mL/min B. 100 mL/min C. 140 mL/min D. 180 mL/min 8. There should be four or five responses for each item. With more responses the probability that a test-taker will get an answer correct by guessing decreases. However, it is better to have four well-constructed responses than five poorly constructed ones. With some items only three responses are possible. For example:

The BUN concentration in a patient receiving a high protein diet probably would be A. Normal * B. Increased C. Decreased

9. The stem and responses should agree in tense and grammar. This can be tested by reading each response with the stem. To avoid clues, use a(n) or is/are. Poor: An antibody that reacts with an individual's own red cell antigens is called an * A. Autoantibody B. Monoclonal antibody C. Alloantibody D. Isoantibody In this example, "An" is a clue that "B" is incorrect. To improve this item either use a(n) or reword the stem.) Better: The term for an antibody that reacts with an individual's own red cell antigens is * A. Autoantibody B. Monoclonal antibody C. Alloantibody D. Isoantibody 10. The words "always," "never," "all," or "none" should be avoided in the responses because these terms are often associated with false statements and provide clues for the learner. 11. The responses should be of similar length, complexity, and format. Often the correct answer is longer and more qualified than the incorrect distracters. Poor: The most common cause of iron deficiency anemia in adults is * A. Chronic loss of blood from GI tract as in bleeding ulcers B. Intestinal malabsorption C. Inadequate diet D. Repeated pregnancies Better: The most common cause of iron deficiency anemia in adults is * A. Chronic blood loss B. Intestinal malabsorption C. Inadequate diet D. Repeated pregnancies 12. The responses "all of the above" and "none of the above" should be used sparingly,

if at all. Often these phrases are employed when it is difficult to create a plausible distracter. If learners know two out of three responses are either correct or incorrect then they are able to select "all" or "none of the above." 13. The position of the correct responses should be randomized. 14. The responses should be placed one below the other instead of side by side. The alternatives should not be separated from the stem by paging. 15. The instructor should try to think like a learner when designing plausible distracters and base the responses on the learners' perceptions. This is especially useful in calculation problems where the distracters can be common math errors. 16. Each test item should be independent of the other items. It should not be necessary to answer one item in order to be able to answer a subsequent item because this unnecessarily penalizes the learner. It is however, appropriate to have several test items related to one set of data or a case study as long as the items can be answered independently. 17. If the item is an opinion or guideline the source should be identified in the stem. For example:

According to the Food and Drug Administration, polyspecific antiglobulin reagents must contain

A. Anti-IgG, Anti-IgM, and Anti-IgA B. Anti-IgG and Anti-C4 * C. Anti-IgG and Anti-C3d D. Anti-C4 and Anti-C3d 18. All responses should be expressed in the same units. 19. When the responses are number values, they should be arranged logically in increasing or decreasing order. 20. Multiple choice items can be designed to have multiple answers. When this occurs, the format for the answers should be consistent throughout the test, so that students do not spend valuable time ascertaining the answering scheme. There are many formats for multiple answer questions, an example of which appears below. A. If alternatives 1, 2, and 3 are correct B. If alternatives 1 and 3 are correct C. If alternatives 2 and 4 are correct D. If only alternative 4 is correct E. If all alternatives are correct

Example: In a family study the mother typed as Group O and the father typed as Group B. Which

of the following genotypes could their children have? 1. BO 2. BB 3. OO 4. AB Correct answer: B True/False Items True/false test items are used infrequently in clinical laboratory science because there are few absolutes in science. True/false items encourage guessing which diminishes the effectiveness of a test. Often true/false items can be transformed into multiple choice items. For example: True or False item: Giardia lamblia is the most commonly diagnosed intestinal flagellate of humans.

(T) A multiple choice item on the same content: The most commonly diagnosed intestinal flagellate of humans is A. Chilomastrix mesnili B. Trichomonas hominis * C. Giardia lamblia D. Dientamoeba fragilis True/false items can be improved when the instructor uses the following guidelines: 8-10

 The statement should be as unambiguous as possible so that it is clearly true or false.

 The words "always," "never," "all," "none," or "only" should be avoided because they are associated with false statements. "Usually," "may," and "sometimes," should also be avoided since they are associated with true statements.

 To minimize the potential for guessing, the learners can be required to correct the false statements.

Matching Matching items are often employed when testing a body of facts and are useful in correlating data. However, matching items tend to promote memorization and not understanding. Suggestions for improvement 8-10

 Clearly state in the directions whether or not the responses can be used more than once and where the responses should be placed.

 Arrange the columns in a logical order, for example, alphabetically or increasing numerically.

 Include several extra responses to minimize guessing.

 Keep the tests short and concise.

 Keep the matching items all on the same page. Example:

Directions Match the disease in Column A with the causative microorganism in Column B. Write the number from Column B next to the correct letter in Column A. You may use an answer from Column B only once.

Column A: Disease Column B: Microorganism ____ A. Toxic Shock Syndrome 1. Bacillus ____ B. Anthrax 2. Bordetella ____ C. Dysentery 3. Corynebacterium ____ D. Syphilis 4. Haemophilus ____ E. Tuberculosis 5. Klebsiella ____ F. Typhoid fever 6. Mycobacterium ____ G. Whooping cough 7. Salmonella 8. Shigella 9. Staphylococcus 10.Treponema Subjective Tests: Short Answer and Essay Items Advantages:

 Easy to create

 Useful in assessing learners' understanding of the material, their ability to organize their thoughts and their ability to communicate at the higher cognitive levels.

Disadvantages:

 Time-consuming to evaluate.

 Essay questions are limited in their sampling power and may prove more difficult for the learner with poor writing skills. Short answer questions broaden the sampling opportunities.

Suggestions for improvement: 9 – 13 1. Decide on the scope of the question. An instructor may choose to restrict the question or leave

it open-ended. An example of an open-ended essay test item would be:

"Design a clinical chemistry laboratory."

A test item such as this has no qualifying criteria and allows the learner to express creativity, organization, and communication skills. However, a question of this type is subject to interpretation and makes grading more difficult. It also may encourage rambling and excessive discussion on the part of a learner who feels that quantity is more important than quality. A restricted essay question assigns direction to the learner's response by stating criteria. An example of a restricted essay question would be:

"Compare a calibrator to a control with regard to composition and purpose in the clinical laboratory.”

The restricted question form is easier to grade but if it is too specific it can limit the learner's creativity and organizational skills.

2. Place the points assigned to each question on the examination. 3. Set a reasonable time limit. 4. Do not allow learners a choice in item selection because this practice lowers test reliability. 5. Tests composed of short answer items should be designed to include an appropriate balance of

lower and higher cognitive levels questions. 6. To evaluate essay and short answer items with the minimum amount of subjectivity:

 Develop answers and criteria for each question prior to grading the test. Assign points for each criteria.

 Read all the learners' answers to one question at a time.

 Grade the test without knowing the learner's identity. (This may not be possible if the class size is small.) This will avoid the "halo effect" whereby the instructor looks for the superior answer from the "A" learner and the inferior answer from the "D" learner.

 If possible, have the test evaluated by more than one instructor.

TEST ANALYSIS The analysis of test items and test scores can provide important information for instructors and learners. The information used in test analysis can be manually calculated or it can be obtained using computer programs. Test statistics are most reliable when there are a large number of test takers. Because most clinical laboratory programs have a small number of learners, the information from any one test administration should be interpreted with caution. Test analysis enables the instructor to:

 Compare the performance of a group of learners with the performance of learners from a previous test by calculating the test mean and range of scores.

 Determine the effectiveness and appropriateness of particular items on an examination. For example, an instructor may find that an item was answered incorrectly by the majority of learners. On closer inspection, this item may be too difficult, ambiguously worded, or incorrectly scored on the answer key.

 Identify test items that did or did not effectively discriminate between high achieving learners and low achieving learners.

Test analysis enables the learner to understand how they performed in relation to others in the class and what test areas they achieved mastery and what areas need further study. Measures of Central Tendency and Dispersion Central Tendency: Measurements of central tendency are used in analyzing test results to determine the grouping of the results and include:

Mode. This is the score or response that appears most frequently. The mode would be useful in determining the most frequent response.

Median. The median divides the distribution of test scores in half. The median is derived by arranging the test scores in increasing or decreasing order and selecting the score that is in the middle. If there is an even number of test scores, the median is the average of the two middle values. The median is a good measurement of central tendency if the test scores are skewed; that is, if there are a few very high or very low results. Also, if the number of test scores is small, the median may provide sufficient information for the instructor and the learners and there may be no need to calculate the mean.

Mean. The mean is the average score and it is determined by adding all the test scores together and dividing that number by the number of learners who took the test. Because the mean is based on the value of each score in the set, it is the most stable statistic and it is the most commonly used measure of central tendency. The mean is affected by extreme values, however, and in severely skewed distributions, it may be more appropriate to report the median.

Dispersion: Measurements of dispersion of test scores are useful in determining whether the scores are homogenous or heterogeneous. A simple measure of dispersion is the range which is the difference between the highest and lowest score. The range is a limited measurement of dispersion because it is influenced by extreme values and it does not consider the majority of test scores. The range is typically reported with the median. The most popular measurement of dispersion is the standard deviation which uses mean scores. The standard deviation is calculated by taking the square root of the average squared deviations around the mean and it is reported with the mean. Item Analysis Difficulty and discrimination indices are useful in determining the effectiveness of individual items on an examination. Often a review of these indices will identify items that should be omitted from the scoring for a current examination and should be revised for future use. A simple review of the distribution of responses to each distracter can also reveal important information for test scoring and item construction. Difficulty Index.

The difficulty index is the percentage of test takers selecting the correct answer on a particular item (passing rate). This can be calculated by dividing the number of learners who correctly answered the item by the total number of learners. The difficulty index of a good test item is usually between 0.60 and 0.80 (60% and 80%), however some variation in item difficulty is appropriate. In analyzing items, the instructor should look at items with very high or very low difficulty indices. These items may be too easy or too difficult for the learners. An item with a very low difficulty index may indicate a problem with the item construction or that the item was keyed incorrectly. Items may also be too difficult because they cover material that was not clearly identified in the objectives or because they test trivial material. An item with a high difficulty index (most learner pass the item) may be appropriate if the instructor wishes to ensure that all learners possess a basic competency or if an instructor wishes to provide some easy questions at the beginning of an examination to build learner confidence.

Discrimination Index.

The discrimination index is a measurement of how well an item discriminates between the high achieving learners and low achieving learners. A good test item should be able to separate the

learners who know the most about a subject from those who know the least. The discrimination index can be manually calculated by first selecting the learners who scored in the upper and lower 27 percent of the class on the examination. The discrimination index compares the number of learners that got the item correct in the upper and lower groups: 12

Number correct in upper 27% - Number correct in lower 27% Number of learners in the upper or lower group In many computer generated item analysis programs, the biserial or point biserial correlation coefficient is used as a discrimination index. These statistics compare the score on an individual test item with the score on the overall test for all the examinees. Evaluating the discrimination index:

 The discrimination index can range from - 1.0 to + 1.0

 An item with a negative discrimination index should be revised or discarded because it indicates that the correct answer was chosen more often by the low achieving learners than by the high achieving learners.

 An item with a zero discrimination index indicates that the correct response was chosen equally by the high and low achieving learners.

 The ideal item would have a discrimination index greater than 0.4.

 Items with an index of discrimination in the range 0.30 to 0.39 are considered good, items in 0.20 to 0.29 range are marginal and items with an index of discrimination index of less than 0.19 are considered poor. 13

Relationship between the two indices. Difficulty and discrimination indices are closely related. The maximum discrimination index (1.0) is possible only when the difficulty index is 0.50 (50 percent). If the discrimination index were 1.0, this would mean that all the learners who performed the best on the examination chose the correct answer and none of the learners who performed poorly on the examination chose the correct answer. If most learners correctly answer an item (high difficulty index) or if most learners incorrectly answer an item (low difficulty index), the item will have poor index of discrimination. The following examples illustrate the relationship between difficulty and discrimination indices and the information that can be obtained from item analysis. Examples: (* indicates the correct answer) Item 1:

Responses A B C D* Difficulty Index Discrimination Index

Number of learners 3 0 5 17 0.68 0.30

Comment: From a statistical point of view, this is a good item. The difficulty index indicates that 68% of the learners selected the correct answer and the discrimination index indicates the learners who performed well on the overall exam selected the correct answer more often than the learners who performed poorly on the overall examination. It is also noteworthy that no learners selected B as the correct answer. The distracter for B may not be plausible, even to the learner who is performing poorly. Item 2:

Responses A B* C D Difficulty Index Discrimination Index

Number of learners 7 8 6 4 0.32 0.05

Comment: This is a difficult item with only 32% of the learners selecting the correct answer. The discrimination index is very useful in this case because it indicates that the item is not separating the learners who performed well from the learners did not perform well on the overall test. This item may be poorly written, it may test material that the learners have not studied, or it may test trivial material. The spread of responses indicates that some learners are selecting each of the responses and the learners may have been guessing on this item. Item 3:

Responses A* B C D Difficulty Index Discrimination Index

Number of learners 8 4 10 3 0.32 0.48

Comment: This is also a difficult item; however the discrimination index for this item is much better than the discrimination index in item 2. Although only 32% of the learners correctly answered this item, the item discriminated between those who performed well and those who performed poorly. This item may be a higher cognitive level item that separated those learners with a thorough understanding of the material from those learners who are at a lower level of comprehension. Items 2 and 3 demonstrate the need to analyze both the difficulty index and discrimination index when evaluating test items. Reliability and Validity Reliability. The reliability of a test is related to the consistency of test scores and it is reported as a correlation coefficient. The reliability of standardized tests is usually over 90%, however the reliability of classroom examinations is typically lower, in the range of 60 to 80%. 14 Reliability refers to the results, or test scores, rather than to the test itself. Reliability is affected by the:

 quality of the test items

 number of test items

 objectivity of the grading process

 physical and emotional state of the learners when taking the test. 15,16

Ideally, the reliability of test scores would be determined by comparing two sets of scores obtained under identical conditions. Because this is impossible to do, several other approaches to determining

reliability have been developed. One method involves giving the same test to the same group of individuals twice. If the test scores are reliable, the scores for individuals taking the same test on two different administrations should be highly correlated. This “test-retest” approach to determining reliability is problematic because the conditions for taking the test are never the exactly the same at different points in time. Other determinations of reliability calculate the consistency of scores based on a single test administration. An example of this type of reliability coefficient is the split-half method. Using this method, the scores on the even items of an examination are compared to the scores on the odd items of the examination. If the test is internally consistent, these scores should be highly correlated. The reliability of a test can be improved by:

 Increasing the number of questions on an examination. In general, adding more items of the same quality as the items already on the examination will improve reliability. A longer test provides a better sample of the test taker’s knowledge and it lessens the influence of guessing.

 Providing good instructions on the test. This ensures that the learners’ responses reflect their best effort and are not the result of confusion over directions.

 Using consistent, objective criteria for grading. A reliable test could be graded by a second instructor and the learner would get a similar score.

 Using items with moderate difficulty and a good discrimination index.

 Providing a test site that is comfortable and free of distractions. Validity. The validity of a test refers to how well the test measures what it was designed to measure or, in other words, the extent to which the test serves its intended purpose. The two types of validity that are most applicable to clinical laboratory science educators are content validity and criterion validity.

Content validity refers to the extent to which the test is related to and samples the intended content or domain of tasks. Content validation usually occurs during test construction when the instructor creates a representative sample of test items that match the objectives for the instructional activity. Content validity is not measured statistically, rather it is inferred from the match between the test, the content of the course and the sample of items.

Criterion validity, or predictive validity, refers to the extent to which the examination predicts some specific outcome. For example, the results of a practical in a learner laboratory will have criterion validity if the learners’ success on this examination predicts their success in a similar clinical laboratory situation. Similarly, the criterion validity of the scores on practice certification examination could be assessed by their ability to predict success on an actual certification examination. A correlation coefficient could be used to report the predictive validity of a test. The reliability of test scores will affect the validity of those scores. If test scores are not reliable, it is difficult to use the test as a predictor of success on other outcome measures.

RETURNING THE TEST Tests should be graded as soon as possible so that learners receive prompt feedback on their performance. Prior to distributing the tests, the instructor could state the mean and standard deviation, or the median and the range. Because well-constructed test items are difficult to construct and may be used from year to year, many instructors do not allow learners to keep the tests. This practice is especially important in rotations in which learners take the same test at different times. SUMMARY When evaluating a test the following questions need to be addressed by the instructor: 17

1. Was the test relevant and balanced? That is, did it correlate closely with the course objectives? 2. Was the test of an appropriate length for the time allowed? 3. Was the grading objective? 4. Did the test discriminate between the high and low achieving learners? 5. Was the test difficult enough to be challenging but not unfair? 6. Was the test reliable? 7. Was the test valid?

REFERENCES 1. Procedures for Examination and Certification. American Society for Clinical Pathology. http://www.ascp.org/PDF/BOC-PDFs/procedures/Examination-Procedures.pdf. Accessed June 17, 2014. 2. Doig K: Common formats of multiple choice questions testing analytical skills in clinical

laboratory science. Clin Lab Sci 2000;13:40-46. 3. Doig K: Common formats of multiple choice questions testing application of knowledge in

clinical laboratory science. Clin Lab Sci 2000;13:32-39. 4. Doig K and Beck S. CLT and CLS Job Responsibilities: Current Distinctions and Updates. Clin Lab

Sci 2001;14:173-182 5. Beck S and Laudicina R. Clinical Laboratory Scientists' Views of the Competencies Needed for

Current Practice. Clin Lab Sci 1999; 12:98-103 6. Allen D and Tanner K: Rubrics: Tools for Making Learning Goals and Evaluation Criteria Explicit

for Both Teachers and Learners. CBE Life Sciences Education. 5: 197-203, Fall 2006. 7. Using Rubrics. http://www.cte.cornell.edu/teaching-ideas/assessing-student-learning/using-

rubrics.html Accessed June 2, 2014. 8. Rubrics. http://www.cte.cornell.edu/documents/Science%20Rubrics.pdf Accessed June 2, 2014 9. Gronlund NE: Measurement and Evaluation in Teaching. New York, Macmillan, 1985, pp. 146-

192. 10. Ebel RE and Frisbie DA: Essentials of Educational Measurement. Englewood Cliffs, NJ Prentice- Hall, 1986, pp. 126-201. 11. Gronlund NE: Constructing Achievement Tests. Englewood Cliffs, NJ, Prentice- Hall, 1982, pp. 36-

80. 12. Ebel R and Frisbie DA: Essentials of Educational Measurement. Englewood Cliffs, NJ, Prentice-

Hall, 1986, pp. 126-201. 13. Ebel R, Frisbie DA: Essentials of Educational Measurement. Englewood Cliffs, New Jersey:

Prentice-Hall, Inc. 1986. pp 226-227. 14. Ebel R, Frisbie DA: Essentials of Educational Measurement. Englewood Cliffs, New Jersey:

Prentice-Hall, Inc. 1986. p 234. 15. Gronlund NE, Constructing Achievement Tests. 3rd. Ed. Englewood Cliffs, New Jersey: Prentice

Hall Inc., 1982. pp 135 16. Gronlund NE: Measurement and Evaluation in Teaching. New York, Macmillan, 1985, pp. 100-

105. 17. Ebel R: Essentials of Educational Measurement. Englewood Cliffs, NJ, Prentice Hall, 1972, pp

359-382.

http://www.ascp.org/PDF/BOC-PDFs/procedures/Examination-Procedures.pdf

http://www.cte.cornell.edu/documents/Science%20Rubrics.pdf