Item Reliability

profilewjm3774
ItemReliabilityWK4.docx

Item Reliability

Introduction

Have you ever weighed yourself on a bathroom scale and then stepped off and on again to find that your weight registered a pound or two heavier or lighter? You know that your real weight could not have changed within only a few seconds. The change was a characteristic of the scale—the scale has less than perfect reliability.

In psychological testing, there are two levels of reliability to consider: item reliability and test reliability. This week, you explore item reliability.

Objectives

Students will:

· Calculate coefficient alpha, true scores, standard errors of measurement, and standard errors of estimate

· Analyze reliability of test items

Readings

· Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.

. Chapter 4, “Reliability”

· Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods1(2), 199–223. Retrieved from the Walden Library databases.

· Schriesheim, C. A., & Hill, K. D. (1981).  Controlling acquiescence response bias by item reversals: The effect on questionnaire validity . Educational and Psychological Measurement, 41, 1101–1114. Schriesheim, C. A., & Hill, K. D., Controlling acquiescence response bias by item reversals: The effect on questionnaire validity, in Educational and Psychological Measurement. Copyright 1981 Sage Publications Inc. Journals. Used with permission from Sage Publications, Inc. via the Copyright Clearance Center.

· Traub, R. E., & Rowley, G. L. (1991).  Understanding reliability Educational Measurement: Issues and Practice, 10, 37–45. Traub , R. E., & Rowley, G. L., Understanding Reliability, in Educational Measurement: Issues and Practice. Copyright 1991 Blackwell Publishing Journals. Used with permission from Blackwell Publishing, Inc. via the Copyright Clearance Center.

· Wanous, J. P., & Hudy, M. J. (2001). Single-item reliability: A replication and extension. Organizational Research Methods4(4), 361–375. Retrieved from the Walden Library databases.

Media

· Laureate Education, Inc. (Executive Producer). (2010).  Scatter plot interactive Baltimore, MD: Author. Note: If you are unable to view the above media piece due to a visual impairment, please contact the Director of Disability Services at Walden University.

For this Knowledge Assessment, you perform test item analysis on the MoneyData dataset provided in the Week 2 Learning Resources.

The data file includes a six-item scale to measure financial risk-taking tendencies:

· R1: I'd rather run my own business than work for someone else

· R2: I don't mind risking large amounts of money if there is a good chance I can come out ahead

· R3: I'm willing to take real chances to get ahead financially

· R4: I get bored unless I'm taking some risks with my career

· R5: Being too conservative with your investments can cause financial problems

· R6: Running a business is something that I think of as interesting and exciting

QUESTION 1

1. Calculate Cronbach’s coefficient alpha estimate of reliability for these six items. Click ANALYZE>SCALE>RELIABILITYANALYSIS. Move the six items into the “Items” box. Click “Statistics” and under “Descriptives for” select “Scale” and “Scale if item deleted.” Click “Continue.” Select “Alpha” for the model. Click “OK.” The alpha coefficient is:

a.

.45

b.

.57

c.

.79

d.

.84

QUESTION 2

1. Review the results you generated to answer the last question. The scale standard deviation is:

a.

5.02

b.

4.02

c.

25.2

d.

17.1

QUESTION 3

1. A scale’s standard error of measurement can be calculated from its standard deviation and reliability. Based on the coefficient alpha and standard deviation that you just calculated, what is this scale’ s standard error of measurement? (Hint: see Anastasi & Urbina, 1997, p. 107).

a.

2.32

b.

3.28

c.

3.56

d.

4.56

QUESTION 4

1. Jane obtained a score of 20 on the financial risk-taking scale. There is a 99% chance that her true score is between:

a.

6.3 - 33.7

b.

14.0 - 26.0

c.

11.1 - 23.1

d.

8.2 - 31.8

QUESTION 5

1. Which item would you eliminate in order to improve Cronbach’s coefficient alpha for the scale?

a.

R1

b.

R3

c.

R5

d.

R6

Discussion

Test Item Reliability

Test item reliability indicates how consistent the results produced from items on a test are. Consistency can refer to the items’ stability over time or the consistency of the items with each other. If an item is unreliable, statistical relationships will appear weaker than they really are and inappropriate conclusions may be drawn regarding the relationships between variables.

A measurement of reliability consists of the extent to which an observed score (which is the true score plus or minus error) accurately reflects the true score. Returning to the example in this week’s Introduction, if your true weight were 150 pounds and you stepped on the scale hundreds of times, it would sometimes show 149, sometimes 151, and sometimes 150. If you averaged all of those weights, you would come close to your true score. If you looked at how much the weights varied, you would have a good measure of the scale’s error. The situation is similar with a psychological test—a score on an IQ test represents an estimate of the theoretical “true” IQ; however, that observed score also includes error.

Researchers or test developers measure a test’s reliability with a reliability coefficient, generally a positive correlation coefficient that is less than 1.00. (A correlation of 1.00 would indicate perfect correlation, which is theoretically impossible due to inherent error in measurement.) Acceptable reliability coefficients for psychological tests or test items are generally at least .70. If you know a test’s reliability, you can calculate its margin of error, a “plus or minus” band that indicates an interval likely to contain the true score.

A test item is reliable if its variations over time primarily reflect variations in what you are measuring. An unreliable item would show changes over periods that are not possible or are theoretically unexpected depending on the construct you are measuring. For instance, personality is a construct that is believed to be constant over a period of years or decades. An item that stated, “I feel happier than usual today” would be unreliable for measuring personality, because the construct of mood easily changes from day to day, much more quickly than the construct of personality.

For this week’s Discussion, think of a specific testing scenario. Then consider a reliable test item for that testing scenario and an unreliable item for that same testing scenario. Consider how you might know if these items are reliable or unreliable.

With these thoughts in mind:

Post a brief description of a specific testing scenario. Then describe one reliable test item and one unreliable test item for that testing scenario. Finally, explain what determines whether an item is reliable or unreliable within the scenario you presented.

Be sure to support your postings and responses with specific references to the Learning Resources.