sampling

chapter4research.docx

Home >Social Science homework help >Sociology homework help >sampling

Malec, T. & Newman, M. (2013). Research methods: Building a knowledge base. San Diego, CA: Bridgepoint Education, Inc. ISBN-13: 9781621785743, ISBN-10: 1621785742. Chapter 4: Survey Research-Describing and Predicting Behavior

Chapter 4

Survey Research—Describing and Predicting Behavior

Kim Steele/Photodisc/Getty Images

Chapter Contents

· Introduction to Survey Research

· Designing Questionnaires

· Sampling From the Population

· Analyzing Survey Data

· Ethical Issues in Survey Research

In a highly influential book published in the 1960s, the sociologist Erving Goffman (1963) defined stigma as an unusual characteristic thattriggers a negative evaluation. In his words, "The stigmatized person is one who is reduced in our minds from a whole and usual person to atainted, discounted one" (1963, p. 3). People's beliefs about stigmatized characteristics exist largely in the eye of the beholder but havesubstantial influence on social interactions with the stigmatized (see Snyder, Tanke, & Berscheid, 1977). A large research tradition in psychologyhas been devoted to understanding both the origins of stigma and the consequences of being stigmatized. According to Goffman and others,the characteristics associated with the greatest degree of stigma have three features in common: They are highly visible, they are perceived ascontrollable, and they are misunderstood by the public.

Recently, researchers have taken considerable interest in people's attitudes toward members of the gay and lesbian community. Although theseattitudes have become more positive over time, this group still encounters harassment and other forms of discrimination on a regular basis (seeNational Gay Task Force, 1984). One of the top recognized experts on this subject is Gregory Herek, professor of psychology at the University ofCalifornia at Davis ( http://psychology.ucdavis.edu/herek/ ). In a 1988 article, Herek conducted a survey of heterosexuals' attitudes toward bothlesbians and gay men, with the goal of understanding the predictors of negative attitudes. Herek approached this research question byconstructing a scale to measure attitudes toward these groups. In three studies, participants were asked to complete this attitude measure,along with other existing scales assessing attitudes about gender roles, religion, and traditional ideologies.

Herek's (1988) research revealed that, as hypothesized, heterosexual males tended to hold more negative attitudes about gay men and lesbiansthan heterosexual females. However, the same psychological mechanisms seemed to explain the prejudice in both genders. That is, negativeattitudes were associated with increased religiosity, more traditional beliefs about family and gender, and fewer experiences actually interactingwith gay men and lesbians. These associations meant that Herek could predict people's attitudes toward gay men and lesbians based onknowing their views about family, gender, and religion, as well as their past interactions with the stigmatized group. Herek's primarycontribution to the literature in this paper was the insight that reducing stigma toward gay men and lesbians "may require confronting deeplyheld, socially reinforced values" (1988, p. 473). And this insight was possible only because people were asked to report these values directly.

4.1 Introduction to Survey Research

Whether you are aware of it or not, you have been encountering survey research for most of your life. Every time your telephone rings duringdinnertime, and the person on the other end of the line insists on knowing your household income and favorite brand of laundry detergent, heor she is helping to conduct survey research. When news programs try to predict the winner of an election two weeks early, these reports arebased on survey research of eligible voters. In both cases, the researcher is trying to make predictions about the products people buy or thecandidates they will elect based on people's reports of their own attitudes, feelings, and behaviors.

Surveys can be used in a variety of contexts and are most appropriate for questions that involve people describing their attitudes, theirbehaviors, or a combination of the two. For example, if you wanted to examine the predictors of attitudes toward the death penalty, you couldask people their opinions on this topic and also ask them about their political party affiliation. Based on these responses, you could testwhether political affiliation predicted attitudes toward the death penalty. Or, imagine you wanted to know whether students who spent moretime studying were more likely to do well on their exams. This question could be answered using a survey that asked students about their studyhabits and then tracked their exam grades. We will return to this example near the end of the chapter as we discuss the process of analyzingsurvey data to test our hypotheses about predictions.

The common thread running through these two examples is that they require people to report either their thoughts (e.g., opinions about thedeath penalty) or their behaviors (e.g., the hours they spend studying). Thus, in deciding whether a survey is the best fit for your researchquestion, the key is to consider whether people will be both able and willing to report these things accurately. We will expand on both of theseissues in the next section.

In this chapter, we continue our journey along the continuum of control, moving on to survey research, in which the primary goal is eitherdescribing or predicting attitudes and behavior. For our purposes, survey research refers to any method that relies on people's reports of theirown attitudes, feelings, and behaviors. So, for example, in Herek's (1988) study, the participants reported their attitudes toward lesbians and gaymen, rather than these attitudes being somehow directly observed by the researchers. Compared with the qualitative and descriptive designsfor observing behavior we discussed in Chapter 3, survey research tends to yield more control over both data collection and question content.Thus, survey research falls somewhere between quantitative descriptive research (Chapter 3) and the explanatory research involved inexperimental designs (Chapter 5). This chapter provides an overview of survey research from conceptualization through analysis. We will coverthe types of research questions that are best suited to survey research and provide an overview of the decisions to consider in designing andconducting a survey study. We will then cover the process of data collection, with a focus on selecting the people who will complete yoursurvey. Finally, we will cover the three most common approaches for analyzing survey data, bringing us back full circle to addressing ourresearch questions.

Distinguishing Features of Surveys

Survey research designs have three distinguishing features that set them apart from other designs. First, all survey research relies on eitherwritten or verbal self-reports of people's attitudes, feelings, and behaviors. This means that researchers will ask participants a series ofquestions and record their responses. This approach has several advantages, including being relatively straightforward and allowing access topsychological processes (e.g., "Why do you support candidate X?"). However, researchers are also cautious in their interpretation of self-reporteddata because participants' responses can reflect a combination of their true attitude and their concern over how this attitude will be perceived.Scientists refer to this as social desirability, which means that people may be reluctant to report unpopular attitudes. So if you were to askpeople their attitudes about different racial groups, their answers might reflect both their true attitude and their desire not to appear racist. Wereturn to the issue of social desirability and discuss some tricks for designing questions that can help to sidestep these concerns and capturerespondents' true attitudes.

The second distinguishing feature of survey research is that it has the ability to access internal states that cannot be measured through directobservation. In our discussion of observational designs in Chapter 3, we learned that one of the limitations of these designs was a lack of insightinto why people do what they do. Survey research is able to address this limitation directly: By asking people what they think, how they feel,and why they behave in certain ways, researchers come closer to capturing the underlying psychological processes.

However, people's reports of their internal states should be taken with a grain of salt, for three reasons. First, as mentioned, these reports maybe biased by social desirability concerns, particularly when unpopular attitudes are involved. Second, there is a large literature in socialpsychology suggesting that people may not be very accurate at understanding the true reasons for their behavior. In a highly cited review paper,psychologists Richard Nisbett and Tim Wilson (1977) argued that we make poor guesses about why we do things, and those guesses are basedmore on our assumptions than on any real introspection. Thus, survey questions can provide access to internal states, but these should alwaysbe interpreted with caution. Third, on a more practical note, survey research allows us to collect large amounts of data with relatively littleeffort and few resources. However, their actual efficiency depends on the decisions made during the design process. In reality, efficiency is oftenin a delicate balance with the accuracy and completeness of the data.

Broadly speaking, survey research can be conducted using either verbal or written self-reports (or a combination of the two). Before we diveinto the details of writing and formatting a survey, it is important to understand the pros and cons of administering your survey as an interview(i.e., an oral survey) or a questionnaire (i.e., a written survey).

Interviews

An interview involves an oral question-and-answer exchange between the researcher and the participant. This exchange can take place eitherface-to-face or over the phone. So our telemarketer example from earlier in the chapter represents an interview because the questions areasked orally, via phone. Likewise, if you are approached in a shopping mall and asked to answer questions about your favorite products, you areexperiencing a survey in interview form because the questions are administered out loud. And, if you have ever taken part in a focus group, inwhich a group of people gives their reactions to a new product, the researchers are essentially conducting an interview with the group. (For amore in-depth discussion of focus groups and other interview techniques, see Chapter 3, Section 3.2 , Qualitative Research Interviews.)

Interview Schedules

Regardless of how the interview is administered, the interviewer (i.e., the researcher) has a predetermined plan, or script, for how the interviewshould go. This plan, or script, for the progress of the interview is known as an interview schedule. When conducting an interview— includingthose telemarketing calls—the researcher/interviewer has a detailed plan for the order of questions to be asked, along with follow-up questionsdepending on the participant's responses.

Broadly speaking, there are two types of interview schedules. A linear (also called "structured") schedule will ask the same questions in thesame order for all participants. In contrast, a branching schedule unfolds more like a flowchart, with the next question dependent onparticipants' answers. A branching schedule is typically used in cases with follow-up questions that make sense only for some of theparticipants. For example, you might first ask people whether they have children; if they answer "yes," you could then follow up by asking howmany.

One danger in using a branching schedule is that it is based partly on your assumptions about the relationships between variables. Granted, it isfairly uncontroversial to ask only people with children to indicate how many children they have. But imagine the following scenario in which youfirst ask participants for their household income, and then ask about their political donations:

· "How much money do you make? $18,000? Okay, how likely are you to donate money to the Democratic Party?"

· "How much money do you make? $250,000? Okay, how likely are you to donate money to the Republican Party?"

The assumption implicit in the way these questions branch is that wealthier people are more likely to be Republicans and less wealthy people tobe Democrats. This might be supported by the data or it might not. But by planning the follow-up questions in this way, you are unable tocapture cases that do not fit your stereotypes (i.e., the wealthy Democrats and the poor Republicans). The lesson here is to be careful aboutletting your biases shape the data collection process, as this can create invalid or inaccurate findings.

Advantages and Disadvantages of Interviews

Spoken interviews have a number of advantages over written surveys. For one, people are often more motivated to talk than they are to write.Let's say that an undergraduate research assistant is dispatched to a local shopping mall to interview people about their experiences in romanticrelationships. The researcher may have no trouble at all recruiting participants, many of whom will be eager to divulge the personal detailsabout their recent relationships. But for better or for worse, these experiences will be more difficult for the researcher to capture in writing.Related to this observation, people's oral responses are typically richer and more detailed than their written responses. Think of the differencebetween asking someone to "Describe your views on gun control" versus "Indicate on a scale of 1 to 7 the degree to which you support guncontrol." The former is more likely to capture the richness and subtlety involved in people's attitudes about guns.

Woman taking notes while listening to a man talk.

Alina Solovyova-Vincent/E+/Getty Images

Conducting interviews may allow a researcher to gather moredetailed and richer responses.

On a practical note, using an interview format also allows you to ensure thatrespondents understand the questions. If written questionnaire items are poorlyworded, people are forced to guess at your meaning, and these guesses introducea big source of error variance (variance from random sources that are irrelevantto the trait or ability the questionnaire is purporting to measure). But if aninterview question is poorly asked, people find it much easier to ask theinterviewer to clarify. Finally, using an interview format allows you to reach abroader cross-section of people and to include those who are unable to read andwrite—or, perhaps, unable to read and write the language of your survey.

Interviews also have three clear disadvantages compared with written surveys.First, interviews are more costly in terms of both time and money. It certainlyused more of my time to go to a shopping mall than it would have taken to mailout packets of surveys (but no more money—these research assistant gigs tend tobe unpaid!). Second, the interview format allows many opportunities to gleanpersonal bias from the interview. These biases are unlikely to be deliberate, butparticipants can often pick up on body language and subtle facial expressionswhen the interviewer disagrees with their answers. These cues may lead them to shape their responses in order to make the interviewerhappier (the influence of social desirability again). Third, interviews can be difficult to score and interpret, especially with open-ended questions.Although administering them may be easy, scoring them is relatively more complicated, often involving subjectivity or bias in the interpretation.Because the researcher often has to make judgments based on personal beliefs about the quality of the response, multiple raters are generallyused to score the responses in order to minimize bias.

The best way to understand the pros and cons of interviewing is that both are a consequence of personal interactions. The interaction betweeninterviewer and interviewee allows for richer responses but also the potential for these responses to be biased. As a researcher, you have toweigh these pros and cons and decide which method is the best fit for your survey. In the next section, we turn our attention to the process ofadministering surveys in writing.

Questionnaires

A questionnaire is a survey that involves a written question-and-answer exchange between the researcher and the participant. Thequestionnaire can be in open-ended format (e.g., the participant writes in his or her answer) or forced-choice response format (e.g., theparticipant selects from a set of responses, such as with multiple choice questions, rating scales, or true/false questions), which will bediscussed later in this chapter. The exchange is a bit different from what we saw with interview formats. In written surveys, the questions aredesigned ahead of time and then given to participants, who write their responses and return the questionnaire to the researcher. In the nextsection, we will discuss details for designing these questions. But before we get there, let's take a quick look at the process of administeringwritten surveys.

Distribution Methods

Questionnaires can be distributed in three primary ways, each with its own pattern of advantages and disadvantages.

Distributing by Mail

Until recently, one common way to distribute surveys was to send paper copies through the mail to a group of participants (see Section 4.3,Sampling From the Population, for more discussion on how this group is selected). Mailing surveys is relatively inexpensive and relatively easy todo, but is unfortunately one of the worst methods when it comes to response rates. People tend to ignore questionnaires they receive in themail, dismissing them as one more piece of junk. There are a few tricks available to researchers to increase response rates, including providingincentives, making the survey interesting, and making it as easy as possible to return the results (e.g., with a postage-paid envelope). However,even using all of these tricks, researchers consider themselves extremely lucky to get a 30% response rate from a mail survey. That is, if youmail 1,000 surveys, you will be doing well to receive 300 back. Because of this low return on investment, researchers have begun using othermethods for their written surveys.

Distributing in Person

Another option is to distribute a written survey in person, simply handing out copies and asking participants to fill them out on the spot. Thismethod is certainly more time-consuming, as a researcher has to be stationed for long periods of time in order to collect data. In addition,people are less likely to answer the questions honestly because the presence of a researcher makes them worry about social desirability. Last,the sample for this method is limited to people who are in the physical area at the time that questionnaires are being handed out. As we willdiscuss later, this might lead to problems in the composition of the sample. On the plus side, however, this method tends to result in highercompliance rates because it is harder to say no to someone face-to-face than it is to ignore a piece of mail.

Distributing Online

Over the past 20 years, Internet, or Web-based, surveys have become increasingly common. In Web-based survey research, the questionnaire isdesigned and posted on a Web page, to which participants are directed in order to complete the questionnaire. The advantages of onlinedistribution are clear: This method is easiest for both researchers and participants and may give people a greater sense of anonymity, therebyencouraging more honest responses. In addition, response times are faster and the data are easier to analyze because they are already in digitalformat. The disadvantages include the following: Specific groups being underrepresented because they do not have access to the Internet, theresearcher has little to no control over sample selection, and the researcher receives responses only from those who are interested in the topic—so-called self-selection bias. All these limitations could raise questions about the validity and reliability of the data collected. In addition,several ethical issues might arise regarding informed consent and the privacy of participants. So when considering conducting Web-basedsurveys, researchers should evaluate all the advantages and disadvantages, as well as any ethical or legal implications.

For readers interested in more information on designing and conducting Internet research, Sam Gosling and John Johnson's 2010 book AdvancedMethods for Conducting Online Behavioral Research is an excellent resource. In addition, several groups of psychological researchers have beenattempting to understand the psychology of Internet users. (You can read about recent studies on this website .)

Advantages and Disadvantages of Questionnaires

Just as with interview methods, written questionnaires have their own set of advantages and disadvantages. Written surveys allow researchersto collect large amounts of data with little cost or effort, and they can offer a greater degree of anonymity than interviews. Anonymity can be aparticular advantage in dealing with sensitive or potentially embarrassing topics. That is, people may be more willing to answer a questionnaireabout their alcohol use or their sexual history than they would be to discuss these things face-to-face with an interviewer. On the downside,written surveys miss out on the advantages of interviews because no one is available to clarify confusing questions or to gather moreinformation as needed. Fortunately, there is one relatively easy way to minimize this problem: Write questions (and response choices, if usingmultiple choice or forced choice formats) that are as clear as possible. In the next section, we turn our attention to the process of designingquestionnaires.

4.2 Designing Questionnaires

One of the most important elements when conducting survey research is deciding how to construct and assemble the questionnaire items. Insome cases, you will be able to use questionnaires that other researchers have developed in order to answer your research questions. Forexample, many psychology researchers use standard scales that measure behavior or personality traits, such as self-esteem, prejudice,depression, or stress levels. The advantage of these ready-made measures is that other people have already gone to the trouble of making surethey are valid and reliable. So if you are interested in the relationship between stress and depression, you could distribute the Perceived StressScale (Cohen, Kamarck, & Mermelstein, 1983) and the Beck Depression Inventory (Beck, Steer, Ball, & Ranieri, 1996) to a group of participantsand move on to the fun part of data analyses. For further discussion, see Chapter 2, Section 2.2 , Reliability and Validity.

However, in many cases there is no perfect measure for your research question—either because no one has studied the topic before or becausethe current measures do not accurately assess the construct of interest you are investigating. When this happens, you will need to go throughthe process of designing your own questionnaire. In this section, we discuss strategies for writing questions and choosing the most appropriateresponse format.

Five Rules for Designing Better Questionnaires

Each of the following rules is designed to make your questions as clear and easy to understand as possible in order to minimize the potential forerror variance. We discuss each one and illustrate them with contrasting pairs of items, consisting of "bad" items that do not follow the rule,and then "better" items that do.

1. Use simple language. One of the simplest and most important rules to keep in mind is that people have to be able to understand yourquestions. This means you should avoid jargon and specialized language whenever possible.

BAD: BETTER: BAD: BETTER:

"Have you ever had an STD?" "Have you ever had a sexually transmitted disease?" "What is your opinion of the S-CHIP?" "What is your opinion of the State Children's Health Insurance Program?"

Close-up image of a simple questionnaire.

iStockphoto/Thinkstock

Simple language is one characteristic of an effective questionnaire.

It is also a good idea to simplify the language as much as possible so that peoplespend time answering the question rather than trying to decode your meaning.For example, words like assist and consider can be replaced with words like helpand think. This may seem odd—or perhaps even condescending to yourparticipants—but it is always better to err on the side of simplicity. Remember, ifpeople are forced to guess at the meaning of your questions, these guesses adderror variance to their answers.

In addition, when developing and administering surveys to various populations, itis important to remember to use language and examples that are not culturally orlinguistically biased. For example, participants from various cultures may not befamiliar with questions involving the historical development of the United Statesor the nature of gender roles in the United States. Thus, with the changing U.S.population and the different languages being spoken in the home, it is importantto use language that does not discriminate against particular populations. Also, itis advisable to avoid slang at all costs.

2. Be precise. Another way to ensure that people understand the question is to be as precise as possible in your wording. Questions that areambiguous in their wording will introduce an extra source of error variance into your data because people may interpret these questions invarying ways.

BAD: BETTER: BAD: BETTER:

"What kind of drugs do you take?" (Legal drugs? Illegal drugs? Now? In college?) "What kind of prescription drugs are you currently taking?" "Do you like sports?" (Playing? Watching? Which sports?) "How much do you like watching basketball on television?"

3. Use neutral language. It is important that your questions be designed to measure your participants' attitudes, feelings, or behaviors ratherthan to manipulate their responses. That is, you should avoid leading questions, which are written in such a way that they suggest ananswer.

BAD: BETTER: BAD: BETTER:

"Do you beat your children?" (Clearly, beating isn't good; who would say yes?) "Is it acceptable to use physical forms of discipline?" "Do you agree that the president is an idiot?" (Hmmm . . . I wonder what the researcher thinks. . . .) "How would you rate the president's job performance?"

This guideline can also be used to sidestep social desirability concerns. If you suspect that people may be reluctant to report holding an attitudesuch as using corporal punishment with their children, it helps to phrase the question in a nonthreatening way—"using physical forms ofdiscipline" versus "beating your children." Many current measures of prejudice adopt this technique. For example, McConahay's Modern RacismScale contains items such as "Discrimination against Blacks is no longer a problem in the United States" (McConahay, 1986). People who holdprejudicial attitudes are more likely to agree with statements like this one than with more blunt ones like "I hate people from Group X."

4. Ask one question at a time. One remarkably common error that people make in designing questions is to include a double-barreledquestion, which fires off more than one question at a time. When you fill out a new patient questionnaire at your doctor's office, theseforms often ask whether you suffer from "headaches and nausea." What if you suffer from only one of these? Or what if you have a lot ofnausea and only an occasional headache? The better approach is to ask about each of these symptoms separately.

BAD: BETTER: BAD: BETTER:

"Do you suffer from pain and numbness?" "How often do you suffer from pain?" "How often do you suffer from numbness?" "Do you like watching football and boxing?" "How much do you enjoy watching professional football on TV?" "How much do you enjoy watching boxing on TV?"

5. Avoid negations. One final and simple way to clarify your questions is to avoid questions with negative statements because these can oftenbe difficult to understand. In the following examples, the first is admittedly a little silly, but the second comes from a real survey of voteropinions.

BAD: BETTER: BAD: BETTER:

"Do you never not cheat on your exams?" (Wait—what? Do I cheat? Do I not cheat? What is this asking?) "Have you ever cheated on an exam?" "Are you against rejecting the ban on pesticides?" (Wait—so am I for the ban? Against the ban? What is this asking?) "Do you support the current ban on pesticides?"

Participant Response Options

In this section, we turn our attention to the issue of deciding how participants should respond to your questions. The decisions you make at thisstage affect the type of data you ultimately collect, so it is important to choose carefully. We also review the primary decisions you will need tomake about response options, as well as the pros and cons of each one.

One of the first choices you have to make is whether to collect open-ended or fixed-format responses. As the names imply, fixed-formatresponses require participants to choose from a list of options (e.g., "Pick your favorite color"), whereas open-ended responses ask participantsto provide unstructured responses to a question or statement (e.g., "How do you feel about legalizing marijuana?"). Open-ended responses tendto be richer and more flexible but harder to translate into quantifiable data—analogous to the trade-off we discussed in comparing writtenversus oral survey methods. To put it another way, some concepts are difficult to reduce to a seven-point fixed-format scale, but these scalesare easier to analyze than a paragraph of free-flowing text.

Another reason to think carefully about this decision is that fixed-format responses will, by definition, restrict people's options in answering thequestion. In some cases, these restrictions can even act as leading questions. In a study of people's perceptions of history, James Pennebakerand his colleagues (2006) asked respondents to indicate the "most significant event over the last 50 years." When this was asked in an open-ended way (i.e., "list the most significant event"), 2% of participants listed the invention of computers. In another version of the survey, thisquestion was asked in a fixed-format way (i.e., "choose the most significant event"). When asked to select from a list of four options (World WarII, Invention of Computers, Tiananmen Square, or Man on the Moon), 30% chose the invention of computers! In exchange for having easilycoded data, the researchers accidentally forced participants into a smaller number of options and ended up with a skewed sense of theimportance of computers in people's perceptions of history.

Fixed-Format Options

Although fixed-format responses can sometimes constrain or skew participants' answers, the reality is that researchers tend to use them moreoften than not. This decision is largely practical; fixed-format responses allow for more efficient data collection from a much larger sample.(Imagine the chore of having to hand-code 2,000 essays!) But once you have decided on this option for your questionnaire, the decision processis far from over. In this section, we discuss three possibilities you can use to construct a fixed-format response scale.

True/False

One option is to ask questions using a true/false format, which asks participants to indicate whether they endorse a statement. For example:

"I attended church last Sunday" "I am a U.S. citizen" "I am in favor of abortion"

True True True

False False False

This last example may strike you as odd, and in fact illustrates an important point about the use of true/false formats: They are best used forstatements of fact rather than attitudes. It is relatively straightforward to indicate whether you attended church or whether you are a U.S.citizen. However, people's attitudes toward abortion are often complicated—one can be "pro-choice" but still support some common-senserestrictions, or "pro-life" but support exceptions (e.g., in cases of rape). The point is that a true/false question cannot even come close tocapturing this complexity. However, for survey items that involve simple statements of fact, the true/false format can be a good option.

Multiple Choice

A second option is to use a multiple-choice format, which asks participants to select from a set of predetermined responses.

"Which of the following is your favorite fast-food restaurant?"

a. McDonald's

b. Burger King

c. Wendy's

d. Taco Bell

"Who did you vote for in the 2008 presidential election?"

a. John McCain

b. Barack Obama

"How do you travel to work on most days? (Select all that apply.)"

a. drive alone

b. carpool

c. take public transportation

As you can see in these examples, multiple-choice questions give you quite a bit of freedom in both the content and the response scaling ofyour questions. You can ask participants either to select one answer or, as in the last example, to select all applicable answers. You can covereverything from preferences (e.g., favorite fast-food restaurant) to behaviors (e.g., how you travel to work).

In addition to assessing preferences or behaviors, as in the preceding examples, multiple-choice questions are often used to examine knowledgeand abilities. For example, intelligence and achievement tests utilize multiple-choice questions to assess the cognitive and academic abilities ofindividuals. In these cases, there is only one correct response choice and three to four incorrect or "distracter" response choices. Most researchindicates that there should be at least three but no more than four incorrect response choices. Providing more than four incorrect responsechoices can make the question too difficult. It is also important that the correct response choice not obviously differ from the incorrect choices.Thus, all response choices, both correct and incorrect, should be approximately the same length and be plausible choices. If the incorrectresponse choices are obviously wrong, this will make the item too easy, which will affect its validity. In addition, including obviously wronganswers allows those individuals who do not know the answer to deduce the correct response.

Regardless of whether you are developing multiple-choice questions to assess behaviors, knowledge, or abilities, all questions should be clear,unambiguous, and brief so that they can be easily understood. In addition, as with all types of question formats, it is best to avoid negativequestions such as, "Which of the following is not correct?" because they can be confusing to test-takers and cause the question to be invalid(i.e., not measure what it is supposed to be measuring). Finally, it is best practice to avoid response choices that include "All of the above" or"None of the above." Most individuals know that when such response choices are provided, they are usually the correct answer.

You may have already spotted a downside to multiple-choice formats. Whenever you provide a set of responses, you are restricting participantsto those choices. This is the problem that Pennebaker and colleagues encountered when asking people about the most significant events of thelast century. In each of the preceding examples, the categories fail to capture all possible responses. What if your favorite restaurant is In-and-Out Burger? What if you voted for Ralph Nader? What if you telecommute or ride your bicycle to work? There are two relatively easy ways toavoid (or at least minimize) this problem. First, plan carefully when choosing the response options. During the design process, it helps tobrainstorm with other people to ensure you are capturing the most likely responses. However, in many cases, it is almost impossible to provideevery option that people might think of. The second solution is to provide an "other" response to your multiple-choice question. This allowspeople to write in an option that you neglected to include. For example, our last question about traveling to work could be rewritten as follows:

"How do you travel to work on most days? (Select all that apply.)"

a. drive alone

b. carpool

c. take public transportation

d. other (please specify):__________________

This way, people who telecommute, bicycle, or even ride their trained pony to work will have a way to respond rather than skipping thequestion. And, if you start to notice a pattern in these write-in responses (e.g., 20% of people adding "bicycle"), then you will have gainedvaluable knowledge to improve the next iteration of the survey.

Rating Scales

Last, but certainly not least, another option is to use a rating scale format, which asks participants to respond on a scale that represents acontinuum.

"Sometimes it is necessary to sacrifice liberty in the name of security."

1	2	3	4	5
strongly agree	agree	neither agree nordisagree	disagree strongly	disagree

"I would vote for a candidate who supported the death penalty."

1	2	3	4	5
Always	often	about half of the time	seldom	never

"The political party in power right now has messed things up."

1	2	3	4	5
strongly agree	agree	neither agree nordisagree	disagree	strongly disagree

This format is well suited to capturing attitudes and opinions and, in fact, is one of the most common approaches to attitude research. Ratingscales are easy to score, and they give participants some flexibility in indicating their agreement with or endorsement of the questions. As aresearcher, you have two critical decisions to make about the construction of rating scale items. Both have implications for how you will analyzeand interpret your results.

First, you'll need to decide on the anchors, or labels, for your response scale. Rating scales offer a good deal of flexibility in these anchors, asyou can see in the preceding examples. You can frame questions in terms of "agreement" with a statement or "likelihood" of a behavior;alternatively, you can customize the anchors to match your question (e.g., "not at all necessary"). Scales that use anchors of strongly agree and strongly disagree are also referred to as Likert scales. At a fairly simple level, the choice of labels affects the interpretation of the results. Forexample, if you asked the "political party" question, you would have to be aware that the anchors were phrased in terms of agreement with thestatement. In discussing these results, you would be able to discuss how much people agreed with the statement, on average, and whetheragreement correlated with other factors. If this seems like an obvious point, you would be amazed at how often researchers (or the media) willtake an item like this and spin the results to talk about the "likelihood of voting" for the party in power—confusing an attitude with a behavior!So, in short, make sure you are being honest when presenting and interpreting research data.

At a more conceptual level, you need to decide whether the anchors for your rating scale make use of a bipolar scale, which has polaropposites at its endpoints and a neutral point in the middle, or a unipolar scale, which assesses the presence or absence of a single construct.The difference between these scales is best illustrated by an example:

Bipolar: How would you rate your current mood?

1	2	3	4	5	6	7
very happy	happy	slightly happy	neither happy nor sad	slightly sad	sad	very sad

Unipolar: How would you rate your current mood?

	1	2	3	4	5
	not at all sad	slightly sad	moderately sad	very sad	completely sad

1	2	3	4	5
not at all happy	slightly happy	moderately happy	very happy	completely happy

With the bipolar option, participants are asked to place themselves on a continuous scale somewhere between sad and happy, which are polaropposites. The assumption in using a bipolar scale is that the endpoints represent the only two options—participants can be sad, happy, orsomewhere in between. In contrast, with the unipolar option, participants are asked to rate themselves on a continuous scale, indicating theirlevel of either sadness or happiness. The assumption in using a pair of unipolar scales is that it is possible to experience varying degrees of eachitem: For example, participants can be moderately happy but also a little bit sad. The decision to use a bipolar or a unipolar scale comes downto the context. What is the most logical way to think about these constructs? What have previous researchers done?

In the 1970s, Sandra Lipsitz Bem revolutionized the way researchers thought about gender roles by arguing against a bipolar approach.Previously, gender role identification had been measured on a bipolar scale from "masculine" to "feminine," the assumption being that a personcould be one or the other. Bem (1974) argued instead that people could easily have varying degrees of masculine and feminine traits. Her scale,the Bem Sex Role Inventory, asks respondents to rate themselves on a set of 60 unipolar traits. Someone with mostly feminine and hardly anymasculine traits would be described as "feminine." Someone with high ratings on both masculine and feminine traits would be described as"androgynous." And, someone with low ratings on both masculine and feminine traits would be described as "undifferentiated." You can viewand complete Bem's scale online at this website .

The second critical decision in constructing a rating scale item is to decide on the number of points in the response scale. You may have noticedthat all the examples in this section have an odd number of points (e.g., five or seven). This is usually preferable for rating scale items becausethe middle of the scale (e.g., "3" or "4") allows respondents to give a neutral, middle-of-the-road answer. That is, on a scale from stronglydisagree to strongly agree, the midpoint can be used to indicate "neither" or "I'm not sure." However, in some cases, you may not want toallow a neutral option in your scale. By using an even number of points (e.g., four or six), you can essentially force people to either agree ordisagree with the statement; this type of scaling is referred to as forced choice.

So how many points should your scale have? As a general rule, more points will translate into more variability in responses—the more choicepeople have, the more likely they are to distribute their responses among those choices. From a researcher's perspective, the big question iswhether this variability is meaningful. For example, if you wanted to assess college students' attitudes about a student fee increase, opinionswill likely vary depending on the size of the fee and the ways in which it will be used. Thus, a five- or seven-point scale would be preferable toa two-point (yes or no) scale. However, past a certain point, increasing the scale range ceases to be linked to meaningful variation in attitudes.In other words, the difference between a 5 and a 6 on a seven-point scale is fairly intuitive for your participants to grasp. But what is the realdifference between an 80 and an 81 on a 100-point scale? When scales become too large, you risk introducing another source of error varianceas participants impose their interpretations on the scaling. In sum, more points do not always translate into a better scale.

Back to our question: How many points should you have? The ideal compromise supported by many statisticians is to use a seven-point scalefor bipolar scales. The reason has to do with the differences between scales of measurement. As you'll remember from our discussion inChapter 2, the way variables are measured has implications for data analyses. For the most popular statistical tests to be legitimate, variablesneed to lie on either an interval scale (i.e., with equal intervals between points) or a ratio scale (i.e., with a true zero point). Based onmathematical modeling research, statisticians have concluded that the variability generated by a seven-point scale is most likely to mimic aninterval scale (see, e.g., Nunnally, 1978). So a seven-point scale is often preferable because it allows us the most flexibility in data analyses.Note, however, that whereas seven-point scales produce reliable results with bipolar scales, unipolar scales tend to perform the best with five-point scales.

Finalizing the Questionnaire

Once you have finished constructing the questionnaire items, one last important step remains before beginning to collect data. This sectiondiscusses a few guidelines for assembling the items into a coherent questionnaire. The main issues at this stage are to think carefully about theorder of the individual items; how many items to include; whether to include open-ended, fixed-format, or multiple choice questions; andwriting the instructions in clear and concise language.

First, keep in mind that the first few questions will set the tone for the rest of the questionnaire. It is best to start with questions that are bothinteresting and nonthreatening to help ensure that respondents complete the questionnaire with open minds. For example:

BAD OPENING: BETTER OPENING: BAD OPENING: BETTER OPENING:

"Do you agree that your child's teacher is incompetent?" (threatening and also a leading question) "How would you rate the performance of your child's teacher?" "Would you support a 1% sales tax increase?" (boring) "How do you feel about raising taxes to help fund education?"

Second, strive whenever possible to have continuity in the different sections of your questionnaire. Imagine you are constructing a survey togive to college freshmen—you might have questions on family background, stress levels, future plans, campus engagement, and so on. It is bestto have the questions grouped by topic on the survey. So, for instance, students would fill out a set of questions about future plans on onepage and then a set of questions about campus engagement on another page. This approach makes it easier for participants to progressthrough the questions without having to mentally switch between topics.

Third, remember that individual questions are always read in context. This means that if you start your college student survey with a questionabout plans for the future and then ask about stress, respondents will likely have their future plans in mind when they think about their level ofstress. Another example is a graduate school that would administer a gigantic survey packet to every student enrolled in its IntroductoryPsychology course. One year, a faculty member included a measure of identity, asking participants to complete the statements "I am ______"and "I am not ______." As the students started to analyze data from this survey, they found that an astonishing 60% of students had filled inthe blank with "I am not a homosexual." This response seemed pretty unusual until they realized that the questionnaire immediately precedingthis one in the packet was a measure of prejudice toward gay and lesbian individuals. So, as these students completed the identity measure,they had homosexuality on their minds and felt compelled to point out that they were not homosexual. This proves once again that results canbe skewed by the context.

Finally, once you have assembled a draft version of your questionnaire, do a test run. This test run, called pilot testing, involves giving thequestionnaire to a small sample of people, getting their feedback, and making any necessary changes. One of the best ways to pilot test is tofind a patient group of friends to complete your questionnaire because this group will presumably be willing to give more extensive feedback.Another effective way to pilot test a questionnaire is to administer it to individuals in the target group. In soliciting their feedback, you shouldask questions like the following:

Was anything confusing or unclear?

Was anything offensive or threatening?

How long did the questionnaire take you to complete?

Did it get repetitive or boring? Did it seem too long?

Were there particular questions that you liked or disliked? Why?

The answers to these questions will give you valuable information to revise and clarify your questionnaire before devoting the resources for afull round of data collection. In the next section, we turn our attention to the question of how to find and select participants for this stage ofthe research.

Research: Thinking Critically

"Beautiful People Convey Personality Traits Better During First Impressions"

Medical News Today

A new University of British Columbia study has found that people identify the personality traits ofpeople who are physically attractive more accurately than others during short encounters.

The study, published in the December [2010] edition of Psychological Science, suggests people pay closerattention to people they find attractive, and is the latest scientific evidence of the advantages ofperceived beauty. Previous research has shown that individuals tend to find attractive people moreintelligent, friendly, and competent than others.

The goal of the study was to determine whether a person's attractiveness impacts others' ability todiscern their personality traits, says Prof. Jeremy Biesanz, UBC Dept. of Psychology, who coauthored thestudy with PhD student Lauren Human and undergraduate student Genevieve Lorenzo.

For the study, researchers placed more than 75 male and female participants into groups of five to 11people for three-minute, one-on-one conversations. After each interaction, study participants ratedpartners on physical attractiveness and five major personality traits: openness, conscientiousness,extraversion, agreeableness, and neuroticism. Each person also rated his or her own personality.

Researchers were able to determine the accuracy of people's perceptions by comparing participants'ratings of others' personality traits with how individuals rated their own traits, says Biesanz, adding thatsteps were taken to control for the positive bias that can occur in self-reporting.

Despite an overall positive bias towards people they found attractive (as expected from previousresearch), study participants identified the "relative ordering" of personality traits of attractiveparticipants more accurately than others, researchers found.

"If people think Jane is beautiful, and she is very organized and somewhat generous, people will see heras more organized and generous than she actually is," says Biesanz. "Despite this bias, our study showsthat people will also correctly discern the relative ordering of Jane's personality traits—that she is moreorganized than generous—better than others they find less attractive."

The researchers say this is because people are motivated to pay closer attention to beautiful people formany reasons, including curiosity, romantic interest, or a desire for friendship or social status. "Not onlydo we judge books by their covers, we read the ones with beautiful covers much closer than others,"says Biesanz, noting the study focused on first impressions of personality in social situations, like cocktailparties.

Although participants largely agreed on group members' attractiveness, the study reaffirms that beautyis in the eye of the beholder. Participants were best at identifying the personalities of people they foundattractive, regardless of whether others found them attractive.

According to Biesanz, scientists spent considerable efforts a half-century ago seeking to determine whattypes of people perceive personality best, to largely mixed results. With this study, the team chose toinvestigate this long-standing question from another direction, he says, focusing not on who judgespersonality best, but rather whether some people's personalities are better perceived.

Think about it:

1. Suppose the following questions were part of the questionnaire given after the 3-minute one-on-oneconversations in this study. Based on the goals of the study and the rules discussed in this chapter,identify the problem with each of the following questions and suggest a better item.

a. Jane is very neat.

1	2	3	4	5
stronglyagree	agree	neither agree nor disagree	disagree	strongly disagree

1. main problem:

1. better item:

b. Jane is generous and organized.

1	2	3	4	5
strongly agree	agree	neither agree nordisagree	disagree	stronglydisagree

2. main problem:

3. better item:

c. Jane is extremely attractive TRUE FALSE

3. main problem:

4. better item:

2. What are the strengths and weaknesses of using a fixed-format questionnaire in this study versusopen-ended responses?

3. The researchers state that they took steps to control for the "positive bias that can occur inselfreporting." How might social desirability influence the outcome of this particular study? Whatmight the researchers have done to reduce the effect of social desirability?

George, R. (2010, December 31). Beautiful people convey personality traits better during first impressions. Medical News Today. Retrieved from http://www.medicalnewstoday.com/articles/212245.php

4.3 Sampling From the Population

By now, you should have a good feel for how to construct survey items. Once you have finalized your measures, the next step is to find a groupof people to fill out the survey. But where do you find this group? And how many of them do you need? On the one hand, you want as manypeople as possible in order to capture the full range of attitudes and experiences. On the other hand, researchers have to conserve time andother resources, which often means choosing a smaller sample of people. In this section, we will examine the strategies researchers can use inselecting samples for their studies.

Researchers refer to the entire collection of people who could possibly be relevant for a study as the population. For example, if you wereinterested in the effects of prison overcrowding in this country, you would want to study the population of prisoners in the United States. If youwanted to study voting behavior in the next U.S. presidential election, your population would be United States residents eligible to vote. And ifyou wanted to know how well college students cope with the transition from high school, your population would include every college studentwho was graduated from high school and is now enrolled in any college in the country.

You may have spotted an obvious practical complication with these populations. How on earth are you going to get every college student, muchless every prisoner, in the country to fill out your questionnaire? You can't; instead, researchers will collect data from a sample, a subset of thepopulation. Instead of trying to reach all prisoners, you might sample inmates from a handful of state prisons. Rather than attempt to survey allcollege students in the country, researchers might restrict their studies to a collection of students at one university.

The goal in choosing a sample for quantitative research is to make it as representative as possible of the larger population. This is the goal,though it is not always practical. That is, if you choose students at one university, they need to be reasonably similar to college studentselsewhere in the country. If the phrase "reasonably similar" sounds vague, this is because the basis for evaluating a sample varies depending onthe hypothesis and the key variables of your study. For example, if you wanted to study the relationship between family income and stresslevels, you would need to make sure that your sample mirrored the population in the distribution of income levels. Thus, a sample of studentsfrom a state university might be a better choice than students from, say, Harvard (which costs about $50,000 per year). On the other hand, ifyour research question dealt with the pressures faced by students in selective private schools, then Harvard students could be a representativesample for your study.

Figure 4.1 is a conceptual illustration of both a representative and nonrepresentative sample, drawn from a larger population. The population inthis case consists of 144 individuals, split evenly between Xs and Os. Thus, we would want our sample to come as close as possible to capturingthis 50/50 split. The sample of 20 individuals on the left is representative of the sample because it is split evenly between Xs and Os. But thesample of 20 individuals on the right is nonrepresentative because it contains 75% Xs. Because there are far fewer Os than we might expect inthe right-hand population, this sample does not accurately represent the population. This failure of the sample to represent the population isalso referred to as sampling bias.

Figure 4.1: Representative and nonrepresentative samples of a population

Figure illustrating samples drawn from a population. The total population is represented by X's and O's in one box. A representative sample is represented by an even number of X's and O's in a separate box. A nonrepresentative sample is illustrated in a third box containing an unbalanced number of X's and O's.

So, where do these samples come from? As a researcher, you have two broad categories of sampling strategies at your disposal: probabilitysampling and nonprobability sampling.

Probability Sampling

Probability sampling is used when each person in the population has a known chance of being in the sample. This is possible only in caseswhere you know the exact size of the population. For instance, the 2010 population of the United States was 308,745,538 (United States CensusBureau, 2010). If you were to have selected a U.S. resident at random, each resident would have had a one in 308,745,538 chance of beingselected. Whenever you have information about total population, probability-sampling strategies are the most powerful approach because theygreatly increase the odds of getting a representative sample. Within this broad category of probability sampling are three specific strategies: simple random sampling, stratified random sampling, and cluster sampling.

Simple Random Sampling

Simple random sampling, the most straightforward approach, involves randomly picking study participants from a list of everyone in thepopulation. The term for this list is a sampling frame (e.g., imagine a list of every resident of the United States). To have a truly representativerandom sample, several criteria need to be met: You must have a sampling frame, you must choose from it randomly, and you must have a100% response rate from those you select. (As we discussed in Chapter 2, it can threaten the validity of your hypothesis test if people drop outof your study.)

Stratified Random Sampling

A group of multiracial students huddled together.

iStockphoto/Thinkstock

Stratified random sampling allows researchers to include allsubgroups of a population in a study.

Stratified random sampling, a variation of simple random sampling, is used whensubgroups of the population might be left out of a purely random samplingprocess. Imagine a city with a population that is 80% Caucasian, 10% Hispanic, 5%African American, and 5% Asian. If you were to choose 100 residents at random,the chances are very good that your entire sample would consist of Caucasianresidents. As a result, you would inadvertently ignore the perspective of all ethnicminority residents. To prevent this problem, researchers use stratified randomsampling—breaking the sampling frame into subgroups and then sampling arandom number from each subgroup. In the preceding city example, you coulddivide your list of residents into four ethnic groups and then pick a random 25from each of these groups. The result would be a sample of 100 people whocaptured opinions from each ethnic group in the population.

Cluster Sampling

Cluster sampling, another variation of random sampling, is used when you do nothave access to a full sampling frame (i.e., a full list of everyone in the population).Imagine that you wanted to do a study of how cancer patients in the United States cope with their illness. Because there is not a list of everycancer patient in the country, you have to get a little creative with your sampling. The best way to think about cluster sampling is as "sampleswithin samples." Just as with stratified sampling, you divide the overall population into groups; however, cluster sampling is different in that youare dividing into groups based on more than one level of analysis. In the cancer example, you could start by dividing the country into regions,then randomly selecting cities from within each region, and then randomly selecting hospitals from within each city, and finally randomlyselecting cancer patients from each hospital. The result would be a random sample of cancer patients from, say, Phoenix, Miami, Dallas,Cleveland, Albany, and Seattle; taken together, these patients would constitute a fairly representative sample of cancer patients around thecountry.

Nonprobability Sampling

The other broad category of sampling strategies is known as nonprobability sampling. These strategies are used in the (remarkably common)case in which you do not know the odds of any given individual being in the sample. This is an obvious shortcoming—if you do not know theexact size of the population and do not have a list of everyone in it, there is no way to know that your sample is representative! But despitethis limitation, researchers use nonprobability sampling on a regular basis.

Nonprobability sampling is used in qualitative, quantitative, and mixed methods research whose focus is on selecting relatively small samples ina purposeful manner rather than selecting samples that are representative of the entire population. Unlike probability sampling, nonprobabilitysampling does not include randomly selected, stratified samples that provide for generalization. Rather, the procedures used to select samplesare purposeful; that is, the samples are selected to obtain information-rich cases that yield in-depth insights. The sampling techniques used forquantitative and qualitative research probably make up one of the biggest differences between the two methods. The following sections willdiscuss some of the most common nonprobability strategies. All of the nonprobability strategies described (except convenience sampling) areconsidered categories of purposive sampling, or sampling with a purpose. Convenience sampling is considered a form of so-called accidental orhaphazard (serendipitous) sampling, since the cases are selected based on ready availability. Even so, convenience sampling can be purposeful,in that researchers do their best to recruit in ways that are convenient and yet attract individuals that meet meaningful criteria.

In many cases, it is not possible to obtain a sampling frame. When researchers study rare or hard-to-reach populations, or investigate potentiallystigmatizing conditions, they often recruit by word of mouth. The term for this is snowball sampling—imagine a snowball rolling down a hill,picking up more snow (or participants) as it goes. If you wanted to study how often homeless people took advantage of social services, youwould be hard-pressed to find a sampling frame that listed the homeless population. Instead, you could recruit a small group of homelesspeople and ask each of them to pass the word along to others, and so on. The resulting sample is unlikely to be representative, but researchersoften have to compromise for the sake of obtaining access to a population.

One of the most popular nonprobability strategies is known as convenience sampling, or simply enrolling people who show up for the study.Any time you see results of a viewer poll on your favorite 24-hour news station, the results are likely based on a convenience sample. CNN andFox News do not randomly select from a list of their viewers; they post a question on-screen or online, and people who are motivated enoughto respond will do so. For that matter, a large majority of psychology research studies are based on convenience samples of undergraduatecollege students. Often, experimenters in psychology departments advertise their studies on a bulletin board or website, and students sign upfor studies for extra cash or to fulfill a research requirement for a course. Students often pick a particular study based on whether it fits theirbusy schedules or whether the advertisement sounds interesting. Another example of convenience sampling is a researcher seeking to collectinformation from human resources managers to study the issue of bullying in organizations. The researcher might use a database of potentialparticipants who belong to one or more chapters of the Society of Human Resource Management (SHRM), an organization for human resourcesprofessionals. In this example, the convenience component of the sample is using a body of existing data; although these data are notrepresentative of all human resources managers or all organizations, they are a useful proxy for a broad sample of human resource managersacross industries.

Other popular nonprobability strategies include extreme or deviant case sampling, typical case sampling, heterogeneous sampling, expertsampling, criterion sampling, and theory-based sampling. Whatever strategy you decide on, the goal here is to make you mindful that all thedecisions that you make as a researcher have inherent strengths and weaknesses.

Extreme or deviant case sampling is used when we want information-rich data on unusual or special cases. For example, if we are interested instudying university diversity plans, it would be important to examine plans that work exceptionally well, as well as plans that have highexpectations but are not working for some reason or another. The focus is on examining outstanding successes and failures to learn lessonsabout those conditions.

On the other hand, sometimes researchers are not interested in learning about unusual cases but rather want to learn about typical cases. Typical case sampling involves sampling the most frequent or "normal" case. It is often used to describe typical cases to people unfamiliar witha setting, program, or process. For example, a typical case sample may be used to study the practicum experiences of psychology students fromuniversities that are rated average. One of the biggest drawbacks of this method involves the difficulty of knowing how to identify or define atypical or normal case.

Heterogeneous sampling, which is also called maximum variation sampling, may include both extreme and typical cases. It is used to select awide variety of cases in relation to the phenomenon being investigated. The idea here is to obtain a sample that includes diverse characteristicsfrom multiple dimensions. For example, if the researchers are interested in examining the experiences of community health clinic patients, theymay wish to conduct a focus group and then select 10 different groups that represent various demographics from several health clinics in thearea. Heterogeneous sampling is beneficial when it is desirable to view patterns across a diverse set of individuals. It is also valuable indescribing experiences that may be central or core to most individuals. Heterogeneous sampling can be problematic in small samples, though,because it is difficult to obtain a wide variety of cases using this method.

Expert sampling involves sampling a panel of individuals who have known expertise (knowledge and training) in a particular area. While expertsampling can be used in both qualitative and quantitative research, it is used more often in qualitative research. Expert sampling involves aprocess of identifying individuals with known expertise in an area of interest, obtaining informed consent from each expert, and then collectinginformation from them either individually or as a group.

The goal of criterion sampling is to select cases "that meet some predetermined criterion of importance" (Patton, 2002, p. 238). A criterioncould be a particular illness or experience that is being investigated, or even a program or a situation. For example, criterion sampling could beused to investigate a range of topics: the experiences of individuals who have attempted suicide, why students are exceedingly absent fromonline course rooms, or cases that have exceeded the standard waiting time at a doctor's office. The idea with this method is that theparticipants meet a particular criterion of interest.

Theory-based sampling is a version of criterion sampling that focuses on obtaining cases that represent theoretical constructs. As Patton (2002)discussed, theory-based sampling involves the researcher obtaining "sampling incidents, slices of life, time periods, or people on the basis oftheir potential manifestation or representation of important theoretical constructs" (p. 238). This type of sampling arose from grounded theoryresearch and follows a more deductive or theory-testing approach. As Glaser (1978) described, theory-based sampling is "the process of datacollection for generating theory whereby the analyst jointly collects, codes, and analyzes his data and decides which data to collect next andwhere to find them" (p. 36). Thus, data collection is driven by the emerging theory, and participants are selected based on their knowledge ofthe topic.

Choosing a Sampling Strategy

Although quantitative researchers strive for representative samples, there is no such thing as a perfectly representative one. There is alwayssome degree of sampling error, defined as the degree to which the characteristics of the sample differ from the characteristics of thepopulation. Instead of aiming for perfection, then, researchers aim for an estimate of how far from perfection their samples are. Theseestimates are known as the error of estimation, or the degree to which the data from the sample are expected to deviate from the populationas a whole.

One of the main advantages of a probability sample is that we are able to calculate these errors of estimation (or margins of error). In fact, youhave likely encountered errors of estimation every time you see the results of an opinion poll. For example, CNN may report that "Candidate Ais leading the race with 60% of the vote, ± 3%." This means Candidate A's percentage in the sample is 60%, but based on statistical calculations,her real percentage is between 57% and 63%. The smaller the error (3% in this example), the more closely the results from the sample matchthe population. Naturally, researchers conducting these opinion polls want the error of estimation to be as small as possible; imagine hownonpersuasive it would be to learn that "Candidate A has a 10-point lead, ± 20 points." In general, these errors are minimized when threeconditions are met: The overall population is smaller; the sample itself is larger; and there is less variability in the sample data. When samplesare created using a probability method, all of this information is available because these methods require knowing the population.

If probability sampling is so powerful, why are nonprobability strategies so popular? One reason is that convenience samples are more practical;they are cheaper, easier, and almost always possible to enroll with relatively few resources because you can avoid the costs of large-scalesampling. A second reason is that convenience is often a good starting point for a new line of research. For example, if you wanted to study thepredictors of relationship satisfaction, you could start by testing hypotheses in a controlled setting using college student participants, and thenyou could extend your research to the study of adult married couples. Finally, and relatedly, in many types of qualitative research, it isacceptable to have a nonrepresentative sample because you do not need to generalize your results. If you want to study the prevalence ofalcohol use in college students, it may be perfectly acceptable to use a convenience sample of college students. Although, even in this case, youwould have to keep in mind that you were studying drinking behaviors among students who volunteered to complete a study on drinkingbehaviors.

There are also cases, however, where it is critical to use probability sampling despite the extra effort it requires. Specifically, researchers useprobability samples any time it is important to generalize and any time it is important to predict behavior of a population. The best example forunderstanding these criteria is to think of political polls. In the lead-up to an election, each campaign is invested in knowing exactly what thevoting public thinks of its candidate. In contrast to a CNN poll, which is based on a convenience sample of viewers, polls conducted by acampaign will be based on randomly selected households from a list of registered voters. The resulting sample is much more likely to berepresentative, much more likely to tell the campaign how the entire population views its candidate, and therefore, much more likely to beuseful.

Determining a Sufficient Sample Size

Several factors come into play when determining what a sufficient sample size is for a research study. Probably the most important factor iswhether the study is a quantitative or a qualitative one. In quantitative research, there is one basic rule for sample size: The larger, the better.This is because a larger sample will be more representative of the population being studied. Having said that, even small samples usingquantitative data can yield meaningful correlations if the researcher makes efforts to achieve a representative sample or a meaningfulconvenience sample.

How does one make a practical decision, though, regarding the exact sample size a particular research situation requires? Gay, Mills, andAirasian (2009) provided the following guidelines for determining sufficient sample size based on the size of the population:

· For populations that include 100 or fewer individuals, the entire population should be sampled.

· For populations that include 400–600 individuals, 50% of the population should be sampled.

· For populations that include 1,500 individuals, 20% of the population should be sampled.

· For populations larger than 5,000, about 8% of the population should be sampled.

Thus, as we can see, the larger the population size, the smaller the percentage required to obtain a representative sample. This does not meanthat the sample shrinks as the population increases, but rather the opposite: The sample size required actually increases when studying largerpopulations.

Although larger sample sizes are generally preferred in quantitative studies, the size of an adequate sample also depends on the similarities anddissimilarities among members of the population. If the population is fairly diverse for the construct being measured, then a larger sample willlikely be required in order to obtain a representative sample. Populations whose members have similar characteristics will require smallersamples. Researchers have developed fairly sophisticated methods for determining sufficient sample sizes through the use of statistical poweranalysis; however, such methods are beyond the scope of this book. Gay et al.'s (2009) guidelines are sufficient for most types of research.

As discussed in Chapter 3, recommended samples for qualitative research are typically much smaller than for quantitative research, whichstrives to be representative of larger populations. In qualitative research, sample sizes are not based on numbers but rather on how well thevariables of interest are represented (Houser, 2009). Unlike quantitative approaches, which require large samples, qualitative techniques do nothave any rules regarding sample size. Thus, sample size depends more on what the researcher wants to know, the purpose of the inquiry, whatthe findings will be useful for, how credible they will be, and what can be done with available time and resources. Qualitative research can bevery costly and time-consuming, so choosing information-rich cases will yield the greatest return on investment. As noted by Patton (2002), "Thevalidity, meaningfulness, and insights generated from qualitative inquiry have more to do with the information-richness of the cases selectedand the observational/analytical capabilities of the researcher than with sample size" (p. 245).

Nonresponse Bias in Survey Research

Sometimes participants do not submit a survey or do not fill it out completely. When this occurs, it is considered nonresponse bias, which canaffect the size and characteristics of the sample, as well as the external validity of the study. External validity refers to how well the resultsobtained from a sample can be extended to make predictions about the entire population, which will be further discussed in Chapter 5.Nonresponses are particularly problematic if a large number of participants or a specific group of participants fails to respond to particularquestions. For example, perhaps all females skip a certain question, or several participants skip certain questions because they are too personal.This omission creates bias not only in the characteristics of the sample (e.g., the sample may no longer be representative) but also in the size ofthe sample that is required for the study.

Nonresponses can occur for many reasons, including the survey being too long, the questions being worded awkwardly, the survey topic beinguninteresting, or, in the case with Web-based surveys, the participants not knowing how to access or log into the website to complete thesurvey.

There are a few ways to minimize the threat of nonresponse bias. These include increasing the sample size to account for the possibility ofnonresponses, making sure that survey directions and questions are worded clearly, making sure the survey is not too long, providing rewardsor incentives for completing the survey, sending out reminders to complete the survey, and providing a cover letter that describes the exactreasons for conducting the survey.

4.4 Analyzing Survey Data

Once you have designed a survey, chosen an appropriate sample, and collected some data, now comes the fun part. As with the quantitativedescriptive designs covered in Chapter 3, the goal of analyzing survey data is to subject your hypotheses to a statistical test. Surveys can beused both to describe and predict thoughts, feelings, and behaviors. However, since we have already covered the basics of descriptive analysis inChapter 3, this section will focus on predictive analyses, which are designed to assess the associations between and among variables.

Researchers typically use three approaches to test predictive hypotheses: correlational analyses, chi-square analyses, and regression analyses.Each one has its advantages and disadvantages, and each is most appropriate for a different kind of data. Correlational analysis allows one toexamine the strength, direction, and statistical significance of a relationship; chi-square analysis determines whether two nominal variables areindependent from or related to one another; and simple linear regression is the method used to "predict" scores for one variable based onanother. In this section, we will walk through the basics of each analysis.

Correlational Analysis

In the beginning of this chapter, we encountered an example of a survey research question: What is the relationship between the number ofhours that students spend studying and their grades in the class? In this case, the hypothesis claims that we can predict something about astudent's grades by knowing how many hours he or she spends studying.

Imagine we collected a small amount of data to test this hypothesis, shown in Table 4.1. (Of course, if we really wanted a good test of thishypothesis, we would need more than 10 people in the sample, but this will do as an illustration.)

Table 4.1: Data for quiz grade/hours studied example
Participant	Hours Studied	Quiz Grade
1	1	2
2	1	3
3	2	4
4	3	5
5	3	6
6	3	6
7	4	7
8	4	8
9	4	9
10	5	9

The Logic of Correlation

The important question here is whether and to what extent we can predict grades based on study time. One common statistic for testing thesekinds of hypotheses is a correlation, which assesses the linear relationship between two variables. A stronger correlation between two variablestranslates into a stronger association between them. Or, to put it differently, the stronger the correlation between study time and quiz grade,the more accurately you can predict grades based on knowing how long the student spends studying.

Before we calculate the correlation between these variables, it is always a good idea to visualize the data on a graph. The scatterplot in Figure4.2 displays our sample data from the studying/quiz grade study.

Figure 4.2: Scatterplot for quiz grade/hours studied example

Scatterplot showing the sample data from the study of hours spent studying relationship to quiz grade. The number of hours studied is shown on the x axis and the quiz grade is shown on the y axis.

Each point on the graph represents one participant. For example, the point in the top right corner represents a student who studied for 5 hoursand earned a 9 on the quiz. The two points in the bottom right represent students who studied for only 1 hour and earned a 2 and a 3 on thequiz.

Figure 4.3: Curvilinear relationship betweenarousal and performance

Graph showing the curvilinear relationship between arousal and performance. Arousal is shown on the x axis and performance is shown on the y axis.

There are two reasons to graph data before conducting statistical tests.First, a graph conveys a general sense of the pattern—in this case,students who study less appear to do worse on the quiz. As a result, wewill be better informed going into our statistical calculations. Second, thegraph ensures that there is a linear relationship between the variables.This is a very important point about correlations: The math is based onhow well the data points fit a straight line, which means nonlinearrelationships might be overlooked. Figure 4.3 demonstrates a robustnonlinear finding in psychology regarding the relationship between taskperformance and physiological arousal. As this graph shows, people tendto perform their best on just about any task when they have a moderatelevel of arousal.

When arousal is too high, it is difficult to calm down and concentrate;when arousal is too low, it is difficult to care about the task at all. If wesimply ran a correlation with data on performance and arousal, the correlation would be zero because the points do not fit a straight line. Thus,it is critical to visualize the data before jumping ahead to the statistics. Otherwise, you risk overlooking an important finding in the data.

Interpreting Coefficients

Once we are satisfied that our data look linear, it is time to calculate our statistics. This is typically done using a computer software program,such as SPSS (IBM), SAS/STAT (SAS), or Microsoft Excel. The number used to quantify our correlation is called the correlation coefficient. Thisnumber ranges from –1 to +1 and contains two important pieces of information:

· The size of our relationship is based on the absolute value of our correlation coefficient. The farther our coefficient is from zero in eitherdirection, the stronger the relationship between variables. For example, both a + .80 and a – .80 indicate strong relationships.

· The direction of the relationship is based on the sign of our correlation coefficient. A + .80 would indicate a positive correlation, meaningthat as one variable increases, so does the other variable. A – .80 would indicate a negative correlation, meaning that as one variableincreases, the other variable decreases. (Refer back to Section 2.1, Overview of Research Designs, for a review of these two terms.)

So, for example, a + .20 is a weak positive relationship and a – .70 is a strong negative relationship.

When we calculate the correlation for our quiz grade study, we get a coefficient of .96, indicating a strong positive relationship betweenstudying and quiz grade. What does this mean in plain English? Students who spend more hours studying tend to score higher on the quiz.

How do we know whether to get excited about a correlation of .96? As with all of our statistical analyses, we look up this value in a criticalvalue table, or, more commonly, let the computer software do this for us. This gives us a p value representing the odds that our correlation isthe result of random chance. In this case, the p value is less than .001. This means that the chance of our correlation being a random fluke isless than 1 in 1,000, so we can feel pretty confident in our results. Note, however, that because the p value associated with r is dependent onsample size, even a tiny, unimportant correlation can be statistically significant when drawing conclusions from a large population.

We now have all the information we need to report this correlation in a research paper. The standard way of reporting a correlation coefficientincludes information about the sample size and p value, as well as the coefficient itself. Our quiz grade study would be reported as shown inFigure 4.4.

Figure 4.4: Correlation coefficient diagram

A correlation coefficient diagram of the quiz grade study with each part of the correlation labeled.

So where does this leave our hypothesis? We started by predictingthat students who spend more time studying would perform better ontheir quizzes than those who spend less time studying. We thendesigned a study to test this hypothesis by collecting data on studyhabits and quiz grades. Finally, we analyzed these data and found asignificant, strong, positive correlation between hours studied and quizgrade. Based on this study, our hypothesis has been supported—students who study more have higher quiz grades! Of course, becausethis is a correlational study, we are unable to make causal statements.It could be that studying more for an exam helps you to learn more.Or, it could be the case that previous low quiz grades make studentsgive up and study less. Or, the third variable of motivation could cause students to both study more and perform better on the quizzes. To teasethese explanations apart and determine causality, we will need an experimental type of research design, which we will cover in Chapter 5.

Regression Analysis

Correlations are the best tool for testing the linear relationship between pairs of quantitative variables. However, in many cases, we areinterested in comparing the influence of several variables at once. Imagine that you wanted to expand the investigation about hours studyingand quiz grade by looking at other variables that might predict students' quiz grades. We have already learned that the hours students spendstudying positively correlate with their grades. But what about SAT scores? We might predict that students with higher standardized test scoreswill do better in all of their college classes. Or what about the number of classes that students have previously taken in the subject area? Wemight predict that increased familiarity with the subject would be associated with higher scores. In order to compare the influence of all threevariables, we will use a slightly different analytic approach. Multiple regression analysis is a variation on correlational analysis; in it, more thanone predictor variable is used to foresee a single outcome variable. In this example, we would be attempting to predict the outcome variable ofquiz scores based on three predictor variables: SAT scores, number of previous classes, and hours studied.

Multiple regression analysis requires an extensive set of calculations; consequently, it is always done using computer software. A detailed look atthese calculations is beyond the scope of this book, but a conceptual overview will help you understand the unique advantages of this form ofanalysis. Essentially, the calculations for multiple regression are based on the correlation coefficients between each of our predictor variables,and between each of these variables and the outcome variable. These correlations for our revised quiz grade study are shown in Table 4.2. If wescan the top row, we can see the correlations between quiz grade and the three predictor variables: SAT score (r = .14), previous classes (r =.24), and hours studied (r = .25). The remainder of the table shows correlations between the various predictor variables; for example, hoursstudied and previous classes correlate at r = .24. When we conduct multiple regression analysis using computer software, the software will useall of these correlations in performing its calculations.

Table 4.2: Correlations for a multiple regression analysis
	Quiz Grade	SAT Score	Previous Classes	Hours Studied
Quiz Grade	—	.14	.24*	.25*
SAT Score		—	.02	–.02
Previous Classes			—	.24*
Hours Studied				—

The advantage of multiple regression is that it considers both the individual and the combined influence of the predictor variables. Figure 4.5 isa visual diagram of the individual predictors of quiz grades. The numbers along each line are known as regression coefficients, or beta weights.These are standardized coefficients that allow comparison across predictors. Their values are very similar to correlation coefficients but differ inan important way: They represent the effects of each predictor variable while controlling for the effects of all the other predictors. That is, thevalue of b = .21 linking hours studied with quiz grades is the independent contribution of hours studied, controlling for SAT scores and previousclasses. If we compared the size of these regression coefficients, we would see that, in fact, hours spent studying were still the largest predictorof quiz grades (b = .21) compared with both SAT scores (b = .14) and previous classes (b = .19).

Figure 4.5: Predictors of quiz grades

Visual diagram showing the individual predictors of quiz grades. The three predictors: SAT scores, previous classes, and hours studied, are shown in separate boxes. Arrows point from these boxes to a fourth box labeled quiz grades. Numbers along each arrow are the regression coefficients, or beta weights.

Even if individual variables have only a small influence, they can add up to a larger combined influence. So, if we were to analyze the predictorsof quiz grades in this study, we would find a combined multiple correlation coefficient of r = .34. The multiple correlation coefficient representsthe combined association between the outcome variable and the full set of predictor variables. Note that in this case, the combined r of .34 islarger than any of the individual correlations (r) in Table 4.2, which ranged from .14 to .25. These numbers mean that we are better able topredict quiz grades from examining all three variables than we are from examining any single variable. Or, as the saying goes, the whole isgreater than the sum of its parts!

Multiple regression analysis is an incredibly useful and powerful analytic approach, but it can also be a tough concept to grasp. Before we moveon, let's revisit the concept in the form of an analogy. Imagine you've just eaten the most delicious hamburger of your life and are determinedto understand what made it so good. Lots of factors will contribute to the taste of your hamburger: the quality of the meat, the type andamount of cheese, the freshness of the bun, perhaps the smoked chili peppers layered on top. If you were to approach this investigation usingmultiple regression analysis, you would be able to separate out the influence of each variable (e.g., How important is the cheese compared withthe smoked peppers?) as well as take into account the full set of ingredients (e.g., Does the freshness of the bun really matter when the otherelements taste so good?). Ultimately, you would be armed with the knowledge of which elements are most important in crafting the perfecthamburger. And you would understand more about the perfect hamburger than if you had examined each ingredient in isolation.

Chi-Square Analyses

Both correlations and regressions are well suited to testing hypotheses about prediction, as long as it is possible to demonstrate a linearrelationship between two variables. But linear relationships require that variables be measured on one of the quantitative scales—that is,ordinal, interval, or ratio scales (see Section 2.3 , Scales and Types of Measurement, for a review). What if we wanted to test the associationbetween nominal, or categorical, variables? In these cases, we would need an alternative statistic called the chi-square statistic, whichdetermines whether two nominal variables are independent from or related to one another. Chi-square is often abbreviated with the symbol X2,which shows the Greek letter chi with the superscript 2 for squared. (This statistic is also referred to as the chi-square test for independence—aslightly longer but more descriptive synonym.)

The idea behind this test is similar to that of the correlation coefficient. If two variables are independent, then knowing the value of onevariable does not tell you anything about the value of the other. As we will see in the following examples, a larger chi-square reflects a largerdeviation from what we would expect by chance and is thus an index of statistical significance.

The Logic of Chi-Square

To determine whether two variables are associated, the chi-square works by comparing the observed frequencies (collected data) with theexpected frequencies if the variables were unrelated. If the results significantly deviate from these expected frequencies, then we conclude thatour variables are related. And, consequently, we are able to predict one variable based on knowing the values of the other. Let's look at acouple of examples to make this more concrete.

First, let's say we wanted to know whether gender is related to political party affiliation. We might randomly select 100 men and 100 womenand ask them whether they identified as Republican or Democrat. Because both of these variables are nominal—that is, they identify onlycategories, not quantitative measures—chi-square will be our best choice to test the association between them. The first step in conducting thisanalysis is to arrange our data in a contingency table, which displays the number of individuals in each of the combinations of our nominalvariables. We encountered these tables before in our examples of observational studies in Chapter 3 ( Section 3.1 ) but stopped short ofconducting the statistical analyses. So imagine we get the results shown in Table 4.3a from our survey of gender and party affiliation.

Table 4.3a: Gender and party affiliation
	Male	Female
Democrat	60	60
Republican	40	40

In this case, there is no association between sex and party affiliation. It does not matter that the sample consists of 60% Democrats and 40%Republicans. What matters for our hypothesis test is that the pattern for males is the same as the pattern for females: Our sample consists of1.5 times the number of Democrats for both sexes. In other words, knowing a person's sex does not tell us anything about their politicalaffiliation.

For illustration purposes, imagine we tested the same hypothesis again but could recruit only 50 women, compared with 100 men. Now, if wefound the same 60%/40% split among men again, and assuming that the variables were still not related, here's the question: What would weexpect the split to look like among women? If the ratio of Democrats to Republicans remains at 1.5 to 1, we would expect to see the womendivided into 30 Democrats and 20 Republicans (i.e., the same ratio; shown in Table 4.3b). This concept is referred to as the expected frequency,or the frequency you would expect to see if the variables were not related. In this example, we have a 60/40 split among men. If gender wereunrelated to party affiliation, we would expect to see the same pattern among women.

Table 4.3b: Gender and party affiliation with unequal ns
	Male	Female
Democrat	60	30
Republican	40	20

The chi-square statistic is calculated by comparing our observed data with these expected frequencies. In our gender and party affiliationexample, the observed data match the expected frequencies, meaning that the variables are not related. Let's walk through another exampleand see how these calculations work.

Calculating Chi-Square

In this second example, imagine we wanted to know whether people in rural or urban areas were more likely to support a sales tax increase. Itwould be easy to speculate why either group might be more likely to do so—perhaps people living in cities are more politically liberal orperhaps people living in small towns are better able to see the benefits of higher local taxes. So, once again, imagine we surveyed a sample of100 people, asking them to indicate both their party affiliation and their support for a sales tax proposal. We get the following contingencytable of results (in Table 4.4a). Notice that we have more urban than rural residents, reflecting the higher population density in cities. But, aswith our preceding gender and political affiliation example, the raw numbers are less important than the ratios within each group.

Table 4.4: Chi-square example: Support for a sales tax increase
4.4a: Observed data
	Rural	Urban	Total
Support	10	45	55
Don't support	30	15	45
Total	40	60	100

4.4b: Expected frequencies
	Rural	Urban	Total
Support	10 (24.75)	45 (33)	55
Don't support	30 (18)	15 (27)	45
Total	40	60	100

4.4c: Calculating deviations between observed and expected values
	Rural	Urban	Total
Support	(10 –24.75)2/24.75	(55 – 33)2/33	55
Don't support	(30 – 18)2/18	(15 – 27)2/27	45
Total	40	60	100

4.4d: Deviations between observed and expected values
	Rural	Urban	Total
Support	8.79	14.67	65
Don't support	8	5.33	45
Total	40	60	100

The first stage in calculating our chi-square is to determine the expected frequencies. We begin by calculating the sums across each row incolumn, as shown in Table 4.4a. This gives us a sense of the overall patterns in the data. Overall 40% of the sample consisted of rural residents,compared with 60% urban residents. And, overall, 55% supported the sales tax increase, while 45% did not. But, as in our previous example,these descriptive statistics do not tell us anything about the relationship between the two variables. If the variables are independent, then the55/45 split in support for the sales tax will not differ based on where people live.

So we need to determine how much these observed data differ from what would be expected under independence. That is, what would thesecells look like if there were no relationship? These expected frequencies are calculated using the following formula:

Expected Frequency=R × CTotal N

For each of the four cells, we multiply the row total (R) by the column total (C), and then divide by the total N in the sample. For example, inthe rural resident/support cell, we would multiply the row total (55) by the column total (40), and then divide by the total sample size (100);(55 × 40) ÷ 100 = 22. Conceptually, this makes perfect sense: If the two variables are unrelated, we can guess the value of each cell using theoverall totals. Table 4.4b shows expected frequencies for each cell in parentheses.

The second stage in calculating chi-square is to determine the extent to which our observed data deviate from these expected frequencies. Wewill need to calculate this deviation in each cell and then add them up for a total chi-square. So, for each cell we (1) subtract the expected fromthe observed value; (2) square the difference to remove any negative numbers; and (3) divide by the expected frequency in order to standardizethe deviation. The final chi-square value is obtained by adding up all four of these deviation scores, which translates into the following formula:

x2=∑(observed − expected)2expected

For example, in our rural resident/support cell, we calculated an expected frequency of 2, representing the number we would expect underindependence. But in our sample, there were 10 people in this cell. To calculate how much this deviates from what is expected, we (1) subtract22 from 10 (= –12), (2) square this difference to remove the negative number (=144), and (3) divide this by the expected frequency tostandardize the deviation (144 ÷ 22 = 6.55). Tables 4.4c and 4.4d illustrate the steps for obtaining these deviation scores in each of the fourcells.

Finally, we add up all four of our deviation scores (one for each cell) to get the total chi-square value:

x2=∑(observed−expected)2expected=6.55 + 14.67 + 8 + 5.33 = 34.55

Our final chi-square value, 34.55, represents the sum of our deviations from the expected value. The larger this number, the more our observeddata differ from the expected frequencies. Remember that these expected frequencies represent our null hypothesis—we would expect thesefrequencies only if the variables were unrelated. So the greater our chi-square value, the more our variables are related to one another. In thepresent example, this means we can predict a person's support for a sales tax increase based on where he or she lives, which is consistent withour initial hypothesis.

But how do we know whether our value of 34.55 is meaningful? As with the other statistical tests we have discussed, this requires looking upour result in a critical value table to determine whether the calculated value is above threshold. In this case, the critical value for a chi-squarewith a 2 × 2 table = 3.84, so we can feel confident in our value of 34.55—almost 10 times higher than the threshold value!

However, unlike correlation and regression coefficients, our chi-square results cannot tell us anything about the direction or magnitude of therelationship. A larger chi-square reflects a larger deviation from what we would expect by chance and is thus an index of statistical significance.In order to interpret the patterns of our data, we need to visually inspect the numbers in our data table. Better yet, we can create a bar graphlike we did in Chapter 3 to visually display these frequencies.

As Figure 4.6 shows, the cell frequencies suggest a fairly clear interpretation: People who live in urban settings are much more likely thanpeople who live in rural settings to support a sales tax increase. In fact, urban residents support the increase by a 3-to-1 margin, while ruralresidents oppose the increase by a 3-to-1 margin.

Figure 4.6: Graph of chi-square results

Bar graph showing the number of people who live in rural vs. suburban areas who support or don't support a sales tax.

4.5 Ethical Issues in Survey Research

Like all research, surveys should be carried out in ways that protect and avoid harm to the participants. Informed consent, as we discussed inChapter 1 ( Section 1.7 , Ethics in Research), is a requirement for survey research and should include a clear description of the survey contentand the purpose for conducting the research. Whether the survey is completed in person or online, a cover letter should be included thatcontains this information. Although not always required for surveys (as there is minimal risk of harm to participants taking questionnaires andsurveys), some researchers also like to obtain signed consent forms from the participants. This is especially true for institutional review boards(IRBs), who want to ensure that participants were fully informed of any sensitive information that may be collected, any potential limits to theconfidentiality of the data, or access to private records (such as medical records) that are being sought in addition to the survey. In any case, aresearcher should never administer a survey if the participant has not provided verbal or written consent and should never use the data otherthan for reasons for which the participant provided consent.

Assurances of participant anonymity and confidentiality are also important, especially with respect to response rates to sensitive questions andresponse rates in general. Participants are likely to feel more comfortable responding to sensitive subjects when they know that they areparticipating anonymously. Anonymity ensures that there is no way for their responses to be linked back to them. Conducting survey research ina completely anonymous manner is much easier with online surveys; however, researchers can take steps to protect the anonymity ofparticipants in individual or group administrations as well. For example, anyone who has access to the surveys and completed data must commitin writing to preserving their confidentiality. Any links between the answers and the participants' personal identifying information, such asnames, email addresses, and phone numbers, should be minimized by removing the latter from the data collection and coding it (e.g.,numbering each participant with an ID code rather than using his or her name). If personal information must be kept, it should be separatedfrom the survey responses and destroyed as soon as the study is over. Any person who can identify the participant by looking at the pattern ofresponses, such as a supervisor, should not be permitted to view the survey responses. It is also important to be careful when reporting resultsfor a small subpopulation of the sample, whose personal information may be identifiable. Finally, when the study is completed, researchersmust ensure that they either destroy all survey responses and personal information or store them securely.

Another important consideration for researchers is to be aware of their own biases and the impact that those might have on the testingprocess. This is especially important in face-to-face inquiries. For example, if a researcher has a strong opinion or bias toward a particular topic,the questions might be worded in a way that persuades the participants to answer in a specific manner. Additionally, if the researcher has astrong belief about the topic, he or she may make this evident during the testing process, which may encourage the participant to respond in away that is consistent with what the researcher believes. As we discussed in Chapter 3 (Section 3.1), participant-expectancy bias can occurduring observations and interviews, as well as during survey research, and needs to be considered when analyzing and interpreting data.

Finally, when using standardized questionnaires or surveys that have been developed by other researchers, professional training and competencein the tests being used are essential. It is unethical for any researchers to administer, score, and interpret a test that they have not been trainedon. In addition, it is unethical for researchers to select tests based on limited knowledge and experience and assume that these tests arereliable and valid for the purpose for which they are using them.

Previous section

Next section

Summary

This chapter has covered the process of survey research from conceptualization through analysis. We first discussed the types of researchquestions that are best suited to survey research—essentially, those that can be answered based on people's observations of their own behaviorand characteristics. Survey research can involve either verbal reports (i.e., interviews) or written reports (i.e., questionnaires). In both cases,surveys are distinguished by their reliance on people's self-reports of their attitudes, feelings, and behaviors.

This chapter covered several key points for writing survey items. The take-home point to our Five Rules for Designing Better Questionnaires isthat your questions should be written as clearly and unambiguously as possible. This helps to minimize the error variance that might result fromparticipants imposing their own guesses and interpretations on the material. In designing survey items, you also have a broad choice betweenopen-ended and fixed-format responses. The former provide richer and more extensive data but are harder to score and code; the latter areeasier to code but can constrain people's responses to your choice of categories. If and when you settle on a fixed-format response, you haveanother set of decisions to make regarding the response scaling, labels, and general format.

Once you have constructed the scale, it is time to begin collecting the data. This chapter discussed the concept of sampling, or choosing aportion of the population to use for your study. Broadly speaking, sampling can be either "probability" or "nonprobability," depending onwhether you have a known population size from which you sample randomly. Probability sampling is more likely to result in a representativesample, but this approach is not possible in all studies. In fact, a significant proportion of psychology research studies uses a form ofnonprobability sampling called convenience sampling, meaning that the sample consists of those who show up for the study.

This chapter also covered three approaches to analyzing survey data and testing hypotheses about prediction. The first, correlational analysis, isa very popular way to analyze survey data. The correlation is a statistical test that gives an assessment of the linear relationship between twovariables. The stronger the correlation between variables, the more we can accurately predict one based on knowing the other. Second, regression analyses allow us to expand our investigations into multiple predictors. The advantage of multiple regression analysis is that itconsiders both the individual and the combined influence of the predictor variables. However, both correlation and regression require thevariables to be quantitative—that is, measured on an ordinal, interval, or ratio scale. In cases where our survey produces nominal or categoricaldata, we use an alternative called the chi-square statistic, which determines whether two nominal variables are independent or related. The chi-square works by examining the extent to which our observed data deviate from the pattern we would expect if the variables were unrelated—that is, the null hypothesis. The common thread running through these analyses is that they measure the association between variables and donot tell us anything about the causal relationship between them. To make causal statements, we have to conduct experiments, which we willcover in the next chapter.

Finally, this chapter discussed the ethical issues that arise in survey research. In addition to the concepts discussed in Chapter 1 regardinginformed consent and confidentiality, it is important that researchers conducting survey research ensure anonymity so that participants feel saferesponding to sensitive topics and comfortable participating in the overall study. Participant response rates tend to be higher when participantsknow that there is no way to link them to their responses. Additionally, the researcher should be aware of his or her own biases and the impactthat these might have on the collection, analysis, and interpretation of the data, as well as on how participants may respond. Also, when usingstandardized tests or tests that have been developed by other researchers, it is imperative that researchers be trained and competent inadministering and scoring the results, as well as in interpreting those results.