Research method
Quantitative Methods in Social Science
Stephen Gorard
continuum N E W Y O R K • L O N D O N
Continuum The Tower Building 15 East 26th Street 11 York Road New York London SE1 7NX NY 10010
www. continuumbooks. com
© Stephen Gorard 2003
Reprinted 2004
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage of retrieval system, without prior permission in writing from the publishers.
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
ISBN 0-8264-65870 (hardback) 0-8264-65862 (paperback)
Typeset by BookEns Ltd., Royston, Herts Printed and bound in Great Britain by Biddies Ltd, King's Lynn, Norfolk
5
Surveying the field: questionnaire design
WHY DO A SURVEY?
Many new researchers appear to assume that their project must be based on a questionnaire survey (in the same way as many appear to assume that it should be based on semi-structured interviews). Indeed, the practice is so widespread in social science research that some commentators appear to equate quantitative approaches only with surveys. However, the 'decision' to use a survey is often quite hard to justify (Gillham 2000a considers the relative merits of interviews and surveys). Surveys are generally inferior as a design compared with experiments, as they are less well-theorized (see Chapter Eight). Even good ones cannot hope to establish a causatory explanation for any observed phenomenon (see Chapter Seven). Surveys are also generally less complete than official statistics, providing data of poorer quality (see Chapter Two). Their use is therefore far from automatic, and should be as reasoned as any other stage of the research design.
The use of a survey is indicated when the data required does not already exist and the research questions are not susceptible to experimental trial for practical reasons such as lack of resources or ethical constraints. Surveys are better at gathering relatively simple facts (such as respondents' current occupation) or reports of behaviour (such as how often the respondent misses a day at work) than at gathering opinions, attitudes or explanations. Viewed in this way, a survey is not a positive solution to a design problem but almost a position of last resort (and much the same comments could be made about the equally common approach of completing a couple of dozen interviews with a 'grounded theory' analysis). According to Gillham (2000b) no single method has been so abused as the questionnaire - 'the quick fix of social research methods'.
Surveying the field 91
Since even good ones tend to generate much poor data, when they are used it is perhaps better that they are used as part of a larger study also involving other approaches.
SAMPLING The intricate steps involved in selecting a sample (see Chapter Four) should come before the other stages of survey design. Obviously the sample does not actually have to be selected first, but you should at least have made all of the sampling design decisions first. Most importantly, you need to make a preliminary decision about the population, sample size and method of selecting cases. Many of the problems in survey design that follow have no best solution, but must be considered in relation to the sample required.
For example, if the population of study is five-year-old children then a postal questionnaire is not likely to be appropriate and a face- to-face interaction may be preferable. Since face-to-face delivery is more costly in research time than postal, the sample size may therefore need to be smaller. On the other hand, if the population is all of the householders in the country and the method of selection is random, then a postal delivery would seem more efficient. Face-to-face delivery would be very difficult since there would most likely be a widely scattered sample in geographical terms, necessitating arduous travel to remote areas for rather small clusters of cases. The research process is therefore iterative and messy, and not linear like following the steps for an instant cake mix. Your sample depends in part on your instrument, which depends in part on how you intend to analyse the results, which depends in part on your research questions, etc.
For simplicity this chapter assumes that the respondents are people, but a survey does not have to be of people. It could be of books or buildings, for example (and most of the comments made here would still apply).
METHOD OF DELIVERY
A key decision affecting the likely response rate, cost, speed, sample size and length of your questionnaire is how you intend to deliver it to your sample. There are many variations, but the most common choices are between face-to-face, self-administered and technology- based. In the following discussion of each these are, by implication, being compared with each other.
92 Quantitative Methods in Social Science
Face-to-face Face-to-face delivery takes place when the researcher is present while the questionnaire is being completed, and can therefore record the responses herself. This approach is very useful in allowing a wide response that includes those with low levels of literacy and those with visual challenges, who would find a self-administered questionnaire very difficult. Face-to-face, the researcher can read the questions aloud, explain any difficult points if necessary and record the responses in as much detail as desired. Since they are present, the researcher can also check who is answering the questions (i.e. that it is the right person) and can stop him or her answering the questions in a non-standard order (i.e. by flipping ahead to see what is coming).
Conducting a door-to-door survey, where you are invited into someone's home, given tea and seated on their sofa, can be a very rewarding experience. It is also possible to take a longer time, for a fuller set of questions, than you might achieve in requesting, say, a postal response. As well as being on hand to explain difficulties, you can use cue cards and visual mnemonics easily (lists of possible multiple-choice responses, perhaps), and can even add an element of multi-media via a laptop computer if this is deemed helpful. In their own homes, respondents can also check the accuracy of their answers by reference to their personal records such as certificates, diaries and so on. Maybe the single most important advantage of being present at the administration of a survey is the potential for observation, field notes and ad hoc interviews that other methods of delivery deny you. You can see facial expressions, type of house, age of car and a hundred other little details that might help you interpret your findings. You can talk to other people on the way in and out of the interview, and these 'staircase' meetings can be very fruitful for new ideas or contacts. Once you are on the road, everything becomes data.
The biggest single drawback for this form of delivery is the length of time it takes with all its practical consequences. Whereas postal surveys, for example, allow parallel mailings far afield, a visit requires travel and so constrains the nature of the sample used. Travel is more expensive than sending a questionnaire the same distance. This also creates a greater temptation to shirk on the call- back procedure for those not available at first (if you have to travel 300 kilometres to see someone and they are not there, will you really go back and try again next week?), and so leads to an even greater possibility of bias in the sample. Also, if the research takes a
Surveying the field 93
long time to complete then the nature of the phenomenon under investigation may change (owing to new legislation, the natural ageing of children, etc.).
If the time problem is solved by having a team of researchers working in parallel, the design now needs to ensure consistency between them in terms of their administration of the questions. It is clear even from work with tight experimental designs (see Chapter Eight) that the presence of the researcher can give unconscious cues to the other participants. This point is even more important in the more relaxed design of a survey. Respondents will react to the appearance, manner, body language and tone of a face-to-face interviewer in a way that is simply not possible using other methods of delivery. Other actors in the scene can also play a part. I once piloted a household questionnaire that asked the householder about the number of life partners they had had prior to their current relationship. It became quite clear, and seems obvious in retrospect, that the presence of their current partner making us coffee in the next room was creating a constraining influence. Finally, and quite importantly for the individual novice researcher, there is the personal safety aspect. Although unlikely, respondents might be abusive or threatening and, whereas abuse by letter or telephone is unpleasant, such a breakdown of communication face-to-face is extremely alarming. All of these potential advantages and disadvantages could be taken into account before a decision is made on the most appropriate compromise.
Self-administeredd Self-administered (by the respondent) questionnaires are usually mailed. There are also considerable opportunities for dropping off and collecting forms in batches at institutions such as hospitals or schools, thereby reducing the cost of postage and travel. If the respondents complete the survey form themselves there are several key advantages. There is much less of the reactivity effect or interviewer bias that can be created by the presence of someone who has a vested interest in the results. It can be arranged that the responses are not only confidential (which is standard practice) but also anonymous (so that even the researcher does not know to whom each returned form belongs). This can help create an atmosphere of trust, and therefore lead perhaps to more truthful answers. This method of delivery is easier if the questions come in batteries of similar types with the same scaled response (e.g. from agree to disagree), or where the list of possible multiple-choice
94 Quantitative Methods in Social Science
responses is very long. Both of these designs are difficult to handle efficiently face-to-face without resorting to at least some elements of self-administration (such as show cards). Self-administered questions can also be created in a form, such as optical marks, that are already computer-readable, so avoiding the time and potential errors involved in coding and transcription. They can also be sent and returned via email with many of the same advantages (see below).
There have been claims that the average response rate to postal surveys is low (20% perhaps), but these claims tend to conflate figures from market 'research', which are generally lower than those from academic studies. If you follow the advice given in this chapter you should obtain much higher rates than those generally quoted. Aim high. Bernard (2000) suggests there is little real difference between the response rates for face-to-face surveys (80%) and for well-designed mail surveys (73% +).
If the researchers are not present at administration they cannot check the identity of the respondent or for frivolous treatment of the questions (Gillham 2000b). They cannot preserve the order of reading the questions, and therefore the secrecy of later questions, and they are not available to explain the meaning of questions or to answer questions about the use to which the data will be put. Self- administration is clearly impossible for those unable to read or write effectively.
Technology-based To a large extent, the use of ICT and technology-based delivery represents a compromise between the previous approaches. This is most obvious in the use of telephone surveys where no travel is involved and the interviewer is depersonalized to some extent (although not in terms of accent or speech patterns), but is still available to explain questions and help motivate the interviewee. Although there is a charge for telephone calls, the cost of these is falling relative to mail delivery and travel, and with an appropriate sample can actually lead to the cheapest form of questionnaire delivery. There are no problems of gaining access via security guards, receptionists, 'doormen' for apartment blocks, etc. The use of random digit dialling is very convenient, does not require a list of telephone numbers to start with, and can avoid the bias introduced to such lists by ex-directory and other unlisted numbers. As your research career takes off and you find yourself running a survey with several investigators, the use of a telephone schedule with a
Surveying the field 95
switchboard will allow you to monitor centrally the quality and consistency of the work of each interviewer in your team.
Email approaches are even better in some respects, leading to cheaper use of telephone lines (or digital television), easier access to worldwide samples and at present an atmosphere of camaraderie and friendly informality. Response rates to email surveys may also be better than by telephone (Selwyn 2002). Selwyn and Robson (1998) cite examples of 50-90% response rates using email and they compare this to rates of 20—50% in conventional mail surveys (Frankfort-Nachmias and Nachmias 1996). The times taken to respond are excellent (almost instantaneous) and the responses can be returned in an already computer-readable format.
The disadvantages of using technology to collect your data are relatively obvious. Random digit dialling cannot distinguish between the number of telephone lines for each area code, thereby over- representing those from rural areas. Not all potential respondents have a telephone and not all who do have a telephone appear in published lists of numbers. In Wales, for example, as many as 10% of people in the year 2000 did not even have access to a shared public payphone at home (Gorard et al. 2000b). As many as 67% of people in Wales in the year 2000 did not have access to a computer (never mind whether it is email-capable) either at work or home. In addition and in general, those who do not have access to telephones and computers are systematically different from those who do. Any of these forms of technology are less likely to be available to those who are older, unqualified or economically inactive. The potential bias from this is considerable when using email in particular, so in order to obtain the response rates suggested above, any remotely delivered questionnaire must be brief (see the example below). The choice of email as a method is also not simply about delivery; because of the tendency of respondents to use a simplified register of language, or even symbols for expressions, the form of data collected is altered (Gorard and Selwyn 2001). In addition to all of these problems, anonymity of the sort possible in mail surveys is often just about impossible (else how do you know the telephone number or email address?), but much of the tacit information available face-to- face is lost.
If I had to have an overall preference it would be for self- administered questions delivered either by mail or, preferably, by the researcher to natural groups of respondents (such as school classes, see Gorard 1997b). This approach is generally better if the respondents are literate and well-motivated, and have no clear need
96 Quantitative Methods in Social Science
for individual attention. Whichever method you consider for the delivery of your questionnaire, bear in mind the issues described above, such as cost, time, geography, length, complexity, control of the question order, visual aids, the use of respondents' personal records for reference, rapport, sensitivity, sample bias, response rate, response bias, knowledge of non-responders and so on. Perhaps a combination of methods can maximize the advantages to you as the researcher.
TYPE OF SURVEY
Another topic for brief consideration before we get to the design of the actual questionnaire instrument is the type of survey you are planning. You may find thereby that there are constraints imposed upon your design. What is the goal of the survey? Is it, for example, to describe something accurately or to test one or more hypotheses? Is it to be a one-off snapshot of a certain period or will it collect historical information? Is it to be repeated in the future or will it repeat questions from previous studies?
A longitudinal (repeated) survey allowing prolonged study of the lives of one group of respondents has many attractions. Data from such a study could be richer, may be more accurate and could help us to understand the process of change over time. However, it would also be expensive and time-consuming and might entail many compromises. It can lead to complex statistical problems, so longitudinal data is often collapsed into a format of one or more cross-sections, or 'snapshots', for analysis anyway (Crouchley 1987). Long-term studies also suffer from respondent attrition, with the result that even the best ones may end up with an overall response rate that clearly suggests bias through self-selection in the sample (Dolton et al. 1994). For example, Banks et al. (1992) had response rates of between 60% and 70% for the first sweep of their study, but if this response rate was similar on each occasion that they attempted to contact the original respondents in successive years, then the overall response rate for the third sweep could easily be less than 25% of the original target sample. The respondents in the Banks study became proportionately more middle-class in each wave. Similarly, only 45% of the respondents in the Youth Cohort Study took part in all three sweeps to 1991 (Whitfield and Bourlakis 1991). Long-term studies also face a threat to internal validity coming from the necessity to test and re-test the same individuals (Hagenaars 1990, see also Chapter Eight).
Surveying the field 97
One way around this is to use a trend design collecting data from different groups for each sweep, but this design does not allow a consideration of change in individuals. Also, since the second sample is not from the same population as the first, in the statistical sense, then this causes problems in looking for changes in population parameters over time. A compromise, which might have the advantages or the disadvantages of both, is to use a rolling sample, whereby a proportion of the sample for each sweep remains longitudinal. For example, the Labour Force Survey (see Chapter Two) contacts 80,000 households every three months, of which 60,000 have also been used in the previous quarter.
Longitudinal studies also face problems of comparability over time (Glenn 1977). In educational research, for example, the modes and titles of certified public examinations change over time (Gorard 200 ib). Even where equivalencies between them are established it is not clear that their value-in-exchange actually remains constant. An A-Level may have meant a lot more in 1970 than in 1990, not because it was any harder to attain, but simply because there were fewer of them. Similar issues arise in most fields of research. However, such considerations are even more difficult for a long-term study since the instrument to be used for all sweeps has to be designed before the changes that it needs to encompass (a nearly impossible task), or else has to be changed between sweeps, exacerbating comparability problems and opening researchers to the charge that the study is not actually longitudinal as the questions have changed.
A retrospective study, asking respondents to recall past events, has the advantage of hindsight. A retrospective study, as opposed to a simple cross-sectional study, also avoids many of the other problems noted above. Retrospective employment histories, such as the 1984 Women and Employment Survey (Martin and Roberts 1984), and learning histories, such as the National Training Survey 1975/76 (Greenhalgh and Stewart 1987), are much used by economists (McNabb and Whitfield 1994). They are not, of course, immune to criticism since a wide range of life variables and events may be difficult for the respondents to recall (although the use of household records can be encouraged). Among these variables are attitudes, which are notoriously unreliable post hoc, and some figures such as income and health measurements.
All of these factors need to be considered before designing an instrument for replication, retrospective or longitudinal work, or a snap-shot picture (and they obviously also have implications for drawing the appropriate sample).
98 Quantitative Methods in Social Science
INSTRUMENT DESIGN
Before writing actual questions it is useful to consider the overall design of your questionnaire instrument. Perhaps the most crucial issue here is the order in which items will appear. This applies to the order of the questions in each section and the order of each section within the whole. A good example of the importance of the first of these points is provided in the novel Yes Prime Minister (Lynn and Jay 1986), where the prime minister's cabinet secretary is demonstrating to a colleague how surveys can be designed to produce whatever result a government official wants. If, for example, the government want support for their plans to reintroduce compulsory National Service in the armed forces, they might ask their sample the following series of questions.
1. Are you worried about the rise in crime among teenagers? 2. Do you think there is a lack of discipline and vigorous training in
our schools? 3. Do you think young people welcome some structure and
leadership in their lives? 4. Do you think young people respond to a challenge? 5. Might you be in favour of reintroducing National Service?
Here, there is strong encouragement to answer 'yes' to question 5 to maintain consistency if you have answered 'yes' to the previous questions. Then, of course, only the responses to question 5 are published under the heading 'Majority of public support National Service'. If, on the other hand, opponents of the government wish to obtain a contrary view they might ask the following series of questions.
1. Are you worried about the danger of war? 2. Are you unhappy about the growth of armaments? 3. Do you think there's a danger in giving young people guns and
teaching them how to kill? 4. Do you think it is wrong to force people to take up arms against
their will? 5. Would you oppose the reintroduction of National Service?
Here, there is strong encouragement to answer 'yes' to question 5 again, even though it now says the opposite of the version above. Again, only the last responses might be used and published under the heading 'Majority of public oppose National Service'.
Surveying the field 99
Now, I am not advocating that either of these approaches be used or that you use leading questions at all. But this example does show how sensitive our responses can be to the precise ordering of questions in a questionnaire. Other than being aware of the problem, the best defence may be to use more than one version of your questionnaire with differing question orders. You can then allocate these versions randomly to your sample and analyse their responses in terms of the sub-groups faced with each version. If there is no obvious difference in the response patterns between groups then you can report with some conviction that order has been eliminated as a possible confounding variable in your results. If there is a difference between responses to different versions, then at least you can use this difference as an estimate of the size and direction of the bias.
The sections of a typical questionnaire might include an introduction (to secure the cooperation of respondents), a question or two about the respondent (as a selection, identification or quota check to make sure you are addressing the right person), the substantive questions (about the research) and background questions (concerning respondents' personal characteristics). This list is in a logical order. The introduction is first. The selection check ensures that no time is wasted answering questions unnecessarily. The substantive questions come next as they are the most interesting and are, after all, what the respondent has agreed to answer. The background questions come last because, although important, they can appear intrusive. Therefore, having them at the end encourages people to start the questionnaire, and once started they are more likely to complete the task. It also means that even if they drop out at this section you still have their responses to the substantive questions (and you may not need background data from everyone).
The introduction should be brief and easy to follow. It might contain the purpose of the study, who is conducting it, who is paying for it, why it is important, what will happen to the results and why the respondent has been selected. Rather than having a complex introduction, it is preferable to use a separate covering letter. This letter could briefly explain the nature and purpose of the study, how the respondent was selected, why their help is needed and how to return the completed form (or even the incomplete one). If you know the respondent's name it is probably better to use it, but reassure them of the confidentiality of their answers (so if the form has any identifying marks these should be explained). Some authorities suggest using stamps on the pre-addressed return envelopes. This can be expensive, especially for an unfunded study.
100 Quantitative Methods in Social Science
Alternatively, try and arrange to use FREEPOST (through your department perhaps). In this way you will have to pay postage only on those forms returned, and potential non-respondents will not be tempted to steam off your unused stamps.
If possible, do not put any questions on the front cover, but have a title and the name and address and lots of space. Similarly, on the last page you could have a simple word of thanks and lots of inviting space for any open-ended comments on the survey as a whole. Although the use of incentives for completing the form are sometimes advocated, I prefer to encourage a full response by making the instrument easy to complete and stressing the value of each response to the study. It is also useful and courteous to offer to supply all respondents with a summary of your eventual findings. Curiosity about research is a key motivator, especially in areas of public policy like health, crime and education where everyone feels they are an 'expert'. Generally, the use of the words like 'University' early on in the document are useful to establish that there is no sales or advertising threat to follow. For similar reasons, words such as 'study' and 'research' are more attractive to respondents than 'survey'. The use of photographs or elaborate logos on the front page is dangerous. Whatever you intend them to signify, such illustrations carry multiple messages and are easily misinterpreted. The first substantive question in the instrument should be relevant to all respondents (since if it applies only to some then this can be demotivating for the others), easy and interesting (so put harder or duller questions towards the end), but non-threatening and probably closed in format (see below).
I recommend a questionnaire of eight core pages as a maximum, preferably less for self-administered instruments. Or looked at another way, do not go much above 100 separate questions (and even this figure presupposes that most questions use the same response format). Use a standard paper size (A4 in the UK), printed in black on a white background (although some authorities suggest that light green is the most attractive paper colour). Questions should be grouped as far as possible into topics, with spaces between them. Each question should have no more than two sentences of instruction, and a different typeface should be used for instructions and questions. Using a different typeface to emphasize instructions is a good idea, as long as both typefaces are similar. Varying between capitals, bold, underlining or italics, or even a different font size can be effective — the instructions in capitals and questions in lower case, for example. Use a normal-sized reading
Surveying the field 101
font (12 or 14 point). Changing the font entirely (e.g. between Times and Courier) rarely works aesthetically.
Minimize, or eliminate entirely, the use of skip and filter questions or branching instructions that ask respondents to move to a question other than the next in sequence. I have seen branching questions go badly wrong with even the most motivated and educated respondents (try getting all under- graduates even in their final examinations to read and follow an examination rubric that is not like the one in the sample paper!). For similar reasons, although it is tempting to save paper, use only one side of each sheet. I once found that the responses to a six-side questionnaire from an entire school covered only the three sides that faced them as they flicked through the instrument. That was a very false economy for me. Again, for similar reasons do not split a question between pages.
Like many readers (I suspect) I very rarely use the grammar checker on my word processor proactively. It just appears to insist on criticizing my use of the passive voice. The one occasion on which I would thoroughly recommend it to the full is when designing a questionnaire. The grammar involved should be simple and clear, the spelling standard for your audience, and the meaning of each sentence easy to understand. Most checkers will provide you with a readability report, including measures such as a Flesch Index of readability. Make sure that the questionnaire is of an appropriate readability for the age and literacy of your entire target sample. If you are working in one language and translating your instrument into another language before completion (a common process for overseas students), then use the technique of back- translation as well. In this, the translated version is translated back into the original language by a third person as a check on the preservation of the original meaning. See Birbil (2000) for advice on what to do if there are still deficits in the translation.
Finally, where possible it is useful to have the responses pre- coded on the actual form, but also to allow space for respondents to make further comments (which are often the most interesting part of the response). Above all, do not cut corners. If, and this is a big 'if, a survey is to be your main method of data collection then you need it to be successful. Don't be mean with the photocopying, paper or postage. If you cannot afford to carry out a proper survey, then do not attempt it.
102 Quantitative Methods in Social Science
QUESTION DESIGN
As with projects involving secondary data (see Chapter Two), it is important to realize that you do not have to start any questionnaire from scratch. Many questions are 'old favourites' (see below). Also, many instruments are available commercially and many are available from academic and other public archives. For example, the ESRC Data Archive (University of Essex, Colchester, Essex CO4 3SQ) has the complete instruments used in much of their publicly funded social science research in the UK. The Centre for Applied Social Surveys (CASS, National Centre for Social Research, 35 Northampton Square, London EClV OAX, UK) has a large question bank formed from past studies (whose current address is http://qb.soc.surrey.ac.uk). The Social Survey division of the Office for National Statistics also have on-line information on survey methods and quality, and they publish a methodology bulletin twice per year (http://www.statistics.gov.uk/ssd/default.asp). The advan- tages of using such previous instruments and questions are clear. The instruments will have been piloted and used before, probably on a far larger scale than you could envisage. They will be mature and ready to use. They may carry some extra authority for your readers. Most usefully, they will enable you to compare the responses in your study with those gained previously, to show changes over time or between locations perhaps. Looking at other questionnaires also helps you see what is good and bad about them, and this should give you confidence since even many famous instruments look terribly imperfect in retrospect.
Good question design is the key to easy survey analysis. You do not commit yourself to any particular form of analysis just by thinking about it before designing your questions, but you do restrict the kinds of analysis available to you by the design of your instrument. Therefore, as I have already emphasized, consideration of analysis is more like the first rather than the last stage of research design. You do not want to ask any question that you cannot analyse, otherwise you will waste resources in preparing the question, waste the respondents' time in answering it (so endangering the response rate) and waste more of your time coding and entering the responses. Even worse, you may need an answer to a particular question, but have asked the question in the wrong form (or even the wrong question). Each question should therefore have an explicit purpose. Once you have formed the question you need to consider whether the respondents could know
Surveying the field 103
the answer, can report their answer, whether they would want to answer or whether they might be tempted to lie or pretend, or whether they would be in a rush and so make a mistake? Thinking about these five issues might then lead you to change the format of your question.
There are many different forms of questions. They include requests for information (such as 'how many...?'), tick-box categories ('yes or no'), multiple choice ('which of these...'), scales ('how strongly do you feel...?'), ranking procedures ('put the following in order'), grids or tables (for multidimensional questions) and open-ended questions. Each of these is discussed below:
One of the biggest problems you will face in designing a question is likely to be that you end up using the wrong metric or level of measurement (see Chapter Three). This will affect the power and type of statistics that you can use later. You can sometimes convert from one scale to another but this can introduce bias and measurement error, so it is better to ask the questions in the form in which you are intending to analyse them. At best, using the wrong metric loses power and is therefore equivalent to using a smaller sample. Poorly designed questions therefore have much the same effect as throwing away responses.
The best metric to use is a real number such as age, number of children or years in employment. This generally allows the use of all/any statistical tests, including the most powerful. The weakest but the most common formats in social science are categorical variables, such as gender or family religion. Sometimes these categories are artificially devised, such as occupational class (see below). If the use of categories is unavoidable, then I advise keeping the number of categories per question to a minimum. Thinking that using more categories leads to greater accuracy is a fallacy, and I have too often seen students collect answers on a seven-point scale and immediately collapse the responses to the three-point scale they intended to use all along. Why bother, and why make respondents worry about seven points?
Open-ended questions Perhaps the easiest types of question to design are those using an open-ended format. They are easy because they are the most natural way of expressing a question in everyday conversation. This ease does not necessarily make them the most appropriate for a questionnaire, but it may tempt researchers to over-use them. Their biggest drawback comes when they are subjected to systematic
104 Quantitative Methods in Social Science
analysis. Simple closed scales (such as those described below) mean that the respondent is the main source of measurement error, but open-ended questions with post hoc classification of the results adds another layer of measurement error due to the researcher. Open- ended questions are best used in two situations: where it is already clear how the responses will be analysed or where the responses will be used not to create a statistical pattern, but to help explain it.
Such choices of question design are far from trivial. Farrall et al. (1997) found that the reported fear of crime was much greater in surveys using closed rather than open-ended questions. Therefore, the results of your study depend on more than simply its face validity (i.e. looking like the right question). People may also respond more sensitively to open-ended approaches. Since there may be so little similarity between responses to forced-choice and open-ended questions it is probably advisable to mix the types of questions in any instrument. Vocabulary and precise phrasing are also more generally important in question design. In a large survey in the US it was recorded that a much larger number of people were in favour of assistance for the poor than were in favour of welfare. The terms you use should be neutral, as far as possible, and familiar but not patronizing. This can lead to problems when you are dealing with very different sub-groups such as parents and their children. Should you change the wording for each group, and run the risk of asking different questions of each, or should you find a common wording and run the risk of patronizing one group?
In early studies of school choice, which tried to identify the reasons reported by families for using a particular new school, there were two main approaches. These involved giving respondents a list (or menu) of choices or else giving them a blank sheet and asking them to list their reasons. Where a list of these potential reasons is presented to respondents for them to tick or rate as appropriate, the list is usually incomplete, not containing all possible reasons for choosing a school. This can lead to serious omissions in the responses, which may well bias the study (Kim and Mueller 1978) by making other criteria appear more important than they truly are (Maddala 1992). For example, a survey by Dennison (1995) used 25 choice criteria but excluded religious preference and the size of the school, which have both been shown to be important to some families in other studies.
Direct evidence of the importance of such omissions from a questionnaire comes from my own study of choice (Gorard 1997b). In one of my focus schools, I mistakenly issued a set of
Surveying the field 105
questionnaire forms with one page, containing 25 of the 73 suggested reasons, missing. The criteria accidentally left out included 'good public examination results', 'firm discipline' and 'small classes', which were all found to be very important overall. Although there was a section for respondents to write any other reasons not covered by the list, not one of the affected respondents suggested any of the 25 missing reasons, and so presumably without the prompt did not notice their lack. This would have the effect of increasing the apparent importance of other variables. Yet few researchers in any field can truly claim that they have tried to make their lists as complete as possible, and it is strange that this phenomenon is not more widely discussed in the literature. A pre- fixed list may also suggest reasons to respondents which they might, in retrospect, feel are important, but which they did not, in this example, consider at the time of making a selection of schools.
On the other hand, the method of asking respondents to create their own list of reasons for choosing a school (for example) by asking an open-ended question relies more heavily on the imperfect memory of the respondents, will over-represent the views of the more literate and highly motivated (Payne 1951) and is likely to produce as many differently worded responses as there are respondents (Oppenheim 1992). This makes them very difficult to analyse. Some groups of respondents, those with the most education for example, may produce more reasons each. Therefore, even if all reasons can be assumed to be simple and unrelated constructs, which they patently are not, but which should be a necessary precondition for their frequencies to be computed, they cannot all be given equal weight. It is not reasonable to assume that both of two reasons given by one respondent are each as important as one reason given by another. Neither can it be assumed that each is only half as important. Such considerations begin to give a clue to the complexity of the analysis of open-ended questions.
If you want to collect real-number answers (in many ways the ideal), then a simple form of open-ended question is one aimed at the apparently straightforward collection of facts. Examples might be, 'How many years have you worked in this factory?', or the simpler, 'How old are you (in years)?', or even simpler, 'In which year were you born?'. In each of these examples the respondent simply writes a number. Three common problems with this type of question are lack of clarity, lack of knowledge and intrusiveness. Lack of clarity can usually be sorted out at the pilot stage. One example would be lack of clarity about the units involved, such as in, 'How tall are you?'.
106 Quantitative Methods in Social Science
Should the answer be in feet or metres? Another would be lack of clarity about parameters, such as in, 'How many people are there in your university?'. Does this mean today or on the roll? Does it mean students or staff or both? Does it include service staff? If the question is, 'How many schools have you been to?' does this refer to attendance as a student or visits? Lack of knowledge arises when you ask someone about something they cannot possibly answer. Most children do not know their parents' incomes, for example, and many parents would not know the full range of subjects taken by their children at school. Some commentators believe that direct questions such as these are anyway very intrusive, and suggest that closed questions should be used instead. People may find it easier to tell you their annual income to within a certain range than to give you a figure, either because they do not know exactly or because they do not want to tell you.
Closed questions Close-ended (or closed) questions are somewhat harder to design well than open-ended questions but should then be much easier to analyse. The reasons why they are hard to design can be experienced in those semi-serious tests that appear in magazines with titles such as, 'How compatible are you?' or, 'Are you a thinker or a doer?'. Whenever I attempt one of these (only at the dentist's, obviously), I can hardly answer any of the questions since all of the possible responses are not right for me. Imagine possible answers such as, 'Do you a) whisk your partner off to Paris for the weekend; or b) sulk for the next three weeks and then buy your partner some chocolates?'. You see my difficulty. What if it is Rome and not Paris, or only one week, or a CD rather than chocolates? What if my response is something completely different? Of course, these are trivial examples but even 'proper' research can lead to questions that appear to exclude the very people they are aimed at by denying them the chance to tell us what they know. Closed questions should ideally be as inclusive and flexible as open-ended ones. Herein lies their difficulty.
Make sure that each question allows for all possible responses, but without overlap. This would usually involve adding categories for 'don't know' (in my opinion a perfectly valid answer to most questions) and for 'other, please specify'. You should try and make this last option of 'other' redundant by making other categories as inclusive as possible, but still retain it as a fail-safe (at least for your pilot study). Consider the difference between these two versions of the same question.
Surveying the field 107
a) What is your highest A-Level or equivalent (or above) qualification? GCSE or equivalent
None
b) What is your highest A-Level or equivalent (or above) qualification? GCSE or equivalent
None Don't know Other (please specify) ...
While neither version is perfect the second is preferable to the first in allowing everyone to answer something, whereas the first will lead to some null responses.
Avoid also the use of negative statements if possible (which are surprisingly confusing) and double-barrelled questions (or two questions in one). Making questions easy to answer involves avoiding hypothetical situations, jargon, technical language and ambiguity. Avoid the danger of assuming a falsely shared premise. To aid recall by your respondents do not ask for more information than you need (or than you are intending to analyse). If, for example, you wish to know how many different jobs a respondent has, then it is not necessary to ask him to list all of his previous jobs.
The following example questions could all lead to problems. The first does not allow respondents to separate their reactions to the two parts of the question. The second (very common in style) is asking something that most people would have no evidence about and therefore should not answer. Note, however, that the added danger of asking such questions is that people may respond even when they have no knowledge. The third is ambiguous. Does it refer to the respondent or to his/her partner as well? Does it refer to each child separately or to all of them together? Does a verbal reprimand count as punishment? And so on.
c) How do you rate the new government for achievement and presentation? [high/medium/low]
d) Are people better educated today than 10 years ago? Yes/No
e) How often do you punish Never your children? Monthly
Weekly At least daily
108 Quantitative Methods in Social Science
Although it is also usually recommended that questions are not loaded', this technique can occasionally be useful to provoke responses in difficult situations. If this is what you intend then build it into your design and your later description of the method used. An example might be when you know that respondents have been selected because they have a characteristic that they may wish to cover up, and you want to let them know both that you know about it and that it is all right (e.g. 'How many times have you been arrested?).
Scales By 'scales' I refer here not just to closed questions in general, but to the use of batteries of similar format questions using a standard scale aimed at the indirect measurement of an underlying concept. A very common example of such a concept would be attitude. I have already stated that, in my opinion, questionnaires are not good at gathering anything other than the most straightforward information about respondents. Therefore, it should come as no surprise to realize that I am not a great fan of this particular use of scales. I will not go into great detail here, but for those interested there is further discussion of these in Oppenheim (1992) and elsewhere.
Complex scales are multiple indicators, often used to measure things like stress, political stance, attitudes or prejudice. They should only be used when a single or even a proxy (substitute) measure is not possible. Their use requires considerable care, since we are not even sure exactly what these things are (if they exist), and are even less sure how to measure them. Simply putting a lot of similar questions together and treating the responses to each question equally (as in scoring a multiple-choice examination) does not automatically lead a social scientist to an underlying variable. There is a lot of make-believe in this technique, since multiple responses are not necessarily any more accurate than a single one. A good multiple scale requires a lot of work and much testing. Their creators often use ordinal scales such as 'strongly agree' to 'strongly disagree', in which respondents, especially less-educated ones, have a tendency towards agreement whatever the associated statement. These responses in ordinal form are then often treated as real numbers (see Chapter Three), which has led to 'intellectual pollution' in the opinion of some writers (e.g. Mitchell 1994). Mitchell claims that the legacy of Spearman (a famous statistician) is a pseudo-science, combining contempt for real information with a worship of false quantification, and ignoring the fact that epistemology and logic are more
Surveying the field 109
important than statistical technique. The users of complex scales are therefore often like the second Villain' in Chapter One, determined to work with numbers at any cost, and convinced of their authority regardless of their substantive meaning (see Prandy 2002).
The old favourites Many questionnaires you see or design will ask standard background questions about the age, sex, social class and ethnicity of the respondent (and perhaps also the family religion). These are some of the old favourites of social researchers, because they can almost always be relied upon to point up systematic differences in the responses to the more substantive questions. There are not many large-scale studies that do not report differences in employ- ment, educational attainment, attitude, participation, or confidence in terms of young and old, men and women, middle and working class, or white and a minority ethnic group. Questions for the first two of the standard questions are relatively simple to devise. If you feel that asking people their age is too intrusive you could ask instead for their date of birth, or year of birth if that is all you really need. Practical problems arise in forming questions about the other two, so much so in fact that I have never seen (much less devised) a satisfactory version of either question. This may be partly due to lack of clarity in the concepts and the lack of an agreed meaning for either term. On reflection, what is astonishing is that, despite these flaws, the many systematic differences between these groups (however they are defined) are so great that even a poorly designed question will usually identify them.
Many social class schemes are actually based on occupational prestige. Until 1971, the UK Registrar-General's class scheme used in the population Census and other official figures was an ordinal classification of occupations according to reputed standing in community (Rose 1996). In 1980, this notion of prestige was exchanged for levels of skill, which sounded more objective but were in some ways more confusing. There are other scales in common use, based on both nominal and continuous variables (in particular look at the Cambridge/Cardiff scale, www.cardiff.ac.uk./ socsi/camsis, for a radically different approach). There are also Standard Industrial Classifications (SIC) and Standard Occupational Classifications (SOC), appropriate for different purposes. However, the RG scale remains the most widely used. Originally designed to relate to measures of infant mortality and adult fertility, the traditional scale looked like this (Table 5.1).
110 Quantitative Methods in Social Science
Table 5.1: The Registrar-General's class scheme
I Professional occupations (e.g. medical doctor, lawyer) II Managerial and technical occupations (e.g. company director, teacher) INN Non-manual skilled occupations (e.g. clerical assistant) HIM Manual skilled occupations (e.g. craftspeople, plumbers) IV Partly-skilled occupations (e.g. lathe operator) V Unskilled occupations (e.g. litter collector)
As can be seen, this list represents a mixture of both skill and occupational prestige. For many analytic purposes you may prefer to work with only three divisions — Service class (I + II), Intermediate class (IIIN + HIM) and Working class (IV + V) - since this may lead to fewer difficult decisions in classifying cases and produces more cases per cell for analysis (see Chapter Six). The scale is primarily male in focus, and thus works less well with what are predominantly women's jobs. Using the scale for women tends to inflate their class since fewer are involved in manual work. It is questionable to suggest that simply working in an office makes a person middle-class (Intermediate). The scale also does not recognize unpaid labour, and makes it difficult to classify those without employment. The newer social class categories introduced in 1998 are based not on skill or prestige but employment conditions, and so overcome some of these problems. This 'socio-economic classification' generally makes it easier to classify the jobs of women, by giving less emphasis to the distinction between manual and non-manual jobs (Table 5.2). Where people do not have a job, you can ask them about their usual occupation or about the occupation of their parents. You will also need a category of 'Unclassified' for students and for no valid response. A self-coding version of this is available from National Statistics (see Chapter Two), suitable for postal surveys in which you do not want full details of the respondents' jobs (following the principle restated throughout this book of not asking for unused detail, but rather asking questions in the format that they will be analysed).
The other standard question that gives the researcher a great deal of trouble but which is worth persevering with relates to the ethnic background of respondents. There is perhaps even less agreement about what this constitutes than there is about social class. Again, the standard question would be based on that used by the Office for
Surveying the field I I I
Table 5.2: The Registrar-General's class scheme 1998 (used 2001)
1. Higher professional and managerial occupations a Employers and managers, company directors, health service and bank
managers b Higher professionals, university and college lecturers, scientists, doctors,
teachers, librarians, social workers, clergy 2. Lower professional and managerial occupations, laboratory technicians, nurses
and midwives, journalists, artists, actors and musicians, police 3. Intermediate professions, secretaries, dental nurses, electrical equipment
installers, piano tuners 4. Small employers and own account workers, farmers, publicans, restauranteurs 5. Lower supervisory, craft and related jobs, plumbers, butchers, train drivers 6. Semi-routine occupations, shop assistants, security guards, hairdressers 7. Routine occupations, waiters, cleaners, couriers 8. Never worked, and long-term unemployed
the Population Census (Table 5.3). As can be seen, this list is a peculiar mixture of skin colour, other racial characteristics, country of 'origin' and primary state religion.
The situation was improved somewhat in the classification for the 2001 census (Table 5.4), largely by the addition of the 'mixed' category. It is still not clear whether a respondent with white skin born in India would be 'Indian' or 'White other'. With the addition of 'Irish' (and 'Scottish' in Scotland, but not 'Welsh', in Wales) as opposed to 'Born in Ireland', it is no longer clear whether these categories are intended to be based on area of birth, residence, language or self-attribution. Can someone be Black and Irish (or Welsh) for example? The same applies to Asian British and Black British. Is 'Indian', for example, a description of birthplace, parental birthplace, or something vaguer? How can 'British' be a sub-set of White and also a modifier for 'Asian British', for example?
Table 5.3: Ethnic groups 1991 census
Main group
White Black groups Indian sub-continent Chinese/other groups Born in Ireland
White Black Caribbean Indian Chinese Born in Ireland
Black African Pakistani Asian other
Black other Bangladeshi Other
112 Quantitative Methods in Social Science
Table 5.4: Ethnic groups 2001 census
Ethnic group
White Mixed
Asian or Asian British Black or Black British Chinese or other
British White and Black Caribbean Indian
Caribbean
Chinese
Irish White and Black African Pakistani
African
Any other
Other White White and Asian
Bangladeshi
Other Black
Other mixed
Other Asian
Most crucially, how mixed does one have to be to be classified as mixed? Are we not all mixed to some extent? Consider the fact that as I have two parents, four grandparents, eight great-grandparents and so on, then 40 generations ago I had 2 ° antecedents, or over one trillion (one thousand billion) people. If each generation, for the sake of argument, reproduced on average after every 25 years, then 40 generations represents 1,000 years. Therefore, I had more ancestors 1,000 years ago than there were people alive at that time (more even than everyone who has ever been alive). Put another way, as recently as 500 years ago (the era of the Tudor monarchs and 'discovery' of the USA perhaps), everyone in the entire world must have been related to me. The notion of 'pure' ethnic groups in terms of genetics or ancestry is therefore somewhat unrealistic. If, on the other hand, ethnicity is defined by our shared local cultures and patterns of behaviour, this means that a change of lifestyle (or country) could lead to a change of ethnic group (meaning therefore that we can alter our ethnicity by altering our circumstances). Perhaps the concept of ethnicity has become so complex and delicate that it has passed its usefulness. Yet, as with social class, however poorly thought out your question, the categories you use will appear to approximate to a social process so powerful that you will still find significant differences between them.
Other issues A further difficult issue relates to clearly sensitive questions. Often as researchers we wish to consider emotional and controversial topics since these are also often important and interesting. The key technique here is to be clear and unemotional in wording questions.
Surveying the field 113
My advice is, however long you make the preamble to a difficult question, keep the question itself short. Avoid all pejorative or leading words (even commonly used terms such as 'truancy' for unauthorized absence imply something about the views of their author). I once asked a large group of students how many had been present at or involved in committing a crime. None had. I then asked how many had been present at or involved in shop-lifting, speeding in a car or the use of illegal drugs. More than half had. Responses are horribly sensitive to the precise phrasing of the question. If you want respondents to be prepared to report dangerous or possibly incriminating matters, then a number of designs that have been worked out could help. How much help they actually are is something you can decide in your pilot study (see below).
A simple example might run as follows. If you wish to ask a difficult question, use a preliminary question such as, Toss a coin, if it is heads answer the next question, but if it is tails toss again and then put "yes" for the next question if it is heads and "no" if it is tails'. In theory, therefore, half of the people answering the next question 'yes' or 'no' are genuine and half are talking about their second coin toss. You do not know which is which (so their anonymity is secure) but you do know that the chances of heads or tails is 50:50. So you need to subtract a quarter of your total sample from the 'yes' responses and a quarter from the 'no' to the next question to be left with the genuine answers (assuming you have a large sample). Please note that I have never tried this and, although it sounds fine in theory, there is an awful lot that can go wrong. The question just seems too complicated to work in real life.
Other notoriously tricky questions involve grids or two- dimensional tables, and those questions where respondents have to rank a set of responses, and indeed any question where the respondent can legitimately respond more than once. These questions are often so difficult to analyse that they are not worth including even if they seem the natural way to ask the question. They are also difficult to complete, and so might endanger your response rate. I suggest you keep away from these until you are more experienced. I have not managed to make one work successfully and have resolved in future to find another way of getting at the same information.
114 Quantitative Methods in Social Science
PILOT STUDIES
All research designs need to be piloted or pre-tested, so the comments made here about surveys could apply equally well to experiments, observation studies, interview schedules and so on. Researchers are always working to a deadline and so the temptation to skimp on the pilot study is very strong. Resist this temptation, at least until you are more experienced. Pilots are sometimes misinterpreted as applying only to the survey instrument. Rather, a pilot study should be seen as a full 'dress rehearsal' for the whole research design. Thus, a good pilot study involves selecting a sample in the same way as intended for the final study, negotiating access in the same way, delivering the instrument in the same way, calculating response rates and analysing the results in the same way. Problems will probably appear at every stage. This kind of pre-test does generally have two main differences from the 'real thing'. It will involve a much smaller sample, making it quicker and cheaper than the final survey, and it involves asking participants some supplementary questions about the design itself, making it slightly longer and more complex again.
I recommend a two-stage pre-testing process. First, try your questionnaire out on experts, friends, family and anybody else you can bully into helping. Ideally try it out in face-to-face interview or focus groups with a few people from your intended population (but not from your sample). Ask for comments and criticisms. Note where people are hesitant or do not understand the question. Note carefully any non-responses. Consider whether there are any pressures to produce socially acceptable or desirable answers. In particular, note if the respondents' first reaction is not actually an answer to the question (often a clue to a design problem). Fix any problems. And there will be problems. Anyone who tells you it is all fine is either lying or cannot be bothered to help. Then pre-test again. Remember to date each draft of your questionnaire so that you know which is up-to-date, but keep the earlier versions in case you change your mind again.
Second, move on to the full pilot. Analysing even four responses in the way that you will in the full study forces you to design this stage early on (so that at least you will not come to my office in six months' time with a pile of questionnaires, saying, 'so, what am I supposed to do now?'). It will also help you face up to flaws. Are the respondents really able to answer the questions? For example, do people know how many litres of petrol they used last year or how
Surveying the field 115
many employees work in the same company as themselves? Or are they guessing to try and please you?
If the pilot leads to a few changes, you might then proceed to the main study. If, however, things go seriously wrong then the changes you need to make are so major that you will need to pilot the whole thing again. This is the social science equivalent of your aeroplane design crashing on its first test flight.
AN EXAMPLE OF A SIMPLE QUESTIONNAIRE
The example questionnaire below comes from a pilot project investigating the relationship between the use of digital technology and patterns of participation in lifelong learning (the work is represented by Gorard and Selwyn 1999, Selwyn and Gorard 2002). It was sent to all of the users of a particular Internet-based educational course, in an attempt to garner information about their background (Figure 5.1). It was sent by email (acceptable given the nature of the population), and completed interactively by the recipients (thus reducing transcription). Our primary concern was with widening participation, and we needed to see whether the kind of people using web-based instruction were different in any significant way from those following more traditional courses at the same level. In essence, has technology broken down the barriers faced by those previously excluded from learning in adult life? Or has it reinforced them? We already knew that patterns of participation in traditional adult learning varied by gender, age, location, employment, social class and prior educational attainment. Therefore, this is what our questions asked about. The temptation to include questions about the nature of their learning experiences and other superficially interesting matters was very strong. We resisted it because we added one final question: Would you be willing to be interviewed as part of this project?' It was in the follow-up interviews with a sub-sample that we decided to approach questions about attitudes, learner identities, the nature of barriers and possible tranformative experiences. The questionnaire was intended to elicit basic 'facts' only.
As a result of the responses and follow-up interviews, we made some modifications even to this simple design. The responses about usual occupation were hard to classify, and the use of equivalence levels in the question about qualifications was not a great success. Nevertheless, I include this simple instrument to make the point that questionnaires do not have to be complicated to be useful to the
116 Quantitative Methods in Social Science
INFORMATION ABOUT YOUR LEARNING 1. How often do you use the on-line
Welsh for learners website?
2. Where do you access the Internet from?
if 'elsewhere' please specify 3. Are you still in full-time education?
if 'no', how old were you when you left full-time education?
4. Which of the following levels best describes your highest qualification?
At least once per week Less than once per week No longer use it Home Work Elsewhere*
Yes No*
INFORMATION ABOUT YOURSELF 5. Sex
6. Date of birth 7. Postcode (or area name) 8. Are you currently employed?
9. What is your current or usual occupation?
Level three: 2+ A-Levels (or equivalent), GNVQ Advanced, NVQ3, OND, etc. Level two: 5+ GCSEs grade A*-C (or equivalent), 5 O-Levels, 5 CSE grade 1, etc. Level one: less than 5 GCSEs grade A*-C (or equivalent)
Male Female
Yes No
Figure 5.1: Draft questionnaire on background to web-based participation
researcher (as this one has been). The information we requested includes the key predictors of adult learning patterns derived from our previous work. We did not need, in this instance, to ask any more questions. We did need more responses (but that is another story!).
COMMON PROBLEMS IN QUESTIONNAIRE DESIGN There are many potential pitfalls in the design of a survey instrument, and several have been described in this chapter. Most can be avoided by careful proof-reading followed by a full pilot study. A selection follows.
Surveying the field 117
Asking the research questions Use of leading questions Making the instrument too long Asking pointless questions Use of offensive language
Asking the research questions Some novices become confused between their research questions, which define what they are trying to find out, and the questions they use in an investigation to answer those research questions. Research questions do not generally make good test items. Suppose, as a simple example, you wanted to know whether most employers believed that graduates were genuinely more multi-skilled than non- graduates. You could not use the following item in a questionnaire to a sample of employers.
Do most employers believe that graduates are more multi-skilled than non-graduates?
(please circle your answer) Yes No Don't know
Employers cannot answer for most employers, only for themselves. Your job as researcher is to aggregate the answers of many employers to decide what most of them believe. Even so, you probably cannot simply convert the question to, 'Do you believe that graduates are more multi-skilled than non-graduates?'. The question is still too much like the research question and therefore too complex. People may want to know more about what multi- skilling is or in what areas of employment this is meant to be relevant. People may feel resistance to answering either 'yes' or 'no', sensing that it is too extreme and wanting to assess different parts of a job differently. The proper development of survey items from research questions is a complex and rewarding business.
Use of leading questions I have regularly seen introductions to surveys that 'give the game away' by leading the potential respondent to answer in a certain way or share some unnecessary assumptions with the researcher. For example, I recently saw a letter addressed to heads of schools
118 Quantitative Methods in Social Science
starting, 'I am a student researching the current shortage of teachers ...'. One of the objectives of the research was to establish whether there was a teacher shortage (although the student- researcher clearly believed that there was).
Less common, as it is easier to spot perhaps, is where the lead is in the question (as in the legendary, 'When did you stop beating your wife?'). I have paraphrased the following question slightly for anonymity, but the example is a genuine one from a PhD student whose dissertation I was examining:
'How important is the quality of music teaching to you when assessing a new school?
1 some importance 2 medium importance 3 very important.'
This candidate, for whatever reason, could not conceive of someone not caring at all about the quality of music teaching when assessing a school.
Making the instrument too long All of us tend to make questionnaires too long. I have seldom managed to analyse all of the questions in a piece of survey research. Despite planning and piloting, some of the questions simply do not work. Working in a team makes the situation worse as each team member tends to have 'favourite' questions that he or she wishes to retain. All these problems exist and must be faced. What is absurd, though, is any desire for length for its own sake. One of the most ridiculous things I have ever heard concerned a PhD student who was repeatedly criticized during his pilot study for having an insufficiently long questionnaire. The complainants did not point out any key issues that had been omitted, merely claiming that the current length was suited only for a Masters project. In their opinion, a government-funded PhD project required a more substantial instrument. While clearly laughable, there is a little of this attitude in many of us. Resist it.
Asking pointless questions Typical problems here involve asking questions to which we already know the answer or asking for information that we can obtain more easily by other means. One example I have seen in real studies involved questionnaires sent to named individuals who had been selected on the basis of sex. The first question was, 'Are you male or
Surveying the field 119
female?". Another involved asking teachers at named schools how many pupils there were in their school, where this information could be more accurately obtained from official statistics (see Chapter Two).
Perhaps the most peculiar example of a pointless question I have come across occurred in a paper by Coldron and Boulton (1991). They asked one group of people for their own views, and for their views of the views of others and then concluded that the 'two' (sic) sets of views were related. Even though the researchers were interested in the views of the pupils, only 'parents . . . were asked to report their children's reasons for wanting to go to a particular school' (Coldron and Boulton 1991, p. 175). It is not clear, in this case, why the 11-year-old children were not felt able to speak for themselves. It is, however, hardly surprising that the authors concluded that 'from these figures it appears that children chose mainly on the same basis as their parents', since the two sets of views they were comparing were in fact both from the parents. A similar situation is evident in a study by West et al. (1995), in which parents were asked about their child's reasons for choosing a new school, and which found that 83% stated that the child wanted the same school as themselves. The inaccuracy of parent's and children's reports about each other has been shown several times (e.g. Pifer and Miller 1995), and so the value of findings like the two above are suspect.
Use of offensive language Clearly no sensible researcher would set out to use deliberately offensive language in a questionnaire, so all of the examples I have come across have been unintentional. Sometimes the use of offensive language is the result of a misjudged attempt at informality and therefore approachability. While a questionnaire should not be pompous or use long technical words inappropriately, it is probably best to stick to a relatively formal style throughout to encourage a serious frame of mind in the respondent. Sometimes the use of offensive language is the result of naivete or ignorance. Sometimes it is due to cultural or national differences. I have seen a question for teachers in the UK refer to a 'retard' or retarded pupil, and another for adults asking whether they were 'low class'. In both cases fashions in terminology had changed and made both questions seem unpleasant in tone. Be careful. Be up-to-date. I have seen questions use analogies and terminology from the drinking of alcoholic beverages in instruments for a general population
120 Quantitative Methods in Social Science
including Moslems. Why take the risk? Don't turn people away by your use of language.
This chapter has concentrated on the design of a survey instrument. For more on general survey design see Thomas (1999), Czaja and Blair (1996), Hakim (1992), Oppenheim (1992), Payne (1951) or Sudman and Bradburn (1982). See Bernard (2000) for examples of more esoteric survey designs. Chapter Six continues by describing some simple statistical techniques for analysing the kinds of data collected from a survey.