Lecture: Survey Usage and Finding Correlations
SURVEY USAGE AND FINDING CORRELATIONS Survey and Correlation Research Designs
Slide 1 Transcript
Surveys and correlations are usually found in qualitative approaches, although some specific forms may be in quantitative designs, as well. In this module, types of surveys, development of items used, and overall survey design and administration will be the subject of interest. The second component of the module is about correlation analysis. Included in both descriptive and inferential statistics, correlations describe the relationship between two paired variables. Surveys and correlations are regarded as nonexperimental research and will serve to cover that category among the three research designs introduced earlier in the course.
Survey Designs
Determine incidence, frequency, distributed characteristics
Opinions, attitudes, previous experience Self report Sample taken from larger population Census taken from smaller population Summarized as percentages, frequencies, indices
Particular point in time, like a snapshot
Categories of surveys Instrumentation (interviews, questionnaires) Span of time
cross-sectional longitudinal (trend, cohort, panel, follow-up)
Slide 3 Transcript A survey is a study designed to determine the incidence, frequency, and distribution of certain characteristics in a population. Some of the characteristics might include opinions, attitudes, or previous experiences using self-report responses. Surveys are either sample surveys or census surveys. Usually, a sample is taken from a population, often called a descriptive or normative survey. Census surveys are usually conducted when a population is relatively small and readily accessible. Once the responses are tabulated they are summarized in percentages, frequencies, or indices. Surveys often are depicted as snapshots, or moments captured in a particular frame of time. There are other types of surveys, e.g., geological or inventory types, but here we are focusing on those done with humans. Surveys can be categorized by instrumentation or by span of time involved. Instrumentation designs include interviews or questionnaires either of which might be conducted orally or in writing. The span of time needed to complete a survey might be cross-sectional for information at single period in time to compare variables, or longitudinal with multiple data collection points over an extended period to examine changes. A key aspect, then, is the number of times the survey is administered. Cross-sectional surveys have the advantage of providing data relatively quickly. However, cross-sectional surveys do not provide a broad perspective to inform decisions about reliably changing systems, or to understand trends over time.
Longitudinal surveys, though, require an extended period of time to study an issue. Longitudinal surveys are categorized into four types – trend surveys address a particular trait or characteristic, cohort surveys involve a population selected at a particular time, panel surveys follow the same individuals throughout the study, and follow-up surveys addresses change in a previously studied population.
Types of Survey Items
Existing surveys or items
Forms of surveys
Questionnaires (scales) Interviews
Structured surveys (closed-ended)
Limits responses available Common with interviews, but with have some open responses
Unstructured surveys (open-ended)
Difficulties with clarity and categories Partial structure
Slide 5 Transcript Because survey researchers often seek information that is not already available, they usually need to develop an appropriate instrument, i.e., questions. When a valid and reliable instrument already exists, researchers can use it, but what is more important is that the survey contains the appropriate questions. Surveys generally take one of two forms, questionnaires which a written collection of questions are put forward to a particular group of participants, or, interview which a person to person question and answer session between a researcher and a participant is. For questionnaires, no item should be included that is not directly related to the topic. Scales are often used in questionnaires, e.g. semantic differential or Likert, or ranked items, e.g., in order of importance. Most survey items are structured, which is to say they are closed-ended items and limit the response options available. Interviews are more likely to use structured questions but allow for open-ended responses. An unstructured item allows the participant complete freedom in their responses, and although they are simpler to construct, unstructured items present several challenges for the researcher. Sometimes respondents will not take the time to answer these items (especially with a questionnaire), or, they may give unclear answers. Also, scoring these items is more difficult and time consuming than with closed-ended items. This is due, in part, to the issue of having to categorize the answers and develop a procedure to decide which category the response should be in. A variation of the open-ended item is the partially open-ended one which restricts some answers but has items or choices where the participant can respond more freely. For interviews, which are more likely to use open-ended items, participants can wander briefly from the topic or include multiple responses which present difficult decisions about how to categorize the statements.
Rules for Writing Surveys
Questionnaires Attractive, brief, and easy Every item related to topic Provide point of reference
For questionnaires or interviews
Avoid leading or sensitive questions Avoid multiple questions in same item (double-barreled) Avoid patterns in questions Range in scales must be appropriate Observe neutral terms
Slide 7 Transcript Questionnaires should be attractive, brief, and easy to respond to. For questionnaires, no item should be included that is not directly related to the topic. Structured items work best if the participant only needs to circle the answer, or check it in the computer, rather than write down a response. Try and provide a point of reference, like a comparison to something, that guides participants in answering the questions. Whether questionnaires or interviews, avoid leading questions that might suggest one response is more appropriate than another. You also should avoid sensitive questions to which the participant may not answer honestly. And, don’t ask a question that assumes a fact not necessarily true. Some additional aspects to avoid include not asking question that have multiple responses built in – the double-barreled item. You would also want to avoid questions that begin to establish a pattern in their answers – this is referred to as the response set pitfall. One way to offset this tendency is to reverse the scale, however, it can also increase the possibility of inaccurate responses by participants. Where you may have several rating scales, they should be consistent in their range and numbering, since participants don’t necessarily agree about what various points along a scale mean and you can, at least, keep their responses in a consistent pattern. With a Likert-type scale, you will often see a range of five points, which allows for a neutral point. However, limiting responses to four choices forces a respondent to move more to one side than the other. Scales that extend to 7 or 10 points offer more variability but can be very difficult to validate the lesser differences and their significance. When constructing survey items, it is best to observe neutral terms with regard to gender, race, and other sensitive matters, unless of course, these are objectives for the survey.
Survey Administration
Have a plan for capturing responses
Ways to administer Records Mail Telephone Group or individual Electronic
Field test Adjust, schedule, consent, administer Mailing may include inaccurate addresses
Computerized surveys
Confidentiality and anonymity not assured
Survey return rates can be challenging
Tele-interviews invite a few concerns
Slide 9 Transcript Before administering a survey, the researcher must have a clear plan for how to capture the responses and other data. Using a computer or smartphone with a checklist or spreadsheet is one way. Having an electronic template provides flexibility and opportunity to gather information on individual participants or situations. Different technology, like Cybertracker, can follow a participant into situations the researcher can’t observe directly. There are several ways to administer surveys, e.g., using records, mailing the survey, doing telephone surveys, by group or individual interviews, and electronically, of course. First, though, a field test or pretest should be conducted with a few volunteers to find weak spots in the questions. Once adjustments are made, the sample or group is identified using procedures covered earlier, and times are arranged. The next step is to obtain informed consent to participate. Often, as part of this process, the researcher may first meet with participants in person or by video to explain the purpose of the survey. Places or means to access a questionnaire might be at a particular website, which is especially relevant for selected samples that need to limit access to the questionnaire. When mailing, several issues may arise like inaccurate addresses, mis deliveries, or who actually responded to the survey. More open surveys can use a link to a service like Survey Monkey, although it is difficult to control who is actually completing the questionnaire and it also is difficult to screen those who may not meet the profile for your desired type of participant. One of the problems with sending out questionnaires electronically is in providing confidentiality or anonymity. Maximizing the return rate for a questionnaire is important, and strategies to improve the rate might include timing that is conducive, getting a good first impression when they see the questionnaire, and being able to motivate potential responders. Sometimes, an advance letter can be sent indicating they will be receiving a survey, or nominal payment may be offered. A survey of records uses a different source than other surveys discussed here because they are nonreactive and do not involve a direct response from a person. Some of the problems with records surveys are that the information may be incomplete, yearly comparisons may not be compatible, or the purpose of the records may be unrelated to the purpose of the survey. For in-person interviews, using a telephone or electronic process, like Skype, can be the way. Teleinterviews, though, present some potential artifact, or contaminating influences, when cameras are used, or sound quality is poor.
Correlational Research Designs
Survey data often used with correlational designs
Considers extent of differences in one variable with another
Correlation = one variable increases > predicts another
Quantitative data are required
Often investigate a major, complex variable
High correlation prompts further causal-comparative studies
Higher the correlation, more accurate the predictions
Slide 11 Transcript Survey data are often used in correlational research designs. A correlational study considers to what extent differences in one variable or characteristic are associated with differences in other characteristics or variables. A correlation exists if, when one variable increases, another predictably increases or decreases. In this way, knowing the value of one variable allows for prediction of the value for another variable. The degree of relation is expressed as a correlation coefficient. To perform a correlational study, quantitative data are required. Keep in mind, this may apply within nearly any type of research design. Correlation research is sometimes regarded as a type of descriptive research because it describes an existing condition. Correlational studies typically investigate several variables believed to be related to a major, complex variable, something like achievement. Variables found to be highly related are studied further in causal-comparative or experimental studies to determine the nature of the relations. We have heard many times that high correlation between two variables does not imply that one causes another. But, even though correlational relations are not cause-effect ones, it still permits prediction. The higher the correlation the more accurate the predictions become. There is a temptation for some researchers to correlate all manner of variables just to see what might turn up, but this is strongly discouraged and is very inefficient.
Relationship Between Variables
Correlation coefficient = +1.00 to -1.00
High positive correlation > high relationship with the other
Indicates size and direction of relationship
High negative correlation > low relationship with the other
Less than .35 is weak Between .35 to .65 is moderate Above .65 is strong
Correlation coefficient > shared variance between variables
Slide 13 Transcript When two variables are correlated, the result is a correlation coefficient which is a decimal ranging from +1.00 to -1.00. The coefficient indicates the size and direction of the relationship. A coefficient near +1.00 represents a strong relationship and positive direction.
In other words, a high positive correlation is likely to have a high relationship with the other variable. If the coefficient is near zero, the variables are not related. Now, if there is a high negative value, approaching near -1.00 for instance, it means there is a strong relation, and that the value on the other variable will be low. As a guide, correlation coefficients can be interpreted in bands, e.g., less than .35 is weak, between .35 and .65 is moderate, and greater than .65 is strong. A coefficient less than .50 is generally considered useless for prediction. Those above .60 are adequate for group prediction purposes and above .80 for individual predictions. Many beginning researchers mistakenly think a correlation coefficient of say .50 means that two variables are 50% related. This is incorrect. The square of the correlation coefficient indicates the amount of variance shared by the variables. Common variance, also called shared variance, indicates the extent to which variables vary in a systematic way.
Shared variance is the variation in one variable that is attributable to its tendency to vary with another variable. So, the more the common variance, the higher the correlation coefficient. If two variables are perfectly related the variability of one set of values is very similar to variability in the other set of values.
Limitations and Interpretation
Interpretation depends on how the coefficient is used
Prediction study Accuracy in predictions
Hypothesis study Statistical significance Not importance, but likelihood of chance Relative to sample size (df) Not about cause and effect
Techniques to determine correlation coefficient
Pearson r (ratio or interval data) Spearman rho (rank or ordinal data) Phi coefficient (categorical data)
To avoid possibility of underestimation
Eta coefficient if curvilinear Low reliability yields attenuation Restricted range of values
Slide 15 Transcript
Interpretation of a correlation coefficient depends on how it is used. In a prediction study, the correlation coefficient value is important for accurate predictions. In a study to examine hypothesized relations, a correlation coefficient is interpreted in terms of its statistical significance. Statistical significance refers to the probability that the results would have occurred just by chance. As always, keep in mind that significance does not mean importance, rather, it indicates probability of results compared with chance occurrence. Another aspect of significance testing with correlations is that it is computed relative to the sample size. To demonstrate a significant relation, the correlation coefficient for small sample sizes must be higher than those for larger sample sizes. The difference is due to the larger degrees of freedom for the larger sample. As mentioned before, it is essential to keep in mind that correlation is about relation, not cause and effect. While it can be tempting to conclude that high correlation values establish cause and effect, they merely suggest it and open the way to experimental designs to actually confirm if true causation exists. The most common technique to compute correlation coefficient is the product moment correlation coefficient, usually referred to as the Pearson r, and is used when both variables are expressed as continuous, i.e., ratio or interval, data. The Pearson r results in the most precise estimate of correlation. If the data for at least one variable is rank or ordinal data, the more appropriate technique is the rank difference correlation, known as the Spearman rho. The Spearman rho is useful with a small number of participants and relatively easier to compute. Used less often is the phi coefficient when both variables are expressed categorically, e.g., by gender or political affiliation. Even so, there are different approaches depending on whether such dichotomies are natural or artificial, i.e., created by operationally defining a midpoint and categorizing as falling above or below. Most correlational techniques are based on the assumption of a linear relationship, and one in which a change in one variable corresponds with a change in the other. However, some relationships are curvilinear, and an eta coefficient is appropriate, otherwise computations would show no relationship exists. There are other factors that may lead to underestimation. Attenuation will reduce the values and occurs where the correlation coefficients have low reliability. A second limitation might occur if there is a restricted range of values in the data. This is because the more variability in each set of values, the higher the correlation coefficient is likely to be.
Correlation, Regression, and Prediction
Linear regression Simple (one IV predicts for the DV) Multiple (two or more IV predict the DV)
Scatter Plot Shows direction and strength of correlation Points plotted on axes for x (predictor v.) and y (criterion v.)
Line of best fit as Y = bX + a use equations for b (LOBF) and a (intercept) calculated using sum of least squares
Slide 17 Transcript A regression equation examines how accurately one or more variables enable prediction to be made regarding values of another. A simple linear regression generates an equation in which a single independent variable yields prediction for the dependent variable. A multiple regression yields an equation in which two or more independent variables are used to predict the dependent variable. Previously covered were how correlation indicates direction of a relationship (positive or negative) between variables and strength of the relationship as expressed in the correlation coefficient. Both of these, direction and strength, can be shown in a scatter plot, or diagram, that summarizes how the factors are related. Data points are plotted on an x and y axis. Then, the regression line is the best-fitting straight line that minimizes the distance that all data points fall from it. The extent to which two factors are related is determined by how far the points fall from the regression line. The issue of variability for interpreting correlation coefficients was explained earlier. A scatter plot and regression line are visual depictions of this variability. With linear regression, first it is necessary to identify the predictor variable, which is the one we know and indicated as the x axis, and the criterion variable or the one we don’t know which is indicated as the y axis. Then, a bit of geometry is needed to complete the equation. There are many expressions for the equation to calculate the line of best fit, and one we use here is Y= bX + a. Y is the criterion variable, X is the predictor variable, b is the slope of the regression line, and a is where the line crosses the y axis (the intercept), or how high up the y axis it goes. For b, the best-fitting line is calculated using the least squares method with the means and standard deviations of the x and y values. The line’s location is the vertical distance between a potential line where each point is squared, and these squares summed so that the line with the least sum of squares is selected as the best fit. The formulas for each of these operators is performed using calculus and is beyond the scope of what is covered here. So, it really is not something you can eyeball and draw to precision, although you would probably be very close.
Validity and Reliability Effects
Correlation depends on how well variables are measured
Poor reliability or validity diminishes relations Extreme values affect the means
Reminder: Correlation does not indicate causation
Does not describe influence or enhancement Assists in highlighting areas for further research
Experimental designs needed to establish cause-effect
Slide 19 Transcript A statistical correlation between variables depends on how well those variables have been measured. In quantitative research designs, if the instruments used have poor validity and reliability, the likelihood of finding any real relationship is dim. But, one thing is certain, you can discover substantial correlations between characteristics only if you can measure both variables with a reasonable level of validity and reliability. One of the threats to regression equations is the presence of extreme values, since that affects the means and calculation for the line of best fit. As mentioned before, correlation, either positive or negative, does not necessarily indicate causation. This means that describing a correlation as influencing or enhancing something should not be done, since that implies a cause and effect relationship – something a correlation cannot provide. That remains the province of the experimental designs. So, a correlation is like a signpost – it should lead you to wonder about the underlying reason for the relationship and stimulate further research that is more directed to determine cause and effect for the association. So, we have learned about two ways to conduct non-experimental research designs using the tools of surveys and correlation. That ends this presentation and I hope you have a fine week going forward.
- Blank Page
- Blank Page
- Blank Page
- Blank Page
- Blank Page
- Blank Page
- Blank Page
- Blank Page
- Blank Page
- Blank Page