Business 436
Essentials of Marketing Research
Part 3: Gathering and Collecting Accurate Data
Chapter 7: Measurement and Scaling
© McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom.
No reproduction or further distribution permitted without the prior written consent of McGraw-Hill Education.
Because learning changes everything.®
1
Value of Measurement in Information Research
Measurement is an integral part of the modern world.
Precise measurement is critical to airline pilots or physicians.
In most marketing situations, measurement applies to abstract things such as people’s preferences.
Accurate measurement is essential to effective decision making.
© McGraw-Hill Education
‹#›
2
Overview of the Measurement Process
Measurement is the integrative process of determining the intensity (or amount) of information about constructs, concepts, or objects.
As part of the measurement process, researchers assign either numbers or labels to phenomena they measure.
The measurement process consists of two tasks.
Construct selection/development.
The goal is to precisely identify and define what is to be measured.
Scale measurement.
This determines how to precisely measure each construct.
© McGraw-Hill Education
‹#›
What is a Construct?
An abstract idea or concept formed in a person’s mind.
The idea is a combination of a number of similar characteristics of the construct.
The characteristics are the variables that collectively define the concept.
And make measurement of the concept possible.
Suppose the research objective is to identify the characteristics associated with a restaurant’s satisfaction construct.
The researcher is likely to:
Review the literature.
Conduct interviews.
Use personal experience to identify variables of restaurant satisfaction.
Combine characteristics into a framework enabling an empirical investigation.
© McGraw-Hill Education
‹#›
Construct Development
Marketing constructs must be clearly defined.
A construct is an unobservable concept measured indirectly by a group of related variables.
Construct development is an integrative process where researchers determine what specific data should be collected for solving the defined research problem.
Objects relevant to the research problem are identified first.
Then the objective (concrete) and subjective (abstract) properties of each object are specified.
© McGraw-Hill Education
‹#›
Exhibit 7.1 – Concrete and Abstract Properties of Objects
Access the text alternative for this image.
© McGraw-Hill Education
‹#›
Exhibit 7.1 – Concrete and Abstract Features of Marketing Constructs
© McGraw-Hill Education
‹#›
Categorical Types of Information
When the problem requires relevant information and verifiable facts about the characteristics of individuals, objects, or organizations, the information is considered state-of-being in nature.
State-of-mind information represents a person’s mental attributes or emotional feelings about an object – not directly observable or available through external sources.
State-of-behavior information is the current observable actions or recorded past actions of individuals or organizations.
State-of-intention information represents the expressed plans of individuals or organizations to undertake specified future behavioral actions.
© McGraw-Hill Education
‹#›
Properties of Measurement Scales
The amount of information obtained from measurement scales depends on which scaling properties are activated in the design of a measurement scale.
Assignment property – the use of unique descriptors to identify each object in a set.
Order property – establishes “relative magnitudes” between the descriptors creating hierarchical rank-order relationships among objects.
Distance property – enables the researcher and respondent to identify, understand, and express absolute (or assumed) differences between objects.
Origin property – a unique scale descriptor designated as being a “true natural zero” or “true state of nothing.”
© McGraw-Hill Education
‹#›
Scale Measurement
Scale measurement involves assigning a set of scale descriptors to represent the range of possible responses to a particular question.
The scale descriptors are a combination of labels, such as “Strongly Agree” or “Strongly Disagree” and numbers, such as 1 to 7, which are assigned using a set of rules.
Scale measurement assigns degrees of intensity to the responses.
The degrees of intensity are commonly referred to as scale points.
Four scale levels:
Nominal.
Ordinal.
Interval.
Ratio.
© McGraw-Hill Education
‹#›
Exhibit 7.3: Examples of Nominal Scales
A nominal scale uses questions requiring respondents to provide only some type of descriptor as the raw response.
© McGraw-Hill Education
‹#›
Exhibit 7.4: Examples of Ordinal Scales
An ordinal scale allows a respondent to express relative magnitude between the answers to a question.
Mode, median, frequency distributions, and ranges can be applied to ordinal scales.
Access the text alternative for this image.
© McGraw-Hill Education
‹#›
Exhibit 7.5: Examples of Interval Scales
An interval scale demonstrates absolute differences between each scale point.
Access the text alternative for this image.
© McGraw-Hill Education
‹#›
Exhibit 7.6: Examples of Ratio Scales
A ratio scale allows the researcher not only to identify the absolute differences between each scale point but also to make comparisons between the responses.
This enables a “true natural zero” or “true state of nothing” response.
Access the text alternative for this image.
© McGraw-Hill Education
‹#›
Evaluating Measurement Scales – Scale Reliability
Scale reliability is the extent to which a scale can reproduce the same or similar measurement results in repeated trials – measures consistency.
The test-retest technique involves repeating the scale measurement with either the same sample at two different times or two different samples of respondents under nearly the same conditions.
First time respondents may miss the second.
Respondents’ scale sensitivity may alter the second measurement.
Environment may change between the two administrations.
In the equivalent form technique, researchers create two equivalent scale measurements and administer both forms to either the same sample of respondents or two samples of respondents.
It may not be worth the time, effort, and expense.
It is difficult, if not impossible, to create two equivalent scales.
© McGraw-Hill Education
‹#›
Scale Reliability – Internal Consistency
Internal consistency is the degree to which the individual questions of a construct are correlated.
Two popular techniques used to assess internal consistency.
Split-half test.
Scale questions are divided into two halves and scores for each half are correlated against one another.
High correlations indicate good internal consistency.
Coefficient alpha.
Calculates the average of all possible split-half measures that result from different ways of dividing the scale questions.
A value of less than 0.7 or higher than 0.95 is a problem.
© McGraw-Hill Education
‹#›
Evaluating Measurement Scales – Validity
Scale validity assesses whether a scale measures what it is supposed to measure – a measure of accuracy.
With face validity, researchers use expert judgment.
While content validity:
Measures the extent a construct represents all the relevant dimensions.
Requires rigorous statistical assessment.
Is assessed before data is collected.
Convergent validity occurs when multiple measures of the same construct have a variance of more than 50 percent.
Discriminant validity is the extent a single construct differs from others and is unique.
Two ways to obtain data to assess validity:
A pilot study.
A panel of experts.
© McGraw-Hill Education
‹#›
Developing Measurement Scales
Designing measurement scales requires:
Understanding the research problem.
Identifying and developing constructs.
Establishing detailed data requirements.
State of being, mind, behavior, and intention.
Understanding the scaling properties.
Selecting the appropriate measurement scale.
© McGraw-Hill Education
‹#›
Criteria for Scale Development
Understanding of the questions.
Pretest for understanding.
The discriminatory power of scale descriptors is the scale’s ability to differentiate between the scale responses.
The more scale points, the more variability and the greater the discriminatory power.
A balanced scale has an equal number of positive and negative response alternatives.
An unbalanced scale has more options on one side.
A forced-choice scale does not have a neutral descriptor.
A scale including a neutral response is a nonforced or free-choice scale.
Minimize negatively worded statements, and still use caution.
Measures of central tendency include: mean, median, mode.
Measures of dispersion include: frequency distribution, range, and standard deviation.
© McGraw-Hill Education
‹#›
Exhibit 7.7: Examples of Forced-Choice and Nonforced Scale Descriptors
Access the text alternative for these images.
© McGraw-Hill Education
‹#›
Exhibit 7.8: Relationships between Scale Levels and Measures of Central Tendency and Dispersion
Nominal scales can be analyzed using frequency distributions and mode.
Ordinal scales can use medians and ranges as well as modes and frequency distributions.
For interval or ratio scales, the most appropriate are means and standard deviations, but can use any of the measurements.
Access the text alternative for this image.
© McGraw-Hill Education
‹#›
Adapting Established Scales
There are hundreds of published scales in marketing.
Handbook of Marketing Scales by Bearden, Netemeyer, and Haws.
Marketing Scales Handbook by Bruner.
The online Measures Chest by the Academy of Management.
Some scales can be used in their published form but most need to be adapted to meet current psychometric standards.
© McGraw-Hill Education
‹#›
Scales to Measure Attitudes and Behaviors
This section discusses three scale formats:
Likert scales.
Semantic differential scales.
Behavioral intention scales.
General steps in construct development/scale measurement process.
Step 1: Identify and define the construct.
Step 2: Use qualitative research to identify a theory.
Step 3: Refine problem using qualitative judgment and analysis.
Step 4: Design scales and pretest.
Step 5: Evaluate reliability and validity.
Step 6: Purify scales by eliminating poorly designed statements.
Step 7: Complete the final scale evaluation.
© McGraw-Hill Education
‹#›
Exhibit 7.10: Example of a Likert Scale
A Likert scale is an ordinal scale format asking respondents to indicate the extent to which they agree or disagree with a series of mental or behavioral belief statements about a given subject.
Uses five scale descriptors: strongly agree, agree, neither agree nor disagree, disagree, and strongly disagree.
© McGraw-Hill Education
‹#›
Exhibit 7.11: Example of a Semantic Differential Scale Format for Jennifer Garner as a Credibility Spokesperson
A semantic differential scale is a unique bipolar ordinal scale format that captures a person’s attitudes or feelings on a give object.
Access the text alternative for this image.
© McGraw-Hill Education
‹#›
Semantic Differential Scale – Non-bipolar Descriptors
A problem encountered in designing semantic differential scales is the inappropriate narrative expressions of the scale descriptors.
A well-designed scale has truly bipolar anchors.
A negative pole descriptor is not truly an opposite, creating confusion.
Another problem is the use of an odd number of scale points, creating a so-called neutral response dividing positive and negative poles.
The neutral response has no diagnostic value to researchers.
To avoid this problem, use an even-point (or forced-choice) scale point format and incorporate a “non-applicable” response out to the side.
© McGraw-Hill Education
‹#›
Exhibit 7.13 – Retail Store: Shopping Intention Scale for Casual Clothes
A behavioral intention scale is designed to capture the likelihood that people will demonstrate predictable behavior with future purchases.
© McGraw-Hill Education
‹#›
Noncomparative Rating Scales
A noncomparative rating scale is used when the objective is to have a respondent express their attitudes, behavior, or intentions about a specific object without making reference to another object.
In contrast, a comparative rating scale is used when the objective is to have a respondent express their attitudes, feelings, or behaviors about an object on the basis of some other object.
Graphic rating scales use a scaling descriptor format that presents a respondent with a continuous line as the set of possible responses.
© McGraw-Hill Education
‹#›
Comparative Rating Scales – Rank-Order Scales
Rank-order scales enables respondents to compare objects by indicating their order of preference or choice from first to last.
© McGraw-Hill Education
‹#›
Comparative Rating Scales – Constant Sum Scales
Constant sum scales require the respondent to allocate a given number of points, usually 100, among each separate attribute or feature relative to all the other listed one.
© McGraw-Hill Education
‹#›
Other Measurement Scale Issues
A single-item scale collects data about only one attribute of the object.
A multiple-item scale collects data on several attributes of an object.
Two factors help decide whether to use a single- or multi-item scale.
The number of dimensions of the construct.
The reliability and validity.
When phrasing the question element of the scale, use clear wording.
Avoid ambiguity.
Avoid using “leading” words or phrases.
Keep instructions simple and clear.
Make sure scale point descriptors are relevant and adequate.
Pretest and evaluate for reliability and validity.
© McGraw-Hill Education
‹#›
Misleading Scaling Formats
A double-barreled question includes two or more attributes in the same question, but responses allow comment on a single issue.
Include a question for each attribute or topic.
A leading question influences the respondent’s answers.
A loaded question suggests a socially desirable answer or involves an emotionally charged issue.
Ambiguous questions involve possible responses that can be interpreted a number of ways.
Complex questions are worded in a way making the respondent unsure how to respond.
A double negative question contains two negative thoughts in the same question.
Scale responses should be mutually exclusive.
© McGraw-Hill Education
‹#›
End of Main Content
© McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom.
No reproduction or further distribution permitted without the prior written consent of McGraw-Hill Education.
Because learning changes everything.®
www.mheducation.com
33
Accessibility Content: Text Alternatives for Images
© McGraw-Hill Education
‹#›
Exhibit 7.1 – Concrete and Abstract Properties of Objects – Text Alternative
The concrete properties of a consumer are their age, sex, marital status, income, brand last purchased, dollar amount of purchase, types of products purchased, and hair or eye color.
The abstract properties of a consumer are attitudes toward a product, brand loyalty, high-involvement purchases, emotions (love, fear, anxiety), intelligence, and personality.
The concrete properties of an organization are the name of the company, the number of employees, the number of locations, total assets, Fortune 500 rating, computer capacity, types and numbers of products and service offerings.
The abstract properties of an organization are its competence of employees, quality control, channel power, competitive advantages, company image, and consumer-oriented practices.
© McGraw-Hill Education
‹#›
Exhibit 7.1 – Concrete and Abstract Features of Marketing Constructs – Text Alternative
The concrete properties of brand loyalty are the number of times a particular brand is purchased, the frequency of purchases of a particular brand, and amount spent.
The abstract properties of brand loyalty are like/dislike of a particular brand, the degree of satisfaction with the brand, and overall attitude toward the brand.
Concrete properties of customer satisfaction are identifiable attributes that make up a product, service, or experience. While abstract properties of customer satisfaction include liking/disliking of the individual attributes making up the product, and positive feelings toward the product.
Concrete properties of service quality are identifiable attributes of a service encounter, for example amount of interaction, personal communications, and service provider’s knowledge.
Abstract properties of service quality are expectations held about each identifiable attribute, and evaluative judgment of performance.
Concrete properties of advertising recall are factual properties of the ad (e.g., message, symbols, movement, models, text), and aided and unaided recall of ad properties.
Abstract properties of advertising recall are favorable/unfavorable judgments, attitude toward the ad.
© McGraw-Hill Education
‹#›
Exhibit 7.3: Examples of Nominal Scales – Text Alternative
Example one: The respondent is asked to indicate their marital status and are offered five options: married, single, separated, divorced, and widowed.
In the second example, the respondent is asked: Do you like or dislike chocolate ice cream? The respondent can choose one of two options: like or dislike.
The third example asks the respondent: Which of the following supermarkets have you shopped at in the past 30 days? Respondent’s choose all that apply from: Albertson’s, Winn-Dixie, Publix, Safeway, and Walmart.
The last example asks respondents to indicate their gender and offers three options: female, male, and transgender.
© McGraw-Hill Education
‹#›
Exhibit 7.4: Examples of Ordinal Scales – Text Alternative
The first example gives respondents the following instructions: We would like to know your preferences for actually using different banking methods. Among the methods listed below, please indicate your top three preferences using a “1” to represent your first choice, a “2” for your second preference, and a “3” for your third choice of methods. Please write the numbers on the lines next to your selected methods. Do not assign the same number to two methods. Respondents have seven choices: inside the bank, drive-in (drive-up) windows, ATM, debit card, bank by mail, bank by telephone, and internet banking.
The second example gives respondents the following instructions: Which one statement best describes your opinion of the quality of an Intel PC processor? (Please check just one statement.) Respondents have three choices: higher than AMD’s PC processor, about the same as AMD’s PC processor, and lower than AMD’s PC processor.
The final example gives respondents the following instructions: For each pair of retail discount stores, circle the one store at which you would be more likely to shop. Respondents have three choices: Costco or Target, Target or Walmart, and Walmart or Costco.
© McGraw-Hill Education
‹#›
Exhibit 7.5: Examples of Interval Scales – Text Alternative
The first example asks respondents: How likely are you to recommend the Santa Fe Grill to a friend? The respondent is offered a range of numbers from one to seven with one being “Definitely Will Not Recommend” and seven being “Definitely Will Recommend.”
The second example gives respondents the following instruction: Using a scale of 0–10, with “10” being Highly Satisfied and “0” being Not Satisfied At All, how satisfied are you with the banking services you currently receive from (read name of primary bank)? The respondent is offered a single answer space to write their number.
The final example gives respondents the following instruction: Please indicate how frequently you use different banking methods. For each of the banking methods listed below, circle the number that best describes the frequency you typically use each method. Respondents are provided with seven banking methods: inside the bank, drive-up window, 24-hour ATM, debit card, bank by mail, bank by phone, and bank by internet. For each banking method, the respondent is offered a range of numbers from 0 to 10 with 0 being “Never Use” and 10 being “Use Very Often” and respondents rate each method with their chosen number.
© McGraw-Hill Education
‹#›
Exhibit 7.6: Examples of Ratio Scales – Text Alternative
The first example asks respondents to please circle the number of children under 18 years of age currently living in their household. Respondent is given the numbers 0 to 7 to choose from with an additional option “If more than 7, please specify” and provides space for a written number.
The second example asks: In the past seven days, how many time did you go online to shop at Amazon.com? Respondent is provided a space to write the “# of times.”
The final example asks: In years, what is your current age? Respondent is provided a space to write “# of years old.”
© McGraw-Hill Education
‹#›
Exhibit 7.7: Examples of Forced-Choice and Nonforced Scale Descriptors – Text Alternative
There are two examples of even-point, forced choice rating scale descriptors and the same two examples using an odd-point, nonforced choice rating scale descriptors.
The first example is for purchase intention (either not buy, or buy) and the even-point scale offers the respondent the following options: Definitely will not buy, Probably will not buy, Probably will buy, and Definitely will buy. While the odd-point scale offers the following response options: Definitely will not buy, Probably will not buy, Neither will nor will not buy, Probably will buy, and Definitely will buy.
The second example is on personal beliefs or opinions (either agreement or disagreement). The even-point scale offers the respondent the options of: Definitely disagree, somewhat disagree, somewhat agree, and definitely agree. Wile the odd-point scale offers these options: Definitely disagree, somewhat disagree, neither disagree nor agree, somewhat agree, and definitely agree.
© McGraw-Hill Education
‹#›
Exhibit 7.8: Relationships between Scale Levels and Measures of Central Tendency and Dispersion – Text Alternative
When using nominal scales, the measurements of mode and frequency distribution are appropriate while other measurements are inappropriate.
When using ordinal scales, the mode and frequency distribution are appropriate but the median and range are more appropriate. Both mean and estimated standard deviation are inappropriate.
When using either interval or ratio scales, mode and median are appropriate but the mean is the most appropriate. Frequency distribution and range and both appropriate but estimated standard deviation is most appropriate.
© McGraw-Hill Education
‹#›
Exhibit 7.10: Example of a Likert Scale – Text Alternative
The instructions to the respondent are: For each listed statement below, please check the one response that best expresses the extent to which you agree or disagree with that statement.
The respondent is presented with four statements and asked to rate the statements as one of the following: definitely disagree, somewhat disagree, slightly agree, somewhat agree, and definitely agree.
The statements are as follows.
I buy many things with a credit card.
I with we had a lot more money.
My friends often come to me for advice.
I am never influenced by advertisements.
© McGraw-Hill Education
‹#›
Exhibit 7.11: Example of a Semantic Differential Scale Format for Jennifer Garner as a Credibility Spokesperson – Text Alternative
This example of a semantic differential scale instructs the respondent: We would like to know your opinions about the expertise, trustworthiness, and attractiveness you believe Jennifer Garner brings to Capital One bank card service advertisements. Each dimension below has five factors that may or may not represent your opinions. For each listed item, please check the space that best expresses your opinion about that item.
Responses are broken down into two groups: expertise and trustworthiness and each section has five facets of that descriptor. Each facet has two opposite anchors and respondents are asked to choose from seven spaces between those anchors.
Expertise facets are: knowledgeable-unknowledgeable, expert-not an expert, skilled-unskilled, qualified-unqualified, and experienced-inexperienced.
Trustworthiness facets are: reliable-unreliable, sincere-insincere, trustworthy-untrustworthy, dependable-undependable, and honest-dishonest.
© McGraw-Hill Education
‹#›
Exhibit 7.13 – Retail Store: Shopping Intention Scale for Casual Clothes – Text Alternative
The respondent is given the following directions: When shopping for casual wear for yourself or someone else, how likely or unlikely are you to shop at each of the following types of retail stores? (Please check one response for each store type.)
There are four types of retail stores with examples provided for each type, including: department stores like Macy’s or Dillard’s; discount department stores like Walmart, Costco, and Target; clothing specialty shops like Wolf Brothers, and Surrey’s George Ltd.; and finally casual wear specialty shops like The Gap, Banana Republic, and Aca Joe’s.
Respondents are to check one of the four responses for each type of store. The four responses are: definitely would shop at (90-100% chance), probably would shop at (50-89% chance), probably would not shop at (10-49% chance) and definitely would not shop at (less than 10% chance).
© McGraw-Hill Education
‹#›
Noncomparative Rating Scales – Text Alternative
There are two examples of graphic rating scales in this image.
The first example rates the satisfaction of browser descriptors and provides a time line beginning at zero with “not at all satisfied” and ends at 100 with “completely satisfied.” The line is broken up into increments of 10.
The second example uses smiling face descriptors and the faces have seven levels. The lowest level, at level one, has a sad face and the faces get progressively happier, through numbers two-six until the happiest face at level seven.
© McGraw-Hill Education
‹#›
Comparative Rating Scales – Rank-Order Scales – Text Alternative
The scale provides the respondent with the following instructions: Thinking about the different types of music, please rank your top three preferences of types of music you enjoy listening to by writing in your first choice, second choice, and third choice on the lines provided below.
The respondent is asked to list their first preference, their second preference, and their third preference.
© McGraw-Hill Education
‹#›
Comparative Rating Scales – Constant Sum Scales – Text Alternative
The respondent is given the following instructions: Below is a list of seven banking features Allocate 100 points among the features. Your allocation should represent the importance each feature has to you in selecting your bank. The more points you assign to a feature, the more importance that feature has in your selection process. If the feature is “not at all important” in your process, you should not assign it any points. When you have finished, double-check to make sure your total adds to 100.
Respondents are asked to rank seven banking features, assigning points which total 100.
The banking features are: convenience/location, inside banking hours, good service charges, the interest rates on loans, the bank’s reputation, the interest rates on savings, and online banking availability.
© McGraw-Hill Education
‹#›