Descriptive Statistics In Marketing
On the one hand, data analysis is a core competence of market research professionals. Thus, any graduate-level market research text will devote hundreds of pages to this topic. On the other hand, the material in those chapters is virtually indistinguishable from what might be found in any text on applied statistics or statistics for the social and behavioral sciences. In that sense, there is little that is unique about the data analyses performed for market research. In essence, the same statistical procedures are used across all the social sciences, and many of the practitioners of data analysis in market research will have a PhD in some related discipline like psychology, sociology, or economics. In fact, there are three specific statistical procedures, widely used across the social sciences, that account for the bulk of the data analyses actually performed in day-to-day commercial market research. The limited goal of this chapter is to give the general manager and the engineering manager a handle on these procedures so that you know what to expect.
Given the brevity of this chapter, its goals must be especially limited. Think of it as a briefing. I can’t teach you how to do data analysis in this space: that requires at least the hundreds of pages found in the standard market research text or, more properly, the years of practice designing and conducting statistical analyses that form the core of most contemporary social science PhD programs. What I can do is give you the names of things and put these names in a context. If this reduces your bewilderment or increases your composure the next time a market research study is discussed, then you should be able to ask more critical questions and ensure that the proposed study meets your needs as a decision maker. In addition, I hope to make you a more critical reader of secondary research and completed market research reports.
Procedure
First, an overview of how quantitative “data” come into existence and get analyzed and reported.
1. In most cases, somehow, some way, an Excel spreadsheet containing numbers corresponding to respondents’ answers gets created. If the procedure was a web-based survey, involved computer-assisted telephone interviewing, or was administered on a computer (as in the case of many conjoint studies), then the Excel spreadsheet is created automatically as part of data collection. If paper forms were used, then someone entered the responses, represented numerically, into Excel. The Excel data are generally in matrix form (“flat file,” in computerese) where the rows correspond to individual respondents and the columns contain a numerical representation of each respondent’s answer to each of the questions administered. For example, if a 5-point scale was used to measure purchase intentions, and that respondent indicated he would definitely purchase the new product when available, a “5” might be entered in the corresponding cell of the spreadsheet.
2. From Excel, the data are typically imported to a specialized statistical analysis program. SPSS ( www.spss.com ) and SAS ( www.sas.com ) are two comprehensive packages used by many academics; numerous other such packages exist, including free versions available over the web. Within the statistical package, raw data can be recorded, transformed, aggregated, or disaggregated at will, and virtually any statistical analysis can be performed simply by pulling down a menu and selecting a few options.
3. The data analyst often has a PhD, but routinized analyses on standardized instruments, such as a satisfaction questionnaire, may be performed by MBAs or other people who have accumulated hands-on experience. If the study is a one-off affair, then the analyst will probably generate a variety of tentative analyses never seen by the client as the analyst assimilates the data and decides on a reporting approach. If the study is more routinized, then analysis may be as simple as pushing a button to trigger a canned series of tests. In this case, the analyst sees roughly the same output as the client.
4. The results of the analyses are formatted as tables and embedded into a narrative (which, in routinized cases, such as the nth satisfaction study done within the banking category, may be largely boilerplate).
5. Results are presented to the client and discussed. Depending on the contract and what was paid for, the research firm may attempt to add quite a bit of interpretation to the data, to the point of recommending specific courses of action based on the data. Alternatively, the research firm may confine itself to explaining, clarifying, and defending the validity of the results and the procedures used, leaving substantive interpretation to client management.
Types of Data Analysis in Market Research
Most data analysis in market research consists of one of the following:
1. Tabulating and cross-tabulating proportions; an example would be agreement with an opinion item cross-tabulated with some other factor such as brand owned or education, in the consumer case, or size of business in the B2B case.
2. Comparing means (averages) across items, groups of customers, or time periods; an example would be total annual expenditure on the product category for males versus females, in summer vs. winter, on the West Coast vs. the East Coast, or by homeowners versus renters.
3. Predicting an outcome as a function of antecedent variables, as when level of satisfaction is shown to covary with expenditure, length of relationship with the vendor, number of changes in account team personnel, size of customer, and type of service contract.
Many more esoteric kinds of analyses, such as multidimensional scaling or structural equations models with latent variables (to convey the flavor of the jargon), also play a role in market research, but these three types of analyses are the workhorses employed every day. In the case of each of these procedures, individual numbers are compared in an attempt to detect meaningful or real differences. How do customers of Brand A differ from customers of Brand B? Which attributes are worth a lot of money to customers, and which attributes are not worth any money? Which factors serve to increase customer satisfaction, and which have no effect?
Sometimes the reality of the difference between two numbers is deemed to be obvious, as in the following cross-tabulation:
If the sample is large, further statistical analysis of this comparison only confirms what we see at a glance—owners of Brands A and B hold very different opinions. On the other hand, much of the time the data looks more like this breakdown of the customer base for each of three brands:
Or maybe like this:
As we move from 2 × 2 cross-tabulations to 3 × 5 cross-tabulations, and as the different results move closer and as the numbers become more abstract, as in the preference ratings, it becomes more and more difficult to say, with confidence, that customers of Brand A tend to have lower incomes than customers of Brand B, or that different segments place different importance weights on key features.
If you recall the sampling chapter, you know that all these numbers obtained on the sample are only fallible estimates of the true population values in any case. The question becomes, When does a difference between two numbers obtained in the sample represent a reliable, actionable difference, and when is the difference only apparent and dismissible as an artifact of the random variation inherent in all sample data? (Sometimes the question is more naturally thought of as whether two items have a real association; but this reduces to the question of whether the degree of association is different from zero. Hence, I shall refer to “difference” throughout this discussion.)
Modern statistical data analysis arose to address precisely this question: Which apparent differences in data are real, and which are not? It is important to understand that statistical analysis gives us no direct access to the truth; it simply indicates whether the apparent difference is probably real or, alternatively, how probable it is that the apparent difference in the sample reflects a real difference in the population, so that it is signal rather than noise.
By convention, an apparent difference is accepted as a real difference if, when the appropriate test statistic is applied, it indicates that the difference in question would arise by chance in fewer than 5 of 100 cases. Now, let us unpack that unwieldy sentence. Every kind of data difference has associated with it one or more statistical procedures, and each such statistical procedure allows a computation of a number known as a test statistic. These test statistics are computed using the same sorts of assumptions about probability and probability distributions as underlie the discussion of sample size.
Suffice to say that when you calculate a test statistic, you acknowledge that if the study were repeated with a new probability sample, you would not get exactly the same results each time. This variability follows from the fact that you are drawing a limited sample from a very large population. The mathematics underlying the test statistic then envisions repeating the study an infinite number of times and uses the data in the present sample to estimate how often, in that infinite series of repetitions, you would get a difference this big by chance alone. If the answer is fewer than 5 in 100, such results are generally given an asterisk and a footnote that reads something like “p < .05.”
The important thing to retain from this discussion is that statistical analysis is simply a set of mathematically based conventions for determining when you can accept an apparent difference as a real difference. More pointedly, if no statistical analysis has been applied, and the difference is not of the order 80–20 versus 20–80, you ought not to assume that the apparent difference is a real one. If you are reading a report that makes much of certain apparent differences—say, the 52 percent agreement in segment 1 versus the 43 percent agreement in segment 2—but that includes no statistical analysis, then you should become very suspicious. Worse, if there is a plethora of such statistical analyses, but the sample is not a probability sample, then you should question how much weight can be placed on the results; strictly speaking, in the absence of a probability sample, the results of the significance tests are not correct. More exactly, the results of conducting a statistical test on a nonprobability sample are of unknown validity. You simply don’t know whether to accept them or not.
Managerial Perspective on Data Analysis
As a general manager receiving the results of data analyses, your primary responsibility is to understand that apparent differences need not be real differences. It behooves a general manager to have a deep humility concerning the ability of any research endeavor to produce a picture of the world as it truly is. Given this stance, you know to be skeptical as soon as you are intrigued. That is, when you see a difference that seems actionable or would resolve an uncertainty, your next response should always be, Is it real? Your second responsibility, then, is to understand the role played by statistical analysis in vetting apparent differences. You must have the discipline not to accept reported differences at face value, absent an appropriate test. Your final responsibility is to accept that even statistical analysis only provides an estimate of the odds that a difference is real rather than apparent. It takes quite a bit of tough-mindedness to accept that some proportion of statistical judgments indicating that a difference is real will be wrong—about 5 in 100, actually. Because general managers may encounter hundreds of data comparisons in a year, they can be virtually certain that some differences vetted as real are not. As I said at the outset, market research reduces uncertainty but cannot eliminate it.
Put in more positive terms, if a good sample was obtained—that is, a correctly sized probability sample—and if two key numbers appear to be really different (p < .05), and you have no countervailing data or experience, then you should feel comfortable acting on these results. More than 95 times out of 100, the data will be vindicated. But remember: It is never 100 percent certain.
Dos and Don’ts
· Don’t be intimidated by statistical tests, no matter how scanty your education in this area. From a managerial standpoint, every test, no matter how esoteric, is just an effort to mark out certain findings as real and noteworthy.
· Do be suspicious of any quantitative research report that doesn’t include statistical tests. Without a test, how are you to know which nominal differences can support a decision?
· Don’t accept a statistical test at face value in the absence of information about the sampling procedure and a judgment as to whether it was a probability sample.
· Do acquire a habit of skepticism when presented with table after table of numbers, and slide after slide of full-color charts and graphs. Which differences are real, and which are just random variation?