Discussion 2

profilenikkieramsey
AblesonReading-ch8.pdf

113

8    Interestingness of Argument

In this chapter and the next, we broaden our discussion of the narrative aspects of statistical claims. We ask what makes a statistical claim interesting to a research audience. This is an important issue, because when a statistical story becomes a conversation piece, further research is likely to be generated. If a claim is so blah that no one cares to read or talk about it, the chances are small that it will enter the lore of a field—much less stimulate further investigation. Thus high interest acts as a magnifier, and low interest as a filter, shaping the body of lore in the direction of more interesting claims.

Yet the nature of interestingness is elusive. Philosophers (Davis, 1971), psychologists (Hidi & Baird, 1986; Tesser, 1990), computer scientists (Schank, 1979; Wilensky, 1983), and others have grappled with this concept. After a preliminary discussion, we focus on the question of what makes research claims theoretically interesting, and only make passing reference to popular interest, or pizazz.

CAN STATISTICS BE INTERESTING?

At the outset, we must confront the widespread stereotype of statistics as a dull subject. (When working on this book, friends and acquaintances would ask me what it was about. “Statistics,” I would say. “Oh… Yes…”, would come the reply. “…And how is your family?”)

Interesting Claims and Interesting Methods

The reputed dullness of statistics is often assumed to spread like some musty odor, covering everything statistical with a layer of suffocating tedium. Students burdened with this stereotype fail to realize that the point of a statistical argument can be interesting, even if the technical substance of its rhetoric is somewhat dry. What is more, in some cases a clever statistical analysis can itself be interesting in the way it manages to reveal something not previously known or properly understood.

Example: A Case of Disputed Authorship. A topic may not of itself be of great importance to nonspecialists, but a statistical story about it may be interesting because of the unexpected use of a pattern of clues, much as in a satisfying detective story. In a classic example of a scholarly “whodunit,” statisticians Mosteller and Wallace (1964) set out to infer the true authorship of several Federalist Papers long in dispute as between James Madison and Alexander Hamilton. For many years, inconclusive debate had raged about stylistic similarities between the unattributed papers and the Federalist Papers known to have been written by Madison and by Hamilton, respectively. Authorship arguments based on ideological content had gotten nowhere, and scholars had begun to look at quantitative indices such as sentence length or average numbers of subordinate clauses per sentence. After study, Mosteller and Wallace rejected these methods; instead, they counted particular word usages—such as versus —in the disputedwhile whilst papers and the known papers of these two authors. With the aid of Bayesian reasoning they came to the conclusion that the contested manuscripts were almost certainly penned by James Madison. It is surprising that idiosyncracy in skilled human expression could be much more readily identified by very concrete details than by general stylistic tendencies.

The Statistician as Grinch. Interest in the authorship example requires academic curiosity. There are many other examples of clever statistical detective work, dealing with hotter topics than Hamilton and Madison, and bearing on beliefs held by the general public. In , we mentioned the statisticalchapter 1Co

py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

114

sleuthing by Carroll (1979) that exposed the flaw in the claim of special longevity for orchestra conductors. In we saw a statistical reinterpretation of the legendary hot hand in basketball, and inchapter 2

, the revelation of a statistical peculiarity in some mental telepathy data. Also in , wechapter 5 chapter 5 presented an incisive debunking job on the supposed baby boom from the New York blackout.

In many such examples, the statistician is cast in the role of a skeptical investigator who does not readily accept a popular explanation of some newsworthy phenomenon. His statistical reanalysis impugns the credibility of an existing belief, showing that the true magnitude of a hypothesized effect was nil, or its basis artifactual. As we elaborate next, the potential to change belief is characteristic of interesting statistical stories.

In cases where beliefs are spoiled, a regrettable side effect is that statisticians are made to seem like1

grinches who loiter about, waiting for opportunities to snatch legends from unsuspecting populaces. In practice, the public is protected from such unpleasantness by its imperviousness to statistics, either because skeptical reanalyses are not sufficiently publicized, or because the public is inattentive or disbelieving.

Like the general public, researchers do not readily abandon their pet hypotheses either. But they must face evidence and argument more squarely than the public does. Debate over the interpretation of results is commonplace, and interestingness plays a role in these confrontations.

THEORETICAL INTEREST

Let us turn to the concept of . Here we are concerned specifically with thetheoretical interest interestingness of research claims based on statistical evidence; thus, we might equally well use the term,

. This might be variously defined, but for our purposes the following conception isscientific interest appropriate: A statistical story is scientifically interesting when it has the potential to change what scientists believe about important causal relationships.2

Change of Belief

The key concept is change of belief, which could consist of strengthening old or creating new beliefs, of weakening existing beliefs, or of modifying beliefs depending on context. New results may create a disparity between observation and expectation, putting pressure on research audience members to re-examine the basis for their expectations, which in turn may change their beliefs. In other words, research must be in order to play a role in potential belief change. Thus interest arises fromsurprising surprising results on an important issue.

Note that we refer to potential belief change. An investigator may make claims that are not accepted, and therefore do not actually change what people believe. If flaws in the conduct and analysis of the research are obvious, it might be dismissed out of hand, and never even be seen as interesting. In other cases, acceptance depends on the persuasive force of the statistical evidence. This depends on the magnitude ( ), articulation ( ), generality ( ), and credibility ( ) of the effects.chap. 3 chap. 6 chap. 7 chap. 9 Claims that are highly surprising, and of great theoretical (or applied) consequence excite great interest and great skepticism simultaneously. The claim by Wilson and Herrnstein (1985) that criminality is genetically transmitted is an illustration of a startling proposal that invokes resistance.

Important beliefs are not readily changed in the typical research community. Beliefs acquire their importance by being anchored in networks of interrelated propositions, often as part of a theory. Change in one belief usually entails changes in others, which in turn imply still further changes, foreshadowing an unwelcome cascade of alterations. As a result, even if the surprising claim is persuasive on paper, cycles of argument and counterargument—to say nothing of further research—may be necessary before beliefs change. During the period following the claim, interest hangs in limbo, the research community collectively not knowing whether to take the claim seriously. Attitudes toward the claim during this limbo period could be paraphrased as, “It seems interesting, but….”3

The tension might finally be resolved by the acceptance, rejection, or modification of the initial claim. Following the change or reaffirmation of all the relevant beliefs, the claim loses current interest. Alternatively, the research community may divide into camps with rival beliefs, in which case interest mayCo

py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

115

last until both sides exhaust their ideas on how to do further useful research.

Too Incredible to Be True. In the extreme, if a claim were so totally bizarre that almost anyone hearing it would immediately dismiss it as incredible, then its interest value for all but true believers would be nil. To give a cockeyed geophysical example, I once heard about a theory that the earth was spherical, all right, but with everyone living on the inside of the sphere, and the heavens in the center. This theory is not an interesting topic for empirical research (though it might be interesting as a delusion), because there are many ways to falsify it with existing knowledge. The incredible claim by Philpott (1950) of an infinitesimal mental time unit, discussed in , is a case in point from the psychological literature.chapter 3 4

SURPRISINGNESS

Our emphasis on surprisingness is consistent with the advice of Leon Festinger and his students that should be a major criterion for good research hypotheses. He argued that if youcounterintuitiveness

performed a piece of research that provided evidence for something your grandmother could have told you already, then you had wasted your time. In this view, for example, it is uninteresting to show that individuals dislike people who disagree with them, or that people will exert greater effort for a larger reward. This doctrine has a weak spot, however: Sometimes what your grandmother (or anyone else) “knows” coexists with knowledge of its opposite. (“Out of sight, out of mind,” plus “Absence makes the heart grow fonder.”)

I agree, however, that when an unambiguous prediction of a folk theory or a scientific theory is generally believed, it is usually more interesting to cast doubt on it than to provide evidence strengthening it. Also interesting would be research and statistical analysis illuminating the particular circumstances under which the existing theory holds or fails to hold. According to McGuire's (1989) perspectivist view of psychology, conditions can virtually always be arranged under which any given relationship, even the most obvious, can be reversed. It is instructive to take examples and try to dream up what such conditions might be. When, perchance, might people show increased liking for someone who had been critical of them? (When they knew they had performed badly, and deserved criticism. See Deutsch & Solomon, 1959.) The bottom line in all this is that to be interesting, a result has to make you think about the topic—or at least make you want to think.

Surprising Ticks in a New Area

One way to create surprise is with a research initiative on a neglected topic, producing results that bruise our intuitions or seem to defy logic.

Example: Milgram's Study of Obedience. A sensation was created by Milgram's (1963) obedience study, in which a majority of ordinary people were induced by an “experimenter” to deliver apparently dangerous levels of electric shock to a helpless victim. Hardly anyone predicted this outcome. Every time a new tick surprises us because it contradicts common wisdom, there are actually two interrelated changes of belief we are called upon to make. One is the acceptance of the reality of the new phenomenon, and the other is a diminution in the perceived force of the prior wisdom.

In the Milgram (1963) case, the common assumption is that evil things are done by evil people, not by evil situations. This is a hard assumption to free ourselves from, as it seems to provide a simple explanation of many events in the social world, and it is a tenet that often arises in law, politics, and religion. Milgram argued against the evildoer theory of cruelty by characterizing the people who obediently deliver the shocks as obedient sheep rather than predatory wolves. Even with the intellectual help of this metaphor, however, it is still difficult not to feel repeatedly surprised every time one thinks of the Milgram study.

Example: Comprehension and Belief. In a less dramatic, yet quite important way, the research of Daniel Gilbert (1991) provides another example that promotes the replacement of a lifelong presupposition inCo

py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

116

favor of a new conception. Gilbert was concerned with the general relationship between one's comprehension of novel statements and one's belief in them.

The usual view of the matter is that when a statement is presented, the first cognitive task for the receiver is to comprehend it. If the statement is understood, a subsequent decision is made whether to believe or disbelieve it. Since the early years of persuasion research (Hovland et al., 1953), investigators have posited that the comprehension of a persuasive passage is followed by a process of acceptance or rejection. In fact, this conception predates psychological research by more than three centuries, going back at least to Descartes.

A radical alternative view, credited to Spinoza, is that comprehension initial belief, followingentails which there is the possibility of later unbelief. In this way of putting the matter, belief is the default state. Barring an active subsequent process of rejection, “perceiving is believing.”

If the Spinozan view is correct, then early interruption of the cognitive processing of each of a mixed set of true and false statements ought to yield a bias toward accepting the false ones as true, compared to a control condition with no interruption. An appropriate experiment by Gilbert (1991) yielded just such a surprising result. This will probably require a great deal of rethinking of the relation between comprehension and belief.5

Accumulating Buts

In , on generality, we discussed the various kinds of replication attempts that may follow on thechapter 7 publication of claims in a new area. Replications are especially likely when the initial study is interesting.

Suppose that almost every replication agrees with the initial claim. If we imagine a cumulative meta-analysis updated after each new replication, the changes in the estimated overall effect size will tend to be smaller and smaller, thus less and less surprising and interesting. After 20 studies, we usually won't know much more than we did after 19 studies.

This decrease of interest with increasing replication is especially pronounced if replications are performed mindlessly, each time with some haphazardly chosen, minor variation. The way to revive interest in an effect is to find contextual factors that qualify it, either by it, or (even morenullifying interesting) it. Hopefully, such qualifying factors would be meaningfully related to the effect inreversing question. If a particular laboratory effect failed to occur only when the moon was full, that might make an interesting story, but it would be incoherent unless further explicated. (See .)chap. 9

When many replications are carried out, it is quite likely that under some conditions there will be failure to reproduce the original result. If these failures are credible (see ), then the initial tick willchap. 9 accumulate buts.

Example: Butting an Early Claim of Dissonance Theory. In the dissonant situation in which someone is rewarded for speaking out contrary to his original beliefs, the claim coming out of the Festinger and Carlsmith (1959) study was that a small reward ($1) is more influential than a large reward ($20) in causing the person to change his beliefs ( ).chap. 2

Many investigators subsequently performed similar studies (with smaller overall rewards to meet criticism of the $20 amount), and at least three clear buts emerged. Linder, Cooper, and Jones (1967) showed a qualitative interaction in a 2 × 2 design: When subjects were given a whether to write anchoice essay counter to their opinions, a group paid 50¢ indeed changed their opinions more than a group paid6

$2.50, but when subjects had no choice, the relative effects of the rewards reversed. Helmreich and Collins (1968) tested whether public by the subject was a necessary condition for replicating thecommitment dissonance prediction. They had subjects agree to argue against their own point of view, some to make a videotape with their names and faces prominent, and others to record their statements anonymously on audiotape. Under the public video manipulation, small reward was more effective than large reward, but under the anonymous audio condition, the effect of reward disappeared.

A third limiting condition was the nature of the likely to follow from theconsequences counterattitudinal performance. Nel, Helmreich, and Aronson (1969) gently persuaded some squeaky clean Texas undergraduates to prepare to speak for the virtues of marijuana to an audience of nonsmoking young high school students, and others to speak to a college audience already favorable to marijuana. The formerCo

py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

117

audience was the more consequential one, because the speech might push the innocents down the path to drug abuse. Dissonance predictions held only for this consequential condition, not for the speech to the already corrupted collegians.

The upshot of these studies (and others—we have simplified matters) was the eventual appearance of summary statements in journals and books, to the effect that opinion or belief change from expressing the other side of an issue is indeed greater when the reward is smaller, only if the behavior is voluntary,but publicly committal, and perceived to have negative consequences.

Actually, the first qualification, that subjects should think they had a free choice, had been predicted by dissonance theory (Brehm & Cohen, 1962). The necessity for subjects to be committed to their behavior was strongly implicit in the theory as well. The requirement that the subject's behavior have negative consequences did not flow so readily from the theory, and required some effort to make it fit. The procedures of the Festinger and Carlsmith (1959) experiment, it should be noted, satisfied the three constraints, although only the factor of choice was explicit. This could be considered lucky for the investigators, although I suspect that Festinger knew intuitively what sort of scenario would work.

Interest in a striking phenomenon tends to be maintained when its prevailing explanation does not quite cover all the empirical buts. Typically there is tension between the integrity of a theory and the need to stretch it to accommodate qualifications. If the theory is stretched, it may no longer be the same theory. An often cited example is the addition of ugly epicycles to make Ptolemy's conception of the heavenly bodies fit orbital observations.

In the dissonance case, the account of self-persuasion for a small reward has gradually been bent into a different form. The subject is no longer seen as motivated merely by inconsistency between his beliefs and his behavior. When he voluntarily agrees to exercise a harmful public deception for a payment of 50¢ or $1, he is making a fool of himself. Belief change can be seen as an attempt to justify an otherwise sleazy performance (Aronson, 1969). Under this view, variously articulated by several social psychologists, the subject is motivated by self-esteem maintenance rather than mere inconsistency reduction.

Thesis, Antithesis, Synthesis

In a very thoughtful article, Tesser (1990) discussed the nature of interesting ideas in psychological research. He suggested that ideally, psychological hypotheses should be about processes rather than static abstractions, and that empirical results should tell a story. (As he put it, if you want to avoid boring ideas, tell yourself “process, process, process, plot, plot, plot.”)

One of his formulas for producing surprise and interest is this: For any thesis, generate the antithesis, and propose a synthesis. In our lingo, start with a tick, find a but for it, and finally reframe the issue so that the but becomes another tick. In one example Tesser (1990) gave, the thesis is that an outstanding performance by someone produces jealousy in those close to him. But the opposite can also be demonstrated, whereby an excellent performance arouses pride in close others, a “basking in reflected glory” (Cialdini et al., 1976). The resolution of this contradiction (Tesser, 1988) is that what is at stake for the close other is the maintenance of his self-esteem. (The reappearance here of this particular concept is coincidental.) If the outstanding performance is on some activity that is relevant to his self-esteem, he is likely to be jealous; if it is irrelevant, he will bask in reflected glory.

Quantifying Surprisingness

A Basic Formula. If we attend not merely to ticks and buts, but also to the magnitudes of expected and observed effects, we can roughly quantify our intuitions about surprisingness. In we mentioned achapter 3 surprisingness coefficient (S); we now formalize it.

That a result is surprising means that it is much stronger or weaker, or even in the reverse direction from what we expected. This suggests that we choose some directional magnitude measure by which expectation and observation can be compared. In the simplest type of Bayesian analysis, the probability of a hypothesis is assessed after the data are gathered, and can be compared with the prior probability. For the reasons given in , however, an observed effect size in relation to the expected effect size seems achapter 3 preferable magnitude concept to use for indexing surprisingness.

Co py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

118

The measure of effect size could be a mean difference, raw or standardized; a correlation coefficient (Rosenthal, 1991); or a causal efficacy, objective or subjective. Staunch Bayesians could, if they wished, stick with probability as a measure. Denoting the measure of magnitude by m, our surprisingness coefficient is: S = (m[o] - m[e]) /| m[o] | + | m[e] |. Here, m[o] is the magnitude of effect for the2 observed comparison of interest in a specified context, and m[e] is the magnitude, assuming generalexpected consensus for the expectation. A null outcome would set m[o] to zero, and a null expectation would set7

m[e] to zero. (If the study concerned a novel relationship about which there was no expectation one way or another, then m[e] would be undefined, and the formula inapplicable.) Effect sizes are of course directional; when m[e] and m[o] are opposite in sign, this indicates that the outcome was the reverse of what was expected. The rationale for the formula is that it is the simplest expression intuitively capturing how surprise arises as a function of effect size, with the constraint that we want the measure of surprise to be in the same units as the measures of effect size.

TABLE 8.1 Potential Values of the Surprisingness Coefficient

Case Expected Effect Size Observed Effect Size Surprisingness Coefficient

A 0 .5 .5 B .1 .5 .27 C .3 .5 .05 D .5 .5 0 E .5 .3 .05 F .5 0 .5 G .5 -.5 1.0 H .7 -.7 1.4 I u -v u + v when u, v > 0.

Behavior of the Coefficient It proves helpful to play with the aforementioned formula, inserting different hypothetical values for the effect magnitude. For simplicity of illustration, let us take the correlation coefficient as the measure m, potentially ranging from 1 to +1. specifies the coefficient S for8 Table 8.1 several basic situations.

As the table indicates for Cases A–D, when the observed effect size is .5 (i.e., when the observed correlation between some putative cause and its claimed effect is .5), the surprisingness coefficient S is itself .5 when the expected correlation was zero, but declines very sharply as the expected correlation increases. Of course, if the expected correlation were also .5, there would be no surprise whatever.

In symmetrical fashion, if we fix the expected correlation at .5, and ask how S varies as the observed correlation falls off below .5 (Cases D–G), we find that surprisingness increases. This increase is gradual at first, but when the observed correlation is zero in the face of the expectation that the correlation should be about .5, S is again .5. If we expect .5 and we get minus .5, surprisingness doubles, to 1.0. As Cases H and I indicate, surprisingness can go even higher, if expectation and observation are both strong, and opposite in direction. Using correlation as the effect size measure, the maximum value of S is 2.0.

Blurring of the Coefficient According to our formula, the absence of an effect that everybody expected is as surprising as the presence of an effect that nobody expected. This seems to be a reasonable intuition, although we should note a qualification. Expectations can sometimes be sharply focused at the exact value of zero; for example, doubters of ESP can believe that there simply is no such thing. By contrast, outcomes are never sharply focused at exactly zero (or at any other exact value, for that matter)—there is always some confidence interval within which the results may be said to lie. (Gilovich et al., 1985, didn't prove the nonexistence of the hot hand in basketball; their results only imply that the effect, if any, is of limited magnitude.) Demonstrations of the absence of an effect are thus apt to be more uncertain than expectations of its absence. Accordingly, allowance should be made for the blurring of an m[o] of zero (or any other fixed central value) throughout some range around zero (or other central value). Actually, if we follow this line of reasoning, we should also allow for the blurring of expected effect magnitudes other than zero. The general case, then, would be one in which m[o] and m[e] had probability distributions, the only fixed case

Co py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

119

being an occasional m[e] of zero. In all cases, the coefficient S would have a distribution, rather than a single value. However, this analysis carries us beyond the level of sophistication we need or wish to achieve here. When we refer to m [o] and m [e], therefore, we are supposing that central values for each are sufficiently exact for our purposes.

Heterogeneous Initial Beliefs

Although we are choosing not to analyze variations within research observers, we must confront variations between observers. This is because, in any real research situation, different investigators may have different beliefs about the magnitude (or even the very existence) of a given phenomenon.

Consider the simplest case of heterogeneous beliefs, in which there are two groups of scientists, each with a different level of belief in a theoretically important phenomenon. Suppose that for the larger of the two groups, the expected effect size is zero; that is, they disbelieve the existence of the phenomenon. The minority group has some positive expectation, m[e]. Imagine that members of this group want to convince the skeptical majority. They run an experiment on the phenomenon, and observe a positive result with effect size m[o] precisely equal to their expectation m[e]. The surprisingness of this result stands to be different for the two groups: For the skeptical majority, the surprisingness coefficient apparently equals m[o], the magnitude of the unanticipated effect; for the minority group, the coefficient equals zero, because the result is exactly what they expected.

At this point, if the two groups were rivalrous and uncivil, the dialogue between them might be caricatured thus:

Minority: There, you see! result ought to surprise you!That Majority: (Defensively): Doesn't surprise us at all. It's only an illusion. Your experiment is flawed, and

we don't accept your claim. Minority: And why is that? Majority: Because… [ARGUMENT]. Minority: But…[COUNTERARGUMENT]! And furthermore…. Majority: Don't waste your breath. We're not interested. Minority: And you call yourselves scientists!

Here, one group tries to command the attention of the other by surprising them with data they don't expect, and—because surprisingness along with importance creates interest—thence to interest them. The second group can appear unsurprised and disinterested by declining to accept the validity of the data. This9

of course requires the development of an argument criticizing the claim made by the first group, possibly supported by the presentation of a replication that fails to confirm the claim (see .) If the majority ischap. 9 unable to damage the claim by the minority, but is not yet willing to give up, their state of mind might be described as reluctant interest: They have to pay attention.

Contextual Qualification of Beliefs

Let us elaborate our analysis of surprisingness to apply to the case in which the expected effect size is of magnitude r, but the observed effect is contingent upon some context variable. In the presence of a particular context variable, the result indeed comes out to be of magnitude r, but in its absence, the effect vanishes. How surprised ought the observer to be?

A simple approach to this type of question is to take the mean of the surprisingness coefficients calculated with the context factor present and with it absent, respectively. In the specified case, the coefficients for these two situations are 0 and r, yielding an average surprisingness of r/2. In other words, when an observer expects a general effect, and it fails in half the situations, he will be half as surprised as he would be were the expected effect to fail universally. This is of course a rough way of stating our intuitions.

If the observer were to anticipate that the context variable might make a difference, she could have different expectations about the effect size for the context present and context absent situations. TheCo

py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

120

observer might be somewhat off in one or both of her two expectations, and this would occasion a modicum of surprise. If, on the other hand, she correctly foresaw the two respective effect sizes, her surprise would be zero. Of course, other researchers might be surprised and interested, and their beliefs might change.

The process of introducing further context variables that produce unanticipated differences, and therefore elicit surprise, could in principle continue indefinitely. However, the greater the number of combinations of context factors, the less impact minor contextual refinements would have—which is another way of saying that research areas tend to lose interest when everyone learns to understand approximately what outcomes to expect under the most crucial circumstances.

IMPORTANCE

Interestingness, we have said, depends on importance as well as surprisingness. The importance of any single empirical result is a direct function of the number of consequences it has for relationships between variables pertinent to the issue at hand. The importance of the issue, in turn, depends on its density of connections to other (important) issues. Insights about cancer are more important than insights about callouses, because more people (and more biological and psychological phenomena) are more deeply affected by the former. This example is obvious, and may make it seem that a judgment of importance is easy. In fact this is a hard judgment to make, especially for theoretical rather than applied research, because the ramifications of theories are often difficult to anticipate.

Differences in Importance for Different Investigators

What seems important to some investigators may seem unimportant to others. If I have heard somewhere that all mammals have periods of rapid eye movements (REM) during sleep, and I idly run across a research report that there is an Australian armadillo that does not show REM sleep, I will hardly be riveted with fascination. I care so little one way or the other that as I write this, I'm not even sure I have my facts straight. Of course, there will be researchers or others who have concerned themselves with REM sleep or armadillos, who would find the new fact interesting. For me, the matter is peripheral; for them, it is central.

The Rlusion of Importance. Indeed, scholars of a particular topic are prone to generate dense networks of conceptual relationships within the topic area, thus lending by the sheer weight of number of relationships an aura of apparent importance to each contribution to the topic. But to nonspecialists in the10

area, the topic might have very little importance, because knowledge gained therein does not shed much light on the understanding of other topics. We refer to this phenomenon as the .illusion of importance

This skeptical characterization of narrow, ingrown fields of research may seem unfair, because one cannot confidently anticipate whether connections to other research fields or practical applications will be forthcoming in any given case. In my framing of this phenomenon, I mainly want to emphasize that knowing a lot about a particular subject matter creates subjective importance for it, whether or not it is objectively warranted.

The exhaustive study within cognitive psychology many years ago of the principles of learning of lists of nonsense syllables may be a case in point. Despite the density of knowledge on this topic, the whole enterprise (arguably) lacked major importance because its findings did not extend well to the learning of meaningful prose—or for that matter, to the learning of content material that was not in the form of lists requiring rote memorization.

We do not attempt to develop a formula for importance. That would require a model of knowledge representations carrying us far afield from our core concerns. Nevertheless, the key question to ask in diagnosing the importance of a given result is, “What can I learn from this about other things that are also important?”

1There are examples, of course, in which statistical analysis supports a popular belief. One illustration concerns the “long, hot summer” hypothesis for the urban riots of the late 1960s, namely, that the probability of riot occurrence

Co py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

121

increased with increasing temperature. The psychological notion here is that the hotter the day, the greater the discomfort of an already frustrated inner-city population, lowering its threshold for an explosion of anger. Baron and Ransberger (1978) examined maximum ambient temperatures for the dates and locations of the occurrence of riots during the turbulent summers of 1967–1971. On the basis of their statistical analysis and a refined reanalysis by Carlsmith and Anderson (1979), one can be quite confident that higher temperatures were systematically associated with greater riot propensity in those summers.

2For a discussion of distinctions between knowledge and belief, see Abelson (1979). 3Anyone who has ever heard old-style Soviet academics argue will recognize the typical opening line of vicious

criticisms: “Comrade Potchky's analysis of the problem is very interesting. However,…[whamski, bamski, socko].” 4The “thin red line” between implausibility and sheer lunacy is notoriously difficult to locate, unfortunately, so that

we encounter here a problem similar to the one we met in in our discussion of Bayesian prior probabilities.chapter 3 From the standpoint of the challenger of orthodoxy, to have one's creative visions dismissed out of hand as mad delusions seems quite unfair. Indeed, once in a blue moon, it is.

5 If replications support Gilbert's (1991) findings in favor of the Spinozan conception, the implications are extensive. At an informal level, light is shed on several apparently senseless eccentricities of human behavior, such as why people deliberately set their watches 5 minutes fast, and why children are so monumentally upset when repeatedly called by the wrong name. At a theoretical level, a number of phenomena in the literatures on persuasion and propaganda become more intelligible: the surprising effectiveness of the Big Lie in propaganda, the persuasive effects of distraction in communication (Festinger & Maccoby, 1964), and the success of persuasion in fiction (Gerrig & Prentice, 1991), to name three.

6 The subjects really only have an of free choice. In the typical high-choice manipulation, the experimenterillusion gives the subject a very effective soft sell, emphasizing that the subject's participation would really be appreciated, but that “It's entirely up to you.’’ Virtually all subjects cooperate.

7 If probability were used as the magnitude measure, the nil value for hypotheses would be = .5.directional p 8 Any other measure of effect size, such as d, could be used instead of r in the formula. The measure S is to be

interpreted in the scale units of the measure of effect magnitude. 9 Old-timers in psychology will recall the feisty exchanges between the Hullians, with their behaviorist view of

learning, and the cognitivist Tolmanians. Poor Tolman kept inventing ever and ever more colorful attempts to demonstrate the existence of “cognitive maps” in rats, hoping to get a rise out of the Hullians. He never succeeded, although years later, cognitive approaches became respectable in learning theory.

10 The tendency for scholars to want to become expert in tiny domains has been lampooned in the old aphorism, “College deans learn less and less about more and more, until they know nothing about everything. Professors learn more and more about less and less, until they know everything about nothing.”

With a cosmic metaphor, social psychologist Richard Nisbett (personal communication, April 30, 1994) said of a particular field of psychology, “Sometimes I think that…[this field]… has imploded, and become a white dwarf.”

Co py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

122

9    Credibility of Argument

To this point, we have discussed four criteria affecting the persuasiveness of arguments based on the statistical analysis and interpretation of empirical data: magnitude, articulation, generality, and interestingness. Failure to satisfy one or more of these criteria will weaken the force of the investigator's argument, increasing the likelihood that the results will be ignored. Indeed, the research may not even be published. A good rule of thumb—the —is that two deficiencies among these fourrule of two criticisms criteria will result in rejection by journal editors.1

By contrast, if these four criteria are apparently met satisfactorily, but the research claim lacks credibility, the reported results will be likely to set off debate. When research presentations advance claims that many or most readers deem incredible, these claims are vulnerable to severe challenge. In response, there will typically be a rebuttal by the investigator, and then a fresh round of criticism. The burden of proof shifts back and forth between the investigator and the critic in what might be called the game of “burden tennis.”

WHY RESEARCH CLAIMS ARE DISBELIEVED

There are two different ways in which a research claim may not seem credible to an audience: The claim may be based on poor methodology; or it may contradict a strongly held conception—a popular theory, a world view, or even just common sense.

Characteristically, the critic who disbelieves a research claim primarily for conceptual reasons will nevertheless bolster his case by putting forward one or more , that is, complaintsmethodological objections about the research design or the statistical analysis. It is contrary to scientific norms to reject an empirical finding solely. because the critic does not believe it. It would seem arrogant for a journal editor to write the investigator, “We are rejecting your manuscript, , because we just don't believe it.”The Life Force in Snails The editor and the reviewers might feel that way, but protocol restrains such brutal frankness. The editor might politely suggest another journal as more suitable for this manuscript, or—more interesting for our discussion—the editor might follow the rule of two criticisms. That is, there would be some mention of objections to the questionable conceptual status of the result being proposed, but also an elaboration of one or more methodological criticisms. Often there is consensus about poor methodology in a field, so that2

such criticism can be powerful. If, however, the methodological attack is persuasively countered by the investigator, there is a chance

that a revised manuscript might be reconsidered and accepted, thus opening the matter for general debate. Among other conceivable outcomes, the claim that was once incredible might eventually be vindicated.

Although not always perfect models of objectivity and decorum, debates can be creatively constructive. This is among the reasons why it has lately become fashionable for various disciplines in the behavioral sciences to sponsor journals such as that welcome—indeed,The Behavioral and Brain Sciences promote—clashing views. A scholar with a position considered controversial or outrageous is asked to write a target article. A number of critics, supporters, and wise old heads contribute commentaries, to which the target person responds.

Empirical evidence free of methodological flaws is crucial to putting one's best case forward in an extended public debate. Success depends on the details of research design and procedure, and of statistical analysis, in relation to the status of the debate. Good research management includes good debate management. In the examples that follow, we consider how empirical results influence argument, and how argument stimulates new studies.Co

py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol

123

THE STRUCTURE OF DEBATES ON DISBELIEVED CLAIMS

When a Positive Universal is Challenged by a Counterexample

The prototypic debate starts with a research result that challenges a strongly held current theory or belief system. Often, the nature of the challenged belief is that of a positive or negative , a statementuniversal with the structure, “All X is Y,” or “Under conditions C, phenomenon P is impossible.” Challenge comes in the form of an empirical purporting to show an X that is not Y, or an instance of Pcounterexample under conditions C. For example, the universal proposition that all human behavior is self-interested would be challenged by a research demonstration of pure altruism.

To give life to the subsequent dialogue in these scripted debates, we name the character expressing the universal Professor Neat, and the rebel with the counterexample Professor Scruffy. (The former name suggests a preference for orderly, formal statements and procedures; the latter connotes a tolerance for realism and messiness; Abelson, 1981). Their dialogue might go like this:

Neat: Xs are always Y. Scruffy: In my experiment, subjects high on X were randomly assigned to a foofram or a no-foofram

condition. Sixty-three percent of the foofram subjects showed no Y whatsoever. Thus Y is not inevitably associated with X.

Neat: The so-called refutation by Scroughy of the (X,Y) Law is clearly an artifact. Fooframming3

inhibits the registration of Y on the standard measure. Thus it is not surprising that many subjects appeared to be at zero.

[or]     Scruffy's attack on the (X,Y) law is not justified by his experiment. His data are irrelevant, for

the reason that true Xs do not occur in his subject population.

These two possible volleys by Neat are similar in style—they dismiss the challenge—but they differ in nature. The first puts it that the methodology was biased so as to disconfirm the universal. The second states, in effect, that Scruffy lacked conceptual understanding of the universal, as he didn't even use a relevant sample of subjects.

Scruffy has potential ways to respond to each of these shots. To the accusation of artifact, an effective line of defense is to give statistical or procedural details that undercut the accusation (e.g., by revealing that 34% of the non-foofram group also showed zero Y). To the conceptual criticism, he must respond conceptually (e.g., by arguing why his subjects are indeed Xs).

An entirely different tack may be taken by Scruffy. He or she may produce another counterexample. If the same rejoinders by Neat do not apply to the new study, then Neat is driven into the position of proposing different artifacts and conceptual rationalizations for each example. Extrapolate to a half-dozen counterexamples, and Neat's ability to retort coherently seems to shrink to zero. Yet, as we soon see, Neat has a last-ditch rhetorical weapon.

Example: The Model of the Rational Actor. The interdisciplinary field of behavioral economics has produced controversy that illustrates the phenomenon of repeated exceptions to a universal. Mainstream economic theory depends heavily on a universalistic model of rational actors all trying to maximize their economic well-being. Political theory, meanwhile, has been massively influenced by the analogue model of political actors seeking to maximize their political self interests. Suitably mathematized and operationalized, this model has the facility to spin out predictions and explanations in a wide variety of economic and political situations.

Neats love the model. If data would conform to its predictions, the model would unify an impressive array of phenomena. When the rational actor model has been tested, however, either the tests have been

Co py ri gh t © 1 99 5. P sy

ch ol og y Pr es s. A ll r ig ht s re se rv ed . Ma y no t be r ep ro du ce d in a ny f or m wi th ou t pe rm is si on f ro m th e pu bl is he r, e xc ep t fa ir u se s pe rm it te d un de r U. S. o r ap pl ic ab le

co py ri gh t la w.

EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 11/2/2021 8:37 PM via SAINT LEO UNIVERSITY AN: 19299 ; Robert P. Abelson.; Statistics As Principled Argument Account: stleocol