DB2C
I n Chapter 1, we read about asking and answering good questions, as well as constructing and using good concepts. Those are often considered the fundamental building blocks of comparative political analysis. But how does comparative politics use the scientific method to answer its questions? This second chapter includes readings that cast light on this question.
The readings move from the more general to the more specific. We begin with a classic work by the philosopher Karl Popper on the logic of hypothesis testing (or what he calls the method of “conjectures and refutations”). If Popper is right, these arguments apply to hypothesis testing in general across the sciences, including social science. We then turn to a contemporary classic by three social scientists (Gary King, Robert Keohane, and Sidney Verba), Designing Social Inquiry, which makes a distinctive argument about a common logic across different types of social science analysis. If they are right, both mathematically oriented researchers that use many cases and qualitatively oriented researchers with their in-depth studies of one case or several cases are doing something logically similar when they try to test their hypotheses: they are using empirical evidence to draw “causal inferences.” This is followed by a recent piece on comparative-historical analysis (by James Mahoney and Celso Villegas), one that gets close to the heart of the main type of comparative political analysis, about which you will read in later sections of this book.
Paying attention to the logic of using comparisons to test hypotheses is important. Indeed, this is what this discipline is all about. Comparative politics, above all, is a method. The readings in this chapter should be useful to you independent of the sorts of comparative political science questions you want to understand. In other words, they are equally relevant to questions about economic development, about democracy, about nationalism, or about institutional design, among many other topics we might study.
2.1 OBJECTIVE KNOWLEDGE
An Evolutionary Approach
In this excerpt, the philosopher of science Karl Popper provides a concise articulation of his highly influential conception of how science progresses. Many people think that science advances by test-ing hypotheses and proving them correct. According to Popper, this is naive. Although we can never be sure that our theories and hypotheses are true, we can try our best to prove them wrong. If we are unable to prove them wrong despite our best efforts, we can draw the conclusion that they are prob-ably true: the notion that science achieves absolute certainty is a myth. Although Popper discusses all sciences, as you read, consider how this excerpt applies to comparative politics in particular.
I have so often described what I regard as the self-correcting method by which science proceeds that I can be very brief here: The method of science is the method of bold conjectures and ingenious and severe at-tempts to refute them.
A bold conjecture is a theory with a great content—greater at any rate than the theory which, we are hoping, will be superseded by it.
That our conjectures should be bold follows immediately from what I have said about the aim of science and the approach to truth: boldness, or great content, is linked with great truth content; for this reason, falsity content can at first be ignored
But an increase in truth content is in itself not sufficient to guarantee an increase in verisimilitude; since increase in content is a purely logical affair, and since increase in truth content goes with increase in content, the only field left for scientific debate—and especially to empirical tests—is whether or not the falsity content has also increased. Thus our competitive search for verisimilitude turns, especially from the empirical point of view, into a competitive comparison of falsity contents (a fact which some people regard as a paradox). It seems as if it holds in science also that (as Winston Churchill once put it) wars are never won but always lost.
We can never make absolutely certain that our theory is not lost. All we can do is to search for the falsity content of our best theory. We do so by trying to refute our theory; that is, by trying to test it severely in the light of all our objective knowledge and all our ingenuity. It is, of course, always possible that the theory may be false even if it passes all these tests; this is allowed for by our search for verisimilitude. But if it passes all these tests then we may have good reason to conjecture that our theory, which as we know has a greater truth content than its predecessor, may have no greater falsity content. And if we fail to refute the new theory, especially in fields in which its predecessor has been refuted, then we can claim this as one of the objective reasons for the conjecture that the new theory is a better approximation to truth than the old theory.
CR I T ICAL DISCUSSION, RATIONAL PREFERENCE, AND THE PROB LEM
OF THE ANALY T I C I TY OF OUR CHOICES AND PREDIC TIONS
Seen in this way, the testing of scientific theories is part of their critical discussion; or, as we may say, it is part of their rational discussion, for in this context
I know no better synonym for “rational” than “critical.” The critical discussion can never establish sufficient reason to claim that a theory is true; it can never “justify” our claim to knowledge. But the critical discussion can, if we are lucky, establish sufficient reasons for the following claim:
This theory seems at present, in the light of a thorough critical discussion and of severe and inge-nious testing, by far the best (the strongest, the best tested); and so it seems the one nearest to truth among the competing theories.”
To put it in a nutshell: we can never rationally jus-tify a theory—that is, a claim to know its truth—but we can, if we are lucky, rationally justify a preference for one theory out of a set of competing theories, for the time being; that is, with respect to the present state of the discussion. And our justification, though not a claim that the theory is true, can be the claim that there is every indication at this stage of the discussion that the theory is a better approximation to the truth than any competing theory so far proposed.
Let us now consider two competing hypotheses h1 and h2. Let us abbreviate by dt some description of the state of the discussion of these hypotheses at the time t, including of course the discussion of relevant experimental and other observational results. Let us denote by
(1) c(h1 , dt ) , c(h2 , dt )
the statement that the degree of corroboration of h1 the light of the discussion dt in . ) is inferior to that of h2 And let us ask what kind of assertion (1) is.
In actual fact (1) will be a somewhat uncertain assertion, if for no other reason than that c(h1 , dt changes with the time t, and can change as fast as thought. In many cases, the truth or falsity of (1) will be just a matter of opinion.
But let us assume “ideal” circumstances. Let us assume a prolonged discussion which has led to stable results, and especially to agreement on all the evidential components, and let us assume that there is no change of opinion with t for some considerable period.
Under such circumstances we can see that, while are of course empirical, the evidential elements of dt the statement (1) can be, provided dt is sufficiently ex-plicit, logical or (unless you dislike the term) “analytic.
This is particularly clear if c(h1 , dt ) should be negative, because the agreement of the discussion at time t is that the evidence refutes h1 , while c(h2 positive, because the evidence supports h2 take h1 to be Kepler’s theory, and h2 , dt ) is . Example: to be Einstein’s theory. Kepler’s theory may be agreed at time t to be refuted (because of the Newtonian perturbations), and Einstein’s theory may be agreed at time t to be supported by the evidence. If dt to entail all this, then
(1) c(h1 , dt ) , c(h2 , dt )
amounts to the statement that some unspecified negative number is smaller than some unspecified positive number, and this is the kind of statement which may be described as “logical” or “analytic.
Of course, there will be other cases; for example, ” is merely a name like “the state of the discus-if “dt sion on 12 May 1910.” But just as one would say that the result of the comparison of two known magnitudes was analytic, so we can say that the result of the comparison of two degrees of corroboration, if sufficiently well known, will be analytic.
But only if the result of the comparison is sufficiently well known can it be said to be the basis of a rational preference; that is, only if (1) holds can we say that h2 is rationally preferable to h1
Let us see further what will happen if h2 sense explained is rationally preferable to h1 . in the : we shall base our theoretical predictions as well as the practical decisions which make use of them upon h2 than upon h1
All this seems to me straightforward and rather trivial. But it has been criticized for the following reasons.
If (1) is analytic, then the decision to prefer h2 to h1 is also analytic, and therefore no new synthetic predictions can come out of the preference for h2 over h1
I am not quite certain, but the following seems to me to sum up the criticism which was first advanced by Professor Salmon against my theory of corroboration: either all the steps described are analytic—then there can be no synthetic scientific predictions; or there are synthetic scientific predictions—then some steps cannot be analytic, but must be genuinely synthetic or ampliative, and therefore inductive.
I shall try to show that the argument is invalid as a criticism of my views, h2 is, as is generally admit-ted, synthetic, and all (non-tautological) predictions are derived from h2 (1). This is enough to answer the criticism. The question why we prefer h2 over h is to be answered by reference to dt which, if sufficiently specific, is also non-analytic.
The motives which led to our choice of h2 cannot alter the synthetic character of h2 The motives—in contrast to ordinary psychological motives—are rationally justifiable preferences. This is why logic and analytic propositions play a role in them. If you like, you can call the motives “analytic.” But these analytic motives for choosing h2 never make h2 true, to say nothing of “analytic”; they are at best logically inconclusive reasons for conjecturing that it is the most truth like of the hypotheses competing at the time t
SC I ENCE: THE GROWTH OF KNOWLEDGE THROUGH CR IT I C I SM AND I NVENT I VENESS.
I see in science one of the greatest creations of the human mind. It is a step comparable to the emergence of a descriptive and argumentative language, or to the invention of writing. It is a step at which our explanatory myths become open to conscious and consistent criticism and at which we are challenged to invent new myths. (It is comparable to the conjectural step in the early days of the genesis of life when types of mutability became an object of evolution through elimination.).
Long before criticism there was growth of knowledge—of knowledge incorporated in the genetic code. Language allows the creation and mutation of explanatory myths, and this is further helped by writ-ten language. But it is only science which replaces the elimination of error in the violent struggle for life by non-violent rational criticism.
GARY KING, ROBERT KEOHANE, AND SIDNEY VERBA
2.2 DESIGNING SOCIAL INQUIRY
Scientific Inference in Qualitative Research
The following is an excerpt from an important book on the theory and methodology of social science by Gary King, Robert Keohane, and Sidney Verba. The book, which has been a staple of the training of graduate students in political science and related fields since the 1990s, is often referred to as KKV (after the last names of the three authors). The book is famous for its central argument that quantitative (that is, statistical) and qualitative (that is, descriptive/narrative) approaches, rather than being two completely different ways to do social science, are actually based on a single logic: “causal inference.” They believe that although social science indeed tries to answer “why” questions as discussed in the previous section, causes as such are not directly observable. For ex-ample, a candidate may give a certain reason in deciding to run for office, but we cannot really get inside her head to observe whether that is the true cause behind her decision (nor can we even be sure that she really knows all the causes of her action). This presents challenges for causal inference. At the level of large-scale social processes, we see something similar: for example, economic development is correlated with democracy. We can see this using statistics. However, we cannot directly see whether one is causing the other through these same statistics. Instead, we must try to think of ways of drawing inferences about causality. For example, perhaps we think that economic development causes democracy because it creates a middle class, as a number of scholars have argued: we can search for data that would allow us to see whether the growth of a middle class tends to “mediate” between development and democratization. Do you think that KKV privileges the logic of quantitative research at the expense of qualitative methods?
THE SCIENCE IN SOC I AL SC I ENCE
INTRODUC TION
Two Styles of Research, One Logic of Inference
Our main goal is to connect the traditions of what is conventionally denoted “quantitative” and “qualitative” research by applying a unified logic of inference to both. The two traditions appear quite different; indeed they sometimes seem to be at war. Our view is that these differences are mainly ones of style and specific technique. The same underlying logic pro-vides the framework for each research approach. This logic tends to be explicated and formalized clearly in discussions of quantitative research methods. But the same logic of inference underlies the best qualitative research, and all qualitative and quantitative re-searchers would benefit by more explicit attention to this logic in the course of designing research.
The styles of quantitative and qualitative research are very different. Quantitative research uses numbers and statistical methods. It tends to be based on numerical measurements of specific aspects of phenomena; it abstracts from particular instances to seek general description or to test causal hypotheses; it seeks measurements and analyses that are easily replicable by other researchers.
King, Gary, Robert Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press.
Qualitative research, in contrast, covers a wide range of approaches, but by definition, none of these approaches relies on numerical measurements. Such work has tended to focus on one or a small number of cases, to use intensive interviews or depth analysis of historical materials, to be discursive in method, and to be concerned with a rounded or comprehensive ac-count of some event or unit. Even though they have a small number of cases, qualitative researchers generally unearth enormous amounts of information from their studies. Sometimes this kind of work in the social sciences is linked with area or case studies where the focus is on a particular event, decision, institution, location, issue, or piece of legislation. As is also the case with quantitative research, the instance is often important in its own right: a major change in a nation, an election, a major decision, or a world crisis. Why did the East German regime collapse so suddenly in 1989? More generally, why did almost all the communist regimes of Eastern Europe collapse in 1989? Sometimes, but certainly not always, the event may be chosen as an exemplar of a particular type of event, such as a political revolution or the decision of a particular community to reject a waste disposal site. Sometimes this kind of work is linked to area studies where the focus is on the history and culture of a particular part of the world. The particular place or event is analyzed closely and in full detail.
A major purpose of this book is to show that the differences between the quantitative and qualitative traditions are only stylistic and are methodologically and substantively unimportant. All good research can be understood—indeed, is best understood—to derive from the same underlying logic of inference. Both quantitative and qualitative research can be systematic and scientific. Historical research can be analytical, seeking to evaluate alternative explanations through a process of valid causal inference. History, or historical sociology, is not incompatible with social science (Skocpol 1984: 374–86).
Breaking down these barriers requires that we begin by questioning the very concept of “qualitative” research. We have used the term in our title to signal our subject matter, not to imply that “qualitative” research is fundamentally different from “quantitative” research, except in style.
Most research does not fit clearly into one category or the other. The best often combines features of each. In the same research project, some data may be collected that is amenable to statistical analysis, while other equally significant information is not. Patterns and trends in social, political, or economic behavior are more readily subjected to quantitative analysis than is the flow of ideas among people or the differ-ence made by exceptional individual leadership. If we are to understand the rapidly changing social world, we will need to include information that cannot be easily quantified as well as that which can. Further-more, all social science requires comparison, which entails judgments of which phenomena are “more” or “less” alike in degree (i.e., quantitative differences) or in kind (i.e., qualitative differences).
Defining Scientific Research in the Social Sciences
Our definition of “scientific research” is an ideal to which any actual quantitative or qualitative research, even the most careful, is only an approximation. Yet, we need a definition of good research, for which we use the word “scientific” as our descriptor.1 This word comes with many connotations that are unwarranted or inappropriate or downright incendiary for some qualitative researchers. Hence, we provide an explicit definition here. As should be clear, we do not regard quantitative research to be any more scientific than qualitative research. Good research, that is, scientific research, can be quantitative or qualitative in style. In design, however, scientific research has the following four characteristics:
1. The goal is inference. Scientific research is de-signed to make descriptive or explanatory inferences on the basis of empirical information about the world. Careful descriptions of specific phenomena are often indispensable to scientific re-search, but the accumulation of facts alone is not sufficient. Facts can be collected (by qualitative or quantitative researchers) more or less systematically, and the former is obviously better than the latter, but our particular definition of science requires the additional step of attempting to infer beyond the immediate data to something broader that is not directly observed. That something may involve descriptive inference—using observations from the world to learn about other unobserved facts. Or that something may involve causal inference—learning about causal effects from the data observed. The domain of inference can be restricted in space and time—voting behavior in American elections since 1960, social movements in Eastern Europe since 1989—or it can be extensive—human behavior since the invention of agriculture. In either case, the key distinguishing mark of scientific research is the goal of making inferences that go beyond the particular observations collected.
2. The procedures are public. Scientific research uses explicit, codified, and public methods to generate and analyze data whose reliability can there-fore be assessed. Much social research in the qualitative style follows fewer precise rules of re-search procedure or of inference. As Robert K. Merton ([1949] 1968:71–72) put it, “The sociological analysis of qualitative data often resides in a private world of penetrating but unfathomable insights and ineffable understandings. . . . [How-ever,] science . . . is public, not private.” Merton’s statement is not true of all qualitative researchers (and it is unfortunately still true of some quantitative analysts), but many proceed as if they had no method—sometimes as if the use of explicit meth-ods would diminish their creativity. Nevertheless they cannot help but use some method. Somehow they observe phenomena, ask questions, infer in-formation about the world from these observations, and make inferences about cause and effect. If the method and logic of a researcher’s observations and inferences are left implicit, the scholarly community has no way of judging the validity of what was done. We cannot evaluate the principles of selection that were used to record observations, the ways in which observations were processed, and the logic by which conclusions were drawn. We cannot learn from their methods or replicate their results. Such research is not a public act. Whether or not it makes good reading, it is not a contribution to social science.
All methods—whether explicit or not—have limitations. The advantage of explicitness 31 is that those limitations can be understood and, if possible, addressed. In addition, the methods can be taught and shared. This process allows research results to be compared across separate researchers and research projects, studies to be replicated, and scholars to learn.
3. The conclusions are uncertain. By definition, inference is an imperfect process. Its goal is to use quantitative or qualitative data to learn about the world that produced them. Reaching perfectly certain conclusions from uncertain data is obviously impossible. Indeed, uncertainty is a central aspect of all research and all knowledge about the world. Without a reasonable estimate of uncertainty, a description of the real world or an inference about a causal effect in the real world is un interpretable. A researcher who fails to face the issue of uncertainty directly is either asserting that he or she knows everything perfectly or that he or she has no idea how certain or uncertain the results are. Either way, inferences without uncertainty estimates are not science as we define it.
4. The content is the method. Finally, scientific research adheres to a set of rules of inference on which its validity depends. Explicating the most important rules is a major task of this book.2 The content of “science” is primarily the methods and rules, not the subject matter, since we can use these methods to study virtually anything. This point was recognized over a century ago when Karl Pearson (1892: 16) explained that “the field of science is unlimited; its material is endless; every group of natural phenomena, every phase of social life, every stage of past or present development is material for science. The unity of all science consists alone in its method, not in its material.”
These four features of science have a further implication: science at its best is a social enterprise. Every researcher or team of researchers labors under limitations of knowledge and insight, and mistakes are un-avoidable, yet such errors will likely be pointed out by others. Understanding the social character of science can be liberating since it means that our work need not be beyond criticism to make an important contribution—whether to the description of a problem or its conceptualization, to theory or to the evaluation of theory. As long as our work explicitly addresses (or attempts to redirect) the concerns of the community of scholars and uses public methods to arrive at inferences that are consistent with rules of science and the information at our disposal, likely to make a contribution. And the contribution it is of even a minor article is greater than that of the “great work” that stays forever in a desk drawer or within the confines of a computer.
MA JOR COMPONENTS OF RE SEARCH DE SIGN
Social science research at its best is a creative process of insight and discovery taking place within a well-established structure of scientific inquiry. The first-rate social scientist does not regard a research design as a blueprint for a mechanical process of data-gathering and evaluation. To the contrary, the scholar must have the flexibility of mind to overturn old ways of looking at the world, to ask new questions, to revise research designs appropriately, and then to collect more data of a different type than originally intended. However, if the re-searcher’s findings are to be valid and accepted by scholars in this field, all these revisions and reconsiderations must take place according to explicit procedures consistent with the rules of inference. A dynamic process of inquiry occurs within a stable structure of rules.
Social scientists often begin research with a considered design, collect some data, and draw conclusions. But this process is rarely a smooth one and is not always best done in this order: conclusions rarely follow easily from a research design and data collected in accordance with it. Once an investigator has collected data as provided by a research design, he or she will often find an imperfect fit among the main research questions, the theory and the data at hand. At this stage, researchers often become discouraged. They mistakenly believe that other social scientists find close, immediate fits between data and research. This perception is due to the fact that investigators often take down the scaffolding after putting up their intellectual buildings, leaving little trace of the agony and uncertainty of construction. Thus the process of inquiry seems more mechanical and cut-and-dried than it actually is.
Some of our advice is directed toward researchers who are trying to make connections between theory and data. At times, they can design more appropriate data-collection procedures in order to evaluate a theory better; at other times, they can use the data they have and recast a theoretical question (or even pose an entirely different question that was not originally foreseen) to produce a more important research project. The research, if it adheres to rules of inference, will still be scientific and produce reliable inferences about the world.
Wherever possible, researchers should also improve their research designs before conducting any field re-search. However, data has a way of disciplining thought. It is extremely common to find that the best research design falls apart when the very first observations are collected—it is not that the theory is wrong but that the data are not suited to answering the questions originally posed. Understanding from the outset what can and what cannot be done at this later stage can help the researcher anticipate at least some of the problems when first designing the research.
Improving Theory
A social science theory is a reasoned and precise speculation about the answer to a research question, including a statement about why the proposed answer is correct. Theories usually imply several more specific descriptive or causal hypotheses. A theory must be consistent with prior evidence about a research question. “A theory that ignores existing evidence is an oxymoron. If we had the equivalent of ‘truth in advertising’ legislation, such an oxymoron should not be called a theory” (Lieberson 1992:4; see also Woods and Walton 1982).
The development of a theory is often presented as the first step of research. It sometimes comes first in practice, but it need not. In fact, we cannot develop a theory without knowledge of prior work on the subject and the collection of some data, since even the research question would be unknown. Nevertheless, despite whatever amount of data has already been collected, there are some general ways to evaluate and improve the usefulness of a theory. We briefly introduce each of these here but save a more detailed discussion for later chapters.
First, choose theories that could be wrong. Indeed, vastly more is learned from theories that are wrong than from theories that are stated so broadly that they could not be wrong even in principle.3 We need to be able to give a direct answer to the question: What evidence would convince us that we are wrong?
4 If there is no answer to this question, then we do not have a theory.
Second, to make sure a theory is falsifiable, choose one that is capable of generating as many observable implications as possible. This choice will allow more tests of the theory with more data and a greater variety of data, will put the theory at risk of being falsified more times, and will make it possible to collect data so as to build strong evidence for the theory.
Third, in designing theories, be as concrete as possible. Vaguely stated theories and hypotheses serve no purpose but to obfuscate. Theories that are stated precisely and make specific predictions can be shown more easily to be wrong and are therefore better.
Moreover, if we are wrong, we need not stop writing after admitting defeat. We may add a section to our article or a chapter to our book about future empirical research and current theoretical speculation. In this context, we have considerably more freedom. We may suggest additional conditions that might be plausibly attached to our theory, if we believe they might solve the problem, propose a modification of another existing theory or propose a range of entirely different theories. In this situation, we cannot conclude anything with a great deal of certainty (except perhaps that the theory we stated at the outset is wrong), but we do have the luxury of inventing new research designs or data-collection projects that could be used to decide whether our speculations are correct. These can be very valuable, especially in suggesting areas where future researchers can look.
Thinking Like a Social Scientist: Skepticism and Rival Hypotheses
The uncertainty of causal inferences means that good social scientists do not easily accept them. When told A causes B, someone who “thinks like a social scientist” asks whether that connection is a true causal one. It is easy to ask such questions about the research of others, but it is more important to ask them about our own research. There are many reasons why we might be skeptical of a causal account, plausible though it may sound at first glance. We read in the newspaper that the Japanese eat less red meat and have fewer heart attacks than Americans. This observation alone is interesting. In addition, the explanation—too much steak leads to the high rate of heart disease in the United States—is plausible. The skeptical social scientist asks about the accuracy of the data (how do we know about eating habits? what sample was used? are heart attacks classified similarly in Japan and the United States so that we are comparing similar phenomena?). Assuming that the data are accurate, what else might explain the effects: Are there other variables (other dietary differences, genetic features, life-style characteristics) that might explain the result? Might we have inadvertently reversed cause and effect? It is hard to imagine how not having a heart attack might cause one to eat less red meat but it is possible. Perhaps people lose their appetite for hamburgers and steak late in life. If this were the case, those who did not have a heart attack (for whatever reason) would live longer and eat less meat. This fact would produce the same relationship that led the researchers to conclude that meat was the culprit in heart attacks.
It is not our purpose to call such medical studies into question. Rather we wish merely to illustrate how social scientists approach the issue of causal inference: with skepticism and a concern for alternative explanations that may have been overlooked. Causal inference thus becomes a process whereby each conclusion becomes the occasion for further research to refine and test it. Through successive approximations we try to come closer and closer to accurate causal inference.
NOTES
1. We reject the concept, or at least the word, “quasi-experiment.” Either a research design involves investigator control over the observations and values of the key causal variables (in which case it is an experiment) or it does not (in which case it is no experimental research). Both experimental and no experimental research have their advantages and drawbacks: one is not better in all research situations than the other.
2. Although we do cover the vast majority of the important rules of scientific inference, they are not complete. Indeed, most philosophers agree that a complete, exhaustive inductive logic is impossible, even in principle.
3. This is the principle of falsifiability (Popper 1968). It is an issue on which there are varied positions in the philosophy of science. However, very few of them disagree with the principle that theories should be stated clearly enough so that they could be wrong.
4. This is probably the most commonly asked question at job interviews in our department and many others.
REFERENCES
Lieberson, Stanley. 1992. “Einstein, Renoir, and Greeley: Some Thoughts about Evidence in Sociology.” American Sociological Review 56 (February): 1–15.
Merton, Robert K. [1949] 1968. Social Theory and Social Structure. Reprint. New York: Free Press.
Pearson, Karl. 1892. The Grammar of Science. London: J. M. Dent and Sons, Ltd.
Popper, Karl R. 1968. The Logic of Scientific Discovery. New York: Harper and Row.
Skocpol, Theda. 1984. “Emerging Agendas and Re-current Strategies in Historical Sociology.” In Theda Skocpol, ed. Vision and Method in Historical Sociology. New York: Cambridge University Press.
Woods, John, and Douglas Walton. 1982. Argument: The Logic of the Fallacies. New York: McGraw-Hill Ryerson Ltd
JAMES MAHONEY AND CELSO M. VILLEGAS
2.3Historical Enquiry
And comparative Politics
Whereas the previous piece (by King, Keohane, and Verba) provides a general overview of the idea of causal inference and how it might be operative in social science, Mahoney and Villegas delve into the more specific issues involved in comparative-historical analysis. This type of analysis—which is foundational in comparative politics—is based on the recognition that social and political phenomena are processes that unfold in time. As such, we cannot just study them “in the present,” but must take the greater historical context into consideration. To be clear, this means that most questions of comparative politics are historical. This seems obvious in some cases (for example, how and why did the Bourbon and Pombaline Reforms in 18th-century Latin America impact subsequent political developments differently in the various colonies?), but it is equally true of more “contemporary cases” (for example, why are some current Latin American regimes more left leaning than others?). Mahoney and Villegas describe current trends in historically oriented comparative politics and highlight the differences between “within-case” and “cross-case” analysis, as well as different ways of explaining difference over time. Be particularly attentive to their discussion of the important concept of “path dependence.”
Historical enquiry has always been central to the field of comparative politics. Scholars from Alexis de Tocqueville and Max Weber to Gabriel Almond and Seymour Martin Lipset to Theda Skocpol and Margaret Levi have explained political dynamics by comparing the historical trajectories of two or more cases. In doing so, they have suggested that the roots of major political outcomes often rest most fundamentally with causal processes found well in the past. Moreover, they have maintained that to elucidate these causal processes one must look closely at the unfolding of events over substantial periods of time. Comparative analysts who engage in historical enquiry have explored topics almost as varied as those that characterize contemporary political science. And they have developed explanations that cross the full gamut of theoretical orientations in the field. One cannot therefore delimit historical analysis by subject matter or theoretical orientation. Nevertheless, comparativists who practice historical analysis do employ a distinctive approach to asking and answering questions. Most basically, these analysts ask questions about the causes of major outcomes in particular cases. The goal of their analyses then becomes explaining adequately the specific historical outcomes in each and every case that falls within their argument’s scope (Mahoney and Rueschemeyer 2003). By adopting this approach, historical researchers differ from cross-national statistical analysts, who are concerned with generalizing about average causal effects for large populations and who do not ordinarily seek to explain specific outcomes in particular cases. Whereas a cross-national statistical analyst might ask about the average causal effect of development on democracy for a large population of cases, a historical researcher will ask about the causal factors that make possible or combine to produce democracy in one or more particular cases (Mahoney and Goertz 2006). Or, to cite actual research, historical analysts ask about the causes of contrasting state-regime complexes in specific early modern European cases (Downing 1992; Ertman 1997; Tilly 1990); the factors that wrought different kinds of welfare states in the advanced capitalist countries (Esping-Andersen 1990; Hicks 1999; Huber and Stephens 2001); the origins of social revolution in certain types of historical and contemporary countries (Foran 2005; Goldstone 1991; Skocpol 1979); and the sources of democracy and dictatorship in regions such as Central America (Mahoney 2001; Paige 1997; Yashar 1997). In each of these research areas, the goal of analysis is to explain specific outcomes of interest in the par-ticular sets of cases under investigation.1
This orientation to asking and answering questions is associated with at least three other methodological traits which also help us to recognize historical research as a singular approach within comparative politics. First, historical analysts employ their own distinctive tools of causal analysis. Some of these tools involve techniques for analyzing necessary and/or sufficient causes, whereas others entail procedures for assessing hypotheses through within-case analysis. Both kinds of techniques contrast in major ways with statistical methods (Brady and Collier 2004; George and Bennett 2005; Mahoney 2004; Mahoney and Goertz 2006). Second, historical analysts are centrally concerned with the temporal dimensions of political explanation. To account for the occurrence of specific outcomes, they attribute great causal weight to the duration, pace, and timing of events (Pier-son 2004; Thelen 2003). Finally, historical researchers develop a deep understanding of their major cases and establish a strong background in the relevant historiography. This kind of case expertise is essential for the successful explanation of particular outcomes in specific cases, and it is achieved through the mastery of secondary and/or primary source material (Skocpol 1984; Ragin 1987). Here we explore each of these three traits in turn.
Mahoney, James and Celso Villegas. 2007. “Historical Enquiry and Comparative Politics,” in Carles Boix and Susan Stokes, eds., Oxford Handbook of Comparative Politics, pp. 73–89. New York: Oxford University Press
.
1 METHODS OF CAUSAL ANALYS I S
1.1 CROSS - CASE ANALYS I S
Early discussions of cross-case analysis and hypothesis testing in historical research usually focused on Mill’s methods of agreement and difference (e.g., Skocpol and Somers 1980) and Przeworski and Teune’s (1970) most similar and most different research designs. In more recent periods, however, the methodology of necessary and sufficient conditions, Boolean algebra, and fuzzy-set logic have superseded earlier formulations (e.g., Goertz and Starr 2003; Ragin 1987, 2000).
Mill’s methods of agreement and difference are tools for eliminating necessary and sufficient causes (see Dion 1998; George and Bennett 2005; Mahoney 1999). The method of agreement is used to eliminate potential necessary causes, whereas the method of difference is used to eliminate potential sufficient causes. The methods usually operate deterministically, such that a single deviation from a hypothesized pattern of necessary or sufficient causation is enough to conclude that a given factor is not (by itself) necessary or sufficient for the outcome of interest. While this deterministic approach is controversial,2 methodologists agree that it is essential to the ability of the methods of agreement and difference to systematically eliminate rival hypotheses when only a small number of cases are selected.
Methods designed to test necessary and/or sufficient causes need not be deterministic, however. One can easily evaluate causes that are necessary or sufficient at some quantitative benchmark, such as necessary or sufficient 90 percent of the time (e.g., Braumoeller and Goertz 2000; Dion 1998; Ragin 2000). And if a modest number of cases is selected (e.g., N = 15), scholars can achieve standard levels of statistical confidence for their findings. Likewise, there is no reason why one needs to use dichotomous variables when testing hypotheses about necessary or sufficient causation. For example, necessary causation can mean that the absence of a particular range of values on a continuously coded independent variable will always (or usually) be associated with the absence of a particular range of values on a continuously coded dependent variable.
In comparative politics, a widely used method of cross-case analysis is typological theory (George and Bennett 2005). With this technique, one treats the dimensions of a typology as independent variables; different values on the dimensions reflect alternative values on independent variables. The categories or “types” in the cells of the typology represent the values on the dependent variable. The dimensions of the typology are thus hypothesized to be jointly (not individually) sufficient for particular values on the dependent variable. There are numerous examples of works in comparative politics that implicitly or explicitly employ this kind of typological theory—Downing’s (1992) study of political regimes in Europe, Goodwin’s (2001) work on revolutions, and Jones-Luong’s (2002) analysis of party and electoral system dynamics, for example.
Other methods evaluate necessary and sufficient techniques. Perhaps the causes with more formal best known of these is Boolean algebra (Ragin 1987), which is used to test whether combinations of dichotomous variables are jointly sufficient for an out-come. Because several different combinations of factors may each be causally sufficient, this method allows for multiple paths to the same outcome, or what is sometimes called equifinality. More recently, Ragin (2000) has introduced fuzzy-set analysis to assess continuously coded variables within a proba-bilistic Boolean framework. Dozens of comparative studies have now used Ragin’s techniques for testing hypotheses about necessary and sufficient causes (see the citations at www.compasss.org/).
To conclude, cross-case analysis usually involves the assessment of hypotheses about necessary and/or sufficient causation, and a whole class of methodolo-gies exists for testing these kinds of hypotheses. By contrast, as multiple methodologists (both qualitative and quantitative) have pointed out, mainstream sta-tistical techniques are not designed for the analysis of necessary and sufficient causes (Braumoeller 2003; Goertz and Starr 2003; but see Clark, Gilligan, and Golder 2006).
1.2 WI THIN - CASE ANALYS I S
Writings on within-case analysis have a distinguished pedigree in the field of qualitative methods (e.g., Barton and Lazarsfeld 1969; Campbell 1975; George and McKeown 1985). In recent years, there has been considerable effort to formally codify the specific procedures entailed in different modes of within-case analysis (e.g., George and Bennett 2005; Brady and Collier 2004; Mahoney 1999). We briefly discuss some of these procedures.
First, some historical researchers use insights from within their cases to locate the intervening mechanisms linking a hypothesized explanatory variable to an out-come. These scholars follow methodological writings that suggest that causal analysis not only involves establishing an association between explanatory variables and an outcome variable, but also entails identifying the intervening mechanisms that link explanatory variables with the outcome variable (Hedstrom and Swedberg 1998; Goldthorpe 2000). Intervening mechanisms are the processes through which an explanatory variable produces a causal effect. The effort to infer causality through the identification of mechanisms can be called “process tracing” (George and McKeown 1985; George and Bennett 2005) and the data thereby generated are “causal-process observations” (Brady and Collier 2004).
Process tracing is often used to help analysts who work with a small number of cases avoid mistaking a spurious correlation for a causal association. Specifically, mechanisms that clearly link a presumed explanatory variable and outcome variable increase one’s confidence in the hypothesis. For example, Skocpol’s (1979, 170–1) work on the origins of social revolutions employs process tracing to reject the hypothesis that ideologically motivated vanguard movements caused social revolutions. Although ideologically motivated vanguard movements were active in her three cases of social revolution, she contends that they were not responsible for landlords and states. Rather, triggering widespread revolts against the movements were marginal to the central political processes that characterized social revolutions in France, Russia, and China, appearing on the scene only to take advantage of situations they did not create.
Other scholars use process tracing not to eliminate causal factors but to support their own explanations. For example, Collier and Collier (1991) identify mechanisms linking different types of labor incorporation periods with different types of party systems. In their analysis of Colombia and Uruguay, Collier and Collier systematically identify the processes and events through which the incorporation pattern of “electoral mobilization by a traditional party” led to the party system outcome of “electoral stability and social conflict.” These processes included: a period in which the party that oversaw incorporation briefly maintained power, the gradual emergence of conservative opposition, a period of intense political polarization, a military coup, and, finally, the creation of party system marked by stable electoral politics and social conflict. Each of these events acts as a mechanism linking labor incorporation with a particular party system outcome. Indeed, although any work can potentially benefit from process tracing, it is an especially important tool for those studies such as Collier and Collier’s in which explanatory and outcome variables are separated by long periods of time.
A given hypothesis might suggest specific features of a case besides the main outcome that should be present if the central hypothesis is correct. These fea-tures need not be intervening variables. Thus, some historical researchers use within-case analysis not to identify intervening mechanisms, but to evaluate whether certain hypothesized features are in fact present. This is how Marx (1998) proceeds in his com-parative study of racial orders in the United States, South Africa, and Brazil. He asserts that where whites were divided, as in the US and South Africa following the Civil War and Boer War, white unity and nation-alist loyalty were forged through the construction of systems of racial domination that systematically ex-cluded blacks. Where no major intra-white cleavage developed, as in Brazil, whites did not have to achieve unity through exclusion and thus a much higher degree of racial harmony could develop. Marx supports this argument using within-case evidence that confirms implicit and explicit predictions about other things that should be true if this argument is valid. For instance, Marx suggests that, if intra-white conflict really is decisive, efforts to enhance black status should produce increased white conflict along the North–South fault line in the US and between British and Afrikaners in South Africa. By contrast, progressive racial reforms should not generate similar intra-white divisions in Brazil. Likewise, if intra-white divisions really are the key, then Marx suggests that we should see evidence that more progressive white factions view political stability as more important than racial equal-ity. His historical narrative then backs up these propositions. Overall, he suggests that it is highly unlikely that these auxiliary facts are accidental; rather, he contends that they are symptoms of a valid main thesis.
2 METHODS OF TEMPORAL ANALYS I S
Historical enquiry in comparative politics is sensitive to temporal processes. Researchers often understand cases as spatial units within which one observes pat-terns of temporally ordered events, such as sequences, cycles, and abrupt changes. While statistical researchers will sometimes develop hypotheses that consider temporal dimensions, the focus of historical researchers on specific outcomes in particular cases calls central attention to temporality. At the level of particular cases, issues of timing and sequencing often seem paramount in a way that may not be true when one wishes to generalize about averages for large populations using available quantitative data. Hence, when a historical researcher hypothesizes that “X is causally related to Y,” it is quite likely that variable X is defined in part by temporal dimensions, such as its duration or its location in time vis-à-vis other variables. In this sense, “history matters” to comparative-historical researchers in part because temporally defined concepts are key variables of analysis. We can examine here three temporal concepts that historical researchers use frequently: path dependence, dura-tion, and conjuncture.
2.1 PATH DEPENDENCE
The concept of path dependence is associated with the effort of researchers to understand the repercussions of early events on subsequent and possibly historically distant outcomes. A quite significant literature in economics, political science, and sociology now exists to codify the various tools of analysis used to study path-dependent sequences (Arthur 1994; David 1985; Goldstone 1998; North 1990; Pierson 2000, 2004; Mahoney 2000; see also Clemens and Cook 1999; Collier and Collier 1991; Thelen 2003). For our purposes, two examples illustrate the breadth of the use of this concept.
Goldstone (1998, 2007) argues that the industrial revolution in England was the result of a path-dependent process. He contends that “there was nothing necessary or inevitable” about England’s breakthrough to modern industrialism (1998, 275). Rather, the out-come was a product of a number of small events that happened to come together in eighteenth-century England. Perhaps most importantly, the industrial revolution depended on the advent of Thomas Newcomen’s first steam engine in 1712—it made possible the subsequent creation of more efficient steam engines that dramatically improved the extraction of coal. Efficient coal extraction reduced the price of coal. In turn:
Cheap coal made possible cheaper iron and steel. Cheap coal plus cheap iron made possible the construction of railways and ships built of iron, fueled by coal, and powered by engines producing steam. Railways and ships made possible mass na-tional and international distribution of metal tools, textiles, and other products that could be more cheaply made with steam-powered metal-reinforced machinery. (1998, 275)
Thus, the sequence of events leading to the industrial revolution ultimately depended on the advent of the first steam engine. Yet, Newcomen did not pursue his invention in order to spur an industrial revolution. Instead, he was trying to devise a means to pump water from deep-shaft coal mines: the steam engine removed water by turning it into vapor. It was necessary to remove water from the mine shafts because the surface coal of the mines had been exhausted, which had led the miners to dig deeper, which had caused the mines to fill with water. And of course the surface coal of the mines was exhausted in the first place be-cause England was exceptionally dependent on coal for heating. Going even further back, as Goldstone does, England was dependent on coal (rather than wood) because of its limited forest area, its cold cli-mate, and its geology, which featured thick seams of coal near the sea.
Orren’s (1991) study of Belated Feudalism offers a different kind of example of path dependence, one in which path dependence involves the stable reproduction of a particular outcome. Orren calls attention to the remarkable persistence of status-based labor legislation in the United States. From its inception until well into the twentieth century, the United States legally defined all able-bodied individuals without in-dependent wealth as workers who could be subject to criminal charges for not selling their labor in the marketplace. This “law of master and servant” was originally established in feudal England, but it managed to carry over into the United States, and it then persisted for more than 150 years despite the supposed liberal orientation of American culture.
To explain this specific outcome, Orren empha-sizes the key role of American courts in upholding the law. In her view, judges enforced the law because they believed it was legitimate, even though it increasingly clashed with American mores and norms. Specifically, “the judges believed that what was as stake was no less than the moral order of things,” and hence upheld the law (Orren 1991, 114). Orren emphasizes that American judges did not follow precedent simply because of personal gain (1991, 90). Likewise, she contends that judges did not simply support legislation on behalf of the interests of economic elites, even though the employment legislation clearly benefited employers (1991, 91). Rather, she argues “that the law of labor relations was on its own historical track, and that it carried protection of business interests along for the ride” (1991, 112).
In both examples of path dependence, Goldstone and Orren identify “critical junctures” where events early in the process have lasting effects, even after those initial causes have disappeared. Scholars using the criti-cal juncture concept emphasize how such events are contingent—that is, they are unpredictable by theory or perhaps truly random (Mahoney 2000; David 1985)—and focus on how these events, at that time, were hardly an indication of the path to follow. The invention of the Newcomen steam engine in England affecting the industrial revolution is a case in point: Newcomen did not intend to begin an industrial revolution, nor was his machine heralded at that time as the harbinger of the tremendous transformation to come, yet it spurred a series of events that led England down an unrepeatable path towards industrialization.
Other scholars have focused on important political choices during critical juncture periods whose institutional implications were unforeseen, but would have significant results in the future. Collier and Collier’s (1991) study of labor incorporation provides the iconic example of critical junctures—the means by which political elites managed the introduction of labor into the political sphere had lasting, long-term effects on party dynamics far removed from the initial decision to forcibly exclude labor or incorporate it through populist, traditional, or radical parties. Cer-tainly political elites in Chile and Brazil did not assume that through their repression of labor in the 1930s they would precipitate the political conditions for military coups in 1973 and 1964, respectively.
Goldstone’s argument in particular shows how path dependence may involve reaction–counterreaction dynamics, such that an initial event triggers a reaction and thereby logically leads to another quite different event, which triggers its own reaction, and so on, until a particular outcome of interest is reached. Mahoney uses the phrase “reactive sequence” to char-acterize these “chains of temporally ordered and causally connected events” (2000, 526). The narrative mode of analysis used in historical analysis generally describes sequences characterized by tight causal linkages that are nearly uninterruptible, such that A leads to B, which leads to C, which leads to D, and so on until one arrives at Z, or the logical termination point of the sequence.
By contrast, Orren’s argument focuses on a kind of path-dependent sequence in which a particular out-come happens to occur at a critical juncture, and then this outcome is subject to self-reproducing mechanisms, causing it to repeatedly exist across time, even long after its original purposes have ceased. Scholars use the label “self-reproducing” to describe these sequences in which a given outcome is stably reinforced over time (Thelen 2003; Pierson 2004; Mahoney 2000). Self-reproducing sequences are also the norm in work on increasing returns, which models processes in which each step in a particular direction induces further movement in that same direction (Arthur 1994, 1989; Pierson 2000).
In some cases, however, self-reproduction and lock-in capture only part of a path-dependent process; scholars may look to ideas such as institutional layering and conversion to explain why and how certain aspects of institutions persist and why some aspects change. According to Thelen, “institutional survival is often strongly laced with elements of institutional transformation to bring institutions in line with changing social, political, and economic conditions” (2003, 211, emphasis in the original). Through institutional layering, actors choose not to remake existing institutional configurations, but instead add new components that bring the institution in alignment with their needs. For example, the Bill of Rights and subsequent amendments to the US constitution altered pre-existing arrangements while leaving the core the same. In addition, institutions initially set up to foster a certain social or political arrangement are often “converted” to suit other purposes. Orren’s analysis of the law of master and servant is a good ex-ample of this: while the law in its English form fostered feudal ties between landlord and serf, as American judges reinterpreted it, the law was converted to sup-port free labor policy.
2. 2 DURATION AND CONJUNC TURAL ANALYS I S
Historical researchers also evoke duration as a key temporal variable by exploring the causes of the length of a given process for a particular outcome (Aminzade 1992, 459). According to Mickey and Pierson, “attend-ing to duration can both help scholars more clearly specify the mechanisms by which independent variables affect outcomes of interest, and can help generate new causal accounts” (2004, 7). Some duration arguments refer to repeated processes over a long time period. For example, Huber and Stephens’s (2001) work on welfare states in advanced industrial countries highlights the importance of “electoral success over an extended period of time” to the maintenance of long-lasting welfare state institutions (Pierson 2004, 85, emphasis in the original). Other duration arguments explore the importance of slow-moving processes that may take years to unfold. For instance, Tilly’s (1990) analysis of state making is centrally concerned with explaining the pace at which modern states were formed in Europe across perhaps centuries of time.
The fact that many sequences of events have a typ-ical or normal duration allows one to speak of processes that are “too short” or “too long” or “just right” (Mickey and Pierson 2004, 15). Compressed processes often lead to significantly different outcomes because they entail a particularly rapid sequence of events. Karl notes that oil booms spur compressed processes of economic and social development. “The restraint inherent in more limited revenues . . . is abruptly removed, both psychologically and in reality” (1997, 66). As a result:
Policymakers, once torn between their preoccupations with diversity and equity, now think they can do both. The military demands modernized weapons and improved living conditions; capitalists seek credits and subsidies; the middle class calls for in-creased social spending, labor for higher wages, and the unemployed for the creation of new jobs. (Karl 1997, 65)
Bureaucracies expand uncontrollably and “ultimately contribute to growing budget and trade deficits and foreign debt” (1997, 65). For Karl, oil booms accelerate processes that eventually overwhelm states and produce economic busts.
Historical researchers also often develop hypotheses about the intersection of various causal pro-cesses (see Aminzade 1992; Pierson 2004; Zuckerman 1997). If and when two or more processes meet in time and/or space can have a large impact on subsequent outcomes. Conjunctural analysis considers specifically the intersection point of two or more separately determined sequences, or as Pierson puts it, “the linking of discrete elements or dimensions of politics in the passage of time” (2004, 55).
In his classic work Modernization and Bureaucratic-Authoritarianism, O’Donnell (1979) notes certain social conditions that gradually came into being and then remained as “constants” throughout subsequent Argentine history. Each such condition worked to “load the dice more and more against an effectively working political system” (1979, 118). By the 1960s, three historical constants came together: political traditions and social processes for national unification, international economic integration, and political mobilization (O’Donnell 1979, 119–31). The conjuncture or coming together of these processes served to limit the political choices available to actors in a way that would not have been true if the sequences did not intersect at this particular time. Ultimately, the conjuncture had the effect of stimulating a determined effort by established sectors “to close any significant political access to a politically activated urban popular sector” (O’Donnell 1979 131). In turn, this outcome set the stage for the emergence of harsh bureaucratic–authoritarian regimes.
REFERENCES
AMINZADE, R. 1992. Historical sociology and time. Sociological Methods and Research, 20: 456–80.
ARTHUR, W. B. 1989. Competing technologies and lock-in by historical events. Economic Journal, 99: 116–31.
ARTHUR, W. B. 1994. Increasing Returns and Path Depen-dence in the Economy. Ann Arbor: University of Michigan Press.
BARTON, A. H., and LAZARSFELD, P. 1969. Some functions of qualitative analysis in social research. Pp. 163–205 in Issues in Participant Observation, ed. G. J. McCall and J. L. Simmons. Reading, Mass.: Addison-Wesley.
BRADY, H. E., and COLLIER, D. eds. 2004. Rethinking Social Inquiry: Diverse Tools, Shared Standards. Lanham, Md.: Rowman & Littlefield.
BRAUMOELLER, B. F. 2003. Causal complexity and the study of politics. Political Analysis, 11 (3): 209–33.
BRAUMOELLER, B. F., and GOERTZ, G. 2000. The methodology of necessary conditions. American Journal of Political Science, 44 (4): 844–58.
CAMPBELL, D. T. 1975. “Degrees of freedom” and the case study. Comparative Political Studies, 8: 178–93.
CLARK, W. R., GILLIGAN, M. J., and GOLDER, M. 2006. A simple multivariate test for a symmetric hypotheses. Political Analysis, 14: 311–31.
CLEMENS, E. S., and COOK, J. M. 1999. Politics and institutionalism. Annual Review of Sociology, 25: 441–66.
COLLIER, R. B., and COLLIER, D. 1991. Shaping the Political Arena: Critical Junctures, the Labor Move-ment, and Regime Dynamics in Latin America. Princeton: Princeton University Press. DAVID, P. A. 1985. Clio and the economics of QWERTY. American Economic Review, 75: 332–7.
DION, D. 1998. Evidence and inference in the com-parative case study. Comparative Politics, 30 (2): 127–45.
DOWNING, B. M. 1992. The Military Revolution and Political Change: Origins of Democracy and Autocracy
NOTES *
James Mahoney’s work on this project is sup-ported by the National Science Foundation under Grant No. 0093754. We thank Carles Boix and Susan Stokes for helpful comments on a previous draft.
1. It bears emphasis that historical researchers often generalize their explanations across all cases that fall within their theory scope. How-ever, the scope of their theory—defined as a domain in which assumptions of causal homogeneity are valid—is usually restricted to a small to medium number of cases. For a discussion, see Mahoney and Rueschemeyer (2003, 7–10); Mahoney and Goertz (2006).
2. Statistical methodologists usually assume that determinism is wholly inappropriate for the social sciences (e.g., Lieberson 1991; Goldthorpe 1997). Some qualitative methodologists share this view. However, determinism can be justified on the grounds that, when one is not generalizing from a sample to large population, but rather explaining particular cases, it is meaningless to say that a cause exerts a probabilistic effect. For any particular case, a cause either exerts a given effect or it does not.