Discussion 7
Approaches to cognitive modeling
Probabilistic models of cognition: exploring representations and inductive biases Thomas L. Griffiths1, Nick Chater2, Charles Kemp3, Amy Perfors4 and Joshua B. Tenenbaum5
1 Department of Psychology, University of California, Berkeley, 3210 Tolman Hall MC 1650, Berkeley CA 94720-1650, USA 2 Division of Psychology and Language Sciences, University College London, Gower Street, London WC1E 6BT, UK 3 Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh PA 15213, USA 4 School of Psychology, University of Adelaide, Level 4, Hughes Building, Adelaide, SA 5005, Australia 5 Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, Building 46-4015, 77 Massachusetts Avenue,
Cambridge, MA 02139, USA
Opinion
Glossary
Backpropagation: a gradient-descent based algorithm for estimating the
weights in a multilayer perceptron, in which each weight is adjusted based
on its contribution to the errors produced by the network.
Bottom-up/mechanism-first explanation: a form of explanation that starts by
identifying neural or psychological mechanisms believed to be responsible for
cognition, and then tries to explain behavior in those terms.
Emergentism: a scientific approach in which complex behavior is viewed as
emerging from the interaction of simple elements.
Gradient-descent learning: learning algorithms based on minimizing the error
of a system (or maximizing the likelihood of the observed data) by modifying
the parameters of the system based on the derivative of the error.
Hypothesis space: the set of hypotheses assumed by a learner, as made
explicit in Bayesian inference and potentially implicit in other learning
algorithms.
Inductive biases: factors that lead a learner to favor one hypothesis over
another that are independent of the observed data. When two hypotheses fit
the data equally well, inductive biases are the only basis for deciding between
them. In a Bayesian model, these inductive biases are expressed through the
prior distribution over hypotheses.
Inductive problem: a problem in which the observed data are not sufficient to
unambiguously identify the process that generated them. Inductive reasoning
requires going beyond the data to evaluate different hypotheses about the
generating process, while maintaining uncertainty.
Likelihood: the component of Bayes’ rule that reflects the probability of the
data given a hypothesis, p(djh). Intuitively, the likelihood expresses the extent
to which the hypothesis fits the data.
Posterior distribution: a probability distribution over hypotheses reflecting the
learner’s degree of belief in each hypothesis in light of the information
provided by the observed data. This is the outcome of applying Bayes’ rule,
p(hjd).
Prior distribution: a probability distribution over hypotheses reflecting the
learner’s degree of belief in each hypothesis before observing data, p(h). The
prior captures the inductive biases of the learner, because it is a factor that
contributes to the extent to which learners believe in hypotheses that is
independent of the observed data.
Top-down/function-first explanation: a form of explanation that starts by
Cognitive science aims to reverse-engineer the mind, and many of the engineering challenges the mind faces involve induction. The probabilistic approach to modeling cognition begins by identifying ideal solutions to these inductive problems. Mental processes are then modeled using algorithms for approximating these solutions, and neural processes are viewed as mechanisms for imple- menting these algorithms, with the result being a top- down analysis of cognition starting with the function of cognitive processes. Typical connectionist models, by contrast, follow a bottom-up approach, beginning with a characterization of neural mechanisms and exploring what macro-level functional phenomena might emerge. We argue that the top-down approach yields greater flexibility for exploring the representations and inductive biases that underlie human cognition.
Strategies for studying the mind Most approaches to modeling human cognition agree that the mind can be studied on multiple levels. David Marr [1] defined three such levels: a ‘computational’ level charac- terizing the problem faced by the mind and how it can be solved in functional terms; an ‘algorithmic’ level describing the processes that the mind executes to produce this solution; and a ‘hardware’ level specifying how those pro- cesses are instantiated in the brain. Cognitive scientists disagree over whether explanations at all levels are useful, and on the order in which levels should be explored. Many connectionists advocate a bottom-up or ‘mechanism-first’ strategy (see Glossary), starting by exploring the problems that neural processes can solve. This often goes with a philosophy of ‘emergentism’ or ‘eliminativism’: higher- level explanations do not have independent validity but are at best approximations to the mechanistic truth; they describe emergent phenomena produced by lower-level mechanisms. By contrast, probabilistic models of cognition pursue a top-down or ‘function-first’ strategy, beginning
Corresponding author: Griffiths, T.L. ([email protected]).
1364-6613/$ – see front matter � 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2010.0
with abstract principles that allow agents to solve pro- blems posed by the world – the functions that minds per- form – and then attempting to reduce these principles to psychological and neural processes. Understanding the lower levels does not eliminate the need for higher-level models, because the lower levels implement the functions specified at higher levels.
considering the function that a particular aspect of cognition serves, explaining
behavior in terms of performing that function.
5.004 Trends in Cognitive Sciences 14 (2010) 357–364 357
Opinion Trends in Cognitive Sciences Vol.14 No.8
Explanations at a functional level have a long history in cognitive science. Virtually all attempts to engineer human-like artificial intelligence, from the Logic Theory Machine [2] to the most successful contemporary para- digms [3], have started with computational principles rather than hardware mechanisms. The great potential of probabilistic models of cognition comes from the solutions they identify to inductive problems, which play a central role in cognitive science: Most of cognition, in- cluding acquiring a language, a concept, or a causal model, requires uncertain conjecture from partial or noisy infor- mation. A probabilistic framework lets us address key questions about these phenomena. How much information is needed? What representations subserve the inferences people make? What constraints on learning are necessary? These are computational-level questions and they aremost naturally answered by computational-level theories.
Taking a top-down approach leads probabilistic models of cognition to explore a broad range of different assump- tions about how people might solve inductive problems, and what representations might be involved. Representa- tions and inductive biases are selected by considering what is needed to account for the functions the brain performs, assuming only that those functions of perception, learning, reasoning, and decision can be described as forms of prob- abilistic inference (Figure 1). By contrast, connectionism makes strong pre-commitments about the nature of people’s representations and inductive biases based on a certain view of neural mechanisms and development: representations are graded, continuous vector spaces, lack- ing explicit structure, and are shaped almost exclusively by experience through gradual error-driven learning algor- ithms. This approach rejects a long tradition of research into knowledge representation in cognitive science, discarding notions such as rules, grammars, and logic that
[(Figure_1)TD$FIG]
Figure 1. Theoretical commitments of connectionism and probabilistic models of
cognition. Based on a certain view of brain architecture and function, connectionist
models makes strong assumptions about the representations and inductive biases
to be used in explaining human cognition: representations lack explicit structure
and inductive biases are very weak. By contrast, probabilistic models explore a
larger space of possibilities, including representations of diverse forms and
degrees of structure, and inductive biases of greatly varying shapes and strength.
These possibilities include highly structured representations and inductive
constraints that have proven valuable – and arguably necessary – for explaining
many of the functions of human cognition.
358
have proven useful in accounting for the functions of higher-level cognition.
The rest of this article presents our argument for the top-down approach, focusing on the importance of repres- entational diversity. The next section describes how struc- tured representations of different forms can be combined with statistical learning and inference in probabilistic models of cognition, using a case study in semantic cogni- tion that has also been the focus of recent work in the connectionist tradition [4]. We then give a broader survey, across different domains and tasks, of how probabilistic models have exploited a range of representations and inductive biases to explain different aspects of cognition that pose a challenge to accounts restricted to the limited forms of representations and weaker inductive biases assumed by connectionism. We emphasize breadth over depth of coverage because our goal is to illustrate the greater explanatory scope of probabilistic models. We then discuss how probabilistic models of cognition should be interpreted in terms of lower levels of analysis, a common point of confusion in critiques of this approach, and close with several other considerations in choosing whether to pursue a top-down, ‘function-first’ or bottom-up, ‘mechan- ism-first’ approach to cognitive modeling.
Knowledge representation and probabilistic models A probabilistic model starts with a formal characterization of an inductive problem, specifying the hypotheses under consideration, the relation between these hypotheses and observable data, and the prior probability of each hypoth- esis (Box 1). Probabilistic models therefore provide a trans- parent account of the assumptions that allow a problem to be solved and make it easy to explore the consequences of different assumptions. Hypotheses can take any form, from weights in a neural network [5,6] to structured symbolic representations, as long as they specify a probability distri- bution over observable data. Likewise, different inductive biases can be captured by assuming different prior distri- butions over hypotheses. The approach makes no a priori commitment to any class of representations or inductive biases, but provides a framework for evaluating different proposals.
Box 1. Probabilistic inference
Probability theory provides a solution to the problem of induction,
indicating how a learner should revise her degrees of belief in a set
of hypotheses in light of the information provided by observed data.
This solution is encapsulated in Bayes’ rule: if a learner considers a
set of hypotheses H that might explain observed data d, and assigns
each hypothesis h2H a probability p(h) before observing d (known
as the ‘prior’ probability), then Bayes’ rule indicates that the
probability p(hjd) assigned to h after seeing d (known as the
‘posterior’ probability) should be
pðhjdÞ ¼ pðdjhÞpðhÞ P
h2H pðdjhÞpðhÞ (1)
where p(djh) is the ‘likelihood’, indicating the probability of observ-
ing d if h were true, and the sum in the denominator simply ensures
that the posterior probabilities sum to one. Bayes’ rule thus indicates
that the conclusions reached by the learner will be determined by
how well hypotheses cohere with prior knowledge, and how well
they explain the data.
Opinion Trends in Cognitive Sciences Vol.14 No.8
Figure 2 illustrates one way in which a probabilistic approach can illuminate the nature of mental representa- tions. Consider a property induction problem where participants learn that horses, cows, and dolphins have a certain property then must decide whether all mammals are likely to have this property. Some researchers have proposed that inferences about novel properties of animals are supported by tree-structured representations [7], but others suggest that the underlying mental representations are closer to continuous spaces [8]. One way to resolve this debate is to define a probabilistic framework that can use either type of representation, and to see which representa- tion best explains human inferences [9]. The results in Figure 2a suggest that a tree structure is the better of these two alternatives.
Connectionist models typically focus on a single form of knowledge – whatever can be encoded in distributed codes over layers of hidden units. Unlike the connectionist approach, the probabilistic approach is open to the idea that qualitatively different representations are used for
[(Figure_2)TD$FIG]
Figure 2. Qualitatively different representations are needed to account for inductive in
human responses for a property induction task where participants learn that several
property. Each point in each scatterplot corresponds to a trio of mammals, and the ver
property after learning that the animals in this trio have the property. The horizontal axis
a tree tend to have similar properties, or that nearby animals in a two dimensional spac
and the spatial model relies on the two-dimensional space shown. (b) Results for a task
spatial model now performs better than the tree model. (c) Relations between biolog
dimensional space, but a probabilistic model can discover that a tree best accounts for
different types of inferences. Figure 2b shows results from a property induction experiment where the items are cities and participants are told, for example, that a certain type of Native American artifact is found near Houston, Dur- ham, and Orlando, and then asked whether this artifact is likely to be found near all major American cities. The probabilistic framework that was previously applied to the animal data (Figure 2a) now suggests that inferences about spatial relations between cities are better captured by a low-dimensional space than a tree. The same prob- abilistic framework also suggests how people might learn qualitatively different representations for different domains [9] (Figure 2c).
Rogers and McClelland have argued that connectionist models can implicitly capture representations like hier- archically-structured taxonomies, but some types of infer- ences seem to rely on explicit representations. For example, explicit representations provide a natural way to incorporate high-level semantic information provided by natural language and informed by social reasoning. To a
ferences about different domains (adapted from [13]). (a) Model predictions and
animals have a property, then decide whether all animals are likely to have this
tical axis indicates how strongly humans believe that all mammals have a certain
shows the predictions of probabilistic models which assume that nearby animals in
e tend to have similar properties. The tree model relies on the tree structure shown
where participants make inferences about US cities rather than animal species. The
ical species could be represented using a tree, a ring, a set of clusters, or a low-
the observable features of these species.
359
Opinion Trends in Cognitive Sciences Vol.14 No.8
child who believes that dolphins are fish, hearing a simple message from a knowledgeable adult (‘dolphins might look like fish but are actually mammals’) might drastically modify the inferences she makes. A learner equipped with a hierarchically structured system of categories can rearrange the hierarchy on hearing such an utterance. By contrast, a connectionist model cannot easily reconfi- gure itself through linguistic input. More generally, whereas both types of approaches might learn well from observing the world, only structured probabilistic approaches offer a natural route to acquiring knowledge through instruction or other forms of social communi- cation.
Although we have focused so far on simple representa- tions such as trees and low-dimensional spaces, many other types of representations are possible and useful. Probabilistic models defined over causal graphs, phrase structure grammars, logical rules or theories have been proposed for language, vision, and many other areas of cognition (see Figure 3, and the following section). These models inherit classic advantages of structured repres- entations that connectionist models give up [10,11]: they
[(Figure_3)TD$FIG]
Figure 3. Structured statistical models provide a way to describe multiple levels of abst
to be able to discover how sounds are organized into words, how words are organized in
of these levels can be described in terms of probabilistic inference over a structured hy
used to describe the set of objects in a scene and the surfaces that comprise those ob
360
generate infinite hypothesis spaces by combinatorial oper- ations on basic elements and capture core properties of human symbolic thought, such as compositionality and recursion. Connectionists have criticized symbolic models for failing to handle exceptions or produce graded gener- alizations, or to account for how representations are learned [4]. Combining structured representations with probabilistic inference meets those challenges, and also explain the rich and sophisticated uses of knowledge in human cognition that appear to require symbolic forms of representation.
The advantages of representational pluralism With their ability to operate over a broad range of candi- date representations and inductive biases, probabilistic models provide a unifying framework for explaining the inferences that people make in different settings. Here we briefly summarize how probabilistic approaches have addressed several aspects of human inductive reasoning and learning that have not previously been well explained in computational terms, and in particular, that would be difficult to explain in a connectionist framework.
raction in a way that applies across different domains. In language, a learner needs
to sentences, and how a language is characterized by a grammar. Learning at each
pothesis space [36]. Analogous problems apply in vision, where grammars can be
jects (figure adapted from [38]).
Opinion Trends in Cognitive Sciences Vol.14 No.8
Rapid and flexible generalization
Human learners routinely draw successful generaliz- ations from very limited evidence. Even young children can infer the extensions of new words or concepts, the hidden properties of objects, or the existence of causal relations from a handful of relevant observations. These abilities outstrip those of conventional machine learning algorithms, but probabilistic models have shown how rapid word learning [12], property induction [13], and causal learning [14] can be explained as Bayesian inferences. Probabilistic models have explained why people might appear to generalize differently in different contexts as a consequenceofapplying the samerulesof optimal statistical inference over different priors [15] orknowledge representa- tions [13] (Figure 2), and why some phenomena, such as Shepard’s universal exponential law [16], might arise in an entirely representation-independent way [17]. Algorithmic- levelmodels of generalization oftenposit different processes – rules to account for all-or-none generalizations, exemplar similarity to account for more graded generalizations – but probabilistic computational theories [18,19] have explained why we have these particular processes, why they work as theydo, andwhypeopleuse a rule-likeprocess in some cases and a similarity process in others.
Probabilisticmodels have alsomade successful empirical predictions about novel factors that can influence children’s generalizations, such as the sampling processes generating the data learners observe. Preschoolers andeven infants are sensitive to whether objects exemplifying a new word or hidden property are drawn specifically from the set of positive examples (‘strong sampling’), or instead from some more general or accidental process (‘weak sampling’), and generalize more sharply in the former case [20,21]. Prob- abilistic models naturally explain these findings, giving sampling processes a central role in the statistical problem of generalization through the likelihood term of Bayes’ rule [12,19]. By contrast, informative sampling was not con- sidered in previous algorithmic models and is not easily accommodated within standard connectionist models of statistical learning.
Causal learning
Discovering the causal relations between objects and events in the environment is a basic problem of human learning. Computational-level analyses of causal learning have provided two types of insights. First, they introduce the distinction between structure and strength [22]. When scientists explore causal relations, they distinguish be- tween questions of whether a relation exists (determining causal structure), and how strong that relation is. This distinction is blurred in associative accounts of causal learning, but is explicit when causal learning is framed as Bayesian inference over causal graphical models [23,24]. Probabilistic models based on this approach have given compelling quantitative accounts of human causal judgments [22,25–27]. Second, probabilistic inference pro- vides a way to understand how prior knowledge is com- bined with statistical evidence in causal learning, characterizing the different types of constraints that prior knowledge can impose [14] and explaining how these con- straints themselves could be learned [28,29].
Learning language
Children appear to be able to learn what utterances are, and are not, allowed in their native language, to some approximation, from exposure only to positive examples of the language. Learning merely from positive instances of a category has often been viewed as fundamentally problematic, sometimes leading to strong nativist con- clusions. The probabilistic approach provides powerful tools, both theoretical [30] and computational [31], for exploring how much learning is possible with minimal language-specific innate biases. More broadly, because linguistic representations can be highly structured, prob- abilistic models provide the means to analyze what can be learned given what sort of input, and can even be used to evaluate what sorts of structures (e.g. what type of grammar or phrase structure representation) provide the best model of the data. Because all probabilistic models are couched in the common language of prob- ability theory, they also provide a natural way to combine different sources of data (e.g. social cues and co-occur- rence relations when learning the meaning of words [32]). Probabilistic models have already been applied to many problems in language development, from the acquisition of syntax [31,33,34] to word segmentation [35] to learning meanings in communicative contexts [32]. On the engin- eering side of natural language processing, where the same ability to learn with hierarchical, compositional or recursive representations of meaning is crucial, struc- tured statistical models have come to dominate and reshaped the state of the art [36].
Visual perception
Probabilistic models have also revolutionized compu- tational theories of visual perception. Models for low-level vision such as motion estimation or shape perception operate over high-dimensional continuous representa- tions: vector fields representing motion components or depth gradients [37]. Models for higher-level visual tasks often resemble probabilistic parsing in natural language: they operate over hierarchically structured representations of objects and parts, assumed to be gener- ated by a probabilistic grammar for natural scenes [38,39] (Figure 3).
Learning to learn
Children learn their first words slowly, but in building their initial vocabulary they also quickly acquire the ability to learn new words much more rapidly [40]. Hierarchical Bayesian models have been used to explain how humans ‘learn to learn’ words [41], as well as categories [42] and causal relations [28,43], by performing inference on multiple levels of abstraction. Connectionist models have explored similar phenomena [44] but have not explained how children can learn to learn so quickly, constructing abstract knowledge of the appropriate form from relatively little experience in a domain [9,43].
The psychological and neural interpretation of probabilistic models Probabilistic models explain human learning and induc- tive reasoning in terms of Bayesian inference, and specify
361
Box 3. Probabilistic models and neural computation
Probabilistic models of cognition rarely emphasize inspiration from
neuroscience, or appeal to neural plausibility. Increasingly, how-
ever, the link between probabilistic inference and neural function is
drawing the attention of modelers from diverse backgrounds.
One route for linking Bayesian cognitive models to the brain uses
connectionism as a mediating paradigm: many familiar connec-
tionist algorithms for learning and inference have natural Bayesian
interpretations [5,6,52], and to the extent that these algorithms are
neurally plausible, they suggest how certain types of probabilistic
inferences could be implemented in the brain. Several connectionist
researchers have emphasized explicitly probabilistic formulations
for learning and inference, while still attempting to preserve the
distinctive ‘connectionist style’ of distributed representations ar-
ranged in hierarchical layers [53].
Another group of researchers aim to show how core computa-
tions and models from Bayesian statistics and machine learning –
many of which are also central in probabilistic models of cognition –
can be implemented in neurally plausible mechanisms. For
instance, Pouget, Beck, Ma and colleagues have studied how to
implement Bayesian parameter estimation and decision-making
using probabilistic population codes in networks of spiking neurons
[54]. Lee and Mumford [55] suggested that cortical hierarchies could
implement a form of particle filtering, which is also a candidate for
making algorithmic-level models (see main text).
Although research on the ‘Bayesian brain’ holds great promise,
there is presently a gulf between such a research program and the
Bayesian models of higher-level cognition reviewed in this article.
We have argued that probabilistic inference over structured
representations is crucial for explaining the use and origins of
human concepts, language, or intuitive theories. Yet little is known
concerning how these structured representations can be implemen-
ted in neural systems (however, see the research program of
Smolensky and colleagues [56]). In our view, the single biggest
challenge for theoretical neuroscience is not to understand how the
brain implements probabilistic inference, but how it represents the
structured knowledge over which such inference is defined.
Box 2. Connecting to process models
The discussion in the main text shows how connections between
the computational, algorithmic, and hardware levels might not be
transparent. However, exploring these connections is an important
part of the strategy of working through levels of analysis from the
top down. One way to do so is to consider psychological processes
that could approximate the computations required for probabilistic
inference. Applications of statistical models in machine learning and
artificial intelligence rely on such approximation algorithms,
because computing exact probabilities is typically intractable for
complex, real-world problems. These algorithms provide rational
approximations to probabilistic inference, and thus are a potential
source of ‘rational process models’ [45].
One class of approximation algorithms is Monte Carlo methods,
in which a probability distribution is approximated with a set of
samples from that distribution. One sophisticated Monte Carlo
method, importance sampling, can be implemented using the same
computations as the exemplar models used as process models of
categorization [46,47], requiring people to store a few hypotheses in
memory and activate them based on their similarity to observed
data [45]. A related set of algorithms known as ‘particle filters’
provide a way to approximately update a probability distribution as
data are observed. They have been used to model deviations from
ideal performance in category learning [48], associative learning
[49], detecting changes in temporal sequences [50], and sentence
processing [51], and could provide a way to connect all the way to
the neural level (Box 3).
Opinion Trends in Cognitive Sciences Vol.14 No.8
hypothesis spaces that often have symbolic structure. Critics of probabilistic models often argue that they are implausible as accounts of human cognition, pointing to the computational difficulty involved in calculating Baye- sian inference as well as the requirement of specifying the hypothesis space in advance (see, e.g. the commentaries to [19]). However, all models – including connectionist models – build in hypothesis spaces; probabilistic models simply make the space explicit. Moreover, this criticism presup- poses that a computational-level analysis in terms of Bayesian inference requires the algorithmic- or hard- ware-level analysis to take the same form. This assump- tion is false: using probabilistic models to provide a computational-level explanation does not require that hy- pothesis spaces or probability distributions be explicitly represented by the underlying psychological or neural processes, or that people learn and reason by explicitly using Bayes’ rule.
To illustrate how the computational-level specification of a model can differ significantly from its realization at the algorithmic and hardware levels, it is useful to apply this approach to one of the best-known connectionist models, the multilayer perceptron. A multilayer percep- tron can be characterized at the computational level as a nonlinear function approximator. Its weights parameter- ize an infinite, high-dimensional hypothesis space of non- linear functions mapping input vectors to outputs. Learning involves searching this hypothesis space for a function that minimizes error on a training dataset. This can actually be cast in Bayesian terms: the error corre- sponds to the negative log likelihood of a hypothesis and the prior is either uniform or prefers smaller weights [5,6].
Described as Bayesian inference in an infinite, high- dimensional hypothesis space, learning the weights of a multilayer perceptron might sound implausible as a cog- nitive process. However, considering ways to solve this computational problem approximately but tractably suggests more plausible psychological and even neural interpretations. We can find at least a local maximum of the Bayesian posterior by computing its gradient in weight space and adjusting the weights iteratively along this gradient. Familiar gradient-descent learning algorithms such as backpropagation implement this strategy in a parallel network of neuron-like units, each computing only local functions of the activation and error signals of neigh- boring units. This algorithm does not require explicit enumeration or scoring of the full space of hypotheses, nor even any explicit application of Bayes’ rule.
Similarly, we view the structured representations and Bayesian calculations used in probabilistic models of cog- nition as computational abstractions that could be imple- mented in the mind and brain in a variety of implicit and approximate ways. Such implementation could differ across problems, and need not look like explicit structured representations or Bayesian inference.Work on connecting probabilistic models to psychological process models (Box 2) and neural computation (Box 3) illustrates this point, and indicates a possible route towards synthesis with more bottom-up, mechanistically constrained approaches to modeling the mind (Box 4).
362
Conclusion: start at the top, or at the bottom? Top-down and bottom-up approaches to traversing levels of analysis are analogous to building a single bridge from different ends. Nonetheless, we expect that more rapid
Box 5. Response to McClelland et al.
We enjoyed reading McClelland et al.’s article, and hope that this
comparison of probabilistic and connectionist models will continue
to result in productive interactions. In this spirit, we first clarify
several respects in which our position differs from its characteriza-
tion in their article. Rather than treating people as optimal cognizing
machines, we believe that considering optimal solutions to the
computational problems people face can provide insight into
human behavior. People undoubtedly approximate these optimal
solutions in a variety of ways. We do not ignore mechanism and
implementation, but view computational level analyses as a guide
to understanding the function that those mechanisms and imple-
mentations fulfill. We do not see the fact that different mechanisms
are involved when task demands differ as being inconsistent with
this approach. Finally, we do not make an a priori commitment to
particular types of representations. Probabilistic models are a tool
for exploring different sets of assumptions about representations
and inductive biases, making it possible for data to lead us to an
account of human cognition.
McClelland et al. warn of the dangers of mis-specifying the
computational problems that people are solving, and point to
Chomskyan linguistics as an illustration. It is certainly true that any
specific high-level explanation, whether probabilistic or not, can be
questioned. However, McClelland et al. go much further than this,
arguing that most if not all aspects of cognition should be explained
without the ‘often misleading’ constructs of high-level explanation.
This radical position opposes not just the probabilistic approach,
but standard practice across the cognitive sciences. By contrast, we
believe that greater danger lies in committing to particular incorrect
low-level mechanisms – a real possibility because most connec-
tionist networks are vastly oversimplified when compared with
actual neurons. Connectionist networks are opaque, and it is
typically difficult to understand what shapes their behavior and
what constraints they might be implementing. This makes it hard to
understand the consequences of changing the underlying mechan-
isms. By contrast, the transparency of probabilistic models makes it
easier to understand the consequences of changing our assump-
tions, and thus to recover from errors of mis-specification.
Chomskyan linguistics is an interesting choice as an illustration of
the dangers of the probabilistic approach. Whereas many of the
ideas introduced by Chomsky have subsequently been revised or
rejected, the notion of a generative grammar, once paired with the
statistical principles that underlie probabilistic models, has been the
basis for considerable advances in both computational linguistics
and psycholinguistics (e.g. [33,34,36,51]). This marriage of structure
and statistics is also at the heart of almost all modern machine
learning algorithms, having become the method of choice for
solving the types of real-world learning problems that people face
every day. Although cognitive modeling and machine learning are
two different enterprises, a basic challenge for both is to match
human-level performance in domains such as language, vision, and
reasoning. Of the modeling approaches that psychologists have
considered, the structured statistical approach comes closest to
meeting this challenge.
Box 4. Outstanding questions
� What are the connections between probabilistic models at the
computational level, and the psychological and neural processes
involved in cognition?
� How (and to what extent) might human behavior be understood
as an approximation to the ‘ideal observer’ behavior predicted by
the probabilistic approach? To what extent can approximations
built into probabilistic models implementing human-like cognitive
limitations account for divergences between human and model
performance?
� How might probabilistic inference and structured representations
be implemented in neural hardware?
� What questions about human cognition are more naturally
framed at levels lower than the computational level? Are there
any phenomena for which no computation-level explanation is
possible?
Opinion Trends in Cognitive Sciences Vol.14 No.8
progress will come from attempts to reduce abstract prob- abilistic analyses of cognition to psychological and neural mechanisms, rather than studies of how analogous com- putational functions might emerge from connectionist networks (Box 5). The flexibility to explore different assumptions about representation and inductive biases, and to naturally capture inferences over rich and struc- tured forms of knowledge, are central advantages of the top-down approach. However, there are two other import- ant differences between these approaches.
First, the top-down strategy fits particularly well with understanding solutions to the computational problems that the mind faces. Finding engineering solutions to these problems is the type of process that typically operates top- down, from high-level specification to physical imple- mentation. A probabilistic approach to reverse-engineer- ing themind forges strong connectionswith the latest ideas from computer science, machine learning, and statistics. Bottom-up accounts can be harder to interpret: we might simulate a complex system and find that its emergent behavior solves a cognitive problem, but that does not mean we will necessarily know how or why it solves it successfully.
Second, bottom-up accounts could be highly sensitive to details of the underlying mechanisms, and these details are either unknown or abstracted away in most current models. For instance, small differences in how neurons process information, adjust their weights, or connect with other neurons could lead to very different emergent beha- vior in a large neural network. These possibilities are particularly problematic given the rapidly evolving state of neuroscience research and the increasingly unclear relation between connectionist networks and biological neural circuits. Committing to a set of assumptions about the representations and inductive biases involved in human cognition thus seems premature.
Whereas the phenomena of human cognition must ulti- mately be analyzed at all of Marr’s levels, we are far from understanding how rich knowledge structures can be implemented in neural circuits. Whether such imple- mentations will ultimately resemble conventional connec- tionist models is an open question. However, when a neural-level understanding of human knowledge and its origins is eventually achieved, we predict that it will build on a deep understanding of these questions at the compu-
tational level – and that this understanding will be best framed using the concepts and principles of probabilistic inference.
References 1 Marr, D. (1982) Vision, W.H. Freeman 2 Newell, A. and Simon, H. (1956) The logic theory machine: a complex
information processing system. IRE Trans. Information Theory IT-2, 61–79
3 Russell, S.J. and Norvig, P. (2002) Artificial Intelligence: a Modern Approach, (2nd edn), Prentice Hall
4 Rogers, T. and McClelland, J. (2004) Semantic Cognition: a Parallel Distributed Processing Approach, MIT Press
5 Neal, R.M. (1996)Bayesian Learning for Neural Networks (Number 118 in Lecture Notes in Statistics), Springer-Verlag
363
Opinion Trends in Cognitive Sciences Vol.14 No.8
6 MacKay, D.J.C. (1995) Probable networks and plausible predictions – a review of practical Bayesian methods for supervised neural networks. Network: Comput. Neural. Systems 6, 469–505
7 Atran, S. (1998) Folk biology and the anthropology of science: cognitive universals and cultural particulars. Behav. Brain Sci. 21, 547–609
8 Rips, L.J. (1975) Inductive judgments about natural categories. J. Verbal Learning Verbal Behav. 14, 665–681
9 Kemp, C. and Tenenbaum, J.B. (2008) The discovery of structural form. Proc. Natl. Acad. Sci. U. S. A. 105, 10687–10692
10 Fodor, J.A. and Pylyshyn, Z.W. (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71
11 Marcus, G.F. (2001) The Algebraic Mind: Integrating Connectionism and Cognitive Science, MIT Press
12 Xu, F. and Tenenbaum, J.B. (2007) Word learning as Bayesian inference. Psychol. Rev. 114, 245–272
13 Kemp, C. and Tenenbaum, J.B. (2009) Structured statistical models of inductive reasoning. Psychol. Rev. 116, 20–58
14 Griffiths, T.L. and Tenenbaum, J.B. (2009) Theory-based causal induction. Psychol. Rev. 116, 661–716
15 Griffiths, T.L. and Tenenbaum, J.B. (2006) Optimal predictions in everyday cognition. Psychol. Sci. 17, 767–773
16 Shepard, R.N. (1987) Towards a universal law of generalization for psychological science. Science 237, 1317–1323
17 Chater, N. and Vitanyi, P.M.B. (2001) The generalized universal law of generalization. J. Math. Psychol. 47, 346–369
18 Tenenbaum, J.B. (2000) Rules and similarity in concept learning. In Advances in Neural Information Processing Systems (Vol. 12) (Solla, S.A., ed.), In pp. 59–65, MIT Press
19 Tenenbaum, J.B. and Griffiths, T.L. (2001) Generalization, similarity, and Bayesian inference. Behav. Brain Sci. 24, 629–641
20 Xu, F. and Tenenbaum, J.B. (2007) Sensitivity to sampling in Bayesian word learning. Dev. Sci. 10, 288–297
21 Gweon, H. et al. (2010) Infants consider both the sample and the sampling process in inductive generalization. Proc. Natl. Acad. Sci. U. S. A. 107, 9066–9071
22 Griffiths, T.L. and Tenenbaum, J.B. (2005) Structure and strength in causal induction. Cogn. Psychol. 51, 354–384
23 Spirtes, P. et al. (1993) Causation Prediction and Search, Springer- Verlag
24 Pearl, J. (2000) Causality: Models, Reasoning and Inference, Cambridge University Press
25 Gopnik, A. et al. (2004) A theory of causal learning in children: causal maps and Bayes nets. Psychol. Rev. 111, 1–31
26 Cheng, P. (1997) From covariation to causation: a causal power theory. Psychol. Rev. 104, 367–405
27 Lu, H. et al. (2008) Bayesian generic priors for causal learning. Psychol. Rev. 115, 955–984
28 Kemp, C. et al. (2007) Learning causal schemata. In Proceedings of the Twenty-Ninth Annual Conference of the Cognitive Science Society (McNamara, D.S. and Trafton, J.G., eds), pp. 389–394, Cognitive Science Society
29 Lucas, C.G. and Griffiths, T.L. (2010) Learning the form of causal relationships using hierarchical Bayesian models. Cogn. Sci. 34, 113–
147 30 Chater, N. and Vitányi, P. (2007) Ideal learning of natural language:
positive results about learning from positive evidence. J. Math. Psychol. 51, 135–163
31 Perfors, A. et al. (2006) Poverty of the stimulus? A rational approach. In Proceedings of the 28th Annual Conference of the Cognitive Science Society (Sun, R. and Miyake, N., eds), pp. 663–668, Cognitive Science Society
32 Frank, M. et al. (2009) Using speakers’ referential intentions to model early cross-situational word learning. Psychol. Sci 20, 578–585
364
33 Klein, D. and Manning, C. (2004) Corpus-based induction of syntactic structure: models of dependency and constituency. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics 486–497
34 Liang, P. et al. (2007) The infinite PCFG using hierarchical Dirichlet Processes. In Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 688–697, Association for Computational Linguistics
35 Goldwater, S. (2009) A Bayesian framework for word segmentation: exploring the effects of context. Cognition 112, 21–54
36 Manning, C. and Schütze, H. (1999) Foundations of Statistical Natural Language Processing, MIT Press
37 Weiss, Y. et al. (2002) Motion illusions as optimal percepts. Nature Neurosci. 5, 598–604
38 Han, F. and Zhu, S.-C. (2005) Bottom-up/top-down image parsing by attribute graph grammar. In Proceedings of the Tenth IEEE International Conference on Computer Vision, pp. 1778–1785
39 Tu, Z. et al. (2005) Image parsing: unifying segmentation, detection, and recognition. Int. J. Computer Vision 63, 113–140
40 Smith, L.B. et al. (2002) Object name learning provides on-the-job training for attention. Psychol. Sci. 13, 13–19
41 Kemp, C. et al. (2007) Learning overhypotheses with hierarchical Bayesian models. Dev. Sci. 10, 307–321
42 Perfors, A. and Tenenbaum, J. (2009) Learning to learn categories. In Proceedings of the 31st Annual Conference of the Cognitive Science Society (Taatgen, N. and van Rijn, H., eds), pp. 136–141, Cognitive Science Society
43 Goodman, N.D. et al. (2009) Learning a theory of causality. In Proceedings of the 31st Annual Conference of the Cognitive Science Society (Taatgen, N. and van Rijn, H., eds), pp. 2188–2193, Cognitive Science Society
44 Colunga, E. and Smith, L.B. (2005) From the lexicon to expectations about kinds: a role for associative learning. Psychol. Rev. 112, 347–382
45 Shi, L. et al. (2008) Performing Bayesian inference with exemplar models. In Proceedings of the Thirtieth Annual Conference of the Cognitive Science Society (Love, B.C. et al., eds), pp. 745–750, Cognitive Science Society
46 Medin, D.L. and Schaffer, M.M. (1978) Context theory of classification learning. Psychol. Rev. 85, 207–238
47 Nosofsky, R.M. (1986) Attention, similarity, and the identification- categorization relationship. J. Exp. Psychol.: General 115, 39–57
48 Sanborn, A.N. et al. (2006) A more rational model of categorization. In Proceedings of the 28th Annual Conference of the Cognitive Science Society (Sun, R. and Miyake, N., eds), pp. 726–731, Cognitive Science Society
49 Daw, N. and Courville, A.C. (2008) The pigeon as particle filter. In Advances in Neural Information Processing Systems (Vol. 20) (Platt, J.C. et al., eds), In MIT Press
50 Brown, S.D. and Steyvers, M. (2009) Detecting and predicting changes. Cogn. Psychol. 58, 49–67
51 Levy, R. et al. (2009) Modeling the effects of memory on human online sentence processing with particle filters. In Advances in Neural Information Processing Systems (Vol. 21) (Koller, D., eds), pp. 937–944
52 McClelland, J.L. (1998) Connectionist models of Bayesian inference. In Rational Models of Cognition (Oaksford, M. and Chater, N., eds), Oxford University Press
53 Hinton, G.E. (2007) Learningmultiple layers of representation.Trends Cogn. Sci. 11, 428–434
54 Ma, W.J. (2008) Spiking networks for Bayesian inference and choice. Curr. Opin. Neurobiol. 18, 217–222
55 Lee, T-S. and Mumford, D. (2003) Hierarchical Bayesian inference in the visual cortex. J. Opt. Soc. Am. A 20, 1434–1448
56 Smolensky, P. and Legendre, G. (2006)TheHarmonic Mind, MIT Press
- Probabilistic models of cognition: exploring representations and inductive biases
- Strategies for studying the mind
- Knowledge representation and probabilistic models
- The advantages of representational pluralism
- Rapid and flexible generalization
- Causal learning
- Learning language
- Visual perception
- Learning to learn
- The psychological and neural interpretation of probabilistic models
- Conclusion: start at the top, or at the bottom?
- References