Sensory paper

profileljbill68
sensory_document.pdf

Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood Pedram Daee1*, Maryam S. Mirian1, Majid Nili Ahmadabadi1,2

1 Cognitive Robotics Laboratory, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering,

University of Tehran, Tehran, Iran, 2 School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

Abstract

In a multisensory task, human adults integrate information from different sensory modalities -behaviorally in an optimal Bayesian fashion- while children mostly rely on a single sensor modality for decision making. The reason behind this change of behavior over age and the process behind learning the required statistics for optimal integration are still unclear and have not been justified by the conventional Bayesian modeling. We propose an interactive multisensory learning framework without making any prior assumptions about the sensory models. In this framework, learning in every modality and in their joint space is done in parallel using a single-step reinforcement learning method. A simple statistical test on confidence intervals on the mean of reward distributions is used to select the most informative source of information among the individual modalities and the joint space. Analyses of the method and the simulation results on a multimodal localization task show that the learning system autonomously starts with sensory selection and gradually switches to sensory integration. This is because, relying more on modalities -i.e. selection- at early learning steps (childhood) is more rewarding than favoring decisions learned in the joint space since, smaller state-space in modalities results in faster learning in every individual modality. In contrast, after gaining sufficient experiences (adulthood), the quality of learning in the joint space matures while learning in modalities suffers from insufficient accuracy due to perceptual aliasing. It results in tighter confidence interval for the joint space and consequently causes a smooth shift from selection to integration. It suggests that sensory selection and integration are emergent behavior and both are outputs of a single reward maximization process; i.e. the transition is not a preprogrammed phenomenon.

Citation: Daee P, Mirian MS, Ahmadabadi MN (2014) Reward Maximization Justifies the Transition from Sensory Selection at Childhood to Sensory Integration at Adulthood. PLoS ONE 9(7): e103143. doi:10.1371/journal.pone.0103143

Editor: Robert J. van Beers, VU University Amsterdam, Netherlands

Received March 19, 2014; Accepted June 27, 2014; Published July 24, 2014

Copyright: � 2014 Daee et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The authors confirm that all data underlying the findings are fully available without restriction. All relevant data are within the paper.

Funding: The authors have no funding or support to report.

Competing Interests: The authors have declared that no competing interests exist.

* Email: [email protected]

Introduction

To make an appropriate decision, our brain has to perceive the

current state of the environment. However, even our best senses

are noisy and can only provide an uncertain estimate of the

underlying state. The biological solution for achieving the best

perception is integration of uncertain individual estimates.

Human adults integrate sensory information, both across and

within different modalities, with seemingly the purpose of reducing

the uncertainty of their perception. The overwhelming majority of

behavioral studies have shown that this uncertainty reduction

happens in a statistically optimal fashion [1], [2]. One way to

model this optimal integration is employing the Bayesian

framework. In this framework and under some assumptions, the

integration procedure is modeled by a weighted average of the

individual sensors’ estimates. Each sensor’s weight is proportional

to its relative reliability; i.e. inverse of its uncertainty. It can be

shown that the reliability of the integrated estimate is higher than

that of any individual’s estimate.

Nevertheless, many behavioral studies indicate that this optimal

behavior, and in some cases even its neural foundations, are not

present at birth. Furthermore, it is only in the later stages of

development that multisensory functions appear and take the main

role in multisensory decision makings; see [3] for a comprehensive

review. An increasing number of studies in different sensory

modalities on adults and children have shown that, unlike adults,

children make their judgments based only on one of the available

sources of information. Some instances of this sensory selection

behavior have been observed in visual and haptic modalities for

size and orientation discrimination [4], visual landmarks and self-

motion information for navigation [5], and visual stereoscopic and

texture information for estimating surface slant [6].

The interesting open questions here are ‘‘Why does optimal

integration occur so late?’’ [7], why there is a tendency in sensory

selection in children, and finally, how and based on what measures

does the transition from sensory selection at childhood to sensory

integration at adulthood happen. While there are a considerable

number of hypotheses regarding the reasons behind these

phenomena (see [6], [3], [7]), to our knowledge, no existing study

has addressed these three questions with a unified computational

model. The primary aim of this research is to investigate the

computational advantages of the transition from sensory selection

PLOS ONE | www.plosone.org 1 July 2014 | Volume 9 | Issue 7 | e103143

at early ages toward multisensory integration at adulthood. The

second goal is to check if the above three questions can be

addressed by a single computational model.

We hypothesize that this selection and integration are emergent

behavior of a single reward maximization system. To verify our

hypothesis, we propose a mathematically sound and general

reward dependent learning framework (see Method) and test it in a

multisensory localization task (see Experiments and Results). The

learning method is value-based [8] [9] and progress of learning in

the framework corresponds to development of the agent over age.

This choice is natural as there are supporting studies indicating

that the multisensory integration is not innate and there should be

a learning mechanism behind its development (see [3], [10]).

Furthermore, this framework does not require most of the strict

mathematical assumptions that are building blocks of the

conventional Bayesian framework, which are widely used to

explain multisensory integration.

Method

Consider an agent with k sensors O1,O2,:::,Ok , where Oi is the observation space of the ith sensor. Furthermore, assume that the environment is fully observable in the Cartesian product of the

observation spaces, i.e. S~O1|O2|:::|Ok . At each time step, the agent should choose an action from its action set A according

to the perceptual input (state) s~(o1,o2,:::,ok ), where oi is the

current reading of the ith sensor. After performing the action, the agent receives an immediate reinforcement signal (reward) r from the environment. It is assumed that all the reward distributions,

corresponding to the state-action pairs, are unknown with support

in ½0,1�. The goal of the agent is to maximize the total amount of reward it receives over its lifetime. To achieve this goal, the agent

should learn the appropriate action in response to members of the

joint sensory space S.

The primary challenge here is that the state space S is high dimensional. Therefore, to learn the best action corresponding to

each member ofS, a large number of experiences (samples) is needed. This problem is known as the curse of dimensionality.

One way to tackle this problem is to use the experiences in the

subspaces ofS, such as Oi, for decision making [11], [12]. However, the environment in the eyes of Oi is partially observable, which creates a many-to-one mapping between real states of the

environment and observations in Oi. This problem is known as Perceptual Aliasing (PA) [13] and is avoided in general.

Nevertheless, PA might be beneficiary in learning a task [11],

since it can partially free the learner from the curse of

dimensionality if states sharing the same oi have similar optimal policies. PA might be helpful at the early stages of learning as well,

where learning a moderately rewarding policy over Oi is faster than learning a policy with the same reward over the joint space S. In these two cases, learning in the subspaces results in

generalization of experiences. In contrast, PA can be very

undesirable when functionally different states of the environment,

i.e. states with very different policies, are mapped to a same

observation in Oi. This case of PA turns the accumulated experience in that subspace into ‘‘garbage’’ [14]. Figure 1

illustrates these concepts in a simple example. Our proposed

statistical test (see Generalization Test) has the ability to detect

different cases of perceptual aliasing that are illustrated in the

figure.

In order to benefit from PA and to avoid its harms, a statistical

test is proposed to discriminate estimates of the expected reward

which are instances of generalization (beneficial cases of PA) from

garbage information. The proposed test is in part inspired from

McCallum’s work on learning with incomplete perception [15].

Then, a selection policy for choosing the most reliable source of

information is employed. Finally, according to the selected

information, a decision making policy has been introduced which

considers the exploration and exploitation trade-off. A schematic

overview of the proposed method, including the Generalization

Test (G Test) and the Decision Making phase, is illustrated in

Figure 2. In the following subsections, the proposed multisensory

learning and decision making method is explained in detail.

In general, there are two approaches for learning a task,

learning through labeled samples and learning by interaction.

State estimation in a supervised setting requires having the

specifications of the states at hand. Nevertheless, in reality we

should learn the states either directly or through learning the

optimal policy. In the problem at hand, the agent begins its life in a

tabula rasa state and there is no information available regarding

the observation models of sensors and the relation between the

agent’s sensory space S and its action space A. Furthermore, the only teacher that the agent can interact with is the environment.

Therefore, only through interactions with the environment, the

agent can learn to act properly. In this problem we are not

interested in learning the observation models of individual sensors

nor do we have the necessary sources of feedback to do this.

Therefore, this problem is different from the conventional

supervised learning where a teacher provides a set of labeled

data, and the agent needs only to learn the observation models of

sensors and perform a state estimation task.

1. Modeling The actual value of choosing action a[A when the agent is in

state s = (o1, o2,…,ok) is denoted as Q�(a,s[S), and its estimated value as Q(a,s[S). All the estimated values (Q-values) are represented in a DO1D|DO2D|:::|DOkD|DAD dimensional table, known as Q-table. Q-values are updated after each time step using

Q(a,s[S)~Q(a,s[S)zb(a,s[S)½r(a,s[S){Q(a,s[S)�,

where r(a,s[S) is the reward received after performing a in s, and 0vb(a,s[S)ƒ1 is the learning rate for the given state and action. We assume that the reward distributions are fixed throughout the

learning; i.e. the environment is stationary. In stationary

environments, it is rational to employ b(a,s[S)~ 1

#(a,s[S) , where

#(a,s[S) is the sample size, i.e. the number of times that action a is performed in state s. By using this learning rate, the above equation becomes identical to the incremental update formula for

computing the average reward [8]. Therefore, Q-values are the

sample means and Q�s are the actual means of the underlying reward distributions.

As it will be explained in the following sections, we need

confidence intervals on Q� s for our generalization test and decision making method. For a moderately large number of

samples, we can create a confidence interval on Q�(a,s[S) using the following bound [16]:

P Q(a,s[S){t#(a,s[S){1a 2

| std (a,s[S)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

#(a,s[S) p ƒQ�(a,s[S)ƒ

Q(a,s[S)zt#(a,s[S){1a 2

| std (a,s[S)ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

#(a,s[S) p

! ~1{a

ð1Þ

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 2 July 2014 | Volume 9 | Issue 7 | e103143

Figure 1. Different types of perceptual aliasing in subspaces. Oi~foi1,o i 2g represents the observation set of the i

th sensor for i = 1, 2.

S~fs1,s2,s3,s4g is the state set and A = {#,%,D} is the action set of the agent. az and a{ are the best and the worst actions in the given state, respectively. Accumulated experience in o12 is a perfect generalization for s

1 and s2, since these two states have the same optimal policy and o12 is

common between them. In contrast, accumulated experience in o22 is garbage information because functionally different states are mapped to the

same observation. The situation for o21 and o 1 1 is a little different. Only for the best action in o

2 1 and the worst action in o

1 1 we have the generalization,

however, for the other action this is not the case. doi:10.1371/journal.pone.0103143.g001

Figure 2. A schematic overview of the proposed framework for multisensory learning and decision making. s = (o 1

,o 2

,…,o k ) is the

perceptual input, oi is the current reading of the i th

sensor, and LBi is the learning block of the i th

sensor. For each action and based on the previously received rewards, each learning block calculates a confidence interval (CI ) on the mean of the reward distribution corresponding to the given observation and action pair. The proposed Generalization Test (G Test), tests the generalization ability of the individual source against the joint space. In case that an individual source passes the G Test, its confidence interval will be considered in the decision making phase. In decision making phase, an appropriate action based on the given intervals will be selected which considers the exploration and exploitation trade-off. doi:10.1371/journal.pone.0103143.g002

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 3 July 2014 | Volume 9 | Issue 7 | e103143

In (1) t #(a,s[S){1 a 2

is the Student t distribution with #(a,s[S){1

degrees of freedom. The parameter a[½0,1� controls the confidence that Q� will fall inside the confidence interval. Finally, the value

std(a,s[S) is the estimated standard deviation of the underlying reward distribution defined by

std (a,s[S)~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi #(a,s[S)

P r2(a,s[S){(

P r(a,s[S))2

#(a,s[S)|(#(a,s[S){1)

s ,

where P

r(a,s[S) is the sum of the rewards and P

r2(a,s[S) is the sum of the squares of the rewards received by performing a in s.

The confidence interval in (1) is mathematically valid when

either the number of samples (#(a,s[S)) is moderately large or when the reward distribution is Normal (Gaussian). Although these

conditions may seem rather restricting, in our experience, bound

(1) works reasonably well in most practical cases.

When the sample size is not sufficiently large or the reward

distribution is not Gaussian, we may use Chebyshev’s inequality to

calculate the confidence interval. To do so, we need the true

standard deviation of the reward distribution, which is not

available in general. However, defining the reward distribution

in the interval ½0,1�, the maximum possible value for the variance is 1

4 . Then a very conservative Chebyshev’s inequality is

P Q(a,s[S){ 1ffiffiffi a p |

0:5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi #(a,s[S)

p ƒQ�(a,s[S)ƒ

Q(a,s[S)z 1ffiffiffi a p |

0:5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi #(a,s[S)

p !

§1{a

ð2Þ

Although bounds (1) and (2) are similar in essence, bound (2) is

very conservative but independent of the reward distribution.

Conservativeness of (2) has roots in not taking into account the

type of the reward distribution and its estimated variance. This

lack of prior assumptions will result in extremely conservative

intervals in cases that the variances are very small or even zero. In

situations like these, it is better to employ the ‘‘variance-aware’’

inequality proposed in [17]:

P Q(a,s[S){std(a,s[S)

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ln 3

a

#(a,s[S)

s {

3 ln 3 a

#(a,s[S) ƒQ�(a,s[S)

0 @

Q(a,s[S)zstd(a,s[S)

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ln 3

a

#(a,s[S)

s z

3 ln 3 a

#(a,s[S)

1 A§1{a

ð3Þ

In this study, we are mainly interested in the length of the confidence intervals and their relative length to each other. Generally, by visiting new samples, the length of all the intervals in

bounds (1), (2), and (3) diminishes gradually. Therefore, as we will

see in the following sections, all the mentioned intervals are

applicable in our algorithm. In Discussions and Conclusions

section, a discussion on a number of practical points concerning

these bounds is provided.

For individual sensors, Q�(a,oi[Oi ) denotes the actual mean and Q(a,oi[Oi ) denotes the sample mean of reward, received by performing action a[A when the ith sensor’s observation is oi . We can create a confidence interval on Q�(a,oi[Oi ) by using the same

procedure and only replacing the following variables in bounds (1),

(2), or (3):

#(a,oi[Oi )~ X

p1 ,:::,pi{1 ,piz1 ,:::,pk

#(a,p1[O1,:::,oi[Oi ,:::,pk[Ok )ð4Þ

Q(a,oi[Oi )~ 1

#(a,oi[Oi )X p1 ,:::,pi{1 ,piz1 ,:::,pk

Q(a,p1[O1 ,:::,oi[Oi ,:::,pk[Ok )#(a,p1[O1 ,:::,oi[Oi ,:::,pk[Ok ) ð5Þ

The above equations express the marginal values for the ith

sensor.

In order to calculate std(a,oi[Oi ) we also need to calculate two more terms:

X r2(a,oi[Oi )~X

p1 ,:::,pi{1 ,piz1 ,:::,pk

½ X

r2(a,p1[O1,:::,oi[Oi ,:::,pk[Ok )� ð6Þ

X r(a,oi[Oi )~X

p1 ,:::,pi{1 ,piz1 ,:::,pk

½ X

r(a,p1[O1,:::,oi[Oi ,:::,pk[Ok )� ð7Þ

Calculation of (4)–(7) does not need extra learning trials

because, these variables are calculated by marginalization of

statistics of the joint space S.

2. Generalization Test A statistical test is proposed to answer the following question:

Is perceptual aliasing in oi , a beneficial case of generalization for action a[A,or a harmful case of ‘‘garbage’’ information?

Based on our modeling, we can restate the question as ‘‘is

Q�(a,oi[Oi ) a reasonable representation of Q�(a,s[S)?’’, where oi

is the current observation of the ith sensor and s = (o1,o2,…,ok). However, as previously mentioned, Q� s are unknown. As such, we use their confidence intervals by employing either bounds (1), (2),

or (3). We denote the confidence interval on Q�(a,s[S) as M and confidence interval on Q�(a,oi[Oi ) as CIi .

To validate the generalization ability of CIi , we need to test

whether CIi and M are estimating the same value (Q �(a,s[S)).

However, due to perceptual aliasing (many-to-one mapping), CIi has also experienced all the rewards used in the calculation of M.

Hence, checking the significance of their difference does not

provide useful information. The proposed idea here is to extract

the common experiences between CIi and M M M, and then

perform a statistical test on the residuals of CIi , and M. The

procedure of extracting common experiences from CIi is as

follows:

Q’(a,oi[Oi )~ #(a,oi[Oi )Q(a,oi[Oi ){#(a,s[S)Q(a,s[S)

#(a,oi[Oi ){#(a,s[S) ð8Þ

(5)

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 4 July 2014 | Volume 9 | Issue 7 | e103143

#’(a,oi[Oi )~#(a,oi[Oi ){ #(a,s[S) ð9Þ

X r2’(a,oi[Oi )~

X r2(a,oi[Oi ){

X r2(a,s[S) ð10Þ

X r’(a,oi[Oi )~

X r(a,oi[Oi ){

X r(a,s[S) ð11Þ

By using the variables on the left side of the above equations, a

new confidence interval CIi ’ can be created using any of bounds

(1), (2), or (3). For each action, CIi ’ represents the intervallic

estimate of the mean of a reward distribution created from

experiences in the current observation of the ith sensor, minus the experiences in the current state of the environment. If there exists

an intersection between CIi ’ and M, then there is a good chance

that CIi and M are estimating the similar expected value of

rewards (Q�(a,s[S)). In other words, it means that the perceptual aliasing in CIi is a case of generalization. The proposed test states

that at each time step for action a:

RejectCIiuM\CIi ’~1 ð12Þ

Based on (12), we can expect the following behavior in different

stages of learning:

N During initial steps of learning (when sample size is very small), M and CIi

’ both have large confidence intervals. Consequent-

ly, CIi will be able to pass the proposed test in most time steps.

Due to the low uncertainty in CIi , this behavior is desirable

during initial steps.

N By gaining new samples, both M and CIi’ shrink. Therefore, the ith sensor will be able to pass the test only if its experiences are a good generalization of M’s experience.

N As the sample size for M increases, its interval becomes smaller and smaller to a degree where it dwindles to only contain

Q�(a,s[S). The same thing happens for CIi ’ but it will

converge to a different point. As a result, the test will reject all

the individual sensors.

3. Decision Policy As mentioned earlier, the agent starts with no prior information

about the environment and the task at hand. Consequently,

throughout the learning it faces the dilemma of gaining new

experience by choosing one of the less explored decisions or

exploiting the past experiences by selecting one of the well-

rewarded decisions. This problem is known as the exploration

versus exploitation trade-off [8].

At each state s[S, it can be assumed that there are DAD unknown reward distributions which correspond to each action in the action

set A. The best action a� is the one corresponding to the distribution with the greatest mean, i.e. a�~ arg max

a[A Q�(a,s[S).

However, Q� s are unknown and the agent should make the decision based on their estimates. A good decision policy should

consider both the Q-value (sample mean statistic) and the

uncertainty regarding its expected value. The value of the sample

mean controls the exploitative selections, while its uncertainty

controls the explorative decisions. Clearly, the uncertainty of the

sample mean tends to zero as the number of samples tends to

infinity, resulting in a smooth transition from exploration to

exploitation as the number of samples increases.

A well-studied family of decision policies, which considers these

two criteria, works based on the idea of creating an upper

confidence interval on the mean of each reward distribution.

Based on the calculated upper bounds, the decision policy selects

the action with the greatest upper confidence interval [18]. This

Table 1. The function that implements MOS method.

function MOS(M, Accepted)

Input: M is the confidence interval on the joint space, Accepted is the array storing confidence intervals on the sources that passed the generalization test

1: MOS/ arg max CI[Accepted

CI

2: v/ min (MOS,M )

3: return v

doi:10.1371/journal.pone.0103143.t001

Table 2. The function that implements LUS method.

function LUS(M, Accepted)

Input: M is the confidence interval on the joint space, Accepted is the array storing confidence intervals on the sources that passed the generalization test

1: LUS/ arg min CI[Accepted

(CI {CI )

2: if (M{MvLUS{LUS) then LUS/M

3: v/ min (LUS,M)

4: return v

doi:10.1371/journal.pone.0103143.t002

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 5 July 2014 | Volume 9 | Issue 7 | e103143

idea is known as ‘‘optimism in face of uncertainty principle.’’ It has

been proved that variations of these decision policies, such as

UCB1 [19], achieve logarithmic expected regret, i.e. the expected

loss due to the fact that the agent does not always choose the

optimal action, uniformly over the total number of samples of the

given state. This amount of regret is the smallest possible expected

regret, up to a constant factor. Fortunately, in the proposed

approach we have already employed confidence intervals on the

means of the reward distributions. The only difference in our

problem is that we have a set of confidence intervals, instead of

one, for each action. Therefore, we need to integrate available

confidence intervals to one, and then employ the mentioned idea.

One can devise various methods for integrating a set of

intervals. However, in this study we are interested in finding,

specifically, the source of information that has the greatest impact

on the final decision. As a result, we reduce the integration

problem to selection of one of the available intervals as the

representative interval for the given action. We propose two

methods for this interval selection. The first method works by the

idea of selecting the Most Optimistic Source (MOS), while the

second method chooses the Least Uncertain Source (LUS). Details

of these methods are as follows:

At each state s[S and for each action a[A, given a set of confidence intervals of individual sensors which were able to pass

the previously mentioned test (12), the MOS method selects the

interval with the greatest upper bound. The LUS method, on the

other hand, selects the interval with the shortest length. The upper

bound value of the selected interval will be used as the

representative value for action a. However, if this value is greater than M’s upper bound, then M’s upper bound will be used as the representative value. The reason behind this constraint is that,

regardless of its great uncertainty, M is still the most reliable (with lowest aliasing) source of information regarding the actual mean of

the underlying reward distribution. Therefore, any value greater

than M’s upper bound is unrealistically optimistic. The idea behind LUS is that shorter intervals indicate lower uncertainty,

and it is always desirable to attend the least uncertain source of

information for decision making. The pseudo-codes of the MOS

and LUS methods are shown in Table 1 and Table 2. For bound

B, the notations B and B represent the upper bound and lower bound values of B, respectively.

After choosing an upper bound value (with either MOS or LUS

methods) for all the actions, the action with the maximum upper

bound value is selected as the final decision. By performing the

selected action, the environment returns the reward r[½0,1�. The complete pseudo-code of the proposed method is shown in Table 3.

The only parameter that needs to be initialized is a[½0,1�, where 1{a is the confidence coefficient of confidence intervals.

Experiments and Results

The task is a modified version of the localization task in the

visual and auditory modalities [2] [20]. The simulation setup is

based partly on [10]. At each time step, a stimulus is generated

randomly in one of the 30 discrete positions and each sensor

observes a noisy representation of it. The observation noise for

each sensor is modeled by a Gaussian distribution with standard

deviation d; see Figure 3. After observing the stimulus through its sensors, the agent chooses one of the 30 discrete positions as the

Table 3. The proposed Algorithm for Multisensory Learning and Decision Making.

Initialize Q(a,s), #(a,s), X

r(a,s), and X

r2 (a,s) to zero Vs[S,a[A

1: Repeat at each time step

2: s = (o 1

,o 2

,…,o k )

3: for each a[A do

4: Accepted/1

5: for each sensor i do

6: Calculate M , CIi , and CIi ’ based on either bounds (1), (2), or (3)

7: if (M\CIi ’)=1 then Accepted/Accepted|fCIig

8: value(a)/MOS(M,Accepted ) or LUS(M,Accepted )

9: Perform az~ arg max a’[A

value(a’), observe reward r

10: #(az,s)~#(az,s)z1

11: X

r(az,s)~ X

r(az,s)zr

12: X

r2 (az,s)~ X

r2 (az,s)zr2

13: Q(az,s)~ X

r(az,s) .

#(az,s)

14: Until the end of the learning

doi:10.1371/journal.pone.0103143.t003

Figure 3. Stimulus and observations by the auditory (oa ) and the visual (ov ) sensors. Observations are based on Gaussian noise models. Variances control the reliability of each sensor. doi:10.1371/journal.pone.0103143.g003

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 6 July 2014 | Volume 9 | Issue 7 | e103143

desirable action and receives an immediate reinforcement value in

½0,1�:

reward~ max (0,(1{ 1

t |Daction{stimulus positionD)) ð13Þ

We used t~4, which indicates that only actions (estimates) within a radius of three units from the stimulus position receive

positive rewards.

The agent has no prior information about the task, the

observation models, and the relation between the sensory space

and actions. Therefore, throughout the learning, it should learn

the appropriate action only based on the sensory inputs and

previously received rewards. On the other hand, the optimal

Bayesian observer [2] assumes that all of the mentioned

information is available and chooses its action according to the

following integration rule:

action~ 1 �

d2a

1 �

d 2 az1

� d

2 v

oaz 1 �

d2v

1 �

d 2 az1

� d

2 v

ov, ð14Þ

where da and dv are the standard deviations of the Gaussian noise models for the auditory and visual inputs, respectively. Moreover,

oa and ov are the representations of the stimulus in the auditory

and visual observation spaces. Behavioral studies have shown that

adults integrate information from sensors in a statistically optimal

manner which based on the Gaussian observation models, can be

formulated by equation (14).

In all the following experiments, the proposed method uses the

Cartesian product of the observation spaces of all the sensors for its

state space. The agent’s learning and decision making is based on

Table 3.

Experiment 1

In the first experiment we use d2v ~3 and d 2 a~5 (see Figure 3). In

order to validate our method, we employ three different agents.

Two of the agents (Visual and Auditory agents) use only the

individual sensors which will result in a state-action space of size

30|30 for each. The third one (Visual|Auditory agent) uses both sensors for its learning and decision makings and has a state-

action space of size 30|30|30. For these three agents, we employ the UCB1 policy [19] for decision making. UCB1

calculates upper bounds on the means of the reward distributions

based on the Hoeffding inequality. At each state s, UCB1 chooses

the action that maximizes

upperBound (a)~Q(a,s[S)z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r| ln (

X a’[A

#(a’,s[S))

#(a,s[S)

vuuut , ð15Þ

where Q(a,s[S) is the average reward obtained from performing action a in state s, #(a,s[S) is the number of times a has been selected in s, and r is the exploration coefficient [17]. In the original version of UCB1, r is set to 2. However, this value results in a high exploration rate. We use r~0:2 in all the experiments to increase the speed of learning for the rival agents.

It should be noted that when we use initial capital for a sensor,

we are referring to the agent that learns in that sensor space. For

instance, Visual refers to the agent that uses only the visual sate

space for its learning.

Figure 4. Performance and behavior of the method in the localization task. All graphs are results of averaging over 20 independent runs and passing a moving average window with size 500. (A) Average reward for all agents. For the proposed methods (MOS and LUS), we used Table 3, employing bound (1) with a~0:1 for calculating confidence intervals. The rival methods employ the UCB1 policy on the individual sensors and on the joint space. (B) Average acceptance rate (1–rejection rate) of the individual sensors in the proposed method (MOS). (C) The average dominancy percentage of each source in decision making (MOS). In the first half of learning steps, vision is the dominant sensor while the agent prefers the integrated sensory data in the rest of learning steps. doi:10.1371/journal.pone.0103143.g004

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 7 July 2014 | Volume 9 | Issue 7 | e103143

T a

b le

4 .

A n

a ly

zi n

g th

e le

a rn

in g

sp e

e d

a n

d th

e b

e h

a v

io r

o f

d if

fe re

n t

m e

th o

d s

fo r

E x

p e

ri m

e n

t 1

a n

d 2

.

P e

rc e

n ta

g e

o f

a c

c u

m u

la te

d re

w a

rd L

e a

rn in

g M

e th

o d

E x

p e

ri m

e n

t 1

E x

p e

ri m

e n

t 2

# ti

m e

s te

p P

e rc

e n

ta g

e o

f d

o m

in a

n c

e #

ti m

e s

te p

P e

rc e

n ta

g e

o f

d o

m in

a n

c e

V A

I V

A N

I

6 0

% Jo

in t

S p

a c

e 3

8 ,1

1 3

0 0

1 0

0 1

,1 4

1 ,6

4 0

0 0

0 1

0 0

M O

S ,

b o

u n

d (1

), a ~

0 :1

8 ,2

0 0

5 6

3 2

1 2

1 2

,4 5

5 6

2 2

7 7

4

L U

S ,

b o

u n

d (1

), a ~

0 :1

5 ,0

1 0

5 6

3 2

1 2

5 ,5

5 7

6 1

3 2

5 2

M O

S ,

b o

u n

d (2

), a ~

0 :4

7 ,9

0 1

6 2

3 7

1 1

0 ,8

2 8

6 0

3 2

8 0

L U

S ,

b o

u n

d (2

), a ~

0 :4

5 ,5

9 9

6 4

3 5

1 8

,8 5

2 6

0 2

9 1

1 0

7 5

% Jo

in t

S p

a c

e 8

1 ,1

7 9

0 0

1 0

0 2

,4 3

7 ,8

1 1

0 0

0 1

0 0

M O

S ,

b o

u n

d (1

), a ~

0 :1

1 7

,3 9

3 5

2 2

8 2

0 3

3 ,9

1 1

6 2

2 5

4 9

L U

S ,

b o

u n

d (1

), a ~

0 :1

1 0

,3 4

1 5

8 2

9 1

3 1

7 ,2

8 9

5 7

3 1

5 7

M O

S ,

b o

u n

d (2

), a ~

0 :4

1 7

,8 5

4 6

1 3

7 2

3 5

,9 7

9 6

7 2

8 4

1

L U

S ,

b o

u n

d (2

), a ~

0 :4

1 4

,1 3

8 6

7 3

1 2

3 5

,4 6

1 6

8 2

7 4

1

9 0

% Jo

in t

S p

a c

e 3

4 8

,9 4

5 0

0 1

0 0

1 0

,0 3

6 ,2

2 5

0 0

0 1

0 0

M O

S ,

b o

u n

d (1

), a ~

0 :1

7 2

,6 8

9 4

0 2

0 4

0 1

,1 4

8 ,0

6 6

4 3

2 0

2 3

5

L U

S ,

b o

u n

d (1

), a ~

0 :1

4 3

,2 8

1 5

0 2

5 2

5 9

7 4

,9 8

6 3

9 2

1 4

3 6

M O

S ,

b o

u n

d (2

), a ~

0 :4

9 6

,4 3

7 5

3 3

8 9

1 ,7

6 7

,7 5

4 5

8 3

0 3

9

L U

S ,

b o

u n

d (2

), a ~

0 :4

9 4

,2 0

4 6

6 2

5 9

1 ,8

3 1

,1 4

5 6

1 2

4 3

1 2

T h

e p

e rf

o rm

a n

c e

c ri

te ri

o n

is th

e n

u m

b e

r o

f ti

m e

st e

p s

n e

e d

e d

to re

a c

h a

c e

rt a

in p

e rc

e n

ta g

e o

f th

e B

a y

e si

a n

o p

ti m

a l

o b

se rv

e r’

s a

c c

u m

u la

te d

re w

a rd

. V

= v

is u

a l,

A =

a u

d it

o ry

, N

= n

o is

e ,

I= in

te g

ra ti

o n

. d

o i:1

0 .1

3 7

1 /j

o u

rn a

l.p o

n e

.0 1

0 3

1 4

3 .t

0 0

4

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 8 July 2014 | Volume 9 | Issue 7 | e103143

The average reward against the time step for all the agents and

the optimal Bayesian observer are shown in Figure 4A. For the

proposed methods (MOS and LUS), we employed bound (1) with

a~0:1. As can be seen in the figure, the proposed methods have a noticeably faster learning and higher rewards compared to the

Visual|Auditory agent. The Visual and the Auditory agents both have a smaller state space (only one sensor) which results in a fast

learning during initial time steps. However, due to their partial

perception, they can never reach the performance of the optimal

Bayesian observer.

To evaluate the proposed generalization test (see Figure 2 and

Generalization Test) for the proposed method (MOS), the average

outcome of the test for the chosen action against the time step is

shown in Figure 4B. The value in the vertical axis specifies the rate

of acceptance in the test which is 1–rejection rate. The test

completely accepts the individual sensors during initial steps. This

is in line with having a generalization power in the individual

sensors due to more samples. Nevertheless, as the joint space

learning improves, the rate of acceptance for the individual sensors

decreases. This is because of sufficient experience accumulation in

the joint space and existence of perceptual aliasing in the

individual sensor spaces. This decline is more noticeable for the

auditory sensor which is less reliable.

To investigate the decision making behavior of the proposed

method (MOS), the average dominancy percentage of each source

of information over time is shown in Figure 4C. In the initial steps

of learning, vision is the dominant modality. However, as the time

step increases there is a tendency to rely on the joint space for

decision making (sensory integration). Considering Figure 4A and

Figure 4C we can conclude that as the average reward received in

the joint space increases, the proposed method gradually switches

its decision policy from selection to integration. This behavior is

comparable to the humans’ shift from sensory selection at

childhood to sensory integration at adulthood.

Performance criteria for different variations of the proposed

method and the Visual|Auditory agent are illustrated in Table 4.

In Figure 4A there is a temporary decline in the average reward

of the individual sensors and the joint space agents. The reason

behind these declines is the inherent temporary exploration in

UCB1. In UCB1, the policy calculates 1{a upper confidence bound where a has an inverse relation with the total number of samples in state s (the logarithmic term in equation (15)).

Therefore, if an action has not been visited in a state for a long

time, this term forces the agent to choose that action. For large

state-action spaces, it creates temporary exploration phases in the

learning. This exploration is beneficial in non-stationary environ-

ments, however, our environment is stationary and the exploration

results in the observed decline. We reduced the exploration effect

by using small r in (15). We tested the individual sensors and the joint space agents using constant alpha and different types of

confidence intervals as well and the significant superiority of the

proposed method was still intact.

A non-stationary change in the environment. Having a

stationary environment is one of the basic assumptions we made.

To investigate the effect of an unexpected change in the

environment, we decreased the reliability of visual sensor to the

lowest possible value at step 105. The underlying reward distributions for the visual sensor and the joint space changed

accordingly. As Figure 5A shows, this change is detected by the

proposed test. As a result, the rate of acceptance of the visual

sensor noticeably decreases after step 105. However, in the decision making section, only the MOS method could cope with

this disturbance and the LUS method failed to adapt its behavior;

as it relies more on the joint space. The percentage of dominance

for each source of information in the MOS method is shown in

Figure 5B. After time step 105, the agent relies more on the auditory sensor and only about 13% of decisions are made

according to the visual data. We will discuss more on non-

stationary environments in Discussions and Conclusions.

Parameter setting. The method (Table 3) does not need any

tuning and the only open parameter is a[½0,1�, initialized at the beginning of the learning. Alpha defines the agent’s characteristic;

smaller value for a results in larger confidence intervals which means more tendency toward exploration than exploitation.

Moreover, small value for alpha makes the test easier for

individual sensors to pass, and as a result, postpones the transition

from selection to integration. Figure 6 shows these effects in

Experiment 1.

Experiment 2 The goal of this experiment is to study the method in the

presence of an added unreliable sensor (noise). The new sensor’s

reading is uniformly distributed noise. In other words, there is no

correlation between the position of the stimulus and the sensor’s

Figure 5. Performance of the method (MOS) in response to an unexpected change in the environment. At time step 105 the visual sensor fails and its variance changes to the highest possible value. All graphs are results of averaging over 10 independent runs and passing a moving average window with size 500. (A) Average acceptance rate (1{rejectionrate) of the individual sensors. (B) The average dominancy percentage of each source in decision making (MOS). After failure of the visual sensor, the method detects this change and relies on the auditory sensor for decision making which. doi:10.1371/journal.pone.0103143.g005

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 9 July 2014 | Volume 9 | Issue 7 | e103143

reading. By adding this sensor, the size of the joint state-action

space jumps to 30|30|30|30.

The Noise agent has no beneficial learning and its average

reward curve is flat throughout its life; see Figure 7A. Further-

more, due to the presence of this unreliable sensor, learning by the

joint space agent has been drastically diminished compared to the

Visual agent. The proposed method (MOS) has been able to

identify the unreliable source of information and therefore, has

been superior to the joint space agent in terms of both learning

speed and average reward. However, during the initial steps of

learning, its average reward is slightly lower than the Visual agent.

It is the cost of having no prior information about the unreliable

sensor which makes the method to explore more at the early steps

of learning.

The results of the proposed test and the percentage of

dominance of each source of information in decision making are

shown in Figure 7B and Figure 7C, respectively. The rate of

acceptance for all subspaces declines by time and this decline is

faster for the unreliable sensor. Moreover, according to Figure 7C,

only about 3% of the time the unreliable sensor chooses the final

decision. This noise selection mostly contains explorative deci-

sions. This result is evidence that the proposed method clearly

considers a subsection of its state space as unreliable and filters it in

the decision makings.

Comparisons. Table 4 illustrates learning speed in terms of

the number of time steps required for each method to reach a

certain percentage of the accumulated reward that the Bayesian

optimal decision maker achieves. Table 4 also shows the

percentage of dominance for each source of information. In all

variations of the proposed method, the percentage of dominance

for sensory integration increases by progress of learning. Also in

the second experiment, the dominance of the noise sensor

decreases with time steps. The results indicate that presence of

the unreliable sensor in the joint space has made the method

slower in the second experiment. This is because the agent has to

live with its reliable individual sensors until its joint space yields a

reasonable amount of samples to be considered reliable.

We proposed two methods for decision making; namely MOS

and LUS, see Table 1 and Table 2. The MOS method chooses the

most optimistic source of information, while LUS attends the

source with the lowest uncertainty. Both of these criteria are

plausible choices for decision making and in our experience both

and even some combinations of them work well in practice. Based

on Table 4, the LUS method requires fewer time steps compared

to the MOS method to reach a certain percentage of performance

in both experiments.

Confidence intervals. Due to the extreme conservative

nature of bounds (2) and (3), for the same a, their learning speed is slower than bound (1) in most cases. On the bright side, these

bounds are mathematically valid for all kinds of reward

distributions. To compensate for this conservativeness, it is

recommended to use larger values for a (smaller confidence coefficients) when employing bounds (2) and (3). Furthermore, as

mentioned in Method Section, bound (3) is only appropriate in

situations where the variances of the reward distributions are

small. However, in most cases, there is no information available

about the type of the reward distributions and their variances. In

these general situations, bound (2) with a moderate value for a is a reasonable choice. For example, in both of the discussed

experiments, by using bound (2) and increasing the value of a to 0.4, we achieved similar learning speed and average reward to

those illustrated in Figure 4A and Figure 7A. A summary of these

results is shown in Table 4.

Extension to the power set of sensors. Throughout this

paper, only individual sensors along with their joint space were

considered as the sources of information. However, by a slight

modification in equations (4)–(7), we can calculate the necessary

marginal values for any combination of sensors. Based on this idea,

instead of k sensors, we can create 2k{2 sources of information beside the primary joint space. By employing these sources instead

of the individual sensors in line 5 of Table 3, a new variation of the

proposed method will be formed. Considering this modification in

the algorithm, we performed Experiment 2 with the LUS method

using bound (1) and a~0:1. The percentage of dominance of each source of information is shown in Figure 8. In the first section of

Figure 6. Impact of a. We used four different values (0.05, 0.25, 0.45, 0.80) for a from being conservative to liberal in terms of confidence. All graphs are results of averaging over 10 independent runs and passing a moving average window with size 500. (A) Average acceptance rate (1–rejection rate) of the individual sensors in the proposed method (MOS). The upper/lower ribbon for each value of a represents visual/auditory sensor. By increasing a, the test becomes harder for the individual sensors to pass. (B) The average dominancy percentage of each source in decision making (MOS). For each value of a, the ascending ribbon represents integration and the two descending ribbons represent selection of visual and auditory sensors. Increasing a results in earlier cross of the ascending and the descending ribbons; i.e. earlier switch from selection to integration. doi:10.1371/journal.pone.0103143.g006

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 10 July 2014 | Volume 9 | Issue 7 | e103143

learning, the final decision is mostly based on the reliable

individual sensors and vision is the dominant modality. However,

as the agent matures, the most reliable source of information,

which is visual|auditory subspace, takes the main role in decision

makings. It means that the extended method has the ability to

autonomously elicit the reliable subspaces and to filter the

unreliable subspaces of its state space. This modification does

not change the amount of required memory. However, the new

processing complexity will be exponential, which is still reasonable

for tasks with a few sensors.

Discussions and Conclusions

The optimal multisensory integration behavior of adults has

been substantially addressed in the literature [1], [2]. However,

there are fewer studies and experiments regarding the idea of

sensory selection in children [3]–[6]. This lack of sufficient

observations is even more significant in the complete age spectral.

As a result, there is not sufficient experimental data available to

form a definite hypothesis about the transition from sensory

selection to sensory integration.

One hypothesis regarding this transition has been proposed by

Gori et al. [4], [21]. Their hypothesis is that children select the

more accurate sense in multisensory tasks with the purpose of

cross-sensory calibration between senses. They suggested that the

cross-sensory calibration might have an important impact on

maturation of the multisensory perception. In this paper, we have

illustrated that even in absence of the cross-sensory calibration

hypothesis, the mere transition from the accurate subspaces to the

joint space has its own computational advantages. This smooth

transition not only facilitates maturation of the multisensory

perception, but it is also essential for having a rewarding life.

To show these advantages, we proposed a general multisensory

learning method (see Method and Table 3). The proposed method

has the ability to autonomously choose different subsets of its state

space based on their generalization property and reliability for

decision making. Unlike the Bayesian framework, our method

neither makes any prior assumptions about the observation model

of sensors nor about the relation between sensory space and

actions.

It was shown that for an agent who starts its life in a tabula rasa

state, the seemingly optimal behavior is to rely on its individual

Figure 7. Performance and behavior of the method in response to an unreliable sensor. All graphs are results of averaging over 20 independent runs and passing a moving average window with size 1000. (A) Average reward for all agents. For the proposed method (MOS), we used Table 3, employing bound (1) with a~0:1 for calculating confidence intervals. The rival methods employ the UCB1 policy on the individual sensors and on the joint space. (B) Average acceptance rate (1–rejection rate) of the individual sensors in the proposed method (MOS). (C) The average dominancy percentage of each source in decision making (MOS). Due to unreliability of the noise sensor, it takes longer for learning in the integrated states to mature and, therefore, dominancy of the visual sensor is prolonged. doi:10.1371/journal.pone.0103143.g007

Figure 8. Dominancy of subspaces over time. The average dominancy percentage of different combination of sensors in decision making (LUS). Subspaces including the unreliable source have been filtered. Furthermore, dependency on the integration of reliable sensors increases over time. doi:10.1371/journal.pone.0103143.g008

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 11 July 2014 | Volume 9 | Issue 7 | e103143

sensors during early life, and to switch to the joint space (sensory

integration) in later stages. This behavior is compatible to the

empirical findings. Experimental data indicate that children do not

integrate sensory information and make their judgments based

only on one sensor, whereas adults use multisensory integration for

their decision making [3]–[6]. It was also shown that the proposed

method is significantly superior to the individual sensor agents

(sensory selection alone) and the joint space agent (only sensory

integration) in terms of both learning speed and average reward.

Based on these findings, we suggest that this selection and

integration, which may be interpreted as two separate methods for

decision making, are in fact two sides of a coin and both serve the

reward maximization behavior. In addition, the transition from

selection to integration is a developmental phenomenon and is

smooth.

In our framework, the integration-based decisions will become

dominant only after the agent receives enough multisensory

experiences during the initial stages of its life. There is also similar

empirical evidence that the maturation of the integration decisions

is related to the early life experiences (see [22], [3]). Moreover, in

[10] the authors showed that by using the reward dependent

framework, the problem of causal inference in multisensory

perception [23] could also be solved in an interactive fashion.

For showing this, they used an artificial neural network for

calculating the average reward statistics in the joint sensory space.

Based on the average rewards, they used a softmax policy for

decision making. With some simplifications, we can say that their

agent is inherently equivalent to the joint space agent used in our

work. The main focus of Weisswange et al. [10] is on the ability of

the learning agent to reach the performance of the Bayesian

optimal observer. In our work, on the other hand, we have

investigated the role of subspace selection in efficiency of

interactive learning. Our results justify that our method can reach

the performance of the Bayesian optimal observer as well. On top

of that, our method justifies the switch from selection to

integration in terms of reward maximization. These studies along

with our results indicate that by considering the reward dependent

framework, we can model (at least in the behavioral aspect) most of

the age-related sensory integration phenomena, without making

unnecessary mathematical assumptions about the sensor system

and the task.

In Experiment 2 it was shown that the algorithm is also

plausible in situations where there is a completely unreliable

source of information in the joint space. Even in this extreme

scenario, our method outperforms its competitors but faces a slight

decrease in the learning speed during initial steps. This decrease is

indispensable for any interactive learning method which explores

different sources of information.

We assumed that the environment is stationary; i.e. the reward

distributions are time invariant, or in other words, the sensory

models are fixed throughout the learning. These assumptions are

widely used in the learning literature. Nevertheless, interactive

learning methods can inherently track non-stationary situations;

but of course with a lag due to being experienced-based. We

discuss this point more in the sequel. In Figure 5 it is shown that

the algorithm (using MOS) tracks the sudden change in the

environment, called unexpected uncertainty [24], and adapts itself.

Nevertheless, there are some methods to directly deal with

unexpected uncertainty. For example, a solution is recalculation

of the required statistics after detection of an unusual behavior

from the environment. This can easily be done by saving the

received rewards in a moving window (a short-term memory) and

calculating the necessary statistics accordingly [25].

In this work, for simplicity, we used tables for storing the

required statistics. This naturally results in the discretization of the

state space. Nevertheless, our approach can be generalized to

continuous spaces by using the idea of function approximation for

estimating the required statistics in Table 3. We believe that to

demonstrate the subspace selection behavior of the proposed

method for the task at hand, a simple discrete state space is a well-

suited balance of complexity and simplicity. However, in our

future works we will investigate and test the theory of continuous

version of our algorithm in more complex and practical tasks.

In summary, the proposed algorithm is a dynamic subspace

selection method for decision making in interactive learning

frameworks. Our method intelligently evades the curse of

dimensionality problem by exploiting inherent perceptual aliasing

in subspaces. This results in fast learning in addition to an efficient

and self-governing transition from sensory selection to integration.

This transition is essential for having a rewarding life. In addition,

the proposed algorithm (Table 3) is easily implementable. These

properties make our method an appropriate candidate for lifetime

learning of artificial agents having a large number of sensors.

Therefore, an important direction of our research team is to

extend the current single-step algorithm to a general multi-step

learning and decision making algorithm (reinforcement learning).

Based on the value-based decision making framework proposed in

[9], we can categorize the main contribution of our algorithm in

the representation phase where given a set of sensory inputs, the

goal is to achieve the most rewarding state representation.

Acknowledgments

The author Pedram Daee would like to thank Amin Niazi and Habib

Zafarian for their time and comments.

Author Contributions

Conceived and designed the experiments: PD MNA. Performed the

experiments: PD. Analyzed the data: PD MNA MSM. Contributed to the

writing of the manuscript: PD MNA MSM. Developed the model: PD

MNA MSM.

References

1. Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in

a statistically optimal fashion. Nature 415: 429–433.

2. Alais D, Burr D (2004) The Ventriloquist Effect Results from Near-Optimal

Bimodal Integration. Current Biology 14: 257–262.

3. Burr D, Gori M (2012) Multisensory Integration Develops Late in Humans. In:

Murray MM, Wallace MT, editors. The Neural Bases of Multisensory Processes.

Boca Raton (FL): CRC Press.

4. Gori M, Del Viva M, Sandini G, Burr DC (2008) Young Children Do Not

Integrate Visual and Haptic Form Information. Current Biology 18: 694–698.

5. Nardini M, Jones P, Bedford R, Braddick O (2008) Development of Cue

Integration in Human Navigation. Current Biology 18: 689–693.

6. Nardini M, Bedford R, Mareschal D (2010) Fusion of visual cues is not

mandatory in children. PNAS 107: 17041–17046.

7. Ernst MO (2008) Multisensory integration: a late bloomer. Current Biology 18:

R519–521.

8. Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction.

Cambridge, UK: MIT Press.

9. Rangel A, Camerer C, Montague PR (2008) A framework for studying the

neurobiology of value-based decision making. Nature Reviews Neuroscience 9:

545–556.

10. Weisswange TH, Rothkopf CA, Rodemann T, Triesch J (2011) Bayesian Cue

Integration as a Developmental Outcome of Reward Mediated Learning. PLoS ONE 6(7): e21575. doi:10.1371/journal.pone.0021575

11. Firouzi H, Ahmadabadi MN, Araabi BN, Amizadeh S, Mirian MS, et al. (2012) Interactive Learning in Continuous Multimodal Space: A Bayesian Approach to

Action-Based Soft Partitioning and Learning. Autonomous Mental Develop-

ment, IEEE Transactions on 4: 124–138.

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 12 July 2014 | Volume 9 | Issue 7 | e103143

12. Mirian MS, Ahmadabadi MN, Araabi BN, Siegwart RR (2010) Learning Active

Fusion of Multiple Experts’ Decisions: An Attention-Based Approach. Neural Computation 23: 558–591.

13. Whitehead SD, Ballard DH (1991) Learning to Perceive and Act by Trial and

Error. Machine Learning 7: 45–83. 14. Mccallum RA (1995) Instance-Based Utile Distinctions for Reinforcement

Learning with Hidden State. In Proceedings of the Twelfth International Conference on Machine Learning: 387–395.

15. Mccallum RA (1993) Overcoming Incomplete Perception with Utile Distinction

Memory. In Proceedings of the Tenth International Conference on Machine Learning: 190–196.

16. Casella G, Berger RL (1990) Statistical inference. Belmont, CA: Duxbury Press. 17. Audibert J-Y, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff

using variance estimates in multi-armed bandits. Theoretical Computer Science 410: 1876–1902.

18. Lai T, Robbins H (1985) Asymptotically efficient adaptive allocation rules.

Advances in Applied Mathematics 6: 4–22.

19. Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time Analysis of the Multiarmed

Bandit Problem. Machine Learning 47: 235–256.

20. Battaglia PW, Jacobs RA, Aslin RN (2003) Bayesian integration of visual and

auditory signals for spatial localization. Journal of the Optical Society of America

A, Optics, image science, and vision 20: 1391–1397.

21. Gori M, Sandini G, Burr D (2012) Development of Visuo-Auditory Integration

in Space and Time. Frontiers in Integrative Neuroscience 6: 77.

22. Wallace MT, Stein BE (2007) Early experience determines how the senses will

interact. Journal of Neurophysiology 97: 921–926.

23. Körding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, et al. (2007)

Causal Inference in Multisensory Perception. PLoS ONE 2: e943.

24. Dayan P, J Yu A (2003) Uncertainty and learning. IETE Journal of Research

49.2/3: 171–182.

25. Narain D, van Beers RJ, Smeets JBJ, Brenner E (2013) Sensorimotor priors in

nonstationary environments. J Neurophysiol. 109: 1259–67. doi: 10.1152/

jn.00605.2012.

The Transition from Sensory Selection to Sensory Integration in Humans

PLOS ONE | www.plosone.org 13 July 2014 | Volume 9 | Issue 7 | e103143

Copyright of PLoS ONE is the property of Public Library of Science and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.