LCLogit, Latent class logit

Michelle_Michy

HessBierPola06.pdf

Home >Education homework help >LCLogit, Latent class logit

A systematic comparison of continuous

and discrete mixture models

Stephane Hess∗ Michel Bierlaire† John W. Polak‡

November, 17, 2006

Report TRANSP-OR 061117

Transport and Mobility Laboratory

School of Architecture, Civil and Environmental Engineering

Ecole Polytechnique F�ed�erale de Lausanne

transp-or.epfl.ch

∗Centre for Transport Studies, Imperial College London, [email protected],

Tel: +44(0)20 7594 6105, Fax: +44(0)20 7594 6102 †Transport and Mobility Laboratory, School of Civil and Environmental Engineering,

�Ecole Polytechnique F�ed�erale de Lausanne, michel.bierlaire@ep .ch, Tel: +41(0)21 693 25

37, Fax: +41(0)21 693 55 70 ‡Centre for Transport Studies, Imperial College London, [email protected], Tel:

+44(0)20 7594 6089, Fax: +44(0)20 7594 6102

Abstract

Modellers are increasingly relying on the use of continuous random

coe�cients models, such as Mixed Logit, for the representation of

variations in tastes across individuals. In this paper, we provide an

in-depth comparison of the performance of the Mixed Logit model

with that of its far less commonly used discrete mixture counterpart,

making use of a combination of real and simulated datasets. The

results not only show signi�cant computational advantages for the

discrete mixture approach, but also highlight greater exibility, and

show that, across a host of scenarios, the discrete mixture models are

able to o�er comparable or indeed superior model performance.

1 Introduction and context

Allowing for variations in behaviour across decision makers is one of the

most fundamental principles in discrete choice modelling, given that the

assumption of a purely homogeneous population cannot in general be seen

to be valid. The typical way of allowing for such variation is through

a deterministic approach, linking the taste heterogeneity to variations in

socio-demographic factors such as income or trip purpose.

While appealing from the point of view of interpretation (and especially

for forecasting), it is often not possible to represent all variations in tastes in

a deterministic fashion, for reasons of data quality, but also due to inherent

randomness in choice behaviour. For this reason, random coe�cient struc-

tures, such as the Mixed Multinomial Logit (MMNL) model, which allow

for random variations in behaviour across respondents, have an important

advantage in terms of exibility. In general, such models have the disad-

vantage that their choice probabilities take on the form of integrals that do

not possess a closed form solution, such that numerical processes, typically

simulation, are required during estimation and application of the models.

This greatly limited the use of these structures for many years after their

initial developments. Over recent years, gains in computer speed and the

e�ciency of simulation based estimation processes (see for example Hess

et al., 2006) have however led to increased interest in the MMNL model in

particular, by researchers and, to a lesser degree, also practitioners.

Despite the improvements in estimation capability, the cost of using the

MMNL model remains high. While this might be acceptable in many cases,

another important issue remains, namely the choice of distribution to be

used for representing the random variations in tastes across respondents.

Here, there is a major risk of producing misleading results when making

an inappropriate choice of distribution, as discussed by Hess et al. (2005).

In this paper, we explore an alternative approach, based on the idea of

replacing the continuous distribution functions by discrete distributions,

spreading the mass among several discrete values. Mathematically, the

model structure of a DM model is a special case of a latent class model

(Kamakura and Russell, 1989; Chintagunta et al., 1991, cf.), assigning dif-

ferent coe�cient values to di�erent parts of the population of respondents,

a concept discussed in the �eld of transport studies for example by Greene

and Hensher (2003) and Lee et al. (2003). Latent class approaches make

use of two sub-models, one for class allocation, and one for within class

choice. The former models the probability of an individual being assigned

to a speci�c class as a function of attributes of the respondent and possibly

of the alternatives in the choice set. The within class model is then used to

compute the class-speci�c choice probabilities for the di�erent alternatives,

conditional on the tastes within that class. The actual choice probability

for individual n and alternative i is given by a sum of the class-speci�c

choice probabilities, weighted by the class allocation choice probabilities

for that speci�c individual.

The latent class approach is appealing from the point of view that it

allows for di�erences in sensitivities across population groups, where the

group allocation can be related to socio-demographic characteristics. How-

ever, in practice, it may not always be possible to explain group allocation

with the help of a probabilistic model relating the outcome to observed

variables. This situation is similar to the case where taste heterogeneity

cannot be explained deterministically, leading to a requirement for using

random coe�cients models. As such, in this paper, we explore the use

of models in which the class allocation probabilities are independent of

explanatory variables, and are simply given by constants that are to be

estimated during model calibration. As such, the resulting model exploits

the class membership concept in the context of random coe�cients models,

with a limited set of possible values for the coe�cients.

Thus far, there have seemingly been only two applications of this ap-

proach in the area of transport research, by Gopinath (1995), in the context

of mode choice for freight shippers, and by Dong and Koppelman (2003),

who made use of discrete mixtures of MNL models in the analysis of mode

choice for work trips in New York, referring to the resulting model as the

\Mass Point Mixed Logit model". Although the properties of DM models

have been discussed by several other authors (Wedel et al., 1999, e.g.), the

model structure does not seem to have received widespread exposure or

application, despite its many appealing characteristics.

Given the above discussion, part of the aim of this paper is to re-explore

the potential advantages of DM models, with the hope of encouraging their

more widespread use. Additionally, the paper aims to o�er a systematic

comparison of the performance of discrete and continuous mixture models

across a host of situations, making use of simulated data.

The remainder of this paper is organised as follows. The next section

sets out the theory behind DM models. Section 3 presents a case study

using real data, while Section 4 uses four di�erent simulated datasets in a

systematic comparison of discrete and continuous mixture models. Finally,

Section 5 presents the conclusions of the paper.

2 Methodology

We begin by introducing some general notation, which is used throughout

the remainder of this paper. Speci�cally, let xin be a vector de�ning the

attributes of alternative i as faced by respondent n (potentially including

interactions with socio demographic variables), and let β be a vector de�n-

ing the tastes of the decision maker, where, in purely deterministic models,

β is constant across respondents. Let xn be a vector grouping together the

individual vectors xjn across the alternatives contained in the choice set of

respondent n, and let γ represent an additional set of parameters, which

can for example contain the structural parameters (and possibly allocation

parameters) used to represent inter-alternative correlation in a Generalised

Extreme Value (GEV) context. In a very general form, we can then de�ne

Pn (i | xn, Cn, γ, β) to give the choice probability of alternative i for indi-

vidual n, with a choice set Cn, conditional on the observed vector xn, and

for given values for the vectors of parameters β and γ (to be estimated).

Due to the potential inclusion of socio-demographic attributes in xn, this

notation allows for deterministic variations in tastes across respondents.

In a discrete mixture context, the number of possible values for the taste

coe�cients β is �nite. Here, we divide the set of parameters β into two sets; �β represents a part of β containing deterministic parameters, while β̂ is a

set of K random parameters that have a discrete distribution. Within this

set, the parameter β̂k has mk mass points β̂ j k, j = 1, . . . , mk, each of them

associated with a probability π j k, where we impose the conditions that

0 ≤ πjk ≤ 1, k = 1, . . . , K; j = 1, . . . , mk, (1)

and mk∑ j=1

π j k = 1, k = 1, . . . , K. (2)

For each realisation β̂ j1 1 , . . . , β̂

jK K of β̂, the choice probability is given by

( i | xn, Cn, γ, β = 〈�β, β̂

j1 1 , . . . , β̂

jK K 〉

) , (3)

where the deterministic part of �β stays constant across realisations of the

vector β̂.

The unconditional (on a speci�c realisation of β, not on the distribution

of β̂) choice probability for alternative i and decision maker n can now be

written straightforwardly as a mixture over the discrete distributions of the

various elements contained in β̂ as:

( i | xn, Cn, γ, �β, β̂, π

) =

m1∑ j1=1

· · · mK∑

jK=1

( i | xn, Cn, γ, β = 〈�β, β̂

j1 1 , . . . , β̂

jK K 〉

) π

j1 1 · . . . · π

jK K , (4)

where �β, β̂ and π (π = 〈π11, . . . , π m1 1 , . . . , π

1 K, . . . , π

mK K 〉) are vectors of pa-

rameters to be estimated in a regular maximum likelihood estimation pro-

cedure. An obvious advantage of this approach is that, if the model (3)

used inside the mixture has a closed form, then so does the DM itself.

In this paper, we mainly focus on the simple case where the underly-

ing choice model is of MNL form; however, the form given in equation (4)

is appropriate for any underlying model, where, with an underlying GEV

structure, the resulting model obtains a closed form expression, avoiding

the need for simulation in estimation and application. In this case, the

1These constraints can be avoided by setting πi = eαi∑J

j=1 e

αj , where αj with j = 1, . . . , J

are estimated without constraints. While avoiding the need for constraints, this formula-

tion becomes highly non-linear and di�cult to handle in estimation.

vector γ would contain parameters that determine the nesting structure

of the model. The approach can easily be extended to the case of com-

bined discrete and continuous random taste variation, by partitioning β

into three parts; the above de�ned parts �β and β̂, and an additional part

β̃, whose elements follow continuous distributions2. This however leads to

a requirement to use simulation, as with all continuous mixture models.

Finally, independently of the additional treatment of random variations

in tastes, a treatment of repeated choice observations analogous to the stan-

dard continuous mixture treatment, with tastes varying across individuals,

but not across observations for the same individual, is made possible by

replacing the conditional choice probabilities for individual observations in

equation (4) by probabilities for sequences of choices, and by using the

resulting DM term inside the log-likelihood function.

Several issues arise in the estimation of DM models. Firstly, the non-

concavity of the log likelihood function does not allow the identi�cation of

a global maximum, even for discrete mixtures of MNL. Given the potential

presence of a high number of local maxima, performing several estimations

from various starting points is advisable. Also, it is good practice to use

starting values other than 0 or 1 for the π j k parameters. Secondly, con-

strained maximum likelihood must be used to account for constraints (1)

and (2). Thirdly, clustering of mass points (for example around the mode

of the true distribution) is a frequent phenomenon with DM models, and

the use of additional bounds on the mass points can be useful, based on

the de�nition of (potentially mutually exclusive) a priori intervals for the

individual mass points. In this context, a heuristic is needed to determine

the optimal number of support points in actual applications.

For the purpose of this analysis, the model was coded into BIOGEME

(Bierlaire, 2003), where various constraints on the parameters can be im-

posed to address the issues described above. This also allows modellers

to test the validity of speci�c assumptions, such as a mass at zero for the

VTTS, a concept discussed for example by Cirillo and Axhausen (2006).

2This approach can then also be used to include error components for correlation or

heteroscedasticity.

3 VTTS case study

In this section, we present the �ndings of an analysis making use of real

world data. We �rst give a brief description of the data in Section 3.1,

before looking at model speci�cation in Section 3.2. The estimation results

are presented in Section 3.3.

3.1 Data

The study presented here makes use of Stated Preference (SP) data col-

lected as part of a recent value of time study undertaken in Denmark (Burge

and Rohr, 2004). Speci�cally, we make use of data describing a binary

choice process for car travellers, with alternatives described only in terms

of travel cost and travel time. Each respondent was presented with 9 choice

situations, including one with a dominating alternative.

After eliminating the observations with a dominating alternative, as well

as additional data cleaning (removing non-traders and respondents who did

not choose the dominating alternative), a sample of 13,386 observations

from 1,723 respondents was obtained. This equates to 3,037 observations

from 392 commuters, 1,081 observations from 142 respondents travelling for

education purposes, 1,767 observations from 230 people on shopping trips,

3,155 observations from 404 people travelling to visit friends or relatives,

1,752 observations from 224 general leisure travellers and 2,594 observations

from 331 respondents travelling for other purposes.

To allow us to gauge the stability of the results, random subsamples of

around 80% of the original sample size were generated for each of the above

listed six purpose segments3, where in each case, 10 such subsamples were

created.

3.2 Model specification

The models used in this paper were estimated in log-WTP (willingness

to pay) space (Fosgerau, 2004, cf.), avoiding the e�ect of heterogenous

3The selection was performed at the individual-speci�c level, rather than the

observation-speci�c level.

scale (Fosgerau and Bierlaire, 2006, cf.), while allowing us to represent

random variations in the VTTS without the issue of calculating the VTTS

on the basis of separate randomly distributed coe�cients for travel time and

travel cost. Some modi�cations of the utility functions are required before

estimation. Speci�cally, let Ti and Ci de�ne the time and cost attributes of

alternative i, and let us rearrange the data such that T1 > T2 and C1 < C2,

i.e., the �rst alternative is slower but cheaper than the second alternative.

Then, a very basic speci�cation of utility is given by:

Ui = βT Ti + βC Ci, (5)

where βT and βC represent time and cost coe�cients respectively, and

where i = 1, 2.

The �rst alternative is then chosen if the respondent is not willing to

pay C2 − C1 to obtain a reduction in travel time by T1 − T2. This equates

to:

P1 = P (βT (T1 − T2) > βC (C2 − C1)) . (6)

With both βT and βC forced to take on negative values (Hess et al., 2005,

cf.), and with the above detailed relationships between the cost and time

attributes for the two alternatives, equation (6) can be rewritten as:

P1 = P

( −

∆C

∆T >

βT

βC

) , (7)

where ∆C = C1 − C2 and ∆T = T1 − T2. After noting that the VTTS is given

by βT βC , and after a further change to equation (7), we obtain the choice

probabilities in log-WTP space as follows:

P1 = P

[ ln

( −

∆C

∆T

) > αLV

] , (8)

where αLV = ln (VTTS).

By noting that the absence of an estimated coe�cient in the utility for

alternative 1 leads to a need to explicitly estimate the scale, the utility

functions for alternative 1 and 2 are given by:

U1 = λ ln

( −

∆C

∆T

) + ε1 (9)

and

U2 = λ αLV + ε2, (10)

where λ is estimated in addition to αLV, and where ε1 and ε2 give the usual

type I iid extreme value terms. With travel costs given in Danish Krona

(DKK) and travel times given in minutes, the actual VTTS in DKK per

hour is obtained by 60 · exp (αLV ). The speci�cation set out above can now be used in a standard discrete

choice framework, with either a �xed estimate for αLV, or with random

variation across respondents. At this point, it should also be noted that at-

tempts to estimate models with an additional constant associated with the

�rst of the two SP alternatives4, hence accounting for a left-right reading

e�ect, did not lead to any signi�cant di�erences in the VTTS estimates.

3.3 Model results

During the analysis, four di�erent types of model were estimated on the

data; a simple MNL model, a MMNL model using a Normal distribution,

and two DM speci�cations, one with two support points, DM(2), and one

with three support points, DM(3)5. In the MMNL and DM models, the

repeated choice nature of the data was taken into account by specifying the

likelihood function with the integration (respectively summation in the DM

models) outside the product over replications for the same respondent.

Each of these models was estimated across the six population segments

and the ten subsets of the data, leading to 240 estimated models. Given

this wealth of results, we presented detailed results only for shopping trips

(Section 3.3.1), and give summary results for the remaining �ve population

segments (Section 3.3.2).

3.3.1 Detailed results for shopping trips

The results for the various models estimated on the data for shopping trips

are summarised in Table 1. Several di�erences arise across models in the

4In preference space. 5Models with more than three support points collapsed back to the more basic speci-

�cations.

S u b s a m p le :

1 2

3 4

5 6

7 8

9 1 0

R e s p o n d e n t s :

1 8 5

1 8 6

1 8 0

1 9 0

1 8 2

1 7 9

1 8 8

1 7 9

1 8 0

1 7 5

O b s e r v a t io n s :

1 4 2 1

1 4 2 9

1 3 8 2

1 4 5 7

1 4 0 3

1 3 7 6

1 4 4 6

1 3 7 7

1 3 8 3

1 3 4 3

O v e r a ll

F in a l L L :

-8 8 0 .9 6

-8 7 7 .3 8

-8 6 1 .6 1

-9 0 9 .2 5

-8 6 9 .0 3

-8 6 2 .3 5

-8 9 1 .5 7

-8 6 3 .8 2

-8 4 0 .0 2

-8 3 7 .6 9

a d j.

ρ 2 :

0 .1 0 3 6

0 .1 1 2 2

0 .0 9 8 5

0 .0 9 7 7

0 .1 0 4 3

0 .0 9 3 8

0 .1 0 8 5

0 .0 9 2 9

0 .1 2 1 6

0 .0 9 8 0

0 .1 0 3 1

E s t im

a t io n t im

e ( s ) :

1 1

1 α

L V

e s t .

-1 .1 1 0 0

-1 .0 5 0 0

-1 .0 8 0 0

-1 .1 0 0 0

-1 .0 5 0 0

-1 .0 6 0 0

-1 .1 5 0 0

-1 .0 2 0 0

a s y . t -r a t io

-1 4 .2 0

-1 4 .8 0

-1 3 .6 0

-1 3 .8 0

-1 4 .0 0

-1 3 .2 0

-1 4 .6 0

-1 3 .2 0

-1 5 .0 0

-1 3 .3 0

λ e s t .

0 .8 3 8 0

0 .8 8 0 0

0 .8 3 9 0

0 .8 2 1 0

0 .8 4 9 0

0 .7 9 5 0

0 .8 8 9 0

0 .8 0 7 0

0 .8 9 6 0

0 .8 4 6 0

a s y . t -r a t io

1 1 .5 0

1 2 .0 0

1 1 .4 0

1 1 .5 0

1 1 .7 0

1 0 .9 0

1 2 .1 0

1 1 .1 0

1 2 .0 0

1 1 .4 0

MNL

V T T S ( D K K / h o u r ) :

1 9 .7 7

2 1 .0 0

2 0 .3 8

1 9 .9 7

2 1 .0 0

2 0 .7 9

1 9 .0 0

2 1 .6 4

2 0 .3 7 ( 0 .7 7 )

F in a l L L :

-8 4 9 .6 5

-8 5 1 .1 1

-8 2 8 .2 7

-8 7 2 .4 4

-8 4 2 .0 2

-8 3 1 .1 7

-8 6 2 .2 5

-8 2 6 .9 2

-8 1 8 .6 1

-8 0 5 .1 6

a d j.

ρ 2 :

0 .1 3 4 3

0 .1 3 7 7

0 .1 3 2 2

0 .1 3 3 2

0 .1 3 1 1

0 .1 2 5 4

0 .1 3 6 7

0 .1 3 0 5

0 .1 4 2 9

0 .1 3 1 9

0 .1 3 3 6

E s t im

a t io n t im

e ( s ) :

7 5

8 1

7 4

7 3

8 0

6 8

7 6

7 1

8 0

7 0

7 4 .8

α L

V , µ

e s t .

-1 .0 8 0 0

-1 .0 2 0 0

-1 .0 5 0 0

-1 .0 7 0 0

-1 .0 3 0 0

-1 .1 2 0 0

-0 .9 9 1 0

a s y . t -r a t io

-1 1 .3 0

-1 2 .0 0

-1 0 .7 0

-1 0 .9 0

-1 1 .3 0

-1 0 .5 0

-1 1 .6 0

-1 0 .3 0

-1 2 .4 0

-1 0 .4 0

α L

V , σ

e s t .

0 .8 9 5 0

0 .8 2 6 0

0 .9 0 7 0

0 .9 4 4 0

0 .8 5 9 0

0 .9 4 2 0

0 .8 3 6 0

0 .9 7 0 0

0 .7 8 0 0

0 .9 0 0 0

a s y . t -r a t io

8 .7 5

8 .5 3

8 .8 0

9 .0 4

8 .4 3

8 .5 3

8 .7 2

8 .8 5

8 .0 7

8 .7 5

λ e s t .

1 .0 3 0 0

1 .0 5 0 0

1 .0 4 0 0

1 .0 2 0 0

0 .9 8 1 0

1 .0 7 0 0

1 .0 1 0 0

1 .0 5 0 0

1 .0 6 0 0

a s y . t -r a t io

1 2 .1 0

1 2 .4 0

1 2 .0 0

1 2 .1 0

1 1 .5 0

1 2 .6 0

1 1 .8 0

1 2 .3 0

1 2 .0 0

M e a n V T T S ( D K K / h o u r ) :

3 0 .4 1

2 8 .6 6

3 2 .6 4

3 2 .7 8

3 0 .3 6

3 2 .0 7

3 0 .3 8

3 4 .2 9

2 6 .5 4

3 3 .3 9

3 1 .1 5 ( 2 .3 5 )

MMNL

V T T S s t a n d a r d d e v ia t io n

3 3 .7 0

2 8 .3 5

3 6 .8 8

3 9 .3 1

3 1 .7 2

3 8 .3 4

3 0 .5 5

4 2 .8 6

2 4 .2 9

3 7 .3 0

3 4 .3 3 ( 5 .6 5 )

F in a l L L :

-8 4 5 .4 0

-8 4 7 .1 5

-8 2 6 .7 4

-8 6 8 .6 4

-8 3 8 .9 0

-8 2 8 .0 1

-8 5 9 .4 0

-8 2 0 .4 8

-8 1 7 .1 5

-8 0 3 .6 0

a d j.

ρ 2 :

0 .1 3 6 6

0 .1 3 9 7

0 .1 3 1 7

0 .1 3 4 9

0 .1 3 2 2

0 .1 2 6 6

0 .1 3 7 6

0 .1 3 5 1

0 .1 4 2 4

0 .1 3 1 4

0 .1 3 4 8

E s t im

a t io n t im

e ( s ) :

1 1

1 α

L V

, 1

e s t .

0 .5 4 1 0

0 .4 5 9 0

0 .3 7 8 0

0 .4 1 2 0

0 .5 7 5 0

0 .4 2 4 0

0 .4 9 2 0

0 .7 6 0 0

0 .0 6 8 5

0 .1 8 4 0

a s y . t -r a t io

1 .8 9

1 .6 0

1 .2 2

1 .6 0

1 .7 2

1 .4 2

1 .6 0

2 .7 9

0 .2 2

0 .7 4

π 1

e s t .

0 .2 1 3 0

0 .2 0 2 0

0 .2 6 2 0

0 .2 6 7 0

0 .1 9 4 0

0 .2 5 7 0

0 .2 0 7 0

0 .2 1 1 0

0 .2 6 8 0

0 .3 3 3 0

a s y . t -r a t io

( i )

3 .7 3

3 .4 0

3 .1 8

4 .0 3

3 .1 3

3 .5 1

3 .2 4

4 .4 2

2 .5 6

3 .5 6

α L

V , 2

e s t .

-1 .4 7 0 0

-1 .4 3 0 0

-1 .4 9 0 0

-1 .5 4 0 0

-1 .4 0 0 0

-1 .5 4 0 0

-1 .3 9 0 0

-1 .4 5 0 0

-1 .5 4 0 0

-1 .5 5 0 0

a s y . t -r a t io

-1 3 .4 0

-1 0 .7 0

-1 2 .2 0

-1 2 .6 0

-1 1 .3 0

-1 2 .8 0

-1 4 .1 0

-9 .8 2

-9 .6 0

π 2

e s t .

0 .7 8 7 0

0 .7 9 8 0

0 .7 3 8 0

0 .7 3 3 0

0 .8 0 6 0

0 .7 4 3 0

0 .7 9 3 0

0 .7 8 9 0

0 .7 3 2 0

0 .6 6 7 0

a s y . t -r a t io

( i )

1 3 .8 0

1 3 .4 0

8 .9 4

1 1 .1 0

1 3 .0 0

1 0 .2 0

1 2 .4 0

1 6 .6 0

6 .9 9

7 .1 1

λ e s t .

1 .0 1 0 0

1 .0 4 0 0

1 .0 2 0 0

1 .0 1 0 0

0 .9 6 8 0

1 .0 5 0 0

1 .0 1 0 0

1 .0 4 0 0

a s y . t -r a t io

1 2 .3 0

1 2 .6 0

1 2 .1 0

1 2 .3 0

1 1 .6 0

1 2 .7 0

1 2 .0 0

1 2 .4 0

1 2 .1 0

V T T S ( D K K / h o u r ) :

3 2 .8 1

3 0 .6 4

3 2 .9 2

3 3 .6 2

3 2 .6 1

3 3 .1 2

3 2 .1 6

3 8 .1 8

2 6 .6 4

3 2 .5 1

3 2 .5 2 ( 2 .8 3 )

V T T S s t a n d a r d d e v ia t io n

3 6 .5 5

3 2 .3 6

3 2 .5 6

3 4 .3 9

3 6 .3 1

3 4 .4 4

3 3 .7 1

4 6 .6 1

2 2 .7 6

2 7 .9 9

3 3 .7 7 ( 6 .1 3 )

Discretemixture(2pts.)

F in a l L L :

-8 4 4 .6 0

-8 4 6 .5 5

-8 2 4 .7 9

-8 6 7 .6 4

-8 3 7 .7 6

-8 2 7 .0 6

-8 5 8 .0 6

-8 1 9 .8 2

-8 1 6 .1 7

-8 0 2 .0 6

a d j.

ρ 2 :

0 .1 3 5 4

0 .1 3 8 3

0 .1 3 1 7

0 .1 3 3 9

0 .1 3 1 3

0 .1 2 5 5

0 .1 3 6 9

0 .1 3 3 7

0 .1 4 1 3

0 .1 3 0 9

0 .1 3 3 9

E s t im

a t io n t im

e ( s ) :

3 3

4 4

3 2

3 .1

α L

V , 1

e s t .

0 .7 7 7 0

0 .7 0 6 0

0 .8 1 6 0

0 .7 1 7 0

0 .9 0 7 0

0 .7 6 3 0

0 .8 1 6 0

0 .9 1 7 0

0 .6 4 2 0

0 .7 4 1 0

a s y . t -r a t io

2 .2 6

1 .8 8

2 .3 1

2 .1 1

2 .3 4

1 .9 8

2 .3 0

2 .9 8

1 .2 6

1 .8 5

π 1

e s t .

0 .1 5 6 0

0 .1 4 3 0

0 .1 4 6 0

0 .1 7 5 0

0 .1 2 9 0

0 .1 6 1 0

0 .1 3 4 0

0 .1 7 5 0

0 .1 0 5 0

0 .1 4 7 0

a s y . t -r a t io

( i )

2 .6 9

2 .2 5

2 .5 3

2 .4 8

2 .5 6

2 .2 1

2 .5 6

3 .5 1

1 .3 8

1 .9 2

α L

V , 2

e s t .

-1 .7 8 0 0

-0 .8 9 4 0

-0 .7 8 9 0

-0 .7 9 2 0

-0 .8 6 9 0

-0 .7 8 9 0

-0 .8 3 8 0

-1 .7 8 0 0

-0 .6 6 5 0

a s y . t -r a t io

-5 .4 0

-1 .8 1

-2 .7 1

-1 .8 7

-2 .5 0

-1 .7 6

-2 .6 3

-4 .8 7

-6 .6 3

-2 .0 0

π 2

e s t .

0 .4 5 5 0

0 .3 8 1 0

0 .4 4 9 0

0 .3 4 6 0

0 .4 5 6 0

0 .3 6 0 0

0 .4 5 0 0

0 .4 3 3 0

0 .4 8 6 0

0 .4 1 1 0

a s y . t -r a t io

( i )

1 .5 3

1 .0 9

2 .8 2

1 .7 5

1 .7 6

1 .7 2

2 .0 4

1 .3 0

2 .2 9

2 .7 9

α L

V , 3

e s t .

-0 .9 1 0 0

-1 .7 0 0 0

-1 .8 8 0 0

-1 .8 1 0 0

-1 .7 9 0 0

-1 .8 3 0 0

-1 .7 7 0 0

-0 .9 6 7 0

-0 .7 4 5 0

-1 .8 3 0 0

a s y . t -r a t io

-2 .0 9

-4 .9 2

-6 .8 6

-6 .7 6

-5 .1 7

-6 .3 4

-5 .8 8

-2 .2 4

-1 .9 3

-7 .4 0

π 3

e s t .

0 .3 8 9 0

0 .4 7 6 0

0 .4 0 5 0

0 .4 8 0 0

0 .4 1 5 0

0 .4 7 9 0

0 .4 1 6 0

0 .3 9 2 0

0 .4 0 8 0

0 .4 4 2 0

a s y . t -r a t io

( i )

1 .3 5

1 .3 1

2 .4 3

2 .3 4

1 .5 5

2 .2 0

1 .8 1

1 .2 0

2 .0 4

2 .8 8

λ e s t .

1 .0 3 0 0

1 .0 6 0 0

1 .0 5 0 0

1 .0 3 0 0

0 .9 8 9 0

1 .0 8 0 0

1 .0 3 0 0

1 .0 6 0 0

1 .0 7 0 0

a s y . t -r a t io

1 2 .1 0

1 2 .5 0

1 2 .0 0

1 2 .2 0

1 1 .5 0

1 2 .6 0

1 1 .9 0

1 2 .3 0

1 2 .0 0

Discretemixture(3pts.)

V T T S ( D K K / h o u r ) :

3 4 .2 9

3 1 .9 4

3 5 .7 8

3 5 .6 8

3 4 .8 6

3 5 .1 4

3 4 .1 1

3 9 .6 0

2 8 .4 1

3 5 .4 2

3 4 .5 2 ( 2 .8 7 )

V T T S s t a n d a r d d e v ia t io n

4 1 .8 6

3 7 .1 4

4 2 .1 5

4 0 .9 2

4 4 .3 4

4 1 .7 5

4 0 .6 2

5 1 .2 2

3 0 .5 5

3 8 .8 1

4 0 .9 3 ( 5 .2 4 )

T a b le 1 : E st im

a ti o n re su lt s o n D a n is h sh o p p in g d a ta

presentation of the results. As such, for the MNL model, only αLV and λ

are estimated. For the MMNL model, αLV follows a Normal distribution,

with mean αLV,µ and standard deviation αLV,σ. For the two DM models,

the value of αLV is spread across several support points αLV,k with associ-

ated probabilities 0 ≤ πk ≤ 1, such that ∑K

k=1 πk = 1, with K = 2 and

K = 3 in DM(2) and DM(3) respectively. In addition, the table shows the

calculated VTTS. For the MNL model, the mean VTTS is simply obtained

through 60 · exp (αLV ). However, for the three mixture models, the non- linearity in the exponential means that a di�erent approach is required.

With αLV ∼ N (µα, σα) in the MMNL model, the actual VTTS follows

a log-normal distribution with mean µVTTS = exp ( µα +

σ2α 2

) and standard

deviation σVTTS = µ √ exp (σ2α) − 1. Both µVTTS and σVTTS can then be multi-

plied by 60 to obtain hourly values. For the DM models, a slightly di�erent

approach was used. As such, with K support points αLV,k and associated

probabilities πk, a sequence of draws was generated that contained πk · N points with a value equal to exp (αLV,k), with k = 1, . . . , K. The sample

mean and standard deviation from this sequence were then used as esti-

mates of the mean and standard deviation for the actual VTTS. For the

results presented here, the value of N was set to 100, 000, beyond which

no visible di�erences were observed for σVTTS. Finally, along with the re-

sults for individual subsamples, the table also shows some overall measures,

namely the average of the adjusted ρ2 measure, the average estimation time,

and the average for µVTTS and σVTTS (together with a standard deviation of

this mean across subsamples).

The �rst observation that can be made from Table 1 is that all three

mixture models o�er signi�cant improvements in model �t over the base

MNL model, across all ten subsamples. Given the structural di�erences

between the continuous and discrete mixture models, the comparison be-

tween these models is carried out using the adjusted ρ2 measure rather

than the log-likelihood function. Here, we can see that, overall, DM(2)

o�ers the best performance, ahead of DM(3) and the MMNL model. While

the model with three support points always obtains slightly better model

�t than the model with two support points, the gains are not large enough

to be signi�cant when taking into account the additional cost in terms of

the number of parameters. In other words, the model with three support

points is not able to retrieve signi�cant amounts of additional heterogene-

ity when compared to the model with two support points. This can partly

be seen as a re ection of the success of the model with two support points,

but is also an illustration of the di�culties of estimating models with more

than two support points, as alluded to in Section 2.

With three exceptions (samples 3, 9 and 10), the DM(2) model obtains

the best performance across the three structures. Overall, the di�erences

in performance between the DM(2) model and the MMNL model are very

small, such that we now focus on other factors. Here, the �rst observation

relates to the much lower estimation cost for the DM(2) model, with an

average estimation time of one second, compared to seventy-�ve with the

MMNL model. This much lower estimation cost would give the DM models

a signi�cant advantage in the case of larger datasets, where the absolute

estimation times would be more substantial. Furthermore, the estimation

time for the MMNL model was in this case kept low through the use of

only 250 Halton draws in the estimation.

In terms of substantive results, the mean VTTS measures obtained by

the three mixture models are signi�cantly higher than the point estimate

obtained with the MNL model. This is at least partly a result of the

asymmetrical distribution of the VTTS in the mixture models. While there

are also some di�erences between the three mixture models in the estimates

for µVTTS, these are much smaller than the di�erence when compared to the

MNL estimates. Finally, the estimate for σVTTS is much higher in the DM(3)

model, while the estimate for the DM(2) model and the MMNL model are

very similar.

3.3.2 Other results

Table 2 summarises the results for the various models estimated on the

remaining �ve purpose segments. With very little variation across the ten

subsamples, only the overall results are shown here. These in turn are very

similar to those obtained on the data for shopping trips. As such, all three

mixture models outperfom the MNL model, where the best performance is

Commuters Education Leisure Other Visit

adj. ρ2: 0.1017 0.1282 0.1102 0.0888 0.1007

estimation time (s): 1 1 1 1 1

M N L

Mean VTTS (DKK/hour): 29.08 29.32 26.40 22.73 23.82

adj. ρ2: 0.1263 0.1599 0.1395 0.1127 0.1294

estimation time (s): 131 51 74 107 127

Mean VTTS (DKK/hour): 39.51 37.28 37.62 34.76 35.83

M M N L

Std.dev. VTTS 35.90 29.24 38.43 39.85 39.61

adj. ρ2: 0.1291 0.1609 0.1433 0.1156 0.1337

estimation time (s): 2 1 1 2 2

Mean VTTS (DKK/hour): 39.78 36.96 37.03 34.43 37.36

D M (2 )

Std.dev. VTTS 30.50 24.01 28.03 29.17 36.28

adj. ρ2: 0.1279 0.1576 0.1412 0.1142 0.1326

estimation time (s): 4 1 2 3 4

Mean VTTS (DKK/hour): 39.78 37.04 37.03 34.43 37.18

D M (3 )

Std.dev. VTTS 30.50 24.36 28.03 29.17 36.16

Table 2: Summary of results for commuters, education trips, leisure trips,

other purposes and visits

consistently obtained by the DM(2) model. Again, the DM(3) model is not

able to retrieve signi�cant levels of additional taste heterogeneity to warrant

the estimation of two additional parameters. In fact, the estimates for µVTTS and σVTTS are almost universally equivalent across the two models

6. As in

the case of shopping trips, the advantages of the DM models in terms of

estimation time are again very signi�cant, across all �ve purpose segments.

Finally, while there are almost no di�erences in the estimates for µVTTS between the three di�erent mixture models (where the estimates are again

signi�cantly higher than those for the MNL models), the estimates for σVTTS are now lower in the DM models, something that was not the case in the

shopping segment.

6It is worth noting that, with the exception of the education segment, the adjusted ρ2

measure is higher for the DM(3) model than for the MMNL model.

4 Simulated data case studies

The application presented in Section 3 has shown the potential advantages

of using a discrete mixture approach. However, it is clearly impossible to

generalise these results, which could well be speci�c to the data at hand.

For this, a systematic comparison between discrete and continuous mixture

models is required; this is the topic of this section, which presents the

�ndings of four case studies making use of simulated data.

In each of the four case studies, the generation of the data is based on the

Danish VOT data used in the case study described in Section 3. Speci�cally,

we use 10, 776 observations from 1, 347 respondents, and generate choices

based on the attributes used in the original survey data. For each of the four

di�erent true models, ten sets of choices are generated for each observation,

allowing us to gauge the stability of results across di�erent samples. Unlike

in the case study described in Section 3, we now work in preference space,

with separate coe�cients for travel time and travel cost. In each case,

the travel cost coe�cient is kept �xed while some random distribution is

used for the travel time coe�cient. Finally, the data generation was in each

case carried out under the assumption of constant tastes across replications

for the same individual, and the same approach was later used in model

estimation.

In the �rst two case studies, the true model is a discrete mixture, while

in the �nal two case studies, the true model is a continuous mixture. This

allows us to gauge the relative di�culties of the two types of model in

dealing with data for which the other model type is more appropriate.

Before proceeding to the discussion of the results, it should be noted

that all MMNL models presented here make use of a Normal distribu-

tion. Attempts to use alternative continuous distribution functions, such

as Johnson's SB, did not lead to consistent results on the data used here.

While the �ndings from this analysis are thus limited to a comparison be-

tween a discrete mixture and a normal mixture, it should be remembered

that the vast majority of MMNL studies make use speci�cally of this Nor-

mal distribution, such that the results are still relevant.

4.1 Case study 1: discrete mixture with two support points

The �rst case study makes use of data generated with the help of a discrete

mixture model with two mass points for βT , at −1 and 0.5, with probabil-

ities of 0.25 and 0.75 respectively. The travel cost coe�cient is �xed at a

value of −1, such that we obtain a mean VTTS of 37.5 DKK per hour with

a standard deviation of 13.33 DKK per hour.

The estimation results obtained on this dataset are presented in two

parts. Table 3 presents detailed results for the �rst of the ten subsamples,

while Table 4 summarises the results obtained across all ten subsamples.

In addition to a basic MNL model, we estimated a MMNL model using a

Normal distribution and a discrete mixture model with two support points

on this dataset7. In both cases, we allowed for random variations in βC as

well as βT . Consistent with the true model, no variations were observed

for βC in the discrete mixture model, labelled DM(2)A, such that a second

model, DM(2)B, was estimated, in which βC was kept �xed.

In a comparison between the three remaining models, MNL, MMNL

and DM(2)B, we observe that the discrete mixture model outperforms the

continuous mixture model, which in turn outperforms the MNL model. In

terms of estimation time, DM(2)B has clear advantages over the MMNL

model, and the higher estimation cost when compared to MNL is well

justi�ed on the basis of the improvements in model performance. All three

models o�er very good performance in retrieving the mean VTTS, while the

two mixture models additionally o�er good performance in the estimation

of the standard deviation.

A �nal point deserves some special attention. As mentioned above, we

initially allowed for random variation in βC as well as βT . The estimation

of the �rst discrete mixture model, DM(2)A, o�ered no evidence of such

heterogeneity, such that the model was replaced by DM(2)B. However,

for the continuous mixture model, MMNL, we retrieved signi�cant hetero-

geneity for βC as well as for βT , despite the fact that βC was kept �xed in

7No further gains in model performance were obtained by allowing for more than two

support points.

MNL MMNL DM(2)A DM(2)B Final LL -4565.42 -4122.22 -4007.05 -4007.05

par. 2 4 8 5

adj. ρ2 0.3885 0.4476 0.4625 0.4629

est.time (s) 2 234 17 6

est. asy.t-rat. est. asy.t-rat. est. asy.t-rat. est. asy.t-rat.

βT -0.4081 -36.16 - - - - - -

βT,µ - - -0.6409 -36.28 - - - -

βT,σ - - 0.1553 10.31 - - - -

βT,1 - - - - -0.5050 -40.99 -0.5050 -40.99

πT,1 - - - - 0.7258 50.22 0.7258 50.22

βT,2 - - - - -1.0231 -40.81 -1.0231 -40.81

πT,2 - - - - 0.2742 18.97 0.2742 18.97

βC -0.6424 -34.09 - - - - -1.0083 -42.20

βC,µ - - -1.0613 -36.64 - - - -

βC,σ - - 0.2071 9.66 - - - -

βC,1 - - - - -1.0083 -12.20 - -

πC,1 - - - - 0.3035 0.00 - -

βC,2 - - - - -1.0083 -24.03 - -

πC,2 - - - - 0.6965 0.00 - -

µVTTS 38.11 37.81 38.50 38.50

σVTTS - 12.75 13.75 13.75

Table 3: Detailed estimation results for �rst subsample for �rst simulated

dataset

the generation of the data. This o�ers clear evidence of confounding; by

being unable to retrieve the correct patterns of heterogeneity for βT , the

MMNL model explains part of the remaining error in the model through

heterogeneity in βC. As such, while the model is able to correctly retrieve

the mean and standard deviation of the VTTS, it does so by incorrectly

indicating a variation across respondents in the sensitivity to changes in

travel cost.

The �ndings from Table 3 are con�rmed by a graphical analysis of the

shape for the distribution of βT in Figure 1, where this comparison is made

possible by the fact that the mean estimate for βC is essentially equal to

−1 in all models.

Due to space considerations, no detailed results are presented for the

remaining nine subsamples. The results are available on request. Never-

Figure 1: Cumulative distribution function for βT �rst subsample for �rst

simulated dataset

theless, the results presented in Table 4 give an indication of the stability

of the results across the ten samples. As such, there is very little variation

in terms of model performance (adj. ρ2), where the advantages of the DM

model clearly remain, with the same applying in the case of estimation

time. Finally, while for the mean VTTS, the results are very stable across

datasets and models, the estimation of the MMNL models led to very high

standard deviations for the VTTS measures in some of the subsamples,

which is re ected in a higher mean value for σVTTS, along with greater vari-

ation across samples. This is a direct result of the incorrect patterns of

MNL MMNL DM(2)A DM(2)B mean 0.3919 0.4475 0.4601 0.4605

adj.ρ2 std.dev. 0.0053 0.0057 0.0057 0.0057

mean 1.8 258.6 19.5 5.4 Est.time (s)

std.dev. 0.42 16.96 4.72 0.52

mean 37.94 37.88 38.22 38.22 µVTTS

std.dev. 0.25 0.41 0.31 0.31

mean - 21.89 13.25 13.25 σVTTS

std.dev. - 16.43 0.38 0.38

Table 4: Summary of results across subsamples for �rst simulated dataset

heterogeneity retrieved for βC in these models, leading to a wider range for

the VTTS.

4.2 Case study 2: discrete mixture with three support points

In the second case study, the true model is again a discrete mixture of

a MNL model, where this time, three support points are used for βT , at

−1, −0.7 and −0.4, with probabilities of 0.3, 0.35 and 0.35. This leads

to a true mean VTTS of 41.1 DKK per hour, with a standard deviation

of 14.48 DKK per hour. Four di�erent models were estimated on these

data; along with the usual MNL and MMNL models, we estimated a DM

with two support points, and a DM with three support points8. Again,

the DM models were estimated with two di�erent speci�cations, using a

randomly distributed βC coe�cient in DM(2)A and DM(3)A, and a �xed

βC coe�cient in DM(2)B and DM(3)B. The detailed results for the �rst

sample are presented in Table 5, while the overall results are summarised

in Table 6.

The results show major improvements for the MMNL and various DM

models when compared to the MNL model. All six models perform very

well in terms of retrieving the mean VTTS, while the �ve mixture models

also obtain a good approximation to the true standard deviation of the

8No further gains could be made by using more than three support points.

M N L

M M N L

D M (2 ) A

D M (2 ) B

D M (3 ) A

D M (3 ) B

F in a l L L

-4 7 2 1 .6 9

-4 1 5 5 .6 5

-4 1 2 6 .2 3

-4 2 2 7 .4 3

-4 1 2 0 .9 6

-4 1 2 0 .9 9

p a r.

2 4

8 5

1 2

a d j.

ρ 2

0 .3 6 7 6

0 .4 4 3 1

0 .4 4 6 5

0 .4 3 3 4

0 .4 4 6 7

0 .4 4 7 3

es t. ti m e (s )

1 3 4 6

1 6

6 1 5 1

1 3

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

β T

-0 .3 9 2 5

-3 3 .7 2

- -

β T ,µ

- -

-0 .6 8 1 7

-3 4 .7 8

- -

β T ,σ

- -

0 .2 4 2 3

2 5 .1 5

- -

β T ,1

- -

-0 .8 5 6 1

-3 6 .5 6

-0 .4 0 0 5

-3 6 .6 2

-0 .3 9 3 0

-2 9 .7 2

-0 .7 0 1 5

-3 4 .1 7

π T ,1

- -

0 .6 2 1 0

2 7 .3 8

0 .5 0 2 8

2 7 .7 2

0 .3 1 8 5

1 4 .8 8

0 .4 0 6 9

1 4 .0 2

β T ,2

- -

-0 .4 2 2 1

-2 8 .9 9

-0 .8 0 8 4

-3 9 .3 3

-0 .7 0 3 1

-3 2 .7 6

-0 .3 9 2 7

-2 9 .8 4

π T ,2

- -

0 .3 7 9 0

1 6 .7 1

0 .4 9 7 2

2 7 .4 1

0 .4 0 9 3

1 3 .5 0

0 .3 1 8 7

1 4 .8 9

β T ,3

- -

-1 .0 2 6 2

-3 2 .1 3

-1 .0 2 3 4

-3 4 .2 6

π T ,3

- -

0 .2 7 2 3

1 0 .9 4

0 .2 7 4 4

1 1 .6 4

β C

-0 .5 7 3 2

-3 3 .3 9

- -

-0 .8 7 8 3

-4 0 .9 8

- -

-1 .0 0 8 4

-3 9 .3 1

β C

,µ -

- -0 .9 9 6 5

-3 7 .4 8

- -

β C

,σ -

- 0 .0 5 9 1

4 .5 1

- -

β C

,1 -

- -

- -1 .2 0 2 3

-3 5 .7 9

- -

-1 .0 0 1 5

-2 3 .6 6

- -

π C

,1 -

- -

- 0 .5 3 5 7

1 3 .2 8

- -

0 .8 1 1 4

0 .6 9

- -

β C

,2 -

- -

- -0 .8 4 6 9

-3 3 .5 7

- -

-1 .0 4 5 4

-6 .6 7

- -

π C

,2 -

- -

- 0 .4 6 4 3

1 1 .5 1

- -

0 .1 8 8 6

0 .1 6

- -

β C

,3 -

- -

- -1 .2 5 8 3

0 .0 0

- -

π C

,3 -

- -

- 0 .0 0 0 0

0 .0 0

- -

µ V

T T S

4 1 .0 8

4 1 .1 8

4 1 .2 2

4 1 .2 5

4 1 .1 8

4 1 .0 9

σ V

T T S

- 1 4 .8 6

1 4 .6 8

1 3 .9 4

1 4 .4 1

1 4 .4 4

T a b le 5 : D et a il ed

es ti m a ti o n re su lt s fo r � rs t su b sa m p le fo r se co n d si m u la te d d a ta se t

MNL MMNL DM(2)A DM(2)B DM(3)A DM(3)B mean 0.3690 0.4429 0.4457 0.4351 0.4460 0.4467

adj.ρ2 std.dev. 0.0051 0.0035 0.0039 0.0038 0.0037 0.0037

mean 1.9 338.6 17.5 6.3 133.5 13.6 Est.time (s)

std.dev. 1.1 8.4 2.32 1.7 56.4 2.4

mean 40.93 40.74 40.87 40.75 40.79 40.78 µVTTS

std.dev. 0.16 0.25 0.24 0.29 0.22 0.23

mean - 14.62 14.43 13.77 14.35 14.30 σVTTS

std.dev. - 0.25 0.25 0.27 0.21 0.24

Table 6: Summary of results across subsamples for second simulated dataset

VTTS. We now look in more detail at the di�erences between the various

mixture models. As was the case in the case study discussed in Section

4.1, the MMNL model again falsely recovers some random variation for βC,

where the level of variation is however much lower than was the case in the

�rst case study. When only allowing for two support points, the DM models

also retrieve signi�cant variation for βC, as re ected in the drop in model �t

observed from DM(2)A to DM(2)B when constraining βC to a �xed value.

This is no longer the case when using three support points. Finally, as was

the case in Section 4.1, the DM models again have a signi�cant advantage

over the MMNL model in terms of estimation cost.

Figure 2 shows the cumulative distribution functions for βT in the

MMNL model, as well as in DM(2)A and DM(2)B. The advantages of

the DM models are again very obvious, especially in the case of the model

with three support points.

The results from Table 6 show very stable performance across the ten

samples, for all four indicators. The fact that, unlike in the �rst case study

(cf., Table 4), the estimate for σVTTS in the MMNL model is now very stable

can be explained by the lower coe�cient of variation for βC in the MMNL

estimates in the second case study.

Figure 2: Cumulative distribution function for βT �rst subsample for sec-

ond simulated dataset

4.3 Case study 3: Normal mixture

For the third case study, a MMNL model with a normally distributed travel

time coe�cient was chosen as the true model. Speci�cally, βC is still �xed

to a value of −1, while βT now follows a Normal distribution with mean

of −0.8 and a standard deviation of 0.3, leading to a mean VTTS of 48

DKK/hour, with a standard deviation of 18 DKK.

The results for the �rst subsample of the third simulated dataset are

summarised in Table 7. A slightly di�erent strategy was employed in the

M N L

M M N L

A M M N L

B D M (5 ) A

D M (5 ) B

D M (6 ) A

D M (6 ) B

F in a l L L

-4 7 4 2 .0 6

-3 9 1 2 .5 7

-3 9 1 3 .9

-3 9 1 0 .2

-3 9 1 3 .5 4

-3 9 0 8 .4 3

-3 9 0 8 .6 1

p a r.

2 4

3 1 4

1 1

1 6

1 3

a d j.

ρ 2

0 .3 6 4 9

0 .4 7 5 6

0 .4 7 4 6

0 .4 7 5 0

es t. ti m e (s )

1 3 4 1

2 3 3

1 4 3

4 1

1 4 1

5 9

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

β T

-0 .4 0 0 8

-3 2 .1 1

- -

β T ,µ

- -

-0 .8 3 5 9

-3 6 .6 7

-0 .8 3 2 9

-3 6 .6 9

- -

β T ,σ

- -

0 .3 1 3 4

2 5 .2 7

0 .3 1 1 3

2 5 .0 3

- -

β T ,1

- -

-0 .1 3 4 3

-1 .9 9

-0 .1 8 6 7

-4 .8 2

-0 .0 8 5 9

-1 .5 5

-0 .1 2 9 3

-1 .8 8

π T ,1

- -

0 .0 3 7 2

2 .3 4

0 .0 5 6 6

3 .8 2

0 .0 2 4 5

2 .5 5

0 .0 3 6 2

2 .3 2

β T ,2

- -

-0 .4 5 8 5

-1 3 .8 7

-0 .5 0 7 1

-1 7 .3 7

-0 .3 6 2 1

-8 .7 9

-0 .4 4 4 9

-1 4 .2 6

π T ,2

- -

0 .1 8 3 8

6 .5 5

0 .2 3 2 6

9 .1 4

0 .0 7 4 2

1 .9 0

0 .1 7 1 0

6 .6 6

β T ,3

- -

-0 .7 0 2 1

-2 2 .4 9

-0 .7 9 0 4

-2 8 .1 3

-0 .7 1 0 2

-2 0 .6 5

-0 .6 8 0 0

-2 4 .5 6

π T ,3

- -

0 .2 2 3 1

3 .9 6

0 .3 5 2 9

9 .6 3

0 .2 0 2 6

3 .2 8

0 .2 2 4 7

4 .7 1

β T ,4

- -

-1 .1 8 7 2

-2 9 .6 6

-1 .1 2 0 8

-2 6 .7 5

-0 .9 0 0 6

-1 9 .6 1

-0 .8 8 3 0

-2 0 .6 3

π T ,4

- -

0 .2 9 0 5

7 .7 0

0 .3 2 9 6

9 .6 7

0 .2 6 5 3

5 .1 1

0 .2 5 9 7

5 .7 9

β T ,5

- -

-0 .8 9 6 4

-1 9 .7 9

-1 .6 2 5 3

-2 0 .7 8

-0 .5 1 7 7

-1 0 .1 9

-1 .1 5 0 2

-2 9 .9 3

π T ,5

- -

0 .2 6 5 4

5 .2 5

0 .0 2 8 3

2 .8 5

0 .1 4 4 1

3 .0 1

0 .2 8 1 8

7 .4 7

β T ,6

- -

-1 .1 9 0 2

-2 9 .5 8

-1 .6 4 2 3

-2 1 .2 4

π T ,6

- -

0 .2 8 9 3

7 .6 1

0 .0 2 6 7

3 .1 0

β C

-0 .4 9 9 9

-3 0 .1 3

- -

-1 .0 2 6 7

-3 8 .5 3

- -

-1 .0 1 3 5

-3 7 .6 7

- -

-1 .0 2 1 3

-3 7 .6 6

β C

,µ -

- -1 .0 2 5 4

-3 8 .5 8

- -

β C

,σ -

- 0 .0 0 8 0

0 .4 7

- -

β C

,1 -

- -

- -0 .7 4 6 7

-1 8 .5 7

- -

-0 .7 4 8 5

-1 8 .5 0

- -

π C

,1 -

- -

- 0 .0 8 6 2

2 .6 7

- -

0 .0 8 6 1

2 .6 8

- -

β C

,2 -

- -

- -1 .0 5 4 5

-3 4 .4 6

- -

-1 .0 5 6 9

-3 4 .4 1

- -

π C

,2 -

- -

- 0 .9 1 3 8

2 8 .3 1

- -

0 .9 1 3 9

2 8 .4 2

- -

µ V

T T S

4 8 .1 0

4 8 .9 3

4 8 .6 8

4 8 .9 6

4 8 .7 2

4 8 .7 7

4 8 .8 1

σ V

T T S

- 1 8 .3 4

1 8 .2 0

1 8 .0 8

1 8 .1 5

T a b le 7 : D et a il ed

es ti m a ti o n re su lt s fo r � rs t su b sa m p le fo r th ir d si m u la te d d a ta se t

model estimation in this case study. From the experience of the �rst two

case studies, it had to be assumed that some of the distribution of βT would

erroneously be picked up as heterogeneity in βC. This would apply espe-

cially in the discrete mixture models with a low number of support points.

As such, alongside the MNL model, two di�erent MMNL models were es-

timated, one with βC kept �xed, and one with a randomly distributed βC.

In the discrete mixture models, 2 support points were used for βC, while

the number of support points for βT was gradually increased up to the

point where no heterogeneity was retrieved for βC, i.e. the random taste

heterogeneity in the data is captured correctly by βT on its own. It was

found that this point was reached between �ve and six support points for

βT . No further gains in model performance could be obtained by increas-

ing the number of support points for βT any further, independently of the

treatment of βC.

Again, all the di�erent models o�er good performance in retrieving the

true mean value of the VTTS, while the various mixture models addition-

ally o�er a good approximation to the true standard deviation. The six

mixture models o�er signi�cant improvements in model performance when

compared to the MNL model. As in the other examples, the DM mod-

els again have computational advantages over the MNL model. Given the

results from the other case studies, it is of interest to look at the issue

of confounding between the heterogeneity for βT and βC. In the MMNL

model and the DM model with six support points, the reductions in model

�t resulting from using a �xed βC coe�cient are not signi�cant. With only

�ve support points, the drop in model �t is slightly more visible (DM(5)A vs DM(5)B), yet still not signi�cant when taking into account the cost of

estimating three additional parameters. However, in earlier models, using

fewer than �ve support points for βT , this was not the case, and there was

signi�cant confounding9.

Finally, it is of interest to look at the speci�c patterns of heterogene-

ity retrieved by the discrete mixture models, where we focus on MMNLB,

DM(5)A and DM(6)B. Here, it can be seen from Figure 3 that the two DM

models o�er a very good approximation to the Normal distribution.

9Detailed results available on request.

Figure 3: Cumulative distribution function for βT �rst subsample for third

simulated dataset

The average results across the ten subsamples are summarised in Table

8. The results show that, on average (and unlike in the �rst subsample),

not allowing for heterogeneity in βC leads to a minor drop in the adjusted

ρ2 measure for the MMNL model and the DM(5) model. This is however

again not the case for DM(6), showing that six support points are su�cient

to retrieve the true heterogeneity in the data. In terms of estimation time,

the DM mixtures retain their advantage, even with a higher number of

support points. Finally, the results for the mean and standard deviation of

the VTTS are very stable across subsamples.

MNL MMNLA MMNLB DM(5)A DM(5)B DM(6)A DM(6)B mean 0.3701 0.4724 0.4722 0.4715 0.4710 0.4714 0.4718

adj.ρ2 std.dev. 0.0057 0.0057 0.0057 0.0057 0.0055 0.0057 0.0056

mean 1 338.1 242.9 124.8 45.3 172 63.7 Est.time (s)

std.dev. 0.0 7.9 25.5 31.27 8.3 30.4 14.4

mean 47.71 48.71 48.58 48.70 48.61 48.64 48.60 VTTS µ

std.dev. 0.35 0.40 0.41 0.48 0.49 0.52 0.49

mean - 17.63 17.63 17.68 17.46 17.60 17.60 VTTS σ

std.dev. - 0.48 0.48 0.45 0.49 0.43 0.45

Table 8: Summary of results across subsamples for third simulated dataset

4.4 Case study 4: Mixture of two Normals

For the fourth case study, a more complex mixture was used. As such,

the true distribution is now a mixture of two Normal distributions, where

βT = π1 βT1 + π2 βT2, with π1 = π2 = 0.5, and with βT1 ∼ N(−0.8, 0.2)

and βT2 ∼ N(−0.3, 0.1). The cost coe�cient βC was again kept �xed at

−1. With this, we obtain a true mean VTTS of 33 DKK/hour, with a

standard deviation of 17.76 DKK. In model estimation, the strategy from

the third case study was again adopted, gradually increasing the number

of support points for βT in the DM models, while maintaining the number

of support points for βC �xed at 2. Again, the issue of confounding largely

disappeared when using �ve or more support points.

The results for the �rst subsample are presented in Table 9, with Table

10 presenting a summary of the results across all ten subsamples. Along

with the MNL model, two MMNL models were estimated, where MMNLA and MMNLB again di�er by using a randomly distributed and �xed βC coe�cient respectively. Although the standard deviation for βC is signi�-

cantly di�erent from zero in model MMNLA, it is very small compared to

the mean value, such that it is no surprise that the e�ect of using a �xed

coe�cient is very small, with very similar model performance for MMNLB.

In the DM models, we experience a very small, and insigni�cant drop in

model �t when constraining βC to a single value. Here, two further obser-

vations can be made. In model DM(5)A, the di�erence between βC,1 and

βC,2 is not signi�cant beyond the 48% level of con�dence, while, in model

M N L

M M N L

A M M N L

B D M (5 ) A

D M (5 ) B

D M (6 ) A

D M (6 ) B

F in a l L L

-5 2 9 6 .8 4

-4 4 0 5 .5 4

-4 4 0 6 .1 1

-4 3 5 9 .0 7

-4 3 6 3 .2 3

-4 3 5 9 .0 3

-4 3 6 3 .2 3

p a r.

2 4

3 1 4

1 1

1 6

1 3

a d j.

ρ 2

0 .2 9 0 6

0 .4 0 9 6

0 .4 0 9 7

0 .4 1 4 5

0 .4 1 4 4

0 .4 1 4 3

0 .4 1 4 1

es t. ti m e (s )

7 3 4 1

2 1 3

1 9 7

3 3

1 7 4

7 5

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

es t.

a sy .t -r a t

β T

-0 .2 1 5 3

-2 7 .8 0

- -

β T ,µ

- -

-0 .5 1 1 5

-3 0 .2 3

-0 .5 1 5 5

-3 0 .6 2

- -

β T ,σ

- -

0 .2 9 3 0

2 9 .2 3

0 .2 9 5 4

2 9 .8 2

- -

β T ,1

- -

-0 .0 8 4 4

-1 .3 3

-0 .0 7 8 7

-1 .2 7

-0 .0 8 5 5

-1 .3 6

-0 .0 7 8 7

-1 .2 7

π T ,1

- -

0 .0 3 7 7

1 .4 2

0 .0 3 6 3

1 .4 7

0 .0 3 8 5

1 .4 1

0 .0 3 6 3

1 .4 7

β T ,2

- -

-0 .2 8 3 1

-7 .9 0

-0 .2 7 1 3

-1 6 .1 9

-0 .2 8 3 5

-7 .0 1

-0 .2 7 1 3

-1 6 .1 9

π T ,2

- -

0 .3 4 6 1

0 .9 8

0 .3 7 6 1

6 .3 2

0 .3 3 9 5

0 .7 7

0 .3 7 6 1

6 .3 2

β T ,3

- -

-0 .9 8 3 3

-2 0 .7 6

-0 .3 7 8 8

-1 1 .9 4

-0 .3 2 1 6

-3 .9 5

-0 .3 7 8 8

-1 1 .9 4

π T ,3

- -

0 .1 9 0 0

4 .7 1

0 .0 9 2 1

1 .4 6

0 .1 1 0 2

0 .2 5

0 .0 9 2 1

1 .4 6

β T ,4

- -

-0 .3 2 5 3

-4 .5 8

-0 .6 5 4 3

-2 8 .1 5

-0 .7 2 4 7

-9 .8 2

-0 .4 6 7 6

0 .0 0

π T ,4

- -

0 .1 0 4 7

0 .2 9

0 .2 8 4 8

1 0 .1 6

0 .0 7 4 9

0 .3 3

0 .0 0 0 0

0 .0 0

β T ,5

- -

-0 .6 6 9 0

-2 0 .6 1

-0 .9 5 4 6

-3 0 .0 0

-0 .6 5 5 5

-1 1 .7 2

-0 .6 5 4 3

-2 8 .1 5

π T ,5

- -

0 .3 2 1 5

8 .3 6

0 .2 1 0 7

8 .9 4

0 .2 5 0 8

1 .1 3

0 .2 8 4 8

1 0 .1 6

β T ,6

- -

-0 .9 8 4 2

-2 1 .7 9

-0 .9 5 4 6

-3 0 .0 0

π T ,6

- -

0 .1 8 6 0

4 .7 7

0 .2 1 0 7

8 .9 4

β C

-0 .4 1 9 4

-3 1 .8 9

- -

-0 .9 7 2 1

-3 7 .8 9

- -

-0 .9 7 3 7

-3 7 .4 0

- -

-0 .9 7 3 7

-3 7 .4 0

β C

,µ -

- -0 .9 7 1 8

-3 8 .1 7

- -

β C

,σ -

- 0 .0 3 5 2

1 .9 7

- -

β C

,1 -

- -

- -1 .0 7 8 1

-2 2 .6 6

- -

-0 .8 7 2 5

-1 5 .6 8

- -

π C

,1 -

- -

- 0 .5 9 6 5

4 .0 3

- -

0 .3 5 7 2

1 .7 0

- -

β C

,2 -

- -

- -0 .8 8 0 1

-2 0 .8 3

- -

-1 .0 6 6 2

-1 8 .5 8

- -

π C

,2 -

- -

- 0 .4 0 3 5

2 .7 3

- -

0 .6 4 2 8

3 .0 6

- -

µ V

T T S

3 0 .8 0

3 1 .6 0

3 1 .8 2

3 2 .6 0

3 2 .4 9

3 2 .6 4

3 2 .4 9

σ V

T T S

- 1 8 .1 5

1 8 .2 3

1 7 .3 9

1 7 .1 0

1 7 .3 8

1 7 .1 0

T a b le 9 : D et a il ed

es ti m a ti o n re su lt s fo r � rs t su b sa m p le fo r fo u rt h si m u la te d d a ta se t

DM(6)A, it is not signi�cant beyond the 50% level of di�erence. It can

also be seen that, on average, when moving from DM(5)A to DM(5)B and

from DM(6)A to DM(6)B, the standard errors associated with the various

πT,k parameters decrease. Finally, model DM(6)B can be seen to reduce

to model DM(5)B; the additional support point, as well as its associated

probability, are not signi�cantly di�erent from zero. All seven models again

o�er good performance in the retrieval of the true mean VTTS, where the

six mixture models also perform well for the standard deviation. The DM

models maintain their advantages in terms of estimation cost, where these

are naturally smaller than before given the higher number of parameters.

In terms of model performance, the MMNL models clearly outperform the

MNL model, while the various DM models have a small advantage over the

MMNL models. The results from Table 10 again show stable performance

over the ten subsamples.

When looking at the retrieval of the true shape for the distribution of βT ,

it can be seen that the MMNL models using a single Normal distribution

produce a mean that is the weighted average of the mean of the two Normal

distributions. The DM models on the other hand do recover the multi-

modality of the true distribution10. These �ndings are re ected in the shape

of the distributions for βT in Figure 4, where the DM models (DM(5)A and DM(6)B) are better able to account for the multi-modality of the true

distribution.

In closing, it should be noted that, in this example, the uni-modal

MMNL model still manages to retrieve the true mean and standard de-

viation of the multi-modal true distribution of the VTTS. This can be

explained by the fact that the probabilities for the two Normal distribu-

tions were set evenly to 0.5, where the di�erence in the standard deviation

for βT1 and βT2 was also rather small. Di�erent patterns could be expected

in a more asymmetrical scenario.

10It should be noted that, in the retained DM model, DM(5)B, two of the probabilities

for support points, πT,1 and πT,3, are only signi�cant at the 85% level of con�dence.

Figure 4: Cumulative distribution function for βT �rst subsample for fourth

simulated dataset

5 Summary and Conclusions

With the availability of powerful computers and estimation tools, researchers

and practitioners are increasingly making use of continuous mixture struc-

tures, such as Mixed Logit, in the representation of random taste hetero-

geneity across respondents. Despite the gains in estimation power, the cost

of using such mixture models remains high, especially in large scale stud-

ies. Furthermore, several issues arise due to the models' reliance on speci�c

distribution functions, whose shape is not necessarily consistent with that

MNL MMNLA MMNLB DM(5)A DM(5)B DM(6)A DM(6)B mean 0.2896 0.4082 0.4082 0.4118 0.4115 0.4116 0.4118

adj.ρ2 std.dev. 0.0036 0.0041 0.0041 0.0045 0.0040 0.0046 0.0045

mean 8.8 363.7 235.4 142.1 40.8 168 59.7 Est.time (s)

std.dev. 2.4 14.8 10.9 37.28 10.0 18.0 11.8

mean 31.32 31.90 32.27 32.87 32.74 32.83 32.76 VTTS µ

std.dev. 0.38 0.48 0.44 0.51 0.46 0.46 0.47

mean - 18.48 18.67 17.71 17.58 17.73 17.66 VTTS σ

std.dev. - 0.33 0.31 0.31 0.35 0.29 0.34

Table 10: Summary of results across subsamples for fourth simulated

dataset

of the true, unobserved distribution.

In this paper, we have discussed an alternative approach for the repre-

sentation of random taste heterogeneity, making use of discrete mixtures

instead of continuous mixtures. Although several issues can also arise in the

estimation of such models, they have the advantage of a closed form solu-

tion, and can hence be estimated and applied without relying on simulation

processes. Furthermore, the models are free from a priori assumptions as

to the shape of the true distribution.

The paper presents several case studies o�ering an in-depth comparison

of the two modelling approaches, making use of real data as well as four

separate simulated datasets. The results of these analyses clearly show the

major advantage of the discrete mixture approach in terms of estimation

cost. They also show that, across scenarios, the discrete mixture models are

able to attain similar or indeed better performance than their continuous

counterparts. Finally, they are better able to deal with complicated true

distributions, such as the presence of multiple modes.

Although further comparisons between the two modelling approaches

are required, the results from this paper do suggest that discrete mixture

models present a viable alternative, partly thanks to their lower cost in

estimation and application, but also due to the absence of a priori shape

assumptions, which is of great interest in the context of recent discussions

of the issue of the speci�cation of continuous heterogeneity by Hess et al.

(2005).

Acknowledgements

Part of the work described in this paper was carried out during a guest

stay by the �rst author in the Institute of Transport and Logistics Studies

at the University of Sydney.

References

Bierlaire, M. (2003). BIOGEME: a free package for the estimation of

discrete choice models, Proceedings of the 3rd Swiss Transport Re-

search Conference, Monte Verit�a, Ascona.

Burge, P. and Rohr, C. (2004). DATIV: SP Design: Proposed approach

for pilot survey, Tetra-Plan in cooperation with RAND Europe and

Gallup A/S.

Chintagunta, P., Jain, D. and Vilcassim, N. (1991). Investigating hetero-

geneity in brand preference in logit models for panel data, Journal of

Marketing Research 28: 417{428.

Cirillo, C. and Axhausen, K. W. (2006). Evidence on the distribution of

values of travel time savings from a six-week diary, Transportation

Research Part A: Policy and Practice 40(5): 444{457.

Dong, X. and Koppelman, F. S. (2003). Mass Point Mixed Logit Model:

Development and Application, paper presented at the 10th Interna-

tional Conference on Travel Behaviour Research, Lucerne.

Fosgerau, M. (2004). Nonparametric and semiparametric estimation of

the distribution of the value of travel time, paper presented at the

European Transport Conference, Strasbourg.

Fosgerau, M. and Bierlaire, M. (2006). Discrete choice models with multi-

plicative error terms, Technical Report TRANSP-OR 060831, Trans-

port and Mobility Laboratory, School of Architecture, Civil and En-

vironmental Engineering, Ecole Polytechnique F�ed�erale de Lausanne.

Gopinath, D. (1995). Modeling Heterogeneity in Discrete Choice Pro-

cesses: Application to Travel Demand, PhD thesis, MIT, Cam-

bridge, MA.

Greene, W. H. and Hensher, D. A. (2003). A latent class model for discrete

choice analysis: contrasts with mixed logit, Transportation Research

Part B: Methodological 37(8): 681{698.

Hess, S., Bierlaire, M. and Polak, J. W. (2005). Estimation of value of

travel-time savings using mixed logit models, Transportation Re-

search Part A: Policy and Practice 39(2-3): 221{236.

Hess, S., Train, K. and Polak, J. W. (2006). On the use of a Modi�ed

Latin Hypercube Sampling (MLHS) method in the estimation of a

Mixed Logit model for vehicle choice, Transportation Research Part

B: Methodological 40(2): 147{163.

Kamakura, K. W. and Russell, G. (1989). A probabilistic choice model for

market segmentation and elasticity structure, Journal of Marketing

Research 26: 379{390.

Lee, B. J., Fujiwara, A., Zhang, J. and Sugie, Y. (2003). Analysis of Mode

Choice Behaviours based on Latent Class Models, paper presented

at the 10th International Conference on Travel Behaviour Research,

Lucerne.

Wedel, M., Kamakura, W., Arora, N., Bemmaor, A., Chiang, J., Elrod, T.,

Johnson, R., Lenk, P., Neslin, S. and Poulsen, C. S. (1999). Discrete

and continuous representations of unobserved heterogeneity in choice

modeling, Marketing Letters 10(3): 219{232.