LCLogit, Latent class logit
A systematic comparison of continuous
and discrete mixture models
Stephane Hess∗ Michel Bierlaire† John W. Polak‡
November, 17, 2006
Report TRANSP-OR 061117
Transport and Mobility Laboratory
School of Architecture, Civil and Environmental Engineering
Ecole Polytechnique F�ed�erale de Lausanne
transp-or.epfl.ch
∗Centre for Transport Studies, Imperial College London, stephane.hess@imperial.ac.uk,
Tel: +44(0)20 7594 6105, Fax: +44(0)20 7594 6102 †Transport and Mobility Laboratory, School of Civil and Environmental Engineering,
�Ecole Polytechnique F�ed�erale de Lausanne, michel.bierlaire@ep .ch, Tel: +41(0)21 693 25
37, Fax: +41(0)21 693 55 70 ‡Centre for Transport Studies, Imperial College London, j.polak@imperial.ac.uk, Tel:
+44(0)20 7594 6089, Fax: +44(0)20 7594 6102
1
Abstract
Modellers are increasingly relying on the use of continuous random
coe�cients models, such as Mixed Logit, for the representation of
variations in tastes across individuals. In this paper, we provide an
in-depth comparison of the performance of the Mixed Logit model
with that of its far less commonly used discrete mixture counterpart,
making use of a combination of real and simulated datasets. The
results not only show signi�cant computational advantages for the
discrete mixture approach, but also highlight greater exibility, and
show that, across a host of scenarios, the discrete mixture models are
able to o�er comparable or indeed superior model performance.
2
1 Introduction and context
Allowing for variations in behaviour across decision makers is one of the
most fundamental principles in discrete choice modelling, given that the
assumption of a purely homogeneous population cannot in general be seen
to be valid. The typical way of allowing for such variation is through
a deterministic approach, linking the taste heterogeneity to variations in
socio-demographic factors such as income or trip purpose.
While appealing from the point of view of interpretation (and especially
for forecasting), it is often not possible to represent all variations in tastes in
a deterministic fashion, for reasons of data quality, but also due to inherent
randomness in choice behaviour. For this reason, random coe�cient struc-
tures, such as the Mixed Multinomial Logit (MMNL) model, which allow
for random variations in behaviour across respondents, have an important
advantage in terms of exibility. In general, such models have the disad-
vantage that their choice probabilities take on the form of integrals that do
not possess a closed form solution, such that numerical processes, typically
simulation, are required during estimation and application of the models.
This greatly limited the use of these structures for many years after their
initial developments. Over recent years, gains in computer speed and the
e�ciency of simulation based estimation processes (see for example Hess
et al., 2006) have however led to increased interest in the MMNL model in
particular, by researchers and, to a lesser degree, also practitioners.
Despite the improvements in estimation capability, the cost of using the
MMNL model remains high. While this might be acceptable in many cases,
another important issue remains, namely the choice of distribution to be
used for representing the random variations in tastes across respondents.
Here, there is a major risk of producing misleading results when making
an inappropriate choice of distribution, as discussed by Hess et al. (2005).
In this paper, we explore an alternative approach, based on the idea of
replacing the continuous distribution functions by discrete distributions,
spreading the mass among several discrete values. Mathematically, the
model structure of a DM model is a special case of a latent class model
(Kamakura and Russell, 1989; Chintagunta et al., 1991, cf.), assigning dif-
1
ferent coe�cient values to di�erent parts of the population of respondents,
a concept discussed in the �eld of transport studies for example by Greene
and Hensher (2003) and Lee et al. (2003). Latent class approaches make
use of two sub-models, one for class allocation, and one for within class
choice. The former models the probability of an individual being assigned
to a speci�c class as a function of attributes of the respondent and possibly
of the alternatives in the choice set. The within class model is then used to
compute the class-speci�c choice probabilities for the di�erent alternatives,
conditional on the tastes within that class. The actual choice probability
for individual n and alternative i is given by a sum of the class-speci�c
choice probabilities, weighted by the class allocation choice probabilities
for that speci�c individual.
The latent class approach is appealing from the point of view that it
allows for di�erences in sensitivities across population groups, where the
group allocation can be related to socio-demographic characteristics. How-
ever, in practice, it may not always be possible to explain group allocation
with the help of a probabilistic model relating the outcome to observed
variables. This situation is similar to the case where taste heterogeneity
cannot be explained deterministically, leading to a requirement for using
random coe�cients models. As such, in this paper, we explore the use
of models in which the class allocation probabilities are independent of
explanatory variables, and are simply given by constants that are to be
estimated during model calibration. As such, the resulting model exploits
the class membership concept in the context of random coe�cients models,
with a limited set of possible values for the coe�cients.
Thus far, there have seemingly been only two applications of this ap-
proach in the area of transport research, by Gopinath (1995), in the context
of mode choice for freight shippers, and by Dong and Koppelman (2003),
who made use of discrete mixtures of MNL models in the analysis of mode
choice for work trips in New York, referring to the resulting model as the
\Mass Point Mixed Logit model". Although the properties of DM models
have been discussed by several other authors (Wedel et al., 1999, e.g.), the
model structure does not seem to have received widespread exposure or
application, despite its many appealing characteristics.
2
Given the above discussion, part of the aim of this paper is to re-explore
the potential advantages of DM models, with the hope of encouraging their
more widespread use. Additionally, the paper aims to o�er a systematic
comparison of the performance of discrete and continuous mixture models
across a host of situations, making use of simulated data.
The remainder of this paper is organised as follows. The next section
sets out the theory behind DM models. Section 3 presents a case study
using real data, while Section 4 uses four di�erent simulated datasets in a
systematic comparison of discrete and continuous mixture models. Finally,
Section 5 presents the conclusions of the paper.
2 Methodology
We begin by introducing some general notation, which is used throughout
the remainder of this paper. Speci�cally, let xin be a vector de�ning the
attributes of alternative i as faced by respondent n (potentially including
interactions with socio demographic variables), and let β be a vector de�n-
ing the tastes of the decision maker, where, in purely deterministic models,
β is constant across respondents. Let xn be a vector grouping together the
individual vectors xjn across the alternatives contained in the choice set of
respondent n, and let γ represent an additional set of parameters, which
can for example contain the structural parameters (and possibly allocation
parameters) used to represent inter-alternative correlation in a Generalised
Extreme Value (GEV) context. In a very general form, we can then de�ne
Pn (i | xn, Cn, γ, β) to give the choice probability of alternative i for indi-
vidual n, with a choice set Cn, conditional on the observed vector xn, and
for given values for the vectors of parameters β and γ (to be estimated).
Due to the potential inclusion of socio-demographic attributes in xn, this
notation allows for deterministic variations in tastes across respondents.
In a discrete mixture context, the number of possible values for the taste
coe�cients β is �nite. Here, we divide the set of parameters β into two sets; �β represents a part of β containing deterministic parameters, while β̂ is a
set of K random parameters that have a discrete distribution. Within this
set, the parameter β̂k has mk mass points β̂ j k, j = 1, . . . , mk, each of them
3
associated with a probability π j k, where we impose the conditions that
1
0 ≤ πjk ≤ 1, k = 1, . . . , K; j = 1, . . . , mk, (1)
and mk∑ j=1
π j k = 1, k = 1, . . . , K. (2)
For each realisation β̂ j1 1 , . . . , β̂
jK K of β̂, the choice probability is given by
Pn
( i | xn, Cn, γ, β = 〈�β, β̂
j1 1 , . . . , β̂
jK K 〉
) , (3)
where the deterministic part of �β stays constant across realisations of the
vector β̂.
The unconditional (on a speci�c realisation of β, not on the distribution
of β̂) choice probability for alternative i and decision maker n can now be
written straightforwardly as a mixture over the discrete distributions of the
various elements contained in β̂ as:
Pn
( i | xn, Cn, γ, �β, β̂, π
) =
m1∑ j1=1
· · · mK∑
jK=1
Pn
( i | xn, Cn, γ, β = 〈�β, β̂
j1 1 , . . . , β̂
jK K 〉
) π
j1 1 · . . . · π
jK K , (4)
where �β, β̂ and π (π = 〈π11, . . . , π m1 1 , . . . , π
1 K, . . . , π
mK K 〉) are vectors of pa-
rameters to be estimated in a regular maximum likelihood estimation pro-
cedure. An obvious advantage of this approach is that, if the model (3)
used inside the mixture has a closed form, then so does the DM itself.
In this paper, we mainly focus on the simple case where the underly-
ing choice model is of MNL form; however, the form given in equation (4)
is appropriate for any underlying model, where, with an underlying GEV
structure, the resulting model obtains a closed form expression, avoiding
the need for simulation in estimation and application. In this case, the
1These constraints can be avoided by setting πi = eαi∑J
j=1 e
αj , where αj with j = 1, . . . , J
are estimated without constraints. While avoiding the need for constraints, this formula-
tion becomes highly non-linear and di�cult to handle in estimation.
4
vector γ would contain parameters that determine the nesting structure
of the model. The approach can easily be extended to the case of com-
bined discrete and continuous random taste variation, by partitioning β
into three parts; the above de�ned parts �β and β̂, and an additional part
β̃, whose elements follow continuous distributions2. This however leads to
a requirement to use simulation, as with all continuous mixture models.
Finally, independently of the additional treatment of random variations
in tastes, a treatment of repeated choice observations analogous to the stan-
dard continuous mixture treatment, with tastes varying across individuals,
but not across observations for the same individual, is made possible by
replacing the conditional choice probabilities for individual observations in
equation (4) by probabilities for sequences of choices, and by using the
resulting DM term inside the log-likelihood function.
Several issues arise in the estimation of DM models. Firstly, the non-
concavity of the log likelihood function does not allow the identi�cation of
a global maximum, even for discrete mixtures of MNL. Given the potential
presence of a high number of local maxima, performing several estimations
from various starting points is advisable. Also, it is good practice to use
starting values other than 0 or 1 for the π j k parameters. Secondly, con-
strained maximum likelihood must be used to account for constraints (1)
and (2). Thirdly, clustering of mass points (for example around the mode
of the true distribution) is a frequent phenomenon with DM models, and
the use of additional bounds on the mass points can be useful, based on
the de�nition of (potentially mutually exclusive) a priori intervals for the
individual mass points. In this context, a heuristic is needed to determine
the optimal number of support points in actual applications.
For the purpose of this analysis, the model was coded into BIOGEME
(Bierlaire, 2003), where various constraints on the parameters can be im-
posed to address the issues described above. This also allows modellers
to test the validity of speci�c assumptions, such as a mass at zero for the
VTTS, a concept discussed for example by Cirillo and Axhausen (2006).
2This approach can then also be used to include error components for correlation or
heteroscedasticity.
5
3 VTTS case study
In this section, we present the �ndings of an analysis making use of real
world data. We �rst give a brief description of the data in Section 3.1,
before looking at model speci�cation in Section 3.2. The estimation results
are presented in Section 3.3.
3.1 Data
The study presented here makes use of Stated Preference (SP) data col-
lected as part of a recent value of time study undertaken in Denmark (Burge
and Rohr, 2004). Speci�cally, we make use of data describing a binary
choice process for car travellers, with alternatives described only in terms
of travel cost and travel time. Each respondent was presented with 9 choice
situations, including one with a dominating alternative.
After eliminating the observations with a dominating alternative, as well
as additional data cleaning (removing non-traders and respondents who did
not choose the dominating alternative), a sample of 13,386 observations
from 1,723 respondents was obtained. This equates to 3,037 observations
from 392 commuters, 1,081 observations from 142 respondents travelling for
education purposes, 1,767 observations from 230 people on shopping trips,
3,155 observations from 404 people travelling to visit friends or relatives,
1,752 observations from 224 general leisure travellers and 2,594 observations
from 331 respondents travelling for other purposes.
To allow us to gauge the stability of the results, random subsamples of
around 80% of the original sample size were generated for each of the above
listed six purpose segments3, where in each case, 10 such subsamples were
created.
3.2 Model specification
The models used in this paper were estimated in log-WTP (willingness
to pay) space (Fosgerau, 2004, cf.), avoiding the e�ect of heterogenous
3The selection was performed at the individual-speci�c level, rather than the
observation-speci�c level.
6
scale (Fosgerau and Bierlaire, 2006, cf.), while allowing us to represent
random variations in the VTTS without the issue of calculating the VTTS
on the basis of separate randomly distributed coe�cients for travel time and
travel cost. Some modi�cations of the utility functions are required before
estimation. Speci�cally, let Ti and Ci de�ne the time and cost attributes of
alternative i, and let us rearrange the data such that T1 > T2 and C1 < C2,
i.e., the �rst alternative is slower but cheaper than the second alternative.
Then, a very basic speci�cation of utility is given by:
Ui = βT Ti + βC Ci, (5)
where βT and βC represent time and cost coe�cients respectively, and
where i = 1, 2.
The �rst alternative is then chosen if the respondent is not willing to
pay C2 − C1 to obtain a reduction in travel time by T1 − T2. This equates
to:
P1 = P (βT (T1 − T2) > βC (C2 − C1)) . (6)
With both βT and βC forced to take on negative values (Hess et al., 2005,
cf.), and with the above detailed relationships between the cost and time
attributes for the two alternatives, equation (6) can be rewritten as:
P1 = P
( −
∆C
∆T >
βT
βC
) , (7)
where ∆C = C1 − C2 and ∆T = T1 − T2. After noting that the VTTS is given
by βT βC , and after a further change to equation (7), we obtain the choice
probabilities in log-WTP space as follows:
P1 = P
[ ln
( −
∆C
∆T
) > αLV
] , (8)
where αLV = ln (VTTS).
By noting that the absence of an estimated coe�cient in the utility for
alternative 1 leads to a need to explicitly estimate the scale, the utility
functions for alternative 1 and 2 are given by:
U1 = λ ln
( −
∆C
∆T
) + ε1 (9)
7
and
U2 = λ αLV + ε2, (10)
where λ is estimated in addition to αLV, and where ε1 and ε2 give the usual
type I iid extreme value terms. With travel costs given in Danish Krona
(DKK) and travel times given in minutes, the actual VTTS in DKK per
hour is obtained by 60 · exp (αLV ). The speci�cation set out above can now be used in a standard discrete
choice framework, with either a �xed estimate for αLV, or with random
variation across respondents. At this point, it should also be noted that at-
tempts to estimate models with an additional constant associated with the
�rst of the two SP alternatives4, hence accounting for a left-right reading
e�ect, did not lead to any signi�cant di�erences in the VTTS estimates.
3.3 Model results
During the analysis, four di�erent types of model were estimated on the
data; a simple MNL model, a MMNL model using a Normal distribution,
and two DM speci�cations, one with two support points, DM(2), and one
with three support points, DM(3)5. In the MMNL and DM models, the
repeated choice nature of the data was taken into account by specifying the
likelihood function with the integration (respectively summation in the DM
models) outside the product over replications for the same respondent.
Each of these models was estimated across the six population segments
and the ten subsets of the data, leading to 240 estimated models. Given
this wealth of results, we presented detailed results only for shopping trips
(Section 3.3.1), and give summary results for the remaining �ve population
segments (Section 3.3.2).
3.3.1 Detailed results for shopping trips
The results for the various models estimated on the data for shopping trips
are summarised in Table 1. Several di�erences arise across models in the
4In preference space. 5Models with more than three support points collapsed back to the more basic speci-
�cations.
8
S u b s a m p le :
1 2
3 4
5 6
7 8
9 1 0
R e s p o n d e n t s :
1 8 5
1 8 6
1 8 0
1 9 0
1 8 2
1 7 9
1 8 8
1 7 9
1 8 0
1 7 5
O b s e r v a t io n s :
1 4 2 1
1 4 2 9
1 3 8 2
1 4 5 7
1 4 0 3
1 3 7 6
1 4 4 6
1 3 7 7
1 3 8 3
1 3 4 3
O v e r a ll
F in a l L L :
-8 8 0 .9 6
-8 7 7 .3 8
-8 6 1 .6 1
-9 0 9 .2 5
-8 6 9 .0 3
-8 6 2 .3 5
-8 9 1 .5 7
-8 6 3 .8 2
-8 4 0 .0 2
-8 3 7 .6 9
a d j.
ρ 2 :
0 .1 0 3 6
0 .1 1 2 2
0 .0 9 8 5
0 .0 9 7 7
0 .1 0 4 3
0 .0 9 3 8
0 .1 0 8 5
0 .0 9 2 9
0 .1 2 1 6
0 .0 9 8 0
0 .1 0 3 1
E s t im
a t io n t im
e ( s ) :
1 1
1 1
1 1
1 1
1 1
1 α
L V
e s t .
-1 .1 1 0 0
-1 .1 1 0 0
-1 .0 5 0 0
-1 .0 8 0 0
-1 .0 8 0 0
-1 .1 0 0 0
-1 .0 5 0 0
-1 .0 6 0 0
-1 .1 5 0 0
-1 .0 2 0 0
a s y . t -r a t io
-1 4 .2 0
-1 4 .8 0
-1 3 .6 0
-1 3 .8 0
-1 4 .0 0
-1 3 .2 0
-1 4 .6 0
-1 3 .2 0
-1 5 .0 0
-1 3 .3 0
λ e s t .
0 .8 3 8 0
0 .8 8 0 0
0 .8 3 9 0
0 .8 2 1 0
0 .8 4 9 0
0 .7 9 5 0
0 .8 8 9 0
0 .8 0 7 0
0 .8 9 6 0
0 .8 4 6 0
a s y . t -r a t io
1 1 .5 0
1 2 .0 0
1 1 .4 0
1 1 .5 0
1 1 .7 0
1 0 .9 0
1 2 .1 0
1 1 .1 0
1 2 .0 0
1 1 .4 0
MNL
V T T S ( D K K / h o u r ) :
1 9 .7 7
1 9 .7 7
2 1 .0 0
2 0 .3 8
2 0 .3 8
1 9 .9 7
2 1 .0 0
2 0 .7 9
1 9 .0 0
2 1 .6 4
2 0 .3 7 ( 0 .7 7 )
F in a l L L :
-8 4 9 .6 5
-8 5 1 .1 1
-8 2 8 .2 7
-8 7 2 .4 4
-8 4 2 .0 2
-8 3 1 .1 7
-8 6 2 .2 5
-8 2 6 .9 2
-8 1 8 .6 1
-8 0 5 .1 6
a d j.
ρ 2 :
0 .1 3 4 3
0 .1 3 7 7
0 .1 3 2 2
0 .1 3 3 2
0 .1 3 1 1
0 .1 2 5 4
0 .1 3 6 7
0 .1 3 0 5
0 .1 4 2 9
0 .1 3 1 9
0 .1 3 3 6
E s t im
a t io n t im
e ( s ) :
7 5
8 1
7 4
7 3
8 0
6 8
7 6
7 1
8 0
7 0
7 4 .8
α L
V , µ
e s t .
-1 .0 8 0 0
-1 .0 8 0 0
-1 .0 2 0 0
-1 .0 5 0 0
-1 .0 5 0 0
-1 .0 7 0 0
-1 .0 3 0 0
-1 .0 3 0 0
-1 .1 2 0 0
-0 .9 9 1 0
a s y . t -r a t io
-1 1 .3 0
-1 2 .0 0
-1 0 .7 0
-1 0 .9 0
-1 1 .3 0
-1 0 .5 0
-1 1 .6 0
-1 0 .3 0
-1 2 .4 0
-1 0 .4 0
α L
V , σ
e s t .
0 .8 9 5 0
0 .8 2 6 0
0 .9 0 7 0
0 .9 4 4 0
0 .8 5 9 0
0 .9 4 2 0
0 .8 3 6 0
0 .9 7 0 0
0 .7 8 0 0
0 .9 0 0 0
a s y . t -r a t io
8 .7 5
8 .5 3
8 .8 0
9 .0 4
8 .4 3
8 .5 3
8 .7 2
8 .8 5
8 .0 7
8 .7 5
λ e s t .
1 .0 3 0 0
1 .0 5 0 0
1 .0 4 0 0
1 .0 2 0 0
1 .0 2 0 0
0 .9 8 1 0
1 .0 7 0 0
1 .0 1 0 0
1 .0 5 0 0
1 .0 6 0 0
a s y . t -r a t io
1 2 .1 0
1 2 .4 0
1 2 .0 0
1 2 .1 0
1 2 .1 0
1 1 .5 0
1 2 .6 0
1 1 .8 0
1 2 .3 0
1 2 .0 0
M e a n V T T S ( D K K / h o u r ) :
3 0 .4 1
2 8 .6 6
3 2 .6 4
3 2 .7 8
3 0 .3 6
3 2 .0 7
3 0 .3 8
3 4 .2 9
2 6 .5 4
3 3 .3 9
3 1 .1 5 ( 2 .3 5 )
MMNL
V T T S s t a n d a r d d e v ia t io n
3 3 .7 0
2 8 .3 5
3 6 .8 8
3 9 .3 1
3 1 .7 2
3 8 .3 4
3 0 .5 5
4 2 .8 6
2 4 .2 9
3 7 .3 0
3 4 .3 3 ( 5 .6 5 )
F in a l L L :
-8 4 5 .4 0
-8 4 7 .1 5
-8 2 6 .7 4
-8 6 8 .6 4
-8 3 8 .9 0
-8 2 8 .0 1
-8 5 9 .4 0
-8 2 0 .4 8
-8 1 7 .1 5
-8 0 3 .6 0
a d j.
ρ 2 :
0 .1 3 6 6
0 .1 3 9 7
0 .1 3 1 7
0 .1 3 4 9
0 .1 3 2 2
0 .1 2 6 6
0 .1 3 7 6
0 .1 3 5 1
0 .1 4 2 4
0 .1 3 1 4
0 .1 3 4 8
E s t im
a t io n t im
e ( s ) :
1 1
1 1
1 1
1 1
1 1
1 α
L V
, 1
e s t .
0 .5 4 1 0
0 .4 5 9 0
0 .3 7 8 0
0 .4 1 2 0
0 .5 7 5 0
0 .4 2 4 0
0 .4 9 2 0
0 .7 6 0 0
0 .0 6 8 5
0 .1 8 4 0
a s y . t -r a t io
1 .8 9
1 .6 0
1 .2 2
1 .6 0
1 .7 2
1 .4 2
1 .6 0
2 .7 9
0 .2 2
0 .7 4
π 1
e s t .
0 .2 1 3 0
0 .2 0 2 0
0 .2 6 2 0
0 .2 6 7 0
0 .1 9 4 0
0 .2 5 7 0
0 .2 0 7 0
0 .2 1 1 0
0 .2 6 8 0
0 .3 3 3 0
a s y . t -r a t io
( i )
3 .7 3
3 .4 0
3 .1 8
4 .0 3
3 .1 3
3 .5 1
3 .2 4
4 .4 2
2 .5 6
3 .5 6
α L
V , 2
e s t .
-1 .4 7 0 0
-1 .4 3 0 0
-1 .4 9 0 0
-1 .5 4 0 0
-1 .4 0 0 0
-1 .5 4 0 0
-1 .3 9 0 0
-1 .4 5 0 0
-1 .5 4 0 0
-1 .5 5 0 0
a s y . t -r a t io
-1 3 .4 0
-1 3 .4 0
-1 0 .7 0
-1 2 .2 0
-1 2 .6 0
-1 1 .3 0
-1 2 .8 0
-1 4 .1 0
-9 .8 2
-9 .6 0
π 2
e s t .
0 .7 8 7 0
0 .7 9 8 0
0 .7 3 8 0
0 .7 3 3 0
0 .8 0 6 0
0 .7 4 3 0
0 .7 9 3 0
0 .7 8 9 0
0 .7 3 2 0
0 .6 6 7 0
a s y . t -r a t io
( i )
1 3 .8 0
1 3 .4 0
8 .9 4
1 1 .1 0
1 3 .0 0
1 0 .2 0
1 2 .4 0
1 6 .6 0
6 .9 9
7 .1 1
λ e s t .
1 .0 1 0 0
1 .0 4 0 0
1 .0 2 0 0
1 .0 1 0 0
1 .0 1 0 0
0 .9 6 8 0
1 .0 5 0 0
1 .0 1 0 0
1 .0 4 0 0
1 .0 4 0 0
a s y . t -r a t io
1 2 .3 0
1 2 .6 0
1 2 .1 0
1 2 .3 0
1 2 .3 0
1 1 .6 0
1 2 .7 0
1 2 .0 0
1 2 .4 0
1 2 .1 0
V T T S ( D K K / h o u r ) :
3 2 .8 1
3 0 .6 4
3 2 .9 2
3 3 .6 2
3 2 .6 1
3 3 .1 2
3 2 .1 6
3 8 .1 8
2 6 .6 4
3 2 .5 1
3 2 .5 2 ( 2 .8 3 )
V T T S s t a n d a r d d e v ia t io n
3 6 .5 5
3 2 .3 6
3 2 .5 6
3 4 .3 9
3 6 .3 1
3 4 .4 4
3 3 .7 1
4 6 .6 1
2 2 .7 6
2 7 .9 9
3 3 .7 7 ( 6 .1 3 )
Discretemixture(2pts.)
F in a l L L :
-8 4 4 .6 0
-8 4 6 .5 5
-8 2 4 .7 9
-8 6 7 .6 4
-8 3 7 .7 6
-8 2 7 .0 6
-8 5 8 .0 6
-8 1 9 .8 2
-8 1 6 .1 7
-8 0 2 .0 6
a d j.
ρ 2 :
0 .1 3 5 4
0 .1 3 8 3
0 .1 3 1 7
0 .1 3 3 9
0 .1 3 1 3
0 .1 2 5 5
0 .1 3 6 9
0 .1 3 3 7
0 .1 4 1 3
0 .1 3 0 9
0 .1 3 3 9
E s t im
a t io n t im
e ( s ) :
3 3
3 3
3 3
4 4
3 2
3 .1
α L
V , 1
e s t .
0 .7 7 7 0
0 .7 0 6 0
0 .8 1 6 0
0 .7 1 7 0
0 .9 0 7 0
0 .7 6 3 0
0 .8 1 6 0
0 .9 1 7 0
0 .6 4 2 0
0 .7 4 1 0
a s y . t -r a t io
2 .2 6
1 .8 8
2 .3 1
2 .1 1
2 .3 4
1 .9 8
2 .3 0
2 .9 8
1 .2 6
1 .8 5
π 1
e s t .
0 .1 5 6 0
0 .1 4 3 0
0 .1 4 6 0
0 .1 7 5 0
0 .1 2 9 0
0 .1 6 1 0
0 .1 3 4 0
0 .1 7 5 0
0 .1 0 5 0
0 .1 4 7 0
a s y . t -r a t io
( i )
2 .6 9
2 .2 5
2 .5 3
2 .4 8
2 .5 6
2 .2 1
2 .5 6
3 .5 1
1 .3 8
1 .9 2
α L
V , 2
e s t .
-1 .7 8 0 0
-0 .8 9 4 0
-0 .7 8 9 0
-0 .7 9 2 0
-0 .8 6 9 0
-0 .7 8 9 0
-0 .8 3 8 0
-1 .7 8 0 0
-1 .7 8 0 0
-0 .6 6 5 0
a s y . t -r a t io
-5 .4 0
-1 .8 1
-2 .7 1
-1 .8 7
-2 .5 0
-1 .7 6
-2 .6 3
-4 .8 7
-6 .6 3
-2 .0 0
π 2
e s t .
0 .4 5 5 0
0 .3 8 1 0
0 .4 4 9 0
0 .3 4 6 0
0 .4 5 6 0
0 .3 6 0 0
0 .4 5 0 0
0 .4 3 3 0
0 .4 8 6 0
0 .4 1 1 0
a s y . t -r a t io
( i )
1 .5 3
1 .0 9
2 .8 2
1 .7 5
1 .7 6
1 .7 2
2 .0 4
1 .3 0
2 .2 9
2 .7 9
α L
V , 3
e s t .
-0 .9 1 0 0
-1 .7 0 0 0
-1 .8 8 0 0
-1 .8 1 0 0
-1 .7 9 0 0
-1 .8 3 0 0
-1 .7 7 0 0
-0 .9 6 7 0
-0 .7 4 5 0
-1 .8 3 0 0
a s y . t -r a t io
-2 .0 9
-4 .9 2
-6 .8 6
-6 .7 6
-5 .1 7
-6 .3 4
-5 .8 8
-2 .2 4
-1 .9 3
-7 .4 0
π 3
e s t .
0 .3 8 9 0
0 .4 7 6 0
0 .4 0 5 0
0 .4 8 0 0
0 .4 1 5 0
0 .4 7 9 0
0 .4 1 6 0
0 .3 9 2 0
0 .4 0 8 0
0 .4 4 2 0
a s y . t -r a t io
( i )
1 .3 5
1 .3 1
2 .4 3
2 .3 4
1 .5 5
2 .2 0
1 .8 1
1 .2 0
2 .0 4
2 .8 8
λ e s t .
1 .0 3 0 0
1 .0 6 0 0
1 .0 5 0 0
1 .0 3 0 0
1 .0 3 0 0
0 .9 8 9 0
1 .0 8 0 0
1 .0 3 0 0
1 .0 6 0 0
1 .0 7 0 0
a s y . t -r a t io
1 2 .1 0
1 2 .5 0
1 2 .0 0
1 2 .2 0
1 2 .2 0
1 1 .5 0
1 2 .6 0
1 1 .9 0
1 2 .3 0
1 2 .0 0
Discretemixture(3pts.)
V T T S ( D K K / h o u r ) :
3 4 .2 9
3 1 .9 4
3 5 .7 8
3 5 .6 8
3 4 .8 6
3 5 .1 4
3 4 .1 1
3 9 .6 0
2 8 .4 1
3 5 .4 2
3 4 .5 2 ( 2 .8 7 )
V T T S s t a n d a r d d e v ia t io n
4 1 .8 6
3 7 .1 4
4 2 .1 5
4 0 .9 2
4 4 .3 4
4 1 .7 5
4 0 .6 2
5 1 .2 2
3 0 .5 5
3 8 .8 1
4 0 .9 3 ( 5 .2 4 )
T a b le 1 : E st im
a ti o n re su lt s o n D a n is h sh o p p in g d a ta
9
presentation of the results. As such, for the MNL model, only αLV and λ
are estimated. For the MMNL model, αLV follows a Normal distribution,
with mean αLV,µ and standard deviation αLV,σ. For the two DM models,
the value of αLV is spread across several support points αLV,k with associ-
ated probabilities 0 ≤ πk ≤ 1, such that ∑K
k=1 πk = 1, with K = 2 and
K = 3 in DM(2) and DM(3) respectively. In addition, the table shows the
calculated VTTS. For the MNL model, the mean VTTS is simply obtained
through 60 · exp (αLV ). However, for the three mixture models, the non- linearity in the exponential means that a di�erent approach is required.
With αLV ∼ N (µα, σα) in the MMNL model, the actual VTTS follows
a log-normal distribution with mean µVTTS = exp ( µα +
σ2α 2
) and standard
deviation σVTTS = µ √ exp (σ2α) − 1. Both µVTTS and σVTTS can then be multi-
plied by 60 to obtain hourly values. For the DM models, a slightly di�erent
approach was used. As such, with K support points αLV,k and associated
probabilities πk, a sequence of draws was generated that contained πk · N points with a value equal to exp (αLV,k), with k = 1, . . . , K. The sample
mean and standard deviation from this sequence were then used as esti-
mates of the mean and standard deviation for the actual VTTS. For the
results presented here, the value of N was set to 100, 000, beyond which
no visible di�erences were observed for σVTTS. Finally, along with the re-
sults for individual subsamples, the table also shows some overall measures,
namely the average of the adjusted ρ2 measure, the average estimation time,
and the average for µVTTS and σVTTS (together with a standard deviation of
this mean across subsamples).
The �rst observation that can be made from Table 1 is that all three
mixture models o�er signi�cant improvements in model �t over the base
MNL model, across all ten subsamples. Given the structural di�erences
between the continuous and discrete mixture models, the comparison be-
tween these models is carried out using the adjusted ρ2 measure rather
than the log-likelihood function. Here, we can see that, overall, DM(2)
o�ers the best performance, ahead of DM(3) and the MMNL model. While
the model with three support points always obtains slightly better model
�t than the model with two support points, the gains are not large enough
to be signi�cant when taking into account the additional cost in terms of
10
the number of parameters. In other words, the model with three support
points is not able to retrieve signi�cant amounts of additional heterogene-
ity when compared to the model with two support points. This can partly
be seen as a re ection of the success of the model with two support points,
but is also an illustration of the di�culties of estimating models with more
than two support points, as alluded to in Section 2.
With three exceptions (samples 3, 9 and 10), the DM(2) model obtains
the best performance across the three structures. Overall, the di�erences
in performance between the DM(2) model and the MMNL model are very
small, such that we now focus on other factors. Here, the �rst observation
relates to the much lower estimation cost for the DM(2) model, with an
average estimation time of one second, compared to seventy-�ve with the
MMNL model. This much lower estimation cost would give the DM models
a signi�cant advantage in the case of larger datasets, where the absolute
estimation times would be more substantial. Furthermore, the estimation
time for the MMNL model was in this case kept low through the use of
only 250 Halton draws in the estimation.
In terms of substantive results, the mean VTTS measures obtained by
the three mixture models are signi�cantly higher than the point estimate
obtained with the MNL model. This is at least partly a result of the
asymmetrical distribution of the VTTS in the mixture models. While there
are also some di�erences between the three mixture models in the estimates
for µVTTS, these are much smaller than the di�erence when compared to the
MNL estimates. Finally, the estimate for σVTTS is much higher in the DM(3)
model, while the estimate for the DM(2) model and the MMNL model are
very similar.
3.3.2 Other results
Table 2 summarises the results for the various models estimated on the
remaining �ve purpose segments. With very little variation across the ten
subsamples, only the overall results are shown here. These in turn are very
similar to those obtained on the data for shopping trips. As such, all three
mixture models outperfom the MNL model, where the best performance is
11
Commuters Education Leisure Other Visit
adj. ρ2: 0.1017 0.1282 0.1102 0.0888 0.1007
estimation time (s): 1 1 1 1 1
M N L
Mean VTTS (DKK/hour): 29.08 29.32 26.40 22.73 23.82
adj. ρ2: 0.1263 0.1599 0.1395 0.1127 0.1294
estimation time (s): 131 51 74 107 127
Mean VTTS (DKK/hour): 39.51 37.28 37.62 34.76 35.83
M M N L
Std.dev. VTTS 35.90 29.24 38.43 39.85 39.61
adj. ρ2: 0.1291 0.1609 0.1433 0.1156 0.1337
estimation time (s): 2 1 1 2 2
Mean VTTS (DKK/hour): 39.78 36.96 37.03 34.43 37.36
D M (2 )
Std.dev. VTTS 30.50 24.01 28.03 29.17 36.28
adj. ρ2: 0.1279 0.1576 0.1412 0.1142 0.1326
estimation time (s): 4 1 2 3 4
Mean VTTS (DKK/hour): 39.78 37.04 37.03 34.43 37.18
D M (3 )
Std.dev. VTTS 30.50 24.36 28.03 29.17 36.16
Table 2: Summary of results for commuters, education trips, leisure trips,
other purposes and visits
consistently obtained by the DM(2) model. Again, the DM(3) model is not
able to retrieve signi�cant levels of additional taste heterogeneity to warrant
the estimation of two additional parameters. In fact, the estimates for µVTTS and σVTTS are almost universally equivalent across the two models
6. As in
the case of shopping trips, the advantages of the DM models in terms of
estimation time are again very signi�cant, across all �ve purpose segments.
Finally, while there are almost no di�erences in the estimates for µVTTS between the three di�erent mixture models (where the estimates are again
signi�cantly higher than those for the MNL models), the estimates for σVTTS are now lower in the DM models, something that was not the case in the
shopping segment.
6It is worth noting that, with the exception of the education segment, the adjusted ρ2
measure is higher for the DM(3) model than for the MMNL model.
12
4 Simulated data case studies
The application presented in Section 3 has shown the potential advantages
of using a discrete mixture approach. However, it is clearly impossible to
generalise these results, which could well be speci�c to the data at hand.
For this, a systematic comparison between discrete and continuous mixture
models is required; this is the topic of this section, which presents the
�ndings of four case studies making use of simulated data.
In each of the four case studies, the generation of the data is based on the
Danish VOT data used in the case study described in Section 3. Speci�cally,
we use 10, 776 observations from 1, 347 respondents, and generate choices
based on the attributes used in the original survey data. For each of the four
di�erent true models, ten sets of choices are generated for each observation,
allowing us to gauge the stability of results across di�erent samples. Unlike
in the case study described in Section 3, we now work in preference space,
with separate coe�cients for travel time and travel cost. In each case,
the travel cost coe�cient is kept �xed while some random distribution is
used for the travel time coe�cient. Finally, the data generation was in each
case carried out under the assumption of constant tastes across replications
for the same individual, and the same approach was later used in model
estimation.
In the �rst two case studies, the true model is a discrete mixture, while
in the �nal two case studies, the true model is a continuous mixture. This
allows us to gauge the relative di�culties of the two types of model in
dealing with data for which the other model type is more appropriate.
Before proceeding to the discussion of the results, it should be noted
that all MMNL models presented here make use of a Normal distribu-
tion. Attempts to use alternative continuous distribution functions, such
as Johnson's SB, did not lead to consistent results on the data used here.
While the �ndings from this analysis are thus limited to a comparison be-
tween a discrete mixture and a normal mixture, it should be remembered
that the vast majority of MMNL studies make use speci�cally of this Nor-
mal distribution, such that the results are still relevant.
13
4.1 Case study 1: discrete mixture with two support points
The �rst case study makes use of data generated with the help of a discrete
mixture model with two mass points for βT , at −1 and 0.5, with probabil-
ities of 0.25 and 0.75 respectively. The travel cost coe�cient is �xed at a
value of −1, such that we obtain a mean VTTS of 37.5 DKK per hour with
a standard deviation of 13.33 DKK per hour.
The estimation results obtained on this dataset are presented in two
parts. Table 3 presents detailed results for the �rst of the ten subsamples,
while Table 4 summarises the results obtained across all ten subsamples.
In addition to a basic MNL model, we estimated a MMNL model using a
Normal distribution and a discrete mixture model with two support points
on this dataset7. In both cases, we allowed for random variations in βC as
well as βT . Consistent with the true model, no variations were observed
for βC in the discrete mixture model, labelled DM(2)A, such that a second
model, DM(2)B, was estimated, in which βC was kept �xed.
In a comparison between the three remaining models, MNL, MMNL
and DM(2)B, we observe that the discrete mixture model outperforms the
continuous mixture model, which in turn outperforms the MNL model. In
terms of estimation time, DM(2)B has clear advantages over the MMNL
model, and the higher estimation cost when compared to MNL is well
justi�ed on the basis of the improvements in model performance. All three
models o�er very good performance in retrieving the mean VTTS, while the
two mixture models additionally o�er good performance in the estimation
of the standard deviation.
A �nal point deserves some special attention. As mentioned above, we
initially allowed for random variation in βC as well as βT . The estimation
of the �rst discrete mixture model, DM(2)A, o�ered no evidence of such
heterogeneity, such that the model was replaced by DM(2)B. However,
for the continuous mixture model, MMNL, we retrieved signi�cant hetero-
geneity for βC as well as for βT , despite the fact that βC was kept �xed in
7No further gains in model performance were obtained by allowing for more than two
support points.
14
MNL MMNL DM(2)A DM(2)B Final LL -4565.42 -4122.22 -4007.05 -4007.05
par. 2 4 8 5
adj. ρ2 0.3885 0.4476 0.4625 0.4629
est.time (s) 2 234 17 6
est. asy.t-rat. est. asy.t-rat. est. asy.t-rat. est. asy.t-rat.
βT -0.4081 -36.16 - - - - - -
βT,µ - - -0.6409 -36.28 - - - -
βT,σ - - 0.1553 10.31 - - - -
βT,1 - - - - -0.5050 -40.99 -0.5050 -40.99
πT,1 - - - - 0.7258 50.22 0.7258 50.22
βT,2 - - - - -1.0231 -40.81 -1.0231 -40.81
πT,2 - - - - 0.2742 18.97 0.2742 18.97
βC -0.6424 -34.09 - - - - -1.0083 -42.20
βC,µ - - -1.0613 -36.64 - - - -
βC,σ - - 0.2071 9.66 - - - -
βC,1 - - - - -1.0083 -12.20 - -
πC,1 - - - - 0.3035 0.00 - -
βC,2 - - - - -1.0083 -24.03 - -
πC,2 - - - - 0.6965 0.00 - -
µVTTS 38.11 37.81 38.50 38.50
σVTTS - 12.75 13.75 13.75
Table 3: Detailed estimation results for �rst subsample for �rst simulated
dataset
the generation of the data. This o�ers clear evidence of confounding; by
being unable to retrieve the correct patterns of heterogeneity for βT , the
MMNL model explains part of the remaining error in the model through
heterogeneity in βC. As such, while the model is able to correctly retrieve
the mean and standard deviation of the VTTS, it does so by incorrectly
indicating a variation across respondents in the sensitivity to changes in
travel cost.
The �ndings from Table 3 are con�rmed by a graphical analysis of the
shape for the distribution of βT in Figure 1, where this comparison is made
possible by the fact that the mean estimate for βC is essentially equal to
−1 in all models.
Due to space considerations, no detailed results are presented for the
remaining nine subsamples. The results are available on request. Never-
15
Figure 1: Cumulative distribution function for βT �rst subsample for �rst
simulated dataset
theless, the results presented in Table 4 give an indication of the stability
of the results across the ten samples. As such, there is very little variation
in terms of model performance (adj. ρ2), where the advantages of the DM
model clearly remain, with the same applying in the case of estimation
time. Finally, while for the mean VTTS, the results are very stable across
datasets and models, the estimation of the MMNL models led to very high
standard deviations for the VTTS measures in some of the subsamples,
which is re ected in a higher mean value for σVTTS, along with greater vari-
ation across samples. This is a direct result of the incorrect patterns of
16
MNL MMNL DM(2)A DM(2)B mean 0.3919 0.4475 0.4601 0.4605
adj.ρ2 std.dev. 0.0053 0.0057 0.0057 0.0057
mean 1.8 258.6 19.5 5.4 Est.time (s)
std.dev. 0.42 16.96 4.72 0.52
mean 37.94 37.88 38.22 38.22 µVTTS
std.dev. 0.25 0.41 0.31 0.31
mean - 21.89 13.25 13.25 σVTTS
std.dev. - 16.43 0.38 0.38
Table 4: Summary of results across subsamples for �rst simulated dataset
heterogeneity retrieved for βC in these models, leading to a wider range for
the VTTS.
4.2 Case study 2: discrete mixture with three support points
In the second case study, the true model is again a discrete mixture of
a MNL model, where this time, three support points are used for βT , at
−1, −0.7 and −0.4, with probabilities of 0.3, 0.35 and 0.35. This leads
to a true mean VTTS of 41.1 DKK per hour, with a standard deviation
of 14.48 DKK per hour. Four di�erent models were estimated on these
data; along with the usual MNL and MMNL models, we estimated a DM
with two support points, and a DM with three support points8. Again,
the DM models were estimated with two di�erent speci�cations, using a
randomly distributed βC coe�cient in DM(2)A and DM(3)A, and a �xed
βC coe�cient in DM(2)B and DM(3)B. The detailed results for the �rst
sample are presented in Table 5, while the overall results are summarised
in Table 6.
The results show major improvements for the MMNL and various DM
models when compared to the MNL model. All six models perform very
well in terms of retrieving the mean VTTS, while the �ve mixture models
also obtain a good approximation to the true standard deviation of the
8No further gains could be made by using more than three support points.
17
M N L
M M N L
D M (2 ) A
D M (2 ) B
D M (3 ) A
D M (3 ) B
F in a l L L
-4 7 2 1 .6 9
-4 1 5 5 .6 5
-4 1 2 6 .2 3
-4 2 2 7 .4 3
-4 1 2 0 .9 6
-4 1 2 0 .9 9
p a r.
2 4
8 5
1 2
7
a d j.
ρ 2
0 .3 6 7 6
0 .4 4 3 1
0 .4 4 6 5
0 .4 3 3 4
0 .4 4 6 7
0 .4 4 7 3
es t. ti m e (s )
1 3 4 6
1 6
6 1 5 1
1 3
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
β T
-0 .3 9 2 5
-3 3 .7 2
- -
- -
- -
- -
- -
β T ,µ
- -
-0 .6 8 1 7
-3 4 .7 8
- -
- -
- -
- -
β T ,σ
- -
0 .2 4 2 3
2 5 .1 5
- -
- -
- -
- -
β T ,1
- -
- -
-0 .8 5 6 1
-3 6 .5 6
-0 .4 0 0 5
-3 6 .6 2
-0 .3 9 3 0
-2 9 .7 2
-0 .7 0 1 5
-3 4 .1 7
π T ,1
- -
- -
0 .6 2 1 0
2 7 .3 8
0 .5 0 2 8
2 7 .7 2
0 .3 1 8 5
1 4 .8 8
0 .4 0 6 9
1 4 .0 2
β T ,2
- -
- -
-0 .4 2 2 1
-2 8 .9 9
-0 .8 0 8 4
-3 9 .3 3
-0 .7 0 3 1
-3 2 .7 6
-0 .3 9 2 7
-2 9 .8 4
π T ,2
- -
- -
0 .3 7 9 0
1 6 .7 1
0 .4 9 7 2
2 7 .4 1
0 .4 0 9 3
1 3 .5 0
0 .3 1 8 7
1 4 .8 9
β T ,3
- -
- -
- -
- -
-1 .0 2 6 2
-3 2 .1 3
-1 .0 2 3 4
-3 4 .2 6
π T ,3
- -
- -
- -
- -
0 .2 7 2 3
1 0 .9 4
0 .2 7 4 4
1 1 .6 4
β C
-0 .5 7 3 2
-3 3 .3 9
- -
- -
-0 .8 7 8 3
-4 0 .9 8
- -
-1 .0 0 8 4
-3 9 .3 1
β C
,µ -
- -0 .9 9 6 5
-3 7 .4 8
- -
- -
- -
- -
β C
,σ -
- 0 .0 5 9 1
4 .5 1
- -
- -
- -
- -
β C
,1 -
- -
- -1 .2 0 2 3
-3 5 .7 9
- -
-1 .0 0 1 5
-2 3 .6 6
- -
π C
,1 -
- -
- 0 .5 3 5 7
1 3 .2 8
- -
0 .8 1 1 4
0 .6 9
- -
β C
,2 -
- -
- -0 .8 4 6 9
-3 3 .5 7
- -
-1 .0 4 5 4
-6 .6 7
- -
π C
,2 -
- -
- 0 .4 6 4 3
1 1 .5 1
- -
0 .1 8 8 6
0 .1 6
- -
β C
,3 -
- -
- -
- -
- -1 .2 5 8 3
0 .0 0
- -
π C
,3 -
- -
- -
- -
- 0 .0 0 0 0
0 .0 0
- -
µ V
T T S
4 1 .0 8
4 1 .1 8
4 1 .2 2
4 1 .2 5
4 1 .1 8
4 1 .0 9
σ V
T T S
- 1 4 .8 6
1 4 .6 8
1 3 .9 4
1 4 .4 1
1 4 .4 4
T a b le 5 : D et a il ed
es ti m a ti o n re su lt s fo r � rs t su b sa m p le fo r se co n d si m u la te d d a ta se t
18
MNL MMNL DM(2)A DM(2)B DM(3)A DM(3)B mean 0.3690 0.4429 0.4457 0.4351 0.4460 0.4467
adj.ρ2 std.dev. 0.0051 0.0035 0.0039 0.0038 0.0037 0.0037
mean 1.9 338.6 17.5 6.3 133.5 13.6 Est.time (s)
std.dev. 1.1 8.4 2.32 1.7 56.4 2.4
mean 40.93 40.74 40.87 40.75 40.79 40.78 µVTTS
std.dev. 0.16 0.25 0.24 0.29 0.22 0.23
mean - 14.62 14.43 13.77 14.35 14.30 σVTTS
std.dev. - 0.25 0.25 0.27 0.21 0.24
Table 6: Summary of results across subsamples for second simulated dataset
VTTS. We now look in more detail at the di�erences between the various
mixture models. As was the case in the case study discussed in Section
4.1, the MMNL model again falsely recovers some random variation for βC,
where the level of variation is however much lower than was the case in the
�rst case study. When only allowing for two support points, the DM models
also retrieve signi�cant variation for βC, as re ected in the drop in model �t
observed from DM(2)A to DM(2)B when constraining βC to a �xed value.
This is no longer the case when using three support points. Finally, as was
the case in Section 4.1, the DM models again have a signi�cant advantage
over the MMNL model in terms of estimation cost.
Figure 2 shows the cumulative distribution functions for βT in the
MMNL model, as well as in DM(2)A and DM(2)B. The advantages of
the DM models are again very obvious, especially in the case of the model
with three support points.
The results from Table 6 show very stable performance across the ten
samples, for all four indicators. The fact that, unlike in the �rst case study
(cf., Table 4), the estimate for σVTTS in the MMNL model is now very stable
can be explained by the lower coe�cient of variation for βC in the MMNL
estimates in the second case study.
19
Figure 2: Cumulative distribution function for βT �rst subsample for sec-
ond simulated dataset
4.3 Case study 3: Normal mixture
For the third case study, a MMNL model with a normally distributed travel
time coe�cient was chosen as the true model. Speci�cally, βC is still �xed
to a value of −1, while βT now follows a Normal distribution with mean
of −0.8 and a standard deviation of 0.3, leading to a mean VTTS of 48
DKK/hour, with a standard deviation of 18 DKK.
The results for the �rst subsample of the third simulated dataset are
summarised in Table 7. A slightly di�erent strategy was employed in the
20
M N L
M M N L
A M M N L
B D M (5 ) A
D M (5 ) B
D M (6 ) A
D M (6 ) B
F in a l L L
-4 7 4 2 .0 6
-3 9 1 2 .5 7
-3 9 1 3 .9
-3 9 1 0 .2
-3 9 1 3 .5 4
-3 9 0 8 .4 3
-3 9 0 8 .6 1
p a r.
2 4
3 1 4
1 1
1 6
1 3
a d j.
ρ 2
0 .3 6 4 9
0 .4 7 5 6
0 .4 7 5 6
0 .4 7 4 6
0 .4 7 4 6
0 .4 7 4 6
0 .4 7 5 0
es t. ti m e (s )
1 3 4 1
2 3 3
1 4 3
4 1
1 4 1
5 9
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
β T
-0 .4 0 0 8
-3 2 .1 1
- -
- -
- -
- -
- -
- -
β T ,µ
- -
-0 .8 3 5 9
-3 6 .6 7
-0 .8 3 2 9
-3 6 .6 9
- -
- -
- -
- -
β T ,σ
- -
0 .3 1 3 4
2 5 .2 7
0 .3 1 1 3
2 5 .0 3
- -
- -
- -
- -
β T ,1
- -
- -
- -
-0 .1 3 4 3
-1 .9 9
-0 .1 8 6 7
-4 .8 2
-0 .0 8 5 9
-1 .5 5
-0 .1 2 9 3
-1 .8 8
π T ,1
- -
- -
- -
0 .0 3 7 2
2 .3 4
0 .0 5 6 6
3 .8 2
0 .0 2 4 5
2 .5 5
0 .0 3 6 2
2 .3 2
β T ,2
- -
- -
- -
-0 .4 5 8 5
-1 3 .8 7
-0 .5 0 7 1
-1 7 .3 7
-0 .3 6 2 1
-8 .7 9
-0 .4 4 4 9
-1 4 .2 6
π T ,2
- -
- -
- -
0 .1 8 3 8
6 .5 5
0 .2 3 2 6
9 .1 4
0 .0 7 4 2
1 .9 0
0 .1 7 1 0
6 .6 6
β T ,3
- -
- -
- -
-0 .7 0 2 1
-2 2 .4 9
-0 .7 9 0 4
-2 8 .1 3
-0 .7 1 0 2
-2 0 .6 5
-0 .6 8 0 0
-2 4 .5 6
π T ,3
- -
- -
- -
0 .2 2 3 1
3 .9 6
0 .3 5 2 9
9 .6 3
0 .2 0 2 6
3 .2 8
0 .2 2 4 7
4 .7 1
β T ,4
- -
- -
- -
-1 .1 8 7 2
-2 9 .6 6
-1 .1 2 0 8
-2 6 .7 5
-0 .9 0 0 6
-1 9 .6 1
-0 .8 8 3 0
-2 0 .6 3
π T ,4
- -
- -
- -
0 .2 9 0 5
7 .7 0
0 .3 2 9 6
9 .6 7
0 .2 6 5 3
5 .1 1
0 .2 5 9 7
5 .7 9
β T ,5
- -
- -
- -
-0 .8 9 6 4
-1 9 .7 9
-1 .6 2 5 3
-2 0 .7 8
-0 .5 1 7 7
-1 0 .1 9
-1 .1 5 0 2
-2 9 .9 3
π T ,5
- -
- -
- -
0 .2 6 5 4
5 .2 5
0 .0 2 8 3
2 .8 5
0 .1 4 4 1
3 .0 1
0 .2 8 1 8
7 .4 7
β T ,6
- -
- -
- -
- -
- -
-1 .1 9 0 2
-2 9 .5 8
-1 .6 4 2 3
-2 1 .2 4
π T ,6
- -
- -
- -
- -
- -
0 .2 8 9 3
7 .6 1
0 .0 2 6 7
3 .1 0
β C
-0 .4 9 9 9
-3 0 .1 3
- -
-1 .0 2 6 7
-3 8 .5 3
- -
-1 .0 1 3 5
-3 7 .6 7
- -
-1 .0 2 1 3
-3 7 .6 6
β C
,µ -
- -1 .0 2 5 4
-3 8 .5 8
- -
- -
- -
- -
- -
β C
,σ -
- 0 .0 0 8 0
0 .4 7
- -
- -
- -
- -
- -
β C
,1 -
- -
- -
- -0 .7 4 6 7
-1 8 .5 7
- -
-0 .7 4 8 5
-1 8 .5 0
- -
π C
,1 -
- -
- -
- 0 .0 8 6 2
2 .6 7
- -
0 .0 8 6 1
2 .6 8
- -
β C
,2 -
- -
- -
- -1 .0 5 4 5
-3 4 .4 6
- -
-1 .0 5 6 9
-3 4 .4 1
- -
π C
,2 -
- -
- -
- 0 .9 1 3 8
2 8 .3 1
- -
0 .9 1 3 9
2 8 .4 2
- -
µ V
T T S
4 8 .1 0
4 8 .9 3
4 8 .6 8
4 8 .9 6
4 8 .7 2
4 8 .7 7
4 8 .8 1
σ V
T T S
- 1 8 .3 4
1 8 .2 0
1 8 .0 8
1 8 .1 5
1 8 .1 5
1 8 .1 5
T a b le 7 : D et a il ed
es ti m a ti o n re su lt s fo r � rs t su b sa m p le fo r th ir d si m u la te d d a ta se t
21
model estimation in this case study. From the experience of the �rst two
case studies, it had to be assumed that some of the distribution of βT would
erroneously be picked up as heterogeneity in βC. This would apply espe-
cially in the discrete mixture models with a low number of support points.
As such, alongside the MNL model, two di�erent MMNL models were es-
timated, one with βC kept �xed, and one with a randomly distributed βC.
In the discrete mixture models, 2 support points were used for βC, while
the number of support points for βT was gradually increased up to the
point where no heterogeneity was retrieved for βC, i.e. the random taste
heterogeneity in the data is captured correctly by βT on its own. It was
found that this point was reached between �ve and six support points for
βT . No further gains in model performance could be obtained by increas-
ing the number of support points for βT any further, independently of the
treatment of βC.
Again, all the di�erent models o�er good performance in retrieving the
true mean value of the VTTS, while the various mixture models addition-
ally o�er a good approximation to the true standard deviation. The six
mixture models o�er signi�cant improvements in model performance when
compared to the MNL model. As in the other examples, the DM mod-
els again have computational advantages over the MNL model. Given the
results from the other case studies, it is of interest to look at the issue
of confounding between the heterogeneity for βT and βC. In the MMNL
model and the DM model with six support points, the reductions in model
�t resulting from using a �xed βC coe�cient are not signi�cant. With only
�ve support points, the drop in model �t is slightly more visible (DM(5)A vs DM(5)B), yet still not signi�cant when taking into account the cost of
estimating three additional parameters. However, in earlier models, using
fewer than �ve support points for βT , this was not the case, and there was
signi�cant confounding9.
Finally, it is of interest to look at the speci�c patterns of heterogene-
ity retrieved by the discrete mixture models, where we focus on MMNLB,
DM(5)A and DM(6)B. Here, it can be seen from Figure 3 that the two DM
models o�er a very good approximation to the Normal distribution.
9Detailed results available on request.
22
Figure 3: Cumulative distribution function for βT �rst subsample for third
simulated dataset
The average results across the ten subsamples are summarised in Table
8. The results show that, on average (and unlike in the �rst subsample),
not allowing for heterogeneity in βC leads to a minor drop in the adjusted
ρ2 measure for the MMNL model and the DM(5) model. This is however
again not the case for DM(6), showing that six support points are su�cient
to retrieve the true heterogeneity in the data. In terms of estimation time,
the DM mixtures retain their advantage, even with a higher number of
support points. Finally, the results for the mean and standard deviation of
the VTTS are very stable across subsamples.
23
MNL MMNLA MMNLB DM(5)A DM(5)B DM(6)A DM(6)B mean 0.3701 0.4724 0.4722 0.4715 0.4710 0.4714 0.4718
adj.ρ2 std.dev. 0.0057 0.0057 0.0057 0.0057 0.0055 0.0057 0.0056
mean 1 338.1 242.9 124.8 45.3 172 63.7 Est.time (s)
std.dev. 0.0 7.9 25.5 31.27 8.3 30.4 14.4
mean 47.71 48.71 48.58 48.70 48.61 48.64 48.60 VTTS µ
std.dev. 0.35 0.40 0.41 0.48 0.49 0.52 0.49
mean - 17.63 17.63 17.68 17.46 17.60 17.60 VTTS σ
std.dev. - 0.48 0.48 0.45 0.49 0.43 0.45
Table 8: Summary of results across subsamples for third simulated dataset
4.4 Case study 4: Mixture of two Normals
For the fourth case study, a more complex mixture was used. As such,
the true distribution is now a mixture of two Normal distributions, where
βT = π1 βT1 + π2 βT2, with π1 = π2 = 0.5, and with βT1 ∼ N(−0.8, 0.2)
and βT2 ∼ N(−0.3, 0.1). The cost coe�cient βC was again kept �xed at
−1. With this, we obtain a true mean VTTS of 33 DKK/hour, with a
standard deviation of 17.76 DKK. In model estimation, the strategy from
the third case study was again adopted, gradually increasing the number
of support points for βT in the DM models, while maintaining the number
of support points for βC �xed at 2. Again, the issue of confounding largely
disappeared when using �ve or more support points.
The results for the �rst subsample are presented in Table 9, with Table
10 presenting a summary of the results across all ten subsamples. Along
with the MNL model, two MMNL models were estimated, where MMNLA and MMNLB again di�er by using a randomly distributed and �xed βC coe�cient respectively. Although the standard deviation for βC is signi�-
cantly di�erent from zero in model MMNLA, it is very small compared to
the mean value, such that it is no surprise that the e�ect of using a �xed
coe�cient is very small, with very similar model performance for MMNLB.
In the DM models, we experience a very small, and insigni�cant drop in
model �t when constraining βC to a single value. Here, two further obser-
vations can be made. In model DM(5)A, the di�erence between βC,1 and
βC,2 is not signi�cant beyond the 48% level of con�dence, while, in model
24
M N L
M M N L
A M M N L
B D M (5 ) A
D M (5 ) B
D M (6 ) A
D M (6 ) B
F in a l L L
-5 2 9 6 .8 4
-4 4 0 5 .5 4
-4 4 0 6 .1 1
-4 3 5 9 .0 7
-4 3 6 3 .2 3
-4 3 5 9 .0 3
-4 3 6 3 .2 3
p a r.
2 4
3 1 4
1 1
1 6
1 3
a d j.
ρ 2
0 .2 9 0 6
0 .4 0 9 6
0 .4 0 9 7
0 .4 1 4 5
0 .4 1 4 4
0 .4 1 4 3
0 .4 1 4 1
es t. ti m e (s )
7 3 4 1
2 1 3
1 9 7
3 3
1 7 4
7 5
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
es t.
a sy .t -r a t
β T
-0 .2 1 5 3
-2 7 .8 0
- -
- -
- -
- -
- -
- -
β T ,µ
- -
-0 .5 1 1 5
-3 0 .2 3
-0 .5 1 5 5
-3 0 .6 2
- -
- -
- -
- -
β T ,σ
- -
0 .2 9 3 0
2 9 .2 3
0 .2 9 5 4
2 9 .8 2
- -
- -
- -
- -
β T ,1
- -
- -
- -
-0 .0 8 4 4
-1 .3 3
-0 .0 7 8 7
-1 .2 7
-0 .0 8 5 5
-1 .3 6
-0 .0 7 8 7
-1 .2 7
π T ,1
- -
- -
- -
0 .0 3 7 7
1 .4 2
0 .0 3 6 3
1 .4 7
0 .0 3 8 5
1 .4 1
0 .0 3 6 3
1 .4 7
β T ,2
- -
- -
- -
-0 .2 8 3 1
-7 .9 0
-0 .2 7 1 3
-1 6 .1 9
-0 .2 8 3 5
-7 .0 1
-0 .2 7 1 3
-1 6 .1 9
π T ,2
- -
- -
- -
0 .3 4 6 1
0 .9 8
0 .3 7 6 1
6 .3 2
0 .3 3 9 5
0 .7 7
0 .3 7 6 1
6 .3 2
β T ,3
- -
- -
- -
-0 .9 8 3 3
-2 0 .7 6
-0 .3 7 8 8
-1 1 .9 4
-0 .3 2 1 6
-3 .9 5
-0 .3 7 8 8
-1 1 .9 4
π T ,3
- -
- -
- -
0 .1 9 0 0
4 .7 1
0 .0 9 2 1
1 .4 6
0 .1 1 0 2
0 .2 5
0 .0 9 2 1
1 .4 6
β T ,4
- -
- -
- -
-0 .3 2 5 3
-4 .5 8
-0 .6 5 4 3
-2 8 .1 5
-0 .7 2 4 7
-9 .8 2
-0 .4 6 7 6
0 .0 0
π T ,4
- -
- -
- -
0 .1 0 4 7
0 .2 9
0 .2 8 4 8
1 0 .1 6
0 .0 7 4 9
0 .3 3
0 .0 0 0 0
0 .0 0
β T ,5
- -
- -
- -
-0 .6 6 9 0
-2 0 .6 1
-0 .9 5 4 6
-3 0 .0 0
-0 .6 5 5 5
-1 1 .7 2
-0 .6 5 4 3
-2 8 .1 5
π T ,5
- -
- -
- -
0 .3 2 1 5
8 .3 6
0 .2 1 0 7
8 .9 4
0 .2 5 0 8
1 .1 3
0 .2 8 4 8
1 0 .1 6
β T ,6
- -
- -
- -
- -
- -
-0 .9 8 4 2
-2 1 .7 9
-0 .9 5 4 6
-3 0 .0 0
π T ,6
- -
- -
- -
- -
- -
0 .1 8 6 0
4 .7 7
0 .2 1 0 7
8 .9 4
β C
-0 .4 1 9 4
-3 1 .8 9
- -
-0 .9 7 2 1
-3 7 .8 9
- -
-0 .9 7 3 7
-3 7 .4 0
- -
-0 .9 7 3 7
-3 7 .4 0
β C
,µ -
- -0 .9 7 1 8
-3 8 .1 7
- -
- -
- -
- -
- -
β C
,σ -
- 0 .0 3 5 2
1 .9 7
- -
- -
- -
- -
- -
β C
,1 -
- -
- -
- -1 .0 7 8 1
-2 2 .6 6
- -
-0 .8 7 2 5
-1 5 .6 8
- -
π C
,1 -
- -
- -
- 0 .5 9 6 5
4 .0 3
- -
0 .3 5 7 2
1 .7 0
- -
β C
,2 -
- -
- -
- -0 .8 8 0 1
-2 0 .8 3
- -
-1 .0 6 6 2
-1 8 .5 8
- -
π C
,2 -
- -
- -
- 0 .4 0 3 5
2 .7 3
- -
0 .6 4 2 8
3 .0 6
- -
µ V
T T S
3 0 .8 0
3 1 .6 0
3 1 .8 2
3 2 .6 0
3 2 .4 9
3 2 .6 4
3 2 .4 9
σ V
T T S
- 1 8 .1 5
1 8 .2 3
1 7 .3 9
1 7 .1 0
1 7 .3 8
1 7 .1 0
T a b le 9 : D et a il ed
es ti m a ti o n re su lt s fo r � rs t su b sa m p le fo r fo u rt h si m u la te d d a ta se t
25
DM(6)A, it is not signi�cant beyond the 50% level of di�erence. It can
also be seen that, on average, when moving from DM(5)A to DM(5)B and
from DM(6)A to DM(6)B, the standard errors associated with the various
πT,k parameters decrease. Finally, model DM(6)B can be seen to reduce
to model DM(5)B; the additional support point, as well as its associated
probability, are not signi�cantly di�erent from zero. All seven models again
o�er good performance in the retrieval of the true mean VTTS, where the
six mixture models also perform well for the standard deviation. The DM
models maintain their advantages in terms of estimation cost, where these
are naturally smaller than before given the higher number of parameters.
In terms of model performance, the MMNL models clearly outperform the
MNL model, while the various DM models have a small advantage over the
MMNL models. The results from Table 10 again show stable performance
over the ten subsamples.
When looking at the retrieval of the true shape for the distribution of βT ,
it can be seen that the MMNL models using a single Normal distribution
produce a mean that is the weighted average of the mean of the two Normal
distributions. The DM models on the other hand do recover the multi-
modality of the true distribution10. These �ndings are re ected in the shape
of the distributions for βT in Figure 4, where the DM models (DM(5)A and DM(6)B) are better able to account for the multi-modality of the true
distribution.
In closing, it should be noted that, in this example, the uni-modal
MMNL model still manages to retrieve the true mean and standard de-
viation of the multi-modal true distribution of the VTTS. This can be
explained by the fact that the probabilities for the two Normal distribu-
tions were set evenly to 0.5, where the di�erence in the standard deviation
for βT1 and βT2 was also rather small. Di�erent patterns could be expected
in a more asymmetrical scenario.
10It should be noted that, in the retained DM model, DM(5)B, two of the probabilities
for support points, πT,1 and πT,3, are only signi�cant at the 85% level of con�dence.
26
Figure 4: Cumulative distribution function for βT �rst subsample for fourth
simulated dataset
5 Summary and Conclusions
With the availability of powerful computers and estimation tools, researchers
and practitioners are increasingly making use of continuous mixture struc-
tures, such as Mixed Logit, in the representation of random taste hetero-
geneity across respondents. Despite the gains in estimation power, the cost
of using such mixture models remains high, especially in large scale stud-
ies. Furthermore, several issues arise due to the models' reliance on speci�c
distribution functions, whose shape is not necessarily consistent with that
27
MNL MMNLA MMNLB DM(5)A DM(5)B DM(6)A DM(6)B mean 0.2896 0.4082 0.4082 0.4118 0.4115 0.4116 0.4118
adj.ρ2 std.dev. 0.0036 0.0041 0.0041 0.0045 0.0040 0.0046 0.0045
mean 8.8 363.7 235.4 142.1 40.8 168 59.7 Est.time (s)
std.dev. 2.4 14.8 10.9 37.28 10.0 18.0 11.8
mean 31.32 31.90 32.27 32.87 32.74 32.83 32.76 VTTS µ
std.dev. 0.38 0.48 0.44 0.51 0.46 0.46 0.47
mean - 18.48 18.67 17.71 17.58 17.73 17.66 VTTS σ
std.dev. - 0.33 0.31 0.31 0.35 0.29 0.34
Table 10: Summary of results across subsamples for fourth simulated
dataset
of the true, unobserved distribution.
In this paper, we have discussed an alternative approach for the repre-
sentation of random taste heterogeneity, making use of discrete mixtures
instead of continuous mixtures. Although several issues can also arise in the
estimation of such models, they have the advantage of a closed form solu-
tion, and can hence be estimated and applied without relying on simulation
processes. Furthermore, the models are free from a priori assumptions as
to the shape of the true distribution.
The paper presents several case studies o�ering an in-depth comparison
of the two modelling approaches, making use of real data as well as four
separate simulated datasets. The results of these analyses clearly show the
major advantage of the discrete mixture approach in terms of estimation
cost. They also show that, across scenarios, the discrete mixture models are
able to attain similar or indeed better performance than their continuous
counterparts. Finally, they are better able to deal with complicated true
distributions, such as the presence of multiple modes.
Although further comparisons between the two modelling approaches
are required, the results from this paper do suggest that discrete mixture
models present a viable alternative, partly thanks to their lower cost in
estimation and application, but also due to the absence of a priori shape
assumptions, which is of great interest in the context of recent discussions
of the issue of the speci�cation of continuous heterogeneity by Hess et al.
(2005).
28
Acknowledgements
Part of the work described in this paper was carried out during a guest
stay by the �rst author in the Institute of Transport and Logistics Studies
at the University of Sydney.
References
Bierlaire, M. (2003). BIOGEME: a free package for the estimation of
discrete choice models, Proceedings of the 3rd Swiss Transport Re-
search Conference, Monte Verit�a, Ascona.
Burge, P. and Rohr, C. (2004). DATIV: SP Design: Proposed approach
for pilot survey, Tetra-Plan in cooperation with RAND Europe and
Gallup A/S.
Chintagunta, P., Jain, D. and Vilcassim, N. (1991). Investigating hetero-
geneity in brand preference in logit models for panel data, Journal of
Marketing Research 28: 417{428.
Cirillo, C. and Axhausen, K. W. (2006). Evidence on the distribution of
values of travel time savings from a six-week diary, Transportation
Research Part A: Policy and Practice 40(5): 444{457.
Dong, X. and Koppelman, F. S. (2003). Mass Point Mixed Logit Model:
Development and Application, paper presented at the 10th Interna-
tional Conference on Travel Behaviour Research, Lucerne.
Fosgerau, M. (2004). Nonparametric and semiparametric estimation of
the distribution of the value of travel time, paper presented at the
European Transport Conference, Strasbourg.
Fosgerau, M. and Bierlaire, M. (2006). Discrete choice models with multi-
plicative error terms, Technical Report TRANSP-OR 060831, Trans-
port and Mobility Laboratory, School of Architecture, Civil and En-
vironmental Engineering, Ecole Polytechnique F�ed�erale de Lausanne.
29
Gopinath, D. (1995). Modeling Heterogeneity in Discrete Choice Pro-
cesses: Application to Travel Demand, PhD thesis, MIT, Cam-
bridge, MA.
Greene, W. H. and Hensher, D. A. (2003). A latent class model for discrete
choice analysis: contrasts with mixed logit, Transportation Research
Part B: Methodological 37(8): 681{698.
Hess, S., Bierlaire, M. and Polak, J. W. (2005). Estimation of value of
travel-time savings using mixed logit models, Transportation Re-
search Part A: Policy and Practice 39(2-3): 221{236.
Hess, S., Train, K. and Polak, J. W. (2006). On the use of a Modi�ed
Latin Hypercube Sampling (MLHS) method in the estimation of a
Mixed Logit model for vehicle choice, Transportation Research Part
B: Methodological 40(2): 147{163.
Kamakura, K. W. and Russell, G. (1989). A probabilistic choice model for
market segmentation and elasticity structure, Journal of Marketing
Research 26: 379{390.
Lee, B. J., Fujiwara, A., Zhang, J. and Sugie, Y. (2003). Analysis of Mode
Choice Behaviours based on Latent Class Models, paper presented
at the 10th International Conference on Travel Behaviour Research,
Lucerne.
Wedel, M., Kamakura, W., Arora, N., Bemmaor, A., Chiang, J., Elrod, T.,
Johnson, R., Lenk, P., Neslin, S. and Poulsen, C. S. (1999). Discrete
and continuous representations of unobserved heterogeneity in choice
modeling, Marketing Letters 10(3): 219{232.
30