help on writing
Not Your Parents’ Test Scores: Cohort Reduces Psychometric Aging Effects
Elizabeth M. Zelinski and University of Southern California
Robert F. Kennison California State University, Los Angeles
Abstract
Increases over birth cohorts in psychometric abilities may impact effects of aging. Data from 2
cohorts of the Long Beach Longitudinal Study, matched on age but tested 16 years apart, were
modeled over ages 55–87 to test the hypothesis that the more fluid abilities of reasoning, list and
text recall, and space would show larger cohort differences than vocabulary. This hypothesis was
confirmed. At age 74, average performance estimates for people from the more recently born
cohort were equivalent to those of people from the older cohort when they were up to 15 years
younger. This finding suggests that older adults may perform like much younger ones from the
previous generation on fluid measures, indicating higher levels of abilities than expected. This
result could have major implications for the expected productivity of an aging workforce as well
as for the quality of life of future generations. However, cohort improvements did not mitigate age
declines.
Keywords
cohort aging; longitudinal; cognition; intelligence
Over the last 50 years, there have been systematic increases in fluid intelligence measures
across birth cohorts in many developed countries (e.g., Flynn, 1987). Despite this finding,
the vast majority of studies in cognitive aging (e.g., Salthouse, 2004) have compared people
of different ages and generations to estimate aging effects. Their conclusions therefore rest
on the assumption that cohort does not bias results. In this paper, we test hypotheses about
the role of cohort on age changes on five different cognitive psychometric tests. Cohort-
sequential panel data from the Long Beach Longitudinal Study were analyzed over age
using latent growth modeling, with cohort effects tested as differences between two panels
of participants from the same age ranges but initially tested 16 years apart. Findings of
cohort differences in psychometric aging would not only have implications for theories of
Copyright 2007 by the American Psychological Association
Correspondence concerning this article should be addressed to Elizabeth M. Zelinski, Leonard Davis School of Gerontology, Andrus Gerontology Center, University of Southern California, Los Angeles, CA 90089-0191. zelinski@usc.edu. Elizabeth M. Zelinski, Leonard Davis School of Gerontology, University of Southern California; Robert F. Kennison, Department of Psychology, California State University, Los Angeles. The contributions of the authors were equal.
NIH Public Access Author Manuscript Psychol Aging. Author manuscript; available in PMC 2014 September 19.
Published in final edited form as: Psychol Aging. 2007 September ; 22(3): 546–557. doi:10.1037/0882-7974.22.3.546.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
cognitive decline, but may translate into the lengthening of the productive life span, as well
as to reduced prevalence of cognitive impairment in late old age.
Cohort Change in Intelligence
Flynn (1987) reported increases of up to 1.5 standard deviations in reasoning scores between
19-year-olds tested in 1950 and those in 1980. Those tested in 1950 were born during the
Great Depression and those in 1980 after World War II. Today, those Depression-era adults
would be in their late 70s and postwar adults would be in their late 40s. The mean
differences between individuals aged 48 and 78 on reasoning in a cross-sectional study
conducted in 2008 would theoretically include that 1.5 standard deviation difference
documented at age 19. This would inflate estimates of the 30-year age difference. Such
biasing effects could have profound consequences for conclusions about cognitive aging
because reasoning, which is considered representative of fluid intelligence, shows earlier
and more substantial age declines than tasks that are more representative of crystallized
intelligence. (e.g., McArdle, Ferrer-Caja, Hamagami, & Woodcock, 2002). Inflated age
estimates could also arise in single-panel longitudinal studies, as they generally include a
wide range of ages over a relatively short retest interval and are often analyzed over age
rather than measurement occasion (e.g., McArdle et al., 2002).
Cohort differences in abilities have been consistently observed in comparisons of different
birth groups in multicohort studies such as the Seattle Longitudinal Study (Schaie, 1996).
Estimates suggest average increases in reasoning performance for people born in 1910
compared to those born in 1896 (e.g., Schaie, 1996) suggesting that such cohort trends have
been a phenomenon of at least the past century. The dramatic rise in fluid reasoning
observed by Flynn (1987) is likely to be based on continuous increments that happened to be
sampled at a wide time interval (see also Flynn, 2003; Raven, 2000).
Despite substantial generational increases in fluid abilities, changes in normative data in
children and young adults for more crystallized abilities have been mixed. For example,
minimal cohort differences have been reported for the Mill Hill Vocabulary tests (e.g.,
Raven, 2000) and also for the arithmetic subtest of the Wechsler Intelligence Scale for
Children—Revised (Flynn, 2003). Alwin (1991; Alwin & McCammon, 2001) suggested that
once effects of education are removed, there is a reversal of the Flynn effect in a population
sample, with more recent cohorts poorer in vocabulary ability than earlier born ones.
However, this finding may be related to a confound between sampling of particular ages and
cohorts in that study; a recent study extending cohort-sequential modeling to the sample and
vocabulary test evaluated by Alwin indicated cohort increases (Bowles, Grimm, & McArdle,
2005; see also Wilson & Gove, 1999).
Although some theorists disagree whether the Flynn effect is based on actual changes in
ability levels or to a lack of psychometric invariance (e.g., Rodgers, 1999)—that is, that
differences across cohorts in intelligence exist because the scores do not have the same
measurement properties such as equal factor loadings, uniquenesses, and factor intercepts—
invariance or at least partial invariance has been established for some indices of fluid
abilities across cohorts of children and young adults in developed countries (Wicherts et al.,
Zelinski and Kennison Page 2
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
2004). Despite disagreements about the broad explanations proposed by Flynn and
colleagues (e.g., Dickens & Flynn, 2001) to explain the eponymous effect (Loehlin, 2002;
Rowe & Rodgers, 2002), and some recent studies suggesting that the Flynn effect may have
recently plateaued or reversed for military conscripts in the 1990s in two Scandinavian
countries (e.g., Sundet, Barlaug, & Torjussen, 2004; Teasdale & Owen, 2005), it is likely
that the skills underlying fluid ability performance increased during young adulthood for
cohorts that are now aging.
The most widely cited explanations for the Flynn effect are those of cultural changes,
including improvements in nutrition and hygiene, population movement from rural to urban
areas, increased access to schooling in the first half of the 20th century, increased
educational levels of parents, smaller families, parental engagement in practices that
encourage cognitive development (e.g., Williams, 1998), and changes in processing from
more characteristically verbal to more iconic representations due to the rise of visually
oriented modalities in film, television, computer games, and other media (for descriptions
see Greenfield, 1998). Blair, Gamson, Thorne, and Baker (2005) suggested that recent fluid
ability increases additionally reflect a shift in the content of mathematical curricula in
primary and secondary schools toward fluid-like tasks that involve working memory and
improve frontal functioning.
The less consistent findings of cohort-related increases for some types of crystallized
abilities have also been interpreted as due to historical changes in reinforcement.
Instructional time spent imparting traditional school related knowledge has been reduced
(see Williams, 1998), and people are more likely to watch movies and television than to
read. The visual environment of movies and television promotes basic vocabulary and use of
contextualized grammatical structures rather than the more advanced vocabulary and
complex, decontextualized grammatical forms of written literature, leading to stable
vocabulary scores on intelligence tests yet simultaneously declining verbal SAT scores (e.g.,
Greenfield, 1998). Consistent with this explanation, Bowles et al. (2005) reported that a
cohort-sequential analysis over age showed greater improvement in more recently born
cohorts for basic vocabulary items than for advanced ones.
In summary, it has been suggested that culture affects the cognitive environment so that
cognitive abilities adapt to it (e.g., Barber, 2005). This leads to the hypothesis that
discrepancies in cohort effects in older populations will vary to the extent that larger cohort
increases will be observed for the fluid-like cognitive skills that have been more emphasized
in recent decades than previously, whereas crystallized skills that have been consistently
emphasized over the past century would show less cohort change. The fluid/crystallized
theory (e.g., Horn & Cattell, 1967) also predicts age effects that parallel the cohort effects,
that is, that age declines are larger for fluid than crystallized abilities. However, it is
important to vary age and cohort systematically to determine whether the declines attributed
to age are confounded with cohort.
Zelinski and Kennison Page 3
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Cohort and Aging
Although it has been suggested that cohort differences may be responsible for much of the
observed age decline in fluid abilities (Raven, 2000), it is not likely that they explain all
apparent age-related effects, as there are substantial and reliable declines for a wide range of
cognitive abilities with age, including the crystallized-like abilities (e.g., Salthouse, 2004).
The finding of consistent age differences across abilities suggests that, if tracked over age,
different cohorts would show parallel age declines. However, it is conceivable that cohort
would interact with age due to selective attrition.
People who drop out from longitudinal studies perform more poorly and tend to be older
than those who remain (e.g., Cooney, Schaie, & Willis, 1988; Kennison & Zelinski, 2005).
There are also age differences in initial selection into a sample. Simply being willing to
participate in cognitive testing signals greater selectivity in older adults because advanced
age is associated with higher rates of refusal to complete cognitive tests in a population
survey (e.g., Zelinski, Burnight, & Lane, 2001) and in normative studies of intelligence
(e.g., Raven, 2000). Even if later-born cohorts are intellectually advantaged, elderly
individuals from earlier cohorts may be more select than younger ones just because they
have survived to the age at which they are tested (see Rabbitt et al., 2002; Verhaeghen,
2003). Because there are likely to be stronger survival effects in earlier cohorts, more recent
cohorts may be less likely to show those benefits because they are relatively less select (e.g.,
Singer, Verhaeghen, Ghisletta, Lindenberger, & Baltes, 2003). Thus cohort could
conceivably have no biasing effect in cross-sectional studies because the cohort advantage
may wash out.
Age Declines
Cohort effects may be implicated in the rate of average decline with age, which is not likely
to be constant across the entire span of late adulthood. Neugarten (1975) coined the terms
“young-old” (below age 75) and “old-old” (above 75) to differentiate age groups in the
elderly population. There are clear age differences between the young-old and the old-old,
with worse performance on cognitive tasks in old-old people, even when differences in
education and health have been accounted for (e.g., Zelinski, Crimmins, Reynolds, &
Seeman, 1998). Longitudinal studies generally report an acceleration of estimated declines
in the old-old (e.g., Rabbitt et al., 2002; Singer et al., 2003; Sliwinski, Hofer, Hall, Buschke,
& Lipton, 2003). Hypotheses based on the fluid-crystallized distinction suggest greater
acceleration of declines in the old-old for more fluid than crystallized ones (Horn & Cattell,
1967), which has been supported longitudinally (e.g., Singer et al., 2003). Thus it is
important to evaluate change across a wide range of ages in late adulthood.
Interval Scaling
Tests of the generality of the Flynn effect require direct comparisons across cognitive
measures. However, with raw scores, it is impossible to determine whether an age or cohort
difference of, say, 5 points on a memory task is equivalent to a difference of 5 points on a
reasoning test. Another problem with raw scores is that across individuals, differences
between scores on the same test may not be equivalent at different points of the scale
Zelinski and Kennison Page 4
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
(Wright & Stone, 1979). For example, the 2-point difference between scores of 20 and 18 on
a 20-item recall task may not be interpreted as reflecting the same amount of difference in
performance as the difference between scores of 14 and 12, or 2 and 0, even though the
relative ordering of individuals’ scores is clear. This is a problem for testing interactions,
where assumptions of statistical tests require that scores are equivalent across their full range
(Embretson, 1996). Mathematical transformations of raw scores, such as z scores or
proportion correct, do not redistribute them so that differences between points on the scale
are equal (Wright & Stone, 1979). However, Rasch scaling can be applied to data to assure
equivalence of scores across tests and persons by using an algorithm that treats item
difficulty and person ability separately. It ensures that the differences between scores at
every point of the scale are the same and that these differences can be directly compared
across variables through logistic transformation (see, e.g., Zelinski & Gilewski, 2003). Thus,
whether age declines accelerate differentially across psychometric tests has only been
directly evaluated by McArdle et al. (2002), who used the Rasch scaled Woodcock-Johnson
test battery. They found smaller declines that occur later in the life span for crystallized
abilities than for more fluid ones for people under age 75.
In the present study, Rasch scaled scores were used for five measures that are likely to show
differential cohort and age patterns. We evaluated decline patterns in people ages 55–87 and
tested the hypothesis that average decline accelerates with age. We estimated changes with
age and cohort independently with cohort-sequential analyses (Schaie, 1965). Two
longitudinal panels, which were treated as different cohorts studied over the same range of
ages but born 16 years apart, were examined to determine whether patterns of decline are
similar, even if there are cohort differences in performance at the intercept.
Schaie’s (1996) longitudinal estimates, as well as cross-sectional estimates in standard score
scaling for similar tasks by Salthouse (2004), suggest that slopes of age declines should be
comparable for reasoning and space. Using the same items and time restrictions for the
vocabulary test as Schaie, we expected that the age effect for vocabulary would be similar to
those of the other psychometric tests largely because that test has a strong speed component
(Hertzog, 1989; Zelinski & Lewis, 2003). Yet only small cohort effects were anticipated
because the vocabulary subtest of the Schaie–Thurstone Adult Mental Abilities Test
(STAMAT; Schaie, 1985) is of a crystallized-like ability. Extrapolating from Schaie’s and
Salthouse’s estimates, we expected accelerating age changes for vocabulary, space, and
reasoning. Both Schaie’s and Salthouse’s estimates suggest minimal acceleration of decline
in late old age for recall tasks.
Method
The Long Beach Longitudinal Study design, participants, and measures are described in
detail elsewhere (Zelinski & Burnight, 1997; Zelinski & Lewis, 2003). Participants are
volunteers residing in the communities of Long Beach and Orange County, California. The
convenience sample shows performance characteristics similar to a representative sample of
older Americans who have at least a high school education (e.g., Zelinski, Burnight, & Lane,
2001). This suggests that the sample represents the upper levels of older adult performance.
Zelinski and Kennison Page 5
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Participants
The data of two cohorts of participants varying in age from 55–87 were used for the
analyses. Cohort 1 participants (baseline n = 456) were born between 1893 and 1923 and
were tested on as many as five test occasions in 1978, 1981, 1994, 1997, and 2000 (see top
panel of Table 1). Cohort 2 participants (baseline n = 482) were born between 1908 and
1940 and were tested up to four times in 1994, 1997, 2000, and 2003. It is important to note
that the overlap in the birth years of the two cohorts does not confound comparisons because
the age matching held the 16-year cohort difference constant. However, Cohort 1
participants could have an advantage because they had been in the study longer than Cohort
2. To determine whether there might be retest effects for Cohort 1 participants in 1994,
when Cohort 2 was initially tested, participants aged 71–87 were compared on their 1994
scores. There were 76 individuals from Cohort 1 and 294 from Cohort 2 in the analyses of
variance. There were no differences on four of the tasks with Fs(1, 368) ranging from 0.0 to
2.7; however, Cohort 1 participants had significantly better 1994 scores on reasoning, F(1,
368) = 5.8, p < .05, suggesting minimal retest bias.
The mean baseline age for Cohort 1 (M = 69.81, SD = 7.17) was reliably younger than for
Cohort 2 (M = 72.15, SD = 8.46), F(1, 935) = 20.81, MSE = 61.74. Members of Cohort 1 (M
= 12.54, SD = 2.76) reported fewer years of formal education than Cohort 2 (M = 13.69, SD
= 2.94), F(1, 935) = 36.42, MSE = 8.12. Cohort 1 consisted of 53.3% women and Cohort 2
consisted of 47.8% women; there were no differences in gender representation across
cohorts, χ2(1, N = 937) = 0.08.
Test intervals were approximately 3 years apart with the exception of a 13-year interval
between the second (1981) and third (1994) testings of Cohort 1. The irregularity of this
interval is assumed to have no undue influence on data modeling because the data were
analyzed over age rather than time. In addition, Zelinski and Burnight (1997) reported no
differences in 1978 scores for Cohort 1 participants who returned in 1994, regardless of
whether they were tested in 1981 or not. All had better initial scores than permanent
dropouts. This suggests general selection effects for Cohort 1 participants returning in 1994
rather than practice effects.
Psychometric Tests
Five measures of cognitive performance were examined. Three of them were from the
STAMAT: Recognition Vocabulary, reasoning (Letter and Word Series), and space (Figure
and Object Rotation). Recognition Vocabulary required selection of definitions for 50 target
words from an array of four choices in 4 min. Reasoning was a composite score of the
STAMAT Letter and Word Series tests. Letter Series required selection of the next item in
30 series, such as a b c c b a d e f f e, in 6 min, and Word Series was a parallel version with
items using days of the week and months. Space was a composite score of the STAMAT
Figure and Object Rotation tests. In these tasks, participants had 6 min to select up to three
rotations of a target from an array of six choices. The items for Figure Rotation were
abstract line figures, whereas those for Object Rotation were line drawings of common
household items such as a bleach bottle. There were 20 items in this test. List recall was
immediate written recall of a list of 20 concrete high-frequency nouns. The words were
Zelinski and Kennison Page 6
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
presented on a sheet of paper and participants were given 3.5 minutes to study the list. The
test was not timed. Text recall involved immediate written recall of a 227-word passage
containing 104 idea units. The passage was read by participants while they also listened to
an audio reading of it presented at a rate of approximately 155 words per minute. The
proportion of idea units correctly recalled according to the parsing model of Turner and
Greene (1977) was the dependent measure of text recall.
Rasch Scaling
Interval measurement scaling is based on the following equation (Rasch, 1966):
(1)
where fni1 is the person n’s probability of scoring 1 rather than 0 on item i, bn is the ability
of person n, and di1 is the difficulty of the item. Rasch scaling assumes that the construct
being measured is unidimensional; difficulty of items on a cognitive test is consistent with
ability such that correct scores on more difficult items reflect greater ability and more
difficult items always have a lower probability of being correctly answered than less
difficult ones, independent of person ability. It also assumes that individuals with greater
levels of ability will be more likely to score correct on more items. They always have a
higher probability of correctly answering any item, independent of item difficulty, than
those with low ability (Wright & Stone, 1979).
In Rasch scaling, person parameters are conditioned out of the model when item difficulties
are being calibrated, and item parameters are conditioned out of the model when person
parameters are being calibrated by repetitive inversion of the items and persons data matrix.
Items are ordered by difficulty and persons by ability, with maximum-likelihood modeling
used to identify items that best discriminate responses and people from one another. The
most discriminating items are those that have an equal likelihood of obtaining a correct or
incorrect response at a given level of ability. The logarithmic transformations of the item
and person data shown in Equation 1 convert the ordinal data into interval data. The size of
the intervals is determined by the item and person performance probabilities. Rasch scaled
person scores are log odds units or logit scores representing the 50% probability of
responding correctly to items at the level of ability, 75% probability of being able to respond
correctly to items 1 logit below the ability level, and 25% probability of being able to
respond correctly to items 1 logit above the ability level (Bond & Fox, 2001). When data are
rescaled, the logit properties remain but vary with the scaling factors used to create the
desired score properties. For example, rescaling list recall in the present study from a
logistic score with a mean of 0 to a 0–100 range involved rescaling the logit units from 1 to
11.15. Thus, if a participant declined 11.15 points, that would indicate a 25% probability of
responding to items that previously would have been associated with a 50% probability. An
increase of 11.15 points on the recall task would indicate a 75% probability of responding
correctly to items associated with a 50% probability previously.
Fit indexes for Rasch analyses are computed separately for persons and items. They use
mean squares to show the amount of distortion in the data relative to the Rasch model. The
Zelinski and Kennison Page 7
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
expected value of the mean squares is 1.0. Values substantially less than 1.0 indicate
underfit, with possible unmodeled noise in the data; values substantially greater than 1.0
indicate overfit. Values between 0.5 and 1.5 are productive for measurement, and those
between 1.5 and 2.0 are unproductive but suggest that the measurement is not degraded;
values over 2.0 are problematic (Linacre, 2006).
The infit mean square is a fit statistic sensitive to unexpected patterns of observations made
by persons on items at their approximate ability level and by items on persons at the item’s
approximate difficulty level. The outfit mean square is a fit statistic sensitive to unexpected
observations made by persons on items that are expected to be either very easy or very
difficult or by items on persons of very low or high ability. The overall root-mean-square
error (RMSE) for the model is the square root of the average error variance computed over
persons or items. It indicates the upper limit to the reliability of measures based on the items
and persons sampled (Linacre, 2006). Reliabilities are computed separately across persons
and items, in contrast to reliabilities such as Cronbach’s alpha, which is computed for
persons only. However, the values of the person and item reliabilities are interpreted as is
alpha, with values close to 1 indicating high reliability.
The WINSTEPS Rasch measurement program Version 3.61.1 (Linacre, 2006) was used for
the interval scaling. Individual items for all tasks were the units of analysis, that is, a 1
(correct) or 0 (incorrect) for each item on each test. For the recall tasks, each word or
proposition in the study materials served as an item. Item scores within each test were
calibrated with the data stacked over occasions and cohorts so that each observation for a
particular subject was treated as independent, as would be done in the computation of z
scores over occasions. Scores were initially scaled so that at the item level they had a mean
of zero. The relative range of mean age performance could be identified from the person
ability scores. Table 2 provides fit information for the Rasch calibration of each of the five
tests. All results were good fitting and discriminating based on their infit and outfit mean
squares, low RMSEs, and high reliability for individuals and for items. Examples of interval
item scores converted from raw total scores for list recall and a mental status test are found
in Zelinski and Gilewski (2003).
For the analyses, we rescaled the Rasch item scores from the logistic scores to a 0–100 range
to increase interpretability. The rescaled means (and standard deviations) of the person
scores were, for reasoning, M = 39.66 (13.07); list recall, M = 54.78 (12.67); text recall, M =
40.29 (7.30); space, M = 63.72 (5.83); and vocabulary, M = 68.25 (15.18). The scaling
factors for the logits were 6.12, 11.15, 7.80, 6.72, and 7.4, respectively.
Longitudinal Analysis
We used growth modeling (McArdle & Bell, 2000) with the Mplus program (Version 4.2;
Muthén & Muthén, 1998–2007) to test hypotheses about age and cohort differences in
longitudinal performance on the Rasch-scaled measures. The models were fit to data
configured over 3-year age “buckets” to increase the number of observations for the ages
studied (e.g., Bowles et al., 2005). The 3-year age ranges at which people were tested were
treated as manifest variables, and those age ranges at which they were not tested as latent
variables. The ranges were 55–57, 58–60, 61–63, 64–66, 67–69, 70–72, 73–75, 76–78, 79–
Zelinski and Kennison Page 8
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
81, 82–84, and 85–87. For simplicity, we refer to these age buckets by the middle value for a
given bucket. For example, the 73–75 bucket will be referred to as age 74.
Growth models that utilize maximum likelihood methods provide accurate parameter
estimates even with missing data, provided that the missing at random assumption is met
(e.g., McArdle & Hamagami, 1991; Schafer, 1997). This assumption requires that
missingness can be estimated from the data included in the analyses. Logistic regression
analyses predicting dropout from the study by the fourth testing were conducted to aid in the
selection of appropriate covariates to increase the accuracy of estimates (see Kennison &
Zelinski, 2005). Age, cohort membership, gender, and education were included in these
analyses as possible predictors, resulting in a statistically reliable fit to the data, χ2(4) = 103.
However, the only significant predictors of dropout were cohort, Wald (1) = 51.6, and
baseline age, Wald (1) = 43.8, with Cohort 1 members and older individuals more likely to
drop out; thus no additional covariates were included in the analyses reported here.
As shown in Table 3, five different growth models were evaluated to determine the best
representation of longitudinal change for the five psychometric tests. Initial analyses were of
age basis coefficients reflecting annualized change intervals. However, because of
convergence problems in the analyses of several of the psychometric scores, coefficients
representing the different age buckets were rescaled so that they represented 6-year effects.
This rescaling resolved convergence issues and permitted consistent application of the basis
coefficients across all analyses. This scaling reflects the suggestion made by Zelinski and
Burnight (1997) that age declines are not generally observed over less than 5 years.
The models to evaluate the age basis were a level-only (no change) model, level plus linear
change, a second-order polynomial model with level, linear and quadratic parameters, a
piecewise model consisting of level and two linear pieces—one from ages 56 to 71 and the
other from 77 to 86, following the young-old/old-old distinction (Neugarten, 1975)—and a
three-piece model, representing spans from ages 56 to 66, 67 to 78, and 79 to 87, to evaluate
age effects separately for early, middle, and late old age (the authors thank an anonymous
reviewer for this suggestion). For all models the intercept was set at age 74. Fit indexes
reported are the −2 log likelihood (−2LL), Akaike’s Information Criteria (Akaike, 1987)—
an index based on −2LL that penalizes for the number of parameters tested, with lower
values indicating better fit—and the root-mean-square error of approximation, with values
less than .05 representing good fit, and .08 indicating adequate fit (Browne & Cudeck,
1993). The Δχ2/Δdf for differences in −2LL was used to evaluate change in fit.
Table 3 indicates that more complex models provided better fits than linear models. Across
psychometric tests, the two-piece and quadratic models most consistently provided the
greatest improvement in fits. Although quadratic models did not generally differ in fit from
the two-piece models, the main purpose of the study was to make direct comparisons of
young-old and old-old age changes and of cohort effects across psychometric tests, and so
the two-piece model is reported for all variables.
The two-piece latent growth model tested evaluated the effects of cohort on the level and
two age pieces using the equations:
Zelinski and Kennison Page 9
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
(2)
(3)
(4)
(5)
Equation 2 is the Level 1 equation representing an individual subject’s scores at a given age.
The a1ti term represents ages below the “knot” point of 74, and a2ti represents ages above
74. Age 74 has a zero value and represents the intercept. The term eti represents residual
error. The second-level equations represent the level π0 (3); the 56–71 age piece, π1i (4); and
the 75–86 age piece, π2i (5). Cohort membership (cohorti) was coded as 0 for Cohort 1 and 1
for Cohort 2. The v00, v10, and v20 parameters are the intercepts; and v01, v11, and v21,
respectively, are the slopes of the second-level equations. Their variance components are r0i for level, and r1i and r2i for the two age slopes.
Results
Individual data plots of reasoning and vocabulary scores for the participants of each cohort
appear in Figure 1. These plots illustrate individual change and differences in variability
among scores. They include all of the data that were included in the reported age models.
Both sets of plots suggest declines, but variability appears to be greater for vocabulary.
The parameter values of the models for the five psychometric tests appear in Table 4 for the
unconditional models. At age 74, reasoning was the most difficult test, with the lowest
growth intercept parameter, followed by text recall, then list recall, space, and vocabulary.
The 6-year unconditional effect sizes in standard deviation units (Cohen’s d) for the
estimated parameters of the young-old and old-old age pieces were .45 and .71 for
reasoning, .43 and .38 for list recall, .50 and .41 for text recall, .50 and .65 for space, and .25
and .51 for vocabulary.
Table 5 shows effects for the conditional models, which included cohort as a predictor of the
level, young-old, and old-old age pieces and are illustrated in Figure 2. Parameters included
in Table 5 are the fixed effect estimates for the intercept (level) at age 74, and the young-old
(56–71) and old-old (77–86) age pieces. Also included are the estimated regression
coefficients of cohort on the two growth estimates. The random variance estimates for the
intercept (level) at age 74, and the two age pieces appear at the bottom of the table. The
model results indicate that declines were generally observed and that they steepened in the
old-old age piece compared to the young-old age piece for reasoning, space, and vocabulary.
For list and text recall, declines were more modest for the old-old age piece compared to the
young-old age piece.
Zelinski and Kennison Page 10
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Age Changes
We tested hypotheses of different slopes for the two age pieces across psychometric tests
(see McArdle et al., 2002) by evaluating models of the equality of age parameters for each
of the pieces in the conditional models. Fit indices for models with pairs of slopes allowed to
be free were compared to models with the two slopes constrained to be identical. The Δχ2
test was used to determine whether fit declined for the constrained model compared to the
free model, using a nested comparison. Worse fits for the constrained model compared to
the free model would indicate that one slope is steeper than the other. Conversely, if model
fit was not affected by constraining the slopes to be equal, then the rate of age change is best
characterized as similar across the tests.
Results are presented in Table 6. For the young-old, the coefficients for longitudinal declines
were greater for reasoning than space, and for list compared to text, space, and vocabulary.
For the old-old, reasoning and vocabulary declines did not differ, but declines for both tests
were greater than for list or text recall, or space, which did not differ from each other. The
vocabulary decline for the older age ranges is steep and similar to that reported by Schaie
(1996); however, it is likely that the decline observed here occurred not only because of
losses of vocabulary ability, but also due to age-related declines in speed, a major
component of performance on this particular test (see Hertzog, 1989).
Cohort Differences
Positive 16-year cohort effects were expected for reasoning, list recall, text recall, and space,
but not for vocabulary. As shown in Table 5, the coefficient for cohort as a predictor of level
was 4.37 (d = .33) for reasoning, 7.54 (d = .42) for list recall, 1.76 (d = .19) for text recall,
1.72 (d = .29) for space, and 0.71 (d = .03) for vocabulary. Thus, at the intercept of age 74,
the two cohorts differed on their rescaled scores by these parameter values. The 16-year d
values reported here, except for vocabulary, compared favorably with the effect sizes
estimated for 14-year cohort improvements for the same birth cohorts observed by Schaie
(1996).
Pairwise comparisons showed that coefficients for the cohort effect varied significantly.
Table 6 indicates that the cohort effect was significantly larger for list recall than for any of
the other tasks. The cohort effect for reasoning was also greater than for text, space, and
vocabulary, which did not differ from one another.
It has been suggested that cohort may account for cross-sectional age changes (Raven,
2000). Although there were reductions in the effect sizes for age, age declines remained. The
d values from the analyses with cohort as a covariate (conditional models) for the young-old
age piece were .37, .30, .34, .38, and .12 for reasoning, list recall, text recall, space, and
vocabulary. In Table 4, we found no interactions between cohort and the young-old age
piece for any of the psychometric tests as seen in the coefficients. Reductions in age effects
also occurred for the old-old age piece with the inclusion of cohort, with ds of .49, .11, .12, .
42, and .43 for reasoning, list recall, text recall, space, and vocabulary, respectively. There
were interactions between cohort and the old-old age piece for list and text recall, and for
vocabulary. The cohort effect on the 6-year age slopes for the two recall measures was
Zelinski and Kennison Page 11
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
negative (d = −0.17 and −0.14) indicating increasing convergence for the slopes of decline
across cohort for those age pieces, as seen in Figure 2. The cohort effect on the slope for
vocabulary was positive (d = .16), indicating that the older cohort had steeper declines and
greater divergence of slopes from that of Cohort 2. That these interactions had small effect
sizes and were significant, despite low power (Hertzog, Lindenberger, Ghisletta, & von
Oertzen, 2006) and paradoxical direction, makes interpretation difficult.
Discussion
The two-piece models tested slopes of change in old age with a knot point of age 74. The d
values for age declines in the unconditional models, that is, without the cohort effect, ranged
from .25 to .71, slightly larger than those estimated from studies using standard scores,
which were extrapolated to range from .2 to .6 over 6 years, suggesting that estimated
longitudinal declines in our sample may be greater with interval scaling than with
uncalibrated scaling. Our finding that the size of age changes differs across tasks— up to
age 75—reinforces the observation by McArdle et al. (2002), who also used Rasch scaled
test scores. Schaie’s (1996) data also suggested that age declines accelerate for reasoning,
space, and vocabulary. In contrast, Salthouse (2004) compared cross-sectional decline
estimates from z scores of a wide range of tasks and concluded that there were similar slopes
of decline across speed, reasoning, and memory.
This is the first study in which direct comparisons of cohort effects on age decline were
made possible with interval scaling. Average 16-year cohort-sequential effects at age 74
indicated that effect sizes varied by task: They were moderate for list recall, reasoning, and
space; small for text recall; and almost nil for vocabulary. These results are the first to
quantify relative cohort differences in tasks in older adults that parallel the conclusions
obtained with children and young adults (e.g., Flynn, 2003). Extrapolating the ds from the
observed 16-year effect to a 32-year effect, the cohort-related increase in performance
standard deviation was about .84 for list recall, .66 for reasoning, .57 for space, about .38 for
text recall, and .07 for vocabulary.
Implications for Gf-Gc Theory
The correspondence between fluid abilities, aging, and cohort effects may be more complex
than Gf-Gc theory suggests. The theory suggests that the size of age declines should reflect
fluid ability involvement in tasks; yet large cohort effects have also been observed for fluid
abilities (e.g., Flynn, 1987). The age/cohort phenomenon was not consistent in this study for
fluid-like abilities. The amount of age decline in recall, controlling for cohort, was not
significant for the old-old, even though the age declines were substantial for reasoning.
Similar age decline rates for list and for text recall in the old-old, which were small, were
accompanied by larger cohort effects for list than for text recall. Despite increasing declines
from age 74 on, the cohort effect for vocabulary did not differ from zero. Taken together, the
findings suggest that the fluid-crystallized distinction may be more closely predictive of
cohort effects than of the size of age declines in the present study.
Interestingly, this conclusion suggests what has been implied in the intelligence literature:
The fluid/crystallized distinction does not arise because fluid abilities are culture-
Zelinski and Kennison Page 12
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
independent and more biological in nature and that crystallized ones are culture-specific, as
suggested by Horn and Cattell (1967). Instead, both types of abilities are culture-specific as
they are selectively modified by the cultural reinforcement of relevant skills (see also
McArdle, Hamagami, Meredith, & Bradway, 2000). Following this suggestion, we suggest
that reasoning and recall performance are likely to involve overlapping skills that involve
both retention and the ability to use strategies. We have previously observed that decline in
reasoning is related to decline in recall in Cohort 1 (Zelinski & Stewart, 1998). The present
study does not inform whether either decline in reasoning or in recall, or of some underlying
process such as executive functioning components (e.g., Miyake, Friedman, Emerson,
Witzki, & Howerter, 2000), is the leading indicator of change in the other variable.
However, because increases in text recall were smaller than for list recall in the younger
cohort, it may suggest that the underlying skills for text recall are not as similar to those of
reasoning. Similarly, although visuospatial processing is thought to be reinforced by
mathematical training approaches in elementary schools, especially since the 1970s (Blair et
al., 2005), the cohorts studied would have completed high school before many of these
approaches were implemented, explaining relatively smaller cohort effects for space
compared to reasoning.
The findings suggest that if the model of cultural reinforcement is correct, differential
patterns in cohort effects may be the rule. Thus, cohort effects could change or vary if
educational and other social institutions emphasize certain abilities over others and
depending on when changes are introduced (Blair et al., 2005; Greenfield, 1998). It is
therefore also possible that cohort effects can asymptote as practice changes or as
performance reaches a maximal level. This may explain the apparent plateau or decline of
fluid abilities in more recent military conscripts in Norway (Sundet et al., 2004) and
Denmark (Teasdale & Owen, 2005).
Age and Cohort
Although cohort was a significant covariate for those tests traditionally showing large age
differences, it impacted the magnitude of parameters for only 3 of 10 tested age effects in
Table 5 (2 age pieces × 5 psychometric tests). This suggests that cohort effects do not
completely confound cross-sectional age declines in the abilities studied. That is,
mechanisms related to increasingly poorer performance are not only related to cohort-
specific experiences. Small interactions of cohort with age, for the old-old age piece for list
and text recall, indicated that Cohort 2 experienced steeper recall declines than Cohort 1.
Although one explanation is that there was greater memory decline in the younger cohort, it
is more likely that there was increased selectivity for people over age 74 in the older cohort.
For example, over testing waves, the difference in mean education level between cohorts
became increasingly smaller. Selectivity in the old-old age range may be an important factor
in reducing the apparent acceleration of cognitive decline, with survival effects in late old
age masking apparent decline for the two recall tasks (see also Zelinski et al., 1998).
However, it is then a mystery why Cohort 1, if it represented greater selection, declined
more in vocabulary than Cohort 2 over the old-old age range.
Zelinski and Kennison Page 13
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Cohort did affect cross-sectional age change estimates in the context of the estimated
average performance of subjects at specific ages. In the present study, cohort had an impact
on the predicted means at various ages. Figure 2 shows that the estimated mean for Cohort 2
on reasoning at age 74 was approximately the same as the mean for age 62 from Cohort 1.
The Cohort 2 mean at age 74 was approximately the same as that of Cohort 1 at age 59 for
list recall. The age 74 means for Cohort 2 were about the same as the Cohort 1 age 65 means
for space and for text recall. Thus, 74-year-olds from Cohort 2 were estimated to be
psychometrically up to 15 years “younger” than same-aged people born about 16 years
earlier. Examining performance for the late old age piece of the models, we see that the
cohort effects were slightly less beneficial. On reasoning and space, the mean for Cohort 1 at
age 74 was about the same as the mean for age 80 in Cohort 2. For list recall, the mean for
Cohort 1 at age 74 was about the same as the mean at age 83 in Cohort 2. Finally, for text
recall, the age 74 mean was similar to that of age 77 in Cohort 2. Thus, in the context of the
model, Cohort 2 participants in late old age experienced psychometric aging “delays” of up
to 9 years.
These findings imply that average cross-sectional age declines as estimated by widely used
tasks such as recall and reasoning are likely to be inflated. By including participants from
widely differing cohorts and ages, even short-term longitudinal studies may violate the
convergence assumption (McArdle & Bell, 2000), that is, a strong correspondence between
cross-sectional and longitudinal results (Hertzog & Nesselroade, 2003). Explicit cohort-
sequential evaluations are necessary to separate age from cohort effects in order to confirm
cross-sectional age decline estimates, as demonstrated here (see also Rönnlund, Nyberg,
Bäckman, & Nilsson, 2005).
If we are to assume that cohort-related increases affect mean level performance, the
implication is that even with increasing age, people from more recently born cohorts will
perform better on abilities like reasoning or list recall at specific ages than those born earlier.
The cumulative effects of cohort differences in reasoning and list recall showed that people
at age 74 from Cohort 2 were on average as cognitively capable as those in their 60s born as
recently as 16 years earlier. This suggests that for occupations requiring strong reasoning
and memory abilities, more recently born cohorts of older adults should have the functional
ability to remain productively employed into their 70s.
These results are important for theories of cognitive aging, which have generally focused on
declines estimated between college students and young-old adults. Our study did not have
adequate data on a younger sample to make generalizations about these comparisons, which
is an important limitation. However, it is likely that cohort differences are large between
individuals born in the 1980s and those in the 1940s, as there have been larger increments in
fluid abilities after the 1970s in the United States (see Flynn, 1998). Even if there are
measurable age declines in abilities from very early into middle adulthood, it is likely that
these declines are smaller than those we have documented in our study for people aged 55–
87. It would not be surprising if a larger portion of the differences between means in many
current cross-sectional aging studies is due to cohort differences, as we demonstrated for list
recall, which has relatively small age declines compared to reasoning. At the very least, the
Zelinski and Kennison Page 14
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
findings clearly suggest that for some abilities, age norms collected in the past are not
directly applicable to contemporary older adults (see also Raven, 2000).
Limitations
To provide some context to our findings, we note that this is the first analysis of cohort and
age effects at the upper end of the life span, and the psychometric tests evaluated do not
represent the full range of cognitive tasks. Although we found that cohort could affect
selected age estimates, it did not account for age declines in cognition across all variables
and ages. Slope parameters are associated with the intercept in the growth models, and may
not be equally affected by cohort at all possible intercepts.
The sample represents older adults who already have a high school education or better,
indicating that the participants studied here are in the range of the higher end of
performance. We do not know whether individuals with less education would show larger
cohort effects; some studies have suggested that the Flynn effect may partially represent
increases in those who have traditionally had the lowest scores (see Sundet et al., 2004). We
also do not know from the present sample what specific health conditions or other lifestyle
variables might differentiate the cohorts. Clearly, this study is only the beginning of the
work that must be done to fully understand cohort increases in ability.
The first baby boomers turn 65 in 2011. As the largest segment of the population in the
United States and other countries, this cohort’s experience of cognitive aging may be even
more positive than that of people from Cohort 2 in our study, who were largely from the pre-
World War II generation. It has been suggested that high cognitive functioning in younger
adulthood will be associated with high cognitive performance and possibly survival in later
life because of behaviors that maintain health and cognition (Gottfredson & Deary, 2001). It
is not currently clear whether the longitudinal data for people of recent or upcoming birth
cohorts will bear this out. Paradoxically, the hardiness thought to produce survival in people
born in previous cohorts may wash out cohort differences. In the present study, the very
small effect sizes for the trend towards convergence for the younger cohort for recall tasks
remind us that the benefits of cohort may be limited to specific segments of old age. It will
be increasingly important to examine whether age effects combined with survival into very
old age eliminate the cohort-related improvements seen with younger cohorts (e.g., Singer et
al., 2003), or whether the positive cohort effects we observed will predominate. This could
have major implications for future cohorts as public policymakers develop predictions of
disability and low quality of life due to poor cognitive function in older adults (e.g.,
Freedman, Akyan, & Martin, 2001; Langa et al., 2001).
Acknowledgments
This research was supported by National Institute on Aging Grants R01 AG10569 and T32 AG00037.
References
Akaike H. Factor analysis and AIC. Psychometrika. 1987; 52:317–332.
Alwin DF. Family of origin and cohort differences in verbal ability. American Sociological Review. 1991; 56:625–638.
Zelinski and Kennison Page 15
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Alwin DF, McCammon RJ. Aging, cohorts, and verbal ability. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences. 2001; 56:S151–S161.
Barber N. Educational and ecological correlates of IQ: A cross-national investigation. Intelligence. 2005; 33:273–284.
Blair C, Gamson D, Thorne S, Baker D. Rising mean IQ: Cognitive demand of mathematics education for young children, population exposure to formal schooling, and the neurobiology of the pre- frontal cortex. Intelligence. 2005; 33:93–106.
Bond, TG.; Fox, CM. Applying the Rasch model. Mahwah, NJ: Erlbaum; 2001.
Bowles RP, Grimm KJ, McArdle JJ. A structural factor analysis of vocabulary knowledge and relations to age. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences. 2005; 60:234–241.
Browne, MW.; Cudeck, R. Alternative ways of assessing model fit. In: Bollen, KA.; Long, JS., editors. Testing structural equation models. Newbury Park, CA: Sage; 1993. p. 136-162.
Cooney TM, Schaie KW, Willis SL. The relationship between prior functioning on cognitive and personality dimensions and subject attrition in longitudinal research. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences. 1988; 43:12–17.
Dickens WT, Flynn JR. Heritability estimates versus large environmental effects: The IQ paradox resolved. Psychological Review. 2001; 108:346–369. [PubMed: 11381833]
Embretson SE. Item response theory models and spurious interaction effects in factorial ANOVA designs. Applied Psychological Assessment. 1996; 20:201–212.
Flynn JR. Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin. 1987; 101:171–191.
Flynn, JR. IQ gains over time: Toward finding the causes. In: Neisser, U., editor. The rising curve: Long-term gains in IQ and related measures. Washington, DC: American Psychological Association; 1998. p. 25-66.
Flynn JR. Movies about intelligence: The limitations of g. Psychological Science. 2003; 12:95–99.
Freedman VA, Aykan H, Martin LG. Aggregate changes in cognitive impairment among older Americans: 1993 and 1998. Journals of Gerontology, Series B: Psychological Sciences and Social Sciences. 2001; 56:S100–S111.
Gottfredson LS, Deary IJ. Intelligence predicts health and longevity, but why? Current Directions in Psychological Science. 2001; 13:1–4.
Greenfield, PM. The cultural evolution of IQ. In: Neisser, U., editor. The rising curve: Long-term gains in IQ and related measures. Washington, DC: American Psychological Association; 1998. p. 81-123.
Hertzog C. Influences of cognitive slowing on age differences in intelligence. Developmental Psychology. 1989; 25:636–651.
Hertzog C, Lindenberger U, Ghisletta P, von Oertzen T. On the power of multivariate latent growth curve models to detect individual differences in change. Psychological Methods. 2006; 11:244– 252. [PubMed: 16953703]
Hertzog C, Nesselroade JR. Assessing psychological changes in adulthood: An overview of methodological issues. Psychology and Aging. 2003; 18:639–657. [PubMed: 14692854]
Horn JL, Cattell RB. Age differences in fluid and crystallized intelligence. Acta Psychologica. 1967; 26:107–129. [PubMed: 6037305]
Kennison RF, Zelinski EM. Estimating age change in 7-year list recall in AHEAD: The roles of independent predictors of missing-ness and dropout. Psychology and Aging. 2005; 20:460–475. [PubMed: 16248705]
Langa KM, Chermew ME, Kabeto MW, Herzog AR, Ofstedal MB, Willis RJ, et al. National estimates of the quantity and cost of informal caregiving for the elderly with dementia. Journal of General Internal Medicine. 2001; 16:770–777. [PubMed: 11722692]
Linacre, JM. A user’s guide to WINSTEPS MINISTEP Rasch-model computer programs [Computer software and manual]. Chicago: Winsteps; 2006.
Loehlin JC. The IQ Paradox: Resolved? Still an open question. Psychological Review. 2002; 109:754– 758. [PubMed: 12374329]
Zelinski and Kennison Page 16
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
McArdle, JJ.; Bell, RQ. An introduction to latent growth models for developmental data analysis. In: Little, TD.; Schnabel, KU.; Baumert, J., editors. Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples. Mahwah, NJ: Erlbaum; 2000. p. 69-107.
McArdle JJ, Ferrer-Caja E, Hamagami F, Woodcock RW. Comparative longitudinal structural analyses of the growth and decline of multiple intellectual abilities over the life span. Developmental Psychology. 2002; 38:115–142. [PubMed: 11806695]
McArdle JJ, Hamagami F. Modeling incomplete longitudinal and cross-sectional data using latent growth structural models. Experimental Aging Research. 1991; 18:145–166. [PubMed: 1459161]
McArdle JJ, Hamagami F, Meredith W, Bradway KP. Modeling the dynamic hypotheses of Gf-Gc theory using longitudinal life-span data. Learning and Individual Differences. 2000; 12:53–79.
Miyake A, Friedman NP, Emerson MJ, Witzki AH, Howerter A. The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: A latent variable analysis. Cognitive Psychology. 2000; 41:49–100. [PubMed: 10945922]
Muthén, LK.; Muthén, BO. Software manual. 4. Los Angeles: Author; 1998–2007. Mplus user’s guide.
Neugarten BL. The future and the young-old. Gerontologist. 1975; 15:4–9. [PubMed: 1110022]
Rabbitt PMA, Watson P, Donlan C, McInnes L, Horan M, Pendleton N, Clague J. Effects of death within 11 years on cognitive performance in old age. Psychology and Aging. 2002; 17:468–481. [PubMed: 12243388]
Rasch G. An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology. 1966; 19:49–57. [PubMed: 5939145]
Raven J. The Raven’s Progressive Matrices: Change and stability over culture and time. Cognitive Psychology. 2000; 41:1–48. [PubMed: 10945921]
Rodgers JL. A critique of the Flynn effect: Massive IQ gains, methodological artifacts, or both? Intelligence. 1999; 26:337–356.
Rönnlund M, Nyberg L, Bäckman L, Nilsson LG. Stability, growth, and decline in adult life span development of declarative memory: Cross-sectional and longitudinal data from a population- based study. Psychology and Aging. 2005; 20:3–18. [PubMed: 15769210]
Rowe DC, Rodgers JL. Expanding variance and the case of historical changes in IQ means: A critique of Dickens and Flynn (2001). Psychological Review. 2002; 109:759–763. [PubMed: 12374330]
Salthouse TA. What and when of cognitive aging. Current Directions in Psychological Science. 2004; 13:140–144.
Schafer, J. Analysis of incomplete multivariate data. London: Chapman & Hall; 1997.
Schaie KW. A general model for the study of developmental problems. Psychological Bulletin. 1965; 64:92–107. [PubMed: 14320080]
Schaie, KW. Schaie–Thurstone Adult Mental Abilities Test. Palo Alto, CA: Consulting Psychologists Press; 1985.
Schaie, KW. Intellectual development in adulthood. Cambridge, England: Cambridge University Press; 1996.
Singer T, Verhaeghen P, Ghisletta P, Lindenberger U, Baltes PB. The fate of cognition in very old age: Six-year longitudinal findings in the Berlin Aging Study (BASE). Psychology and Aging. 2003; 18:318–331. [PubMed: 12825779]
Sliwinski MJ, Hofer SM, Hall C, Buschke H, Lipton RB. Modeling memory decline in older adults: The importance of preclinical dementia, attrition, and chronological age. Psychology and Aging. 2003; 18:658–671. [PubMed: 14692855]
Sundet JM, Barlaug DG, Torjussen TM. The end of the Flynn effect? A study of secular trends in mean intelligence test scores of Norwegian conscripts during half a century. Intelligence. 2004; 32:349–362.
Teasdale TW, Owen DR. A long-term rise and recent decline in intelligence test performance: The Flynn Effect in reverse. Personality and Individual Differences. 2005; 39:837–843.
Turner, A.; Greene, E. Tech Rep No 63. Boulder: Department of Psychology, University of Colorado; 1977. The construction and use of a propositional text base.
Zelinski and Kennison Page 17
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Verhaeghen P. Aging and vocabulary score: A meta-analysis. Psychology and Aging. 2003; 18:332– 339. [PubMed: 12825780]
Wicherts JM, Dolan CV, Hessen DJ, Oosterveld P, Van Baal G, Caroline M, et al. Are intelligence tests measurement invariant over time? Investigating the nature of the Flynn effect. Intelligence. 2004; 32:509–537.
Williams, WM. Are we raising smarter children today? School-and home-related influences on IQ. In: Neisser, U., editor. The rising curve: Long-term gains in IQ and related measures. Washington, DC: American Psychological Association; 1998. p. 125-154.
Wilson JA, Gove WR. The intercohort decline in verbal ability: Does it exist? American Sociological Review. 1999; 64:253–266.
Wright, BD.; Stone, MH. Best test design: Rasch measurement. Chicago: MESA Press; 1979.
Zelinski EM, Burnight KP. Sixteen-year longitudinal and time lag changes in memory and cognition in older adults. Psychology and Aging. 1997; 12:503–513. [PubMed: 9308097]
Zelinski EM, Burnight KP, Lane CJ. The relationship between subjective and objective memory in the oldest-old: Comparisons of findings from a representative and a convenience sample. Journal of Aging and Health. 2001; 13:248–266. [PubMed: 11787514]
Zelinski EM, Crimmins E, Reynolds S, Seeman T. Do medical conditions affect cognition in older adults? Health Psychology. 1998; 17:504–512. [PubMed: 9848800]
Zelinski EM, Gilewski MJ. Effects of demographic and health variables on Rasch scaled cognitive scores. Journal of Aging and Health. 2003; 15:435–464. [PubMed: 12914012]
Zelinski EM, Lewis KL. Adult age differences in multiple cognitive functions: Differentiation, dedifferentiation, or process-specific change? Psychology and Aging. 2003; 18:727–745. [PubMed: 14692860]
Zelinski EM, Stewart S. Individual differences in 16-year memory changes. Psychology and Aging. 1998; 13:622–630. [PubMed: 9883462]
Zelinski and Kennison Page 18
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Figure 1. Data plots for individuals on reasoning (top) and vocabulary (bottom) by cohort. Plots
include all data from Cohorts 1 and 2 for ages 55 to 87.
Zelinski and Kennison Page 19
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Figure 2. Estimated longitudinal changes for vocabulary, space, list recall, text recall, and reasoning
for Cohort 1 (C1) and Cohort 2 (C2) between ages 55 and 87 for the two-piece models. The
line at age 74 represents the intercept of the models.
Zelinski and Kennison Page 20
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 21
T a b
le 1
B as
el in
e D
em og
ra ph
ic C
ha ra
ct er
is ti
cs o
f th
e S
am pl
e by
W av
e of
T es
ti ng
a nd
C oh
or t
M ea
su re
W av
e 1
W av
e 2
W av
e 3
W av
e 4
W av
e 5
Y ea
rs f
ro m
b as
el in
e
C
oh or
t 1
0 3
16 19
22
C
oh or
t 2
0 3
6 9
A ge
C
oh or
t 1
M 69
.8 68
.7 64
.0 62
.3 60
.6
S D
7. 2
7. 5
6. 5
6. 4
4. 9
C
oh or
t 2
M 72
.2 70
.8 68
.2 67
.9
S D
8. 5
8. 2
7. 4
7. 4
E du
ca ti
on
C
oh or
t 1
M 12
.5 12
.9 13
.7 13
.5 14
.4
S D
2. 8
2. 8
2. 8
2. 4
2. 0
C
oh or
t 2
M 13
.7 13
.9 14
.1 14
.1
S D
2. 9
2. 9
2. 8
2. 7
F em
al e
(% )
C
oh or
t 1
53 .3
52 .2
53 .7
48 .4
54 .5
C
oh or
t 2
47 .8
50 .5
48 .8
49 .5
N C
oh or
t 1
45 6
23 2
82 31
11
C
oh or
t 2
48 2
28 2
13 8
10 6
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 22
T a b
le 2
S um
m ar
y S
ta ti
st ic
s fo
r R
as ch
S ca
li ng
P ar
am et
er
P er
so n
sa It
em s
N on
ex tr
em e
ob se
rv at
io n
s (n
)b M
ea su
re c
In fi
t O
u fi
t M
od el
R M
S E
M od
el re
li ab
il it
y
N on
ex tr
em e
ob se
rv at
io n
s (n
) M
ea su
re In
fi t
O u
tf it
M od
el R
M S
E M
od el
re li
ab il
it y
R ea
so ni
ng 2,
26 1
.4 6
.9 6
60 .0
9 1.
00
M
− 1.
13 0.
98 1.
22 0
0. 97
1. 83
S
D 2.
14 0.
35 1.
79 2.
79 0.
16 1.
88
L is
t re
ca ll
2, 17
0 .5
6 .7
6 20
.0 5
1. 00
M
0. 40
1. 00
1. 04
0 1.
00 1.
04
S D
1. 14
0. 16
0. 47
0. 81
0. 04
0. 13
T ex
t re
ca ll
2, 21
3 .2
8 .9
1 10
4 .0
6 1.
00
M
− 1.
32 1.
00 1.
01 0
1. 00
1. 01
S D
0. 94
0. 15
0. 38
1. 07
0. 08
0. 21
S pa
ce 2,
28 1
.1 9
.9 5
40 .0
3 1.
00
M
1. 92
0. 95
0. 94
0 0.
91 0.
94
S D
0. 87
0. 44
0. 76
1. 01
0. 20
0. 32
V oc
ab ul
ar y
2, 12
8 .6
2 .9
2 50
.1 0
1. 00
M
2. 62
0. 98
1. 76
0 0.
97 2.
44
S
D 2.
15 0.
33 2.
47 2.
11 0.
37 2.
41
N o te
. R M
S E
= r
oo t-
m ea
n- sq
ua re
e rr
or .
a P
er so
n m
ea su
re s
ar e
ca lc
ul at
ed o
ve r
pe rs
on s,
a s
is d
on e
w it
h tr
ad it
io na
l ra
w s
co re
s ca
li ng
; it
em m
ea su
re s
ar e
ca lc
ul at
ed o
ve r
it em
s, t
ha t
is , w
it h
th e
tr ad
it io
na l
pe rs
on -i
te m
m at
ri x
tr an
sp os
ed .
b N
on ex
tr em
e ob
se rv
at io
ns a
re o
bs er
va ti
on s
w it
h ne
it he
r pe
rf ec
t sc
or es
n or
z er
o sc
or es
; ex
tr em
e ob
se rv
at io
ns a
re p
er fe
ct ly
p re
di ct
ed b
y th
e R
as ch
m od
el a
nd a
re t
he re
fo re
n ot
i nf
or m
at iv
e.
c M ea
su re
i s
th e
lo gi
st ic
s co
re b
as ed
o n
th e
R as
ch m
od el
f or
t he
t es
t be
in g
ca li
br at
ed . I
t is
s et
t o
a m
ea n
of z
er o
fo r
th e
it em
s an
al ys
is a
nd c
an v
ar y
by p
er so
ns , b
y sa
m pl
e, a
nd b
y m
ea su
re m
en t
oc ca
si on
. T he
se s
co re
s w
er e
th en
s ca
le d
to a
r an
ge o
f 0–
10 0
fo r
th e
lo ng
it ud
in al
an al
ys es
. T hi
s sc
al in
g do
es n
ot a
ff ec
t fi
t in
di ce
s.
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 23
T a b
le 3
F it
f or
A lt
er na
ti ve
L on
gi tu
di na
l M
od el
s fo
r E
ac h
of t
he P
sy ch
om et
ri c
T es
ts
P sy
ch om
et ri
c te
st
F it
i n
d ex
es
− 2L
L F
re e
p ar
am et
er s
(n )
Δ −
2L L
Δ d
f A
IC R
M S
E A
R ea
so ni
ng
L ev
el o
nl y
15 ,0
20 3
15 ,0
25 .0
88
L in
ea r
14 ,5
82 6
43 8*
3 14
,5 93
.0 40
Q ua
dr at
ic 14
,5 30
10 52
* 4
14 ,5
49 .0
31
T w
o pi
ec es
14 ,5
42 10
40 *
4 14
,5 61
.0 34
T hr
ee p
ie ce
s 14
,5 38
15 4
5 14
,5 68
.0 36
L is
t re
ca ll
L ev
el o
nl y
16 ,5
64 3
16 ,5
70 .0
71
L in
ea r
16 ,3
60 6
20 4*
3 16
,3 73
.0 48
Q ua
dr at
ic 16
,3 26
10 34
* 4
16 ,3
45 .0
44
T w
o pi
ec es
16 ,3
18 10
42 *
4 16
,3 39
.0 43
T hr
ee p
ie ce
s 16
,3 18
15 0
5 16
,3 47
.0 43
T ex
t re
ca ll
L ev
el o
nl y
13 ,8
68 3
13 ,8
75 .0
65
L in
ea r
13 ,6
32 6
23 6*
3 13
,6 44
.0 32
Q ua
dr at
ic 13
,6 22
10 10
* 4
13 ,6
41 .0
32
T w
o pi
ec es
13 ,6
22 10
10 *
4 13
,6 41
.0 31
T hr
ee p
ie ce
s 13
,6 08
15 14
* 5
13 ,6
38 .0
31
S pa
ce
L ev
el o
nl y
11 ,9
18 3
11 ,9
23 .0
86
L in
ea r
11 ,6
22 6
29 6*
3 11
,6 33
.0 58
Q ua
dr at
ic 11
,5 90
10 30
* 4
11 ,6
15 .0
57
T w
o pi
ec es
11 ,5
98 10
32 *
4 11
,6 19
.0 57
T hr
ee p
ie ce
s 11
,5 92
15 6
5 11
,6 10
.0 57
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 24
P sy
ch om
et ri
c te
st
F it
i n
d ex
es
− 2L
L F
re e
p ar
am et
er s
(n )
Δ −
2L L
Δ d
f A
IC R
M S
E A
V oc
ab ul
ar y
L ev
el o
nl y
17 ,0
98 3
17 ,1
04 .0
58
L in
ea r
16 ,9
42 6
15 6*
3 16
,9 54
.0 36
Q ua
dr at
ic 16
,9 28
10 14
* 4
16 ,9
48 .0
36
T w
o pi
ec es
16 ,9
28 10
14 *
4 16
,9 47
.0 35
T hr
ee p
ie ce
s 16
,9 16
15 12
* 5
16 ,9
47 .0
35
N o te
. L L
= l
og l
ik el
ih oo
d; A
IC =
A ka
ik e’
s in
fo rm
at io
n cr
it er
io n;
R M
S E
A =
r oo
t- m
ea n-
sq ua
re e
rr or
o f
ap pr
ox im
at io
n.
* p <
.0 5.
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 25
T a b
le 4
U nc
on di
ti on
al A
na ly
si s
P ar
am et
er s
fo r
th e
F iv
e P
sy ch
om et
ri c
M ea
su re
s
P ar
am et
er
P sy
ch om
et ri
c te
st s
R ea
so n
in g
L is
t T
ex t
S p
ac e
V oc
ab u
la ry
G ro
w th
p ar
am et
er s
L ev
el 36
.8 7*
* *
53 .1
6* * *
39 .3
4* * *
62 .6
4* * *
69 .2
8* * *
A ge
5 6–
71 −
1. 80
* * *
− 2.
80 * * *
− 1.
53 * * *
− 0.
97 * * *
− 1.
47 * * *
A ge
7 7–
86 −
3. 67
* * *
− 2.
80 * * *
− 1.
96 * * *
− 1.
44 * * *
− 4.
12 * * *
G ro
w th
p ar
am et
er S
E s
L ev
el 0.
42 0.
57 0.
29 0.
19 0.
62
A ge
5 6–
71 0.
25 0.
40 0.
19 0.
12 0.
36
A ge
7 7–
86 0.
32 0.
45 0.
29 0.
14 0.
50
V ar
ia nc
e co
m po
ne nt
s
L ev
el 11
1. 92
* * *
13 7.
44 * * *
24 .5
8* * *
19 .2
5* * *
23 2.
26 * * *
A ge
5 6–
71 5.
45 * * *
23 .1
1* * *
0. 88
1. 23
* *
8. 25
*
A ge
7 7–
86 10
.7 7*
* *
14 .4
3* 6.
92 *
0. 13
12 .0
0
E rr
or 10
.9 6*
* *
50 .1
2* * *
20 .7
7* * *
4. 28
* * *
39 .0
3* * *
V ar
ia nc
e co
m po
ne nt
S E
s
L ev
el 7.
55 11
.9 4
3. 11
1. 45
16 .4
6
A ge
5 6–
71 1.
51 4.
92 1.
26 0.
48 4.
18
A ge
7 7–
86 3.
29 6.
56 3.
10 0.
53 6.
66
E rr
or 0.
64 2.
54 1.
03 0.
23 2.
13
* p <
.0 5.
* * p <
.0 1.
* * * p <
.0 01
.
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 26
T a b
le 5
C on
di ti
on al
A na
ly si
s P
ie ce
w is
e G
ro w
th M
od el
C oe
ff ic
ie nt
s fo
r th
e F
iv e
P sy
ch om
et ri
c T
es ts
P ar
am et
er
P sy
ch om
et ri
c te
st s
R ea
so n
in g
L is
t T
ex t
S p
ac e
V oc
ab u
la ry
G ro
w th
i nt
er ce
pt p
ar am
et er
s
L ev
el (
v 0 0)
34 .6
5* * *
49 .3
3* * *
38 .4
3* * *
61 .7
7* * *
69 .0
8* * *
A ge
5 6–
71 (
v 1 0)
− 2.
21 * * *
− 2.
86 * * *
− 1.
53 * * *
− 1.
10 * * *
− 1.
05
A ge
7 7–
86 (
v 2 0)
− 4.
55 * * *
− 1.
53 −
1. 05
− 1.
70 * * *
− 6.
28 * * *
G ro
w th
i nt
er ce
pt S
E s
L ev
el (
v 0 0)
0. 61
0. 82
0. 42
0. 28
0. 93
A ge
5 6–
71 (
v 1 0)
0. 37
0. 59
0. 28
0. 18
0. 56
A ge
7 7–
86 (
v 2 0)
0. 58
0. 83
0. 55
0. 25
0. 93
C oh
or t
ef fe
ct p
ar am
et er
s
L ev
el (
v 0 1)
4. 37
* * *
7. 54
* * *
1. 76
* * *
1. 72
* * *
0. 71
A ge
5 6–
71 (
v 1 1)
0. 71
− 0.
08 −
0. 01
0. 25
− 0.
59
A ge
7 7–
86 (
v 2 1)
0. 82
− 2.
75 *
− 1.
48 *
0. 19
2. 88
*
C oh
or t
ef fe
ct S
E s
L ev
el (
v 0 1)
0. 82
1. 10
0. 56
0. 37
1. 24
A ge
5 6–
71 (
v 1 1)
0. 46
0. 77
0. 37
0. 23
0. 72
A ge
7 7–
86 (
v 2 1)
0. 69
0. 99
0. 65
0. 29
1. 10
V ar
ia nc
e co
m po
ne nt
s
L ev
el (
r 0 )
10 5.
85 * * *
12 2.
61 * * *
23 .4
4* * *
18 .3
3* * *
22 8.
28 * * *
A ge
5 6–
71 (
r 1 )
5. 32
* *
22 .5
2* * *
0. 71
1. 18
* 7.
81
A ge
7 7–
86 (
r 2 )
11 .0
8* *
12 .1
0 5.
79 0.
18 7.
77
E rr
or (
S E
e )
10 .9
0* * *
50 .2
3* * *
20 .7
7* * *
4. 28
* * *
39 .5
0* * *
V ar
ia nc
e co
m po
ne nt
S E
s
L ev
el (
r 0 )
7. 24
11 .0
4 3.
02 1.
40 16
.1 6
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 27
P ar
am et
er
P sy
ch om
et ri
c te
st s
R ea
so n
in g
L is
t T
ex t
S p
ac e
V oc
ab u
la ry
A ge
5 6–
71 (
r 1 )
1. 48
4. 73
1. 24
0. 47
4. 12
A ge
7 7–
86 (
r 2 )
3. 37
6. 36
3. 02
0. 53
6. 27
E rr
or (
e) 0.
63 2.
53 1.
02 0.
23 2.
15
* p <
.0 5.
* * p <
.0 1.
* * * p <
.0 01
.
Psychol Aging. Author manuscript; available in PMC 2014 September 19.
N IH
-P A
A u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t N
IH -P
A A
u th
o r M
a n u scrip
t
Zelinski and Kennison Page 28
Table 6
Changes in Fit Statistics (Δχ2) for Comparisons of Free Versus Equal Parameter Models Across Psychometric
Tests
Parameter Both age pieces free vs. ages 55–74 equal Both age pieces free vs. ages 74–86 equal Cohorts free vs. equal
Compared with reasoning
List recall 0 7* 5*
Text recall 2 18* 6*
Space 6* 22* 9*
Vocabulary 3 1 4*
Compared with list recall
Text recall 4* 1 21*
Space 8* 0 25*
Vocabulary 5* 15* 17*
Compared with text recall
Space 2 2 0
Vocabulary 1 22* 1
Compared with space
Vocabulary 0 21* 0
Note. Tests of all differences involve one parameter. Significant values indicate that parameters differ between psychometric tests.
* p < .05.
Psychol Aging. Author manuscript; available in PMC 2014 September 19.