review
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR
EFFECTS OF DELAYED CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
PAUL ROYALTY, BEN A. WILLIAMS, AND EDMUND FANTINO
UNIVERSITY OF CALIFORNIA-SAN DIEGO
The contingency between responding and stimulus change on a chain variable-interval 33-s, variable- interval 33-s, variable-interval 33-s schedule was weakened by interposing 3-s delays between either the first and second or the second and third links. No stimulus change signaled the delay interval and responses could occur during it, so the obtained delays were often shorter than the scheduled delay. When the delay occurred after the initial link, initial-link response rates decreased by an average of 77% with no systematic change in response rates in the second or third links. Response rates in the second link decreased an average of 59% when the delay followed that link, again with little effect on response rates in the first or third links. Because the effect of delaying stimulus change was comparable to the effect of delaying primary reinforcement in a simple variable-interval schedule, and the effect of the unsignaled delay was specific to the link in which the delay occurred, the results provide strong evidence for the concept of conditioned reinforcement.
Key words: unsignaled delay of reinforcement, chain schedules, conditioned reinforcement, key peck, pigeons
A chain schedule consists of a series of schedule requirements, each correlated with a unique stimulus, only the last of which ter- minates in primary reinforcement. Transition between the successive links of the chain is contingent upon the fulfillment of each sepa- rate schedule requirement. Chain schedules thus involve stimulus change, a contingency between responding and component transi- tion, and a contingency between responding and stimulus change. Such schedules have been commonly used to study conditioned rein- forcement, based on the premise that behavior in the early links of the chain is maintained by the reinforcing properties of the stimulus that accompanies the next link of the schedule. A given stimulus in the chain has been as- sumed to serve both as a discriminative stim- ulus for its own correlated schedule and as a conditioned reinforcer for the behavior main- tained during the stimulus that preceded it.
In order to validate the premise of condi- tioned reinforcement in chain schedules, the effects of the contingency between responding and stimulus change must be dissociated from
This research was supported by NIMH Grant MH- 20752, NSF Grant BNS 83-02963, and NSF Grant BNS 84-08878 to the University of California at San Diego. Reprints may be obtained from any of the authors, De- partment of Psychology, C-009, University of California at San Diego, La Jolla, California 92093.
the effects of the contingency between re- sponding and component transition (i.e., be- tween responding in the early links of the chain and eventual primary reinforcement) and from the effects of stimulus change alone (in the absence of contingency). The most frequent approach for effecting such a dissociation has been to eliminate the role of stimulus change by converting the chain schedule to a tandem schedule. In his review of experiments com- paring performance under tandem and chained schedules, Gollub (1977) claimed that, "For two-component chains of FI schedules, the rate in the first component under chain was gen- erally higher than tandem (Gollub, 1958), but not always (Malagodi, DeWeese, & John- ston, 1973)" (pp. 294-295). (The latter study found no difference between chain and tan- dem rates.) Subsequently, Wallace, Osborne, and Fantino (1982) reported a higher rate in the initial link of the tandem than in the ini- tial link of the chain schedule. Those authors' review of Gollub's earlier study also suggested that their own findings were the more typical result in two-link chain-tandem comparisons. Other published studies involving chain-tan- dem comparisons have consistently found lower response rates in the initial link of a chain schedule than in the initial link of the corresponding tandem schedule. This result has been obtained regardless of whether the comparison involved three fixed-interval (FI)
41
1987, 479 41-56 NUMBER 1 (JANUARY)
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
components (Kelleher & Fry, 1962; Thomas, 1964), five Fl components (Gollub, 1958), three fixed-ratio (FR) components (Jwaideh, 1973; Thomas, 1967), or five FR components (Jwaideh, 1973). In summary, the compari- son of chain and tandem schedules offers little or no support for the concept of conditioned reinforcement. A second approach to examining condi-
tioned reinforcement in chain schedules has been to investigate the effects of response-in- dependent stimulus change. That is, a chain schedule is converted into a comparable mul- tiple schedule. For example, a chain Fl 30-s FI 30-s Fl 30-s schedule would be converted into a multiple extinction (EXT) 30-s EXT 30-s FI 30-s schedule. The major difficulty with such a procedure as a control for a chain schedule is that removal of the contingency between responding and stimulus change also breaks the contingency between responding and component transition. Specifically, the de- pendency between responding in the early links of the chain and the time of the primary reinforcer is altered in ways that could influ- ence response rate. A multiple-schedule con- trol cannot, therefore, rule out the possibility that responding in the early links of a chain schedule is maintained by the dependency be- tween responding in early links and eventual primary reinforcement alone, rather than by conditioned reinforcement.
In an attempt to circumvent this liability of the multiple-schedule control procedure, Ca- tania, Yohalem, and Silverman (1980) com- pared not only chain and multiple schedules but tandem and mixed schedules as well. Higher rates of responding were maintained by the contingency between responding and stimulus change (chain schedule) than by stimulus change without contingency (multi- ple schedule). In the absence of any stimulus change, the contingency between responding and primary reinforcement (tandem schedule) did not produce a higher response rate than the absence of that contingency (mixed sched- ule). Catania et al. argued that the latter re- sult demonstrated that the contingency be- tween responding and primary reinforcement was unimportant in the maintenance of be- havior in the early links of the chain schedule and that the difference found between the chain and multiple schedules was therefore direct support for the role of conditioned re- inforcement.
The problem with using the tandem-mixed schedule comparison to control for contin- gency in the chain-multiple schedule compar- ison is that the response rates in the first two links of the tandem and mixed schedules were much higher than the corresponding rates in the first two links of the chain and multiple schedules (see Figure 1 of Catania et al., 1980). These high rates prevented the subject from encountering the difference in the con- tingencies between responding and component transition on the tandem and on the mixed schedules. The consequence of responding at a moderately high rate was the same on both the tandem and the mixed schedule-namely, the schedule advanced. Only the consequence of not responding differed on these two sched- ules; the only way to have contacted this con- tingency would have been to not respond for a period of time, but the high response rates prevented that contact. By contrast, the near- zero response rates in the initial links of both the chain and multiple schedules ensured con- tact with the differential consequences of not responding on those two schedules. Thus, the tandem-mixed schedule comparison did not rule out contingency between responding and primary reinforcement as a plausible expla- nation of the response-rate differences be- tween the chain and multiple schedules.
In summary, the experimental analysis of stimulus functions in chained schedules of re- inforcement has failed to make a totally con- vincing case for the concept of conditioned re- inforcement because the control procedures most frequently used, tandem and multiple schedules, have either often produced re- sponse-rate differences in the wrong direction (tandem schedules) or have failed to preclude possible alternative interpretations (multiple schedules). Perhaps because of these difficul- ties, the concept of conditioned reinforcement has fallen into ill repute. For example, in a recent textbook Staddon (1983) has written: The concept of conditioned reinforcement (that is, the response contingency between pecking and stimulus change) adds nothing to our un- derstanding of chain schedules.... Providing the response contingency for food in the ter- minal links is maintained, it can be omitted in earlier links with little effect on key pecking, as long as stimulus changes continue to take place as before.... Behavior on chained sched- ules is determined by temporal proximity to food in the same way as behavior on multiple schedules. (p. 466)
42
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
What is needed is a new method for dem- onstrating the action of conditioned reinforce- ment in chain schedules that avoids the pit- falls of tandem and multiple schedule comparisons. One such alternative approach to studying the role of conditioned reinforce- ment in chain schedules involves a comparison between delayed versus immediate transition between components. As suggested by Dins- moor and Clayton (1966), the reinforcing properties of a stimulus can be indexed by whether its effects are diminished by increas- ing the delay between responding and stimu- lus presentation. Thus, to the extent that stimulus transitions during a chain schedule constitute conditioned reinforcers, the behav- ior during the early links of the chain should be decreased if the presentation of the stimu- lus for the succeeding link of the chain is de- layed rather than immediately contingent on the response. The efficacy for using delay of reinforcement as a tool for investigating con- ditioned reinforcement is suggested by results from the unsignaled delay-of-reinforcement procedure (Catania & Keller, 1981; Sizemore & Lattall, 1977, 1978; Williams, 1976b). With this procedure, typically studied on interval schedules, the first response after a primary reinforcer is set up begins a delay timer and the reinforcer is delivered at the end of delay- timer operation. No stimulus change signals the delay interval and responses can occur during it, so the obtained delays are often shorter than those scheduled. Nevertheless, this procedure typically produces large decrements (on the order of 70% to 90%) in response rates with even very short (2- to 3-s) delays. The current study investigated unsignaled
delays of conditioned reinforcement by inter- posing 3-s unsignaled delays between the first and second and the second and third links of a chain of variable-interval schedules (chain VI 33-s VI 33-s VI 33-s). This procedure minimized possible confounding factors be- cause the discriminative functions of the com- ponent stimuli and the contingencies between responding, schedule advancement, and even- tual primary-reinforcer delivery were present during both baseline and delay conditions. In- terreinforcement intervals were held constant across baseline and delay conditions by short- ening each interval in the VI schedule by 3 s whenever the delay contingency was in effect. Thus, the delay contingency postponed pre- sentation of the stimulus correlated with the
next link of the chain by 3 s or less but did not alter the relation between responding in early links of the chain and food delivery in a way that could be influential. If presentation of each stimulus correlated with the chain does indeed serve to reinforce responding in the link that precedes it, then one would expect that the unsignaled delay of one of these stimuli would have the same effect as the unsignaled delay of primary reinforcement-namely, a substantial response-rate decrement in the preceding link.
METHOD Subjects
Six adult male White Carneaux pigeons, all with extensive experimental histories, served as subjects. Throughout the experi- ment, all subjects were housed individually and had free access to water and grit. The birds were weighed after each experimental session and were fed measured amounts of Universal Feeds Pigeon Pellets to maintain them at 80% of their free-feeding body weights.
Apparatus Six identical, rectangular, operant-condi-
tioning chambers were used. The chambers consisted of opaque black plastic side walls, sheet aluminum front and back walls, a ply- wood ceiling, and a wire mesh floor. Each chamber was 32 cm high, 35 cm wide, and 36 cm deep and had three response keys, each 2.5 cm in diameter, mounted 23 cm from the floor and 7.25 cm apart, center to center, on the front wall. Each key could be transillu- minated from the rear and required a mini- mum force of approximately 0.15 N to oper- ate. Feedback for each effective peck on a lighted key was provided by darkening the key for 100 ms. Only the right key was used; the left and center keys remained dark and re- sponses on them were not recorded. Access to a solenoid-operated grain hopper, when acti- vated, was available through a rectangular opening, 5 cm high and 6 cm wide, located 9.5 cm below the center key. Reinforcers con- sisted of 3.5-s access to milo. While the hop- per was raised, it was illuminated by a white light and the keylights were extinguished. General chamber illumination was provided by a dim blue houselight mounted 4 cm above the right key. A ventilation fan and continu- ously present white noise masked extraneous
43
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
Table 1
Order of conditions, schedules (in seconds), and number of sessions per condition.
Condition Schedule Sessions
1. Baseline Chain VI 33 VI 33 VI 33 25 2. Initial-link delay Chain (VI 30 delay) VI 33 VI 33 20 3. Baseline Chain VI 33 VI 33 VI 33 20 4. Middle-link delay Chain VI 33 (VI 30 delay) VI 33 25 5. Baseline Chain VI 33 VI 33 VI 33 30 6. Terminal-link delay Chain VI 33 VI 33 (VI 30 delay) 30
sounds. Scheduling of experimental events and data recording were performed by a PDP-8E® (Digital Equipment Corporation) computer located in an adjacent room.
Procedure Although all subjects had extensive exper-
imental histories, they had not been active for approximately 6 months prior to the begin- ning of the current experiment. To reestablish key pecking, all subjects were placed on a VI 30-s schedule for five sessions. The key was white during this pretraining period and ses- sions terminated after 60 reinforcers had been delivered. After pretraining, subjects were ex- posed successively to the conditions shown in Table 1. Key colors that accompanied the ini- tial, middle, and terminal links were blue, red, and white, respectively, throughout the ex- periment.
During baseline conditions, the VI sched- ules consisted of intervals pseudorandomly se- lected from a modified, 20-interval, Fleshler and Hoffman (1962) distribution. This dis- tribution consisted of a standard, 20-interval, VI 30-s Fleshler and Hoffman distribution with 3 s added to each of the 20 intervals. When an unsignaled delay followed a given link, the standard, unmodified VI 30-s distri- bution was substituted for the modified dis- tribution in that link. In this manner, the scheduled interreinforcement interval (IRI) remained constant between baseline and delay conditions. Each delay condition was preceded and followed by the immediate-transition baseline condition as indicated in Table 1.
During baseline conditions, after a given interval had elapsed, the next peck was fol- lowed immediately by the stimulus correlated with the next link in the chain. During a de- lay condition, the first peck after the comple- tion of an interval started a 3-s delay timer. At the end of the timer operation, stimulus
change and component transition occurred in- dependently of behavior. Thus, during a delay condition, the schedule in the designated com- ponent was changed from a simple VI 33-s schedule to a tandem VI 30-s FT 3-s sched- ule. No stimulus change signaled the delay interval and responses were free to occur dur- ing it, so the obtained delays between the last key peck and stimulus change were often shorter than the 3-s scheduled delay. A mea- sure of the actual delays was obtained by hav- ing the key peck that started the delay timer also start a second timer. Each subsequent peck during the delay interval reset this second timer. Upon component transition, the elapsed time on the second timer was recorded on a cumulative timer from which the average de- lay-per-stimulus change was computed.
Sessions were conducted 5 to 7 days per week and were terminated after 60 reinforcers had been delivered. Each condition was con- ducted for a minimum of 20 sessions after which response rate as a function of sessions was plotted for each subject and visually ex- amined for stability. If the data from any sub- ject were judged unstable, all subjects received an additional five training sessions after which the data were reexamined and either the con- dition was terminated or an additional five sessions were conducted for a maximum of 30 sessions.
RESULTS Figure 1 shows the mean response rate in
each component of the chain during the last five sessions of the initial-link delay condition (hatched bars) and the corresponding rate in each component during baseline (solid bars). Baseline rates shown were obtained by aver- aging together the mean response rates during the last five sessions of the baseline conditions immediately preceding and following the ini-
44
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
140 T
120 -
100
80
60
40
20
DELAYI
NTIAL
140 T
w
1-
n
z
2 w a.
(I)
Uf) z 0 a. C,, Id
120 t
100t
o0
60
40
20
0
70
60
so
40
30
20
10
0
Si
TERNAL
80'
70'
60'
50'
40'
30'
20 '
10
0
200'
180'
160'
140'
120 '
100'
80'
60'
40
20
0'
S2
INITIAL M1OOLE TER1NAL
250
S3
nTIAL
200
150
100
50
0 TERMN.
S4
INITIAL M OOLE
S5
INITIAL TERIflAL
S6
INITIAL IDDLE TERMIAL
COMPONENT
Fig. 1. Response rates (responses/minute) for each of 6 pigeons in each component of the chain during baseline (solid bars) and during the condition where the delay to stimulus change was imposed on initial-link responding (hatched bars). Ordinate scaling varies among subjects.
tial-link delay condition. The data for the in- dividual conditions are shown in the Appen- dix. As may be seen in Figure 1, initial-link response rates decreased during the delay con- dition for all 6 subjects with an average de-
crease of 77% from baseline levels. There was no systematic effect of the initial-link delay contingency on response rates in either the middle or terminal links.
Session-by-session initial-link response rates
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
DELAY BASELINE BASELINE
30 l
25
20
15
10
S-i
0 5 10 15 20 25 30 35 40 45 00 55 60
30
10
so
50
10
10 S-2
0 5 10 15 20 25 30 35 40 45 50 55 60
I0
10
.0. S-3
0 5 10 15 20 25 30 35 40 45 50 55 60
SESSIONS
130
110
70
50
30
10
00
60
70
50
30
20
10
DELAY BASELINE
('M S-4
0 5 10 15 20 25 30 35 40 45 50 55 60
I S-5
0 5 10 15 20 25 30 35 40 45 50 55 60
0 5 10 1 5 20 25 30 35 40 45 50 55 60
SESSIONS
Fig. 2. Initial-link response rates as a function of sessions during the initial-link delay condition and adjacent baseline conditions.
for each of the 6 subjects are shown in Figure 2. For all subjects the delay contingency took effect quickly, with large response-rate de- creases evident after only one to five sessions of exposure. Response rate then continued to decrease for the next 15 to 20 sessions and became generally stable during the last 10 ses- sions for 4 of the 6 subjects. For 2 subjects (S- 4, S-6), however, response rates were still de- creasing at the end of exposure to the delay condition. Recovery from the delay contin- gency was quite rapid during the return-to- baseline condition, as the level of responding from the preceding baseline was typically reached within five sessions following removal of the delay contingency.
Figure 3 shows the mean response rate in each component of the chain during the last five sessions of the middle-link delay condition (hatched bars) and the corresponding rate in each component during baseline (solid bars). Again, baseline rates were averaged over the baseline conditions immediately before and after the middle-link delay condition. Middle- link response rates decreased during the delay condition for all subjects, with an average dec- rement of 59%. Terminal-link response rates decreased slightly for all subjects during the delay condition, and a slight initial-link re- sponse rate decline was also evident for 4 of the 6 birds.
Figure 4 shows the middle-link response
46
BASELINE
50
45
40
35
30
25
20
15
10
5,-
aI w
C/o 4 z 0
w 2
3
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
70
60'
50*
47
S4 o40
30'
TERMKAL
20'
10
0'
160
140
120'
100'
80
60
40
20
0I
200
180
160
140
120
100
80 60
40
20
0 TEWWLA
NTIAL DOOLE TE RIML
S5
NTIAL MIDDLE
S6
INITIAL TEA1IAL
C OMPONENT
Fig. 3. Response rate (response/minute) in each component of the chain during baseline (solid bars) and during the middle-link delay condition (hatched bars). Ordinate scaling varies from subject to subject.
rates during sessions of the middle-link delay condition and its adjacent baseline conditions. As in the initial-link delay condition, response rates dropped quickly for all 6 subjects, al- though the rate of decline was notably smaller for some subjects than was the rate of decline
evident in Figure 2 for the initial-link delay condition. Because of the slower rate of de- cline, response rates after 25 sessions were still decreasing for 4 of the 6 subjects, suggesting that still lower response rates would have been obtained had the delay condition been contin-
|* BGELINE| 10 DELAY| Si
160
140
120
100
80
60
40
20
0 4 IMTIAL
140 T
120*
100 t S2
w
tr z
2
w a-
U1) L Un z 0a- C,) w
80I 60 t
40t
20
0 NITIAL MIDDL
S3
70
60
50
40
30
20
10
0 NTIAL MIDDLE
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
BASELINE
frAv,Mk
BASELINE DELAY
100
so070
60
50
40
30
20
10
40
20
100 Y-
60 S-2
40
45 SS 65 70 85 95 105 115
70 /
so
50
30
20
.0 .-3
250
200
150
100
50
BASELINE
S-4
S-6
45 SS 60 70 6s 95 100 115
SESSIONS
Fig. 4. Middle-link response rates as a function of sessions during the middle-link delay condition and during adjacent baseline conditions.
ued further. Return to baseline after the delay condition produced an immediate increase in rate for all subjects. The rates of responding finally attained during the postdelay baseline were generally comparable to response rates during the predelay baseline.
Figure 5 shows the mean response rates in each component of the chain during the last five sessions of the terminal-link delay con- dition (hatched bars) and the mean response rate in each component during the last five sessions of the immediately preceding baseline condition (solid bars). A decline in terminal- link response rates during the terminal-link delay condition was evident for 4 subjects, whereas terminal-link response rates in-
creased for the remaining 2 subjects. There was no systematic effect of the terminal-link delay contingency on response rates in either the initial or middle links.
Session-by-session terminal-link response rates during the terminal-link delay condition and its preceding baseline are plotted in Fig- ure 6. Response rates were considerably more variable than were those shown in Figures 2 and 4, both across sessions and across subjects. For 2 subjects (S-3, S-4), there was a regular decline across sessions, much like that seen in the earlier conditions. For 2 others (S-2, S-5), there was an increase in rate, which occurred during the first 5 to 10 sessions of exposure to the delay condition. Response rates for these
48
BASELINE DELAY
70
60
S0
40
30
20
10 S-i
45 SS 6s 75 58 9S 10 110
SESSIONS
Ian .I
I
I
I
I
f
4
1
II
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
Si
80
70
60
50
40
30
20
10
0 TERMML
160
MIDLE TERMINAL
MIDDLE
140
120
100
0
60
40
20
0
250
200
150
100
50
0 TEMIINAL
S4
NTIAL MIDDLE TERMNAL
S5
INITIAL MIDOLE TEAItL
S6
NTIAL
COMPONENT Fig. 5. Response rate (responses/minute) in each component of the chain during baseline (solid bars) and during
the terminal-link delay condition. Ordinate scaling varies across subjects.
2 subjects then remained elevated for the du- ration of the delay condition. For the remain- ing 2 subjects (S-1, S-6), there was an overall decrease in rate, but performance was highly erratic across sessions, often changing by 30 to 50 responses per minute across successive
sessions. Possible reasons for this variability will be considered in the Discussion.
Figure 7 summarizes the above findings by averaging response rates in each component across all 6 subjects for each of the three delay conditions and their corresponding baselines.
BSLINE
0 DELAY|
49
160 T
140
120
100
80
60
40
20
0 4
140
w
120
z
100
Z 80 w
a.
60
(I) w
U) 40 z
0 a. 20 U)
w
lr A J.
IMTIAL mILE
S2
INITIAL
S3
60
50
30
20
10
0 NTIAL
I
I
I
I
I
U I
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
BASELINE DELAY BASELINE DELAY
S-i
120 130 140 1S0 160 170 180
30
10
50 S-2
30 ....................................
120 130 140 150 160 .170 160
r0
10 5-3
lo S-3 > O
90
60
70
60
50
40
30 I S-4
20 -- 120 130 140 150 160 170 160
200
iao J/ 160
120
100
80
60 S-5 40
120 130 140 1S0 160 170 160
240
220
200 l 160
160 140
120
100
80
60 S-6
120 130 140 150 1601 20 1 30 1 40 1SO1S60
SESSIONS SESSIONS
Fig. 6. Session-by-session terminal-link response rates during the terminal-link delay condition and during the immediately preceding baseline condition.
The upper panel shows the rates during the initial-link delay condition; middle-link de- lay-condition rates are shown in the middle panel; and terminal-link delay-condition re- sponse rates are found in the lower panel. Ev- ident in this figure is the specificity of the effect of the unsignaled delay contingency. Within each panel, response rates in the com- ponent followed by the delay decreased dra- matically from baseline levels, whereas the rates in the remaining two components were largely unaffected by the delay contingency.
Recall that the unsignaled delay contin- gency specifies only the maximum delay be- tween the last response in one component and the onset of the next component. Table 2 shows
the average delays actually occurring in each of the three delay conditions, recorded as de- scribed above. Although there is some vari- ability, in general the longest obtained delays occurred in the initial-link delay condition, and the shortest obtained delays were found in the terminal-link delay condition. An exception to this trend may be seen in the data for Subject 5 where the shortest obtained delay, 0.68 s, occurred in the initial-link delay condition, and the longest delay, 1.14 s, occurred in the mid- dle-link delay condition. Note, however, that even this modest initial-link delay was never- theless sufficient to produce a robust response rate decrease of 53 responses per minute.
Table 3 shows the obtained interreinforce-
160
160
140
120
100
60
60
40
20
13
w
a: w CO) z 0
C') w a:
6
01
7
5
I
170 ISO
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
COMPONENT Fig. 7. Response rate (response/minute) in each com-
ponent of the chain averaged across subjects for the initial- link delay condition and corresponding baseline (upper panel), middle-link delay condition and corresponding baseline (middle panel), and terminal-link delay condition and corresponding baseline (bottom panel). Ordinate scal- ing varies among panels.
ment intervals for each of the three delay con- ditions and the mean of the obtained inter- reinforcement intervals (IRIs) for the three baseline conditions. In general, the increase in obtained IRIs over baseline levels was greatest
Table 2 Mean obtained delays (in seconds).
Subject
Condition 1 2 3 4 5 6
Initial-link delay 2.52 2.18 2.29 1.64 0.68 2.80 Middle-link delay 1.84 1.51 2.01 2.53 1.14 1.95 Terminal-link delay 1.09 0.63 2.11 1.07 0.73 1.13
when the delay occurred after the initial link and smallest when the delay occurred after the terminal link. With the exception of the ini- tial-link delay condition for Subject 6, how- ever, in no instance did the obtained IRI dur- ing a delay condition exceed the baseline IRI by more than 7.5 s, and in six instances, the obtained IRIs during delay conditions were actually less than the baseline IRIs.
DISCUSSION The present data demonstrate that behavior
in the initial and middle links of a three-link chain schedule is maintained by the contin- gency between responding and access to the succeeding stimulus of the chain. Interposing a brief delay between responding and access to the succeeding stimulus produced major decrements in response rate. Such effects can- not be explained by increases in the temporal distance to primary reinforcement signaled by the different stimuli correlated with the dif- ferent links, because the delay procedure had no effect on the overall time to reinforcement. Slight increases in the obtained interreinforce- ment interval did occur, but these were clearly due to the delay procedure having produced large decreases in response rates, including substantial periods of no responding. The magnitude of the present effects thus suggests that a major determinant of responding in chain schedules is the conditioned reinforcement value of the succeeding stimulus of the chain. The large and consistent conditioned rein-
forcement effects seen in the present study raise the question of why comparable effects have not been found in the past. For example, in the study by Catania et al. (1980), which demonstrated higher response rates in a chain FI FI FI than in a corresponding multiple EXT EXT FI, the effects of conditioned re- inforcement were evident only in the middle
INITIAL DELAY
120
100
so
60
40
20
0
51
INTIAL MIDDLE TEIINAL
MIDDLE DELAY
2001 180
160
140
120 100
0
60
40' 20
0'
w I.-
z
:
cn w 0e
Cl) w C,) z 0 a- Cf)
cr- MTIAL MOMD0E TEIINL
120
100
80
60
40
20
0
TERMINAL DELAY
NTIM
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
Table 3
Mean obtained interreinforcement intervals (in seconds). Baseline IRIs shown are the means of the three baseline conditions.
Subject
Condition 1 2 3 4 5 6
Baseline 100.9 101.0 101.9 103.1 97.5 101.8 Initial delay 108.4 105.7 102.6 106.5 104.9 122.8 Middle delay 105.5 100.4 107.4 103.8 100.3 105.1 Terminal delay 98.5 96.6 98.8 102.3 100.3 94.1
link of the chain (there was no difference in response rates in the initial link) and were small in magnitude (although consistent across subjects). The smaller differences reported in that study are puzzling, because in the present study the contingency between responding and component transition was merely degraded by the addition of the unsignaled delay, whereas in the multiple-schedule condition of the study by Catania et al., the contingency between component transition and responding was completely absent. The most likely explana- tion of this difference in the magnitude of the respective findings was the use of very brief training periods (either four or eight sessions per condition) in the study of Catania et al. Several of their subjects' responding did not show discriminative control during the four or eight sessions per condition with the new schedule contingencies: Responding at sub- stantial rates occurred during the second EXT component of their multiple EXT EXT FI condition. Had training continued until re- sponding during that component ceased (as it did for some of their subjects), a larger differ- ence between their chain and multiple sched- ules might have been apparent. The present results are also in ostensible
conflict with studies comparing chain and tan- dem schedules; those studies have reported higher response rates during the initial links of the tandem (see introduction). This finding has been interpreted as evidence that the dis- criminative properties of the initial-link stim- uli, rather than conditioned reinforcement, are the major determinant of responding in the initial links of the chain. Most prior studies have typically used chain schedules consisting of FI components. Given that the subjects dis- criminate the temporal properties of the Fl schedule, the onset of the stimulus correlated with the next link of the chain should be a cue for nonreinforcement with respect to the
next conditioned reinforcer, just as it appears that the onset of a stimulus correlated with a simple FI schedule is a cue for nonreinforce- ment with respect to the primary reinforcer (see Schneider, 1969). Thus, such a stimulus event should not be an effective conditioned reinforcer, so behavior leading to that stimu- lus should, not surprisingly, be weakly main- tained. The result is that there should be less responding in the initial link of a chain FI Fl Fl schedule than during the initial link of a corresponding tandem schedule. The present analysis suggests that chain VI VI VI sched- ules (and the corresponding tandem) would produce a different outcome-that is, little dif- ference in the initial-link response rates. Sup- port for this speculation comes from a com- parison of the data of Duncan and Fantino (1972) with those of Schneider (1972) ob- tained from studies of choice between single- stimuli and chain schedules (in the terminal links of concurrent-chains schedules). In Duncan and Fantino's study, choice was as- sessed between simple FI 2x and chain FI x Fl x schedules. Pigeons showed dramatic preference for the simple FI. Schneider, how- ever, assessed choice between tandem and chain VI VI schedules, and found indiffer- ence. Moreover, there is no evidence from Schneider's study (see his Table 1, page 49) that response rates in the initial link of the tandem VI x VI y schedule differed from those in the initial link of the chain VI x VI y schedule. (For a more complete discussion of these findings, see Duncan & Fantino, 1972, and Fantino, 1977.) It should be acknowl- edged, however, that although the comparison of the data of Schneider with those of Duncan and Fantino (1972) supports the present em- phasis on conditioned reinforcement, other re- sults from studies of segmentation of rein- forcement intervals show strong discriminative effects even with VI schedules (e.g., Fantino
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES
& Duncan, 1972; Leung & Winton, 1985; Moore, 1982). A more fundamental problem with the
comparison of chain and tandem schedules as evidence for the role of conditioned reinforce- ment is that such a comparison depends crit- ically on implicit assumptions about the tradeoff between reinforcement rate and re- inforcement type (primary vs. conditioned). As is well known, the function relating response rate to reinforcement rate is hyperbolic in shape (Catania & Reynolds, 1968; Herrn- stein, 1970). This is important because to the extent that subjects trained with a tandem schedule fail to discriminate elapsed time as a result of the contingencies for component transitions, their schedule effectively becomes a simple interval schedule equal to the sum of the times required to complete the tandem re- quirements. Thus, the tandem-chain compar- ison becomes equivalent to one between a sim- ple interval schedule of primary reinforcement versus a second, shorter interval schedule of conditioned reinforcement. With a chain Fl 30 FI 30 FI 30 versus tandem FI 30 FI 30 FI 30, for example, the comparison would be between a 30-s interval schedule of condi- tioned reinforcement versus a 90-s interval schedule of primary reinforcement. The prob- lem is that we do not know beforehand what the relative strengths of behavior maintained by each of the two schedules should be, even assuming that conditioned reinforcement is a potent determinant of behavior. Given a hy- perbolic function relating response rate to re- inforcement rate, which implies that there are large regions of the function in which differ- ent frequencies of reinforcement produce very similar response rates, a VI 90-s schedule of primary reinforcement may produce response rates similar to those of a VI 30-s schedule of primary reinforcement. Such a possibility would preclude the use of the chain-tandem comparison as a meaningful index of condi- tioned reinforcement because presumably a VI 30-s schedule of primary reinforcement would produce higher response rates than would a VI 30-s schedule of conditioned reinforce- ment. More generally, until the separate func- tions relating response rate to the frequency of primary versus conditioned reinforcement are specified, the comparison between chain and tandem schedules as an index of condi- tioned reinforcement can mean very little. The present results are also in apparent
conflict with findings from studies of FI schedules in which subjects could produce brief presentations of "clock" stimuli correlated with successive segments of the FI. Such schedules are similar to chain schedules in that there is a contingency between responding and access to stimuli correlated with temporal distances from primary reinforcement. Despite the sim- ilarity, however, such studies (Auge, 1977; Kendall, 1972) have suggested that stimuli that accompany the initial and middle segments of the interval are not conditioned reinforcers and may be in fact conditioned punishers. For ex- ample, Auge presented pigeons an FI 32-s schedule divided into three segments cued by individual stimuli. In addition to the constant stimulus in each segment, responses during a given segment could produce brief presenta- tions of the stimuli correlated with other seg- ments. Thus, presentations of the middle stimulus were arranged contingent on re- sponding during both the initial and terminal segments of the FI. Whereas the present data suggest that the middle stimulus should have been a conditioned reinforcer for responding in the initial segment and thus should have enhanced responding during the initial seg- ment, little change was observed and respond- ing continued at a near-zero rate.
Although such results seem in conflict with the present findings, a closer analysis suggests that the conflict is only superficial. It is un- certain, for example, whether "brief" stimu- lus presentations are functionally similar to the ostensibly same stimulus when presented for longer durations. Similarly, presentations of brief stimuli not only add a new stimulus to the situation, they also change the existing stimulus situation. Thus, it is possible that the lack of an apparent conditioned-reinforcement effect was due to these complexities. The pres- ent procedure did not contain these possible disruptive influences, and the resulting large effects of the delay contingency suggest strongly that the middle stimuli of a chain (and possibly of a segmented FI) have strong conditioned reinforcing properties for re- sponding in the initial link. An aspect of the present results that bears
comment is the smaller and less consistent ef- fect of the delay contingency when it was im- posed between responding in the terminal link and access to food (see Figures 5 and 6). Only 4 of the 6 subjects revealed an effect of the delay contingency, whereas all subjects had
53
54 PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
shown reliable effects when the delay had been presented in either the initial or middle links of the chain. Such a result seems surprising because the terminal link of the chain was most similar to a simple VI food schedule, which previous studies have used to demon- strate strong effects of the unsignaled delay- of-reinforcement contingency. Two possible reasons for the discrepancy can be suggested. The first considers the role of elicited behav- ior. If both the response contingency and the occurrence of conditioned reinforcement are ignored, a three-link chain schedule is com- parable to a multiple EXT EXT VT sched- ule, which often has been used to study au- toshaped pecking. This suggests that at least part of the behavior maintained in the ter- minal link of the chain schedule in the present study was controlled by the Pavlovian signal properties of the terminal-link stimulus. Con- sequently, interposing the delay between re- sponding and food during the terminal link should be expected to have less effect than during a comparable simple VI schedule, be- cause in the latter the Pavlovian contingency is absent. The relatively weak delay effects seen during the terminal link may thus reflect the relative influence of Pavlovian versus op- erant contingencies in controlling behavior in that link. Control of behavior in the terminal link by Pavlovian contingencies would also help to explain the greater variability (noted above) in response rates in the terminal-link delay condition than in the two previous delay conditions, as behavior maintained by stimu- lus-reinforcer contingencies is frequently more variable than behavior maintained by re- sponse-reinforcer contingencies (e.g., Wil- liams, 1976a). The second possibility is that the discrep-
ancy between the present results and those from previous studies of the unsignaled delay of reinforcement in simple VI schedules is more apparent than real. Sizemore and Lattal (1978) plotted functions relating percentage change in response rate to obtained rather than programmed unsignaled delays. The present results are quite comparable to these functions from simple VI schedules. For example, the 3 subjects in the current study whose obtained delays in the terminal-link condition were ap- proximately 1 s (S-1, S-4, and S-6) showed response-rate decrements relative to baseline of 38%, 68%, and 46%, respectively, which
compare favorably with the decrements of 30%, 35%, and 55% observed by Sizemore and Lattal at obtained delays of 1 s (percentages estimated from Figure 2 of Sizemore & Lat- tal). Likewise, the decrement of 69% in the response rate of Subject S-3 in the current study with an obtained terminal-link delay of 2.1 s compares well with the 45%, 60%, and 65% decrements observed by Sizemore and Lattal at a 2-s obtained delay. Indeed, even the seemingly paradoxical response-rate in- creases of Subjects S-2 and S-5 in the present study (obtained delays of 0.6 and 0.7 s) are compatible with the response-rate increases observed by Sizemore and Lattal and by Lat- tal and Ziegler (1982) at their shortest (0 to 0.5-s) delay values. Such response-rate in- creases with very short delays are apparently due to the shaping of response bursts (Lattal & Ziegler).
Finally, the present study has implications for the level of analysis appropriate for rein- forcement schedules in general. Rejection of the concept of conditioned reinforcement has occurred partly because of its alliance with a molecular analysis, wherein behavior is as- sumed to be determined by the cumulation of the individual effects of temporally contiguous events. In contrast, molar levels of analysis have invoked concepts such as stimulus-rein- forcer contingency, response-reinforcer cor- relations, and the like. Staddon's (1983) ac- count provides an example of this approach with respect to chain schedules, insofar as he suggested, in lieu of conditioned reinforce- ment, that the variable controlling response rate is the "relative temporal proximity" to food. Such a molar notion cannot easily ex- plain the large effects of the brief delay con- tingencies used in the present study, for in this case the molar temporal properties of the var- ious discriminative stimuli were left intact. Whether a similar molecular analysis will be successful in other schedule situations remains to be seen, but the present findings provide strong encouragement for such an attempt.
REFERENCES Auge, R. J. (1977). Stimulus functions within a fixed-
interval clock schedule: Reinforcement, punishment, and discriminative stimulus control. Animal Learning e Behavior, 5, 117-123.
Catania, A. C., & Keller, K. J. (1981). Contingency,
CONDITIONED REINFORCEMENT IN CHAIN SCHEDULES 55
contiguity, correlation, and the concept of causation. In P. Harzem & M. D. Zeiler (Eds.), Advances zn analysis of behaviour: Vol. 2. Predictability, correlation, and contiguity (pp. 125-167). Chichester, England: Wiley.
Catania, A. C., & Reynolds, G. S. (1968). A quanti- tative analysis of the responding maintained by inter- val schedules of reinforcement. Journal of the Experi- mental Analysis of Behavior, 11, 327-383.
Catania, A. C., Yohalem, R., & Silverman, P. J. (1980). Contingency and stimulus change in chained schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 33, 213-219.
Dinsmoor, J. A., & Clayton, M. H. (1966). A condi- tioned reinforcer maintained by temporal association with the termination of shock. Journal of the Experi- mental Analysis of Behavior, 9, 547-552.
Duncan, B., & Fantino, E. (1972). The psychological distance to reward. Journal ofthe Experimental Analysis of Behavior, 18, 23-34.
Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313-339). Englewood Cliffs, NJ: Prentice-Hall.
Fantino, E., & Duncan, B. (1972). Some effects of in- terreinforcement time upon choice. Journal of the Ex- perimental Analysis of Behavior, 17, 3-14.
Fleshler, M., & Hoffman, H. S. (1962). A progression for generating variable-interval schedules. Journal of the Experimental Analysis of Behavior, 5, 529-530.
Gollub, L. R. (1958). The chaining of fixed-interval schedules. Unpublished doctoral dissertation, Harvard University, Cambridge, MA.
Gollub, L. R. (1977). Conditioned reinforcement: Schedule effects. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 288-312). Englewood Cliffs, NJ: Prentice-Hall.
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13, 243-266.
Jwaideh, A. R. (1973). Responding under chained and tandem fixed-ratio schedules. Journal of the Experi- mental Analysis of Behavior, 19, 259-267.
Kelleher, R. T., & Fry, W. T. (1962). Stimulus func- tions in chained fixed-interval schedules. Journal of the Experimental Analysis of Behavior, 5, 167-173.
Kendall, S. B. (1972). Some effects of response-depen- dent clock stimuli in a fixed-interval schedule. Journal of the Experimental Analysis of Behavior, 17, 161-168.
Lattal, K. A., & Ziegler, D. R. (1982). Briefly delayed reinforcement: An interresponse time analysis. Journal of the Experimental Analysis of Behavior, 37, 407-416.
Leung, J. P., & Winton, A. S. W. (1985). Preference for unsegmented interreinforcement intervals in con- current chains. Journal of the Experimental Analysis of Behavior, 44, 89-101.
Malagodi, E. F., DeWeese, J., & Johnston, J. M. (1973). Second-order schedules: A comparison of chained, brief-stimulus, and tandem procedures. Jour- nal of the Experimental Analysis of Behavior, 20, 447- 460.
Moore, J. (1982). Choice and segmented interreinforce- ment intervals. Journal of the Experimental Analysis of Behavior, 38, 133-141.
Schneider, B. A. (1969). A two-state analysis of fixed- interval responding in the pigeon. Journal of the Ex- perimental Analysis of Behavior, 12, 677-687.
Schneider, J. W. (1972). Choice between two-compo- nent chained and tandem schedules. Journal of the Ex- perimental Analysis of Behavior, 18, 45-60.
Sizemore, 0. J., & Lattal, K. A. (1977). Dependency, temporal contiguity, and response-independent rein- forcement. Journal of the Experimental Analysis of Be- havior, 27, 119-125.
Sizemore, 0. J., & Lattal, K. A. (1978). Unsignalled delay of reinforcement in variable-interval schedules. Journal of the Experimental Analysis of Behavior, 30, 169-175.
Staddon, J. E. R. (1983). Adaptive learning and behavior. Cambridge: Cambridge University Press.
Thomas, J. R. (1964). Multiple baseline investigations of stimulus functions in an FR chained schedule. Jour- nal of the Experimental Analysis of Behavior, 7, 241- 245.
Thomas, J. R. (1967). Chained and tandem fixed-in- terval schedule performance and frequency of primary reinforcement. Psychological Reports, 20, 471-480.
Wallace, F., Osborne, S., & Fantino, E. (1982). Con- ditioned reinforcement in two-link chain schedules. Behaviour Analysis Letters, 2, 335-344.
Williams, B. A. (1976a). Elicited responding to signals for reinforcement: The effects of overall versus local changes in reinforcement probability. Journal of the Experimental Analysis of Behavior, 26, 213-220.
Williams, B. A. (1976b). The effects of unsignalled de- layed reinforcement. Journal of the Experimental Anal- ysis of Behavior, 26, 441-449.
Received November 7, 1985 Final acceptance October 13, 1986
PAUL ROYALTY, BEN A. WILLIAMS, and EDMUND FANTINO
APPENDIX Response rate (responses/minute) in each component for all conditions of the experiment.
Subject
Condition Link 1 2 3 4 5 6
Baseline Initial Intermediate Terminal
Initial delay Initial Intermediate Terminal
Baseline Initial Intermediate Terminal
Middle delay Initial Intermediate Terminal
Baseline Initial Intermediate Terminal
Terminal delay Initial Intermediate Terminal
39.12 63.08 43.45 52.47 110.68 60.75
123.43 73.65 52.80 7.81 12.78 7.58
46.64 124.50 51.04 137.58 75.66 43.93 35.95 57.96 37.03 48.00 140.49 .62.37 131.63 77.39 48.27 35.57 54.89 34.25 28.04 70.01 11.41
140.57 70.11 29.80 30.32 62.53 38.17 48.72 110.68 58.32
151.63 84.49 50.03 25.11 68.25 39.42 49.40 121.19 49.76 93.34 101.70 15.72
17.56 81.73 72.36 5.53
70.50 68.92 19.62 58.93 72.93 19.23 27.39 57.91 15.86 44.52 66.66 15.91 72.46 51.23
76.52 62.87 132.14 186.88 159.43 141.73 32.67 4.24
151.81 201.79 180.75 193.10 94.84 52.05
161.06 195.31 191.98 178.95 71.83 41.74 78.02 51.80
139.77 178.27 101.07 51.64 133.87 184.52 101.15 203.89 83.26 44.01
126.12 139.34 147.39 109.19