systems analysis

profiledaven9947
02_Chapter3to7MythicalManMonth.pdf

The Surgical Team

3

The Surgical Team

These studies revealed large individual differences between

high and low performers, often by an order of magnitude.

SACKMAN. ERIKSON, AND GRANT

UPI Photo

29

30 The Surgical Team

At computer society meetings one continually hears young pro- gramming managers assert that they favor a small, sharp team of first-class people, rather than a project with hundreds of program-

mers, and those by implication mediocre. So do we all. But this naive statement of the alternatives avoids the hard

problem—how does one build large systems on a meaningful schedule? Let us look at each side of this question in more detail.

The Problem

Programming managers have long recognized wide productivity variations between good programmers and poor ones. But the

actual measured magnitudes have astounded all of us. In one of

their studies, Sackman, Erikson, and Grant were measuring perfor-

mances of a group of experienced programmers. Within just this group the ratios between best and worst performances averaged

about 10:1 on productivity measurements and an amazing 5:1 on

program speed and space measurements! In short the $20,000/year

programmer may well be 10 times as productive as the $10,000/year one. The converse may be true, too. The data showed no correlation whatsoever between experience and per- formance. (I doubt if that is universally true.)

I have earlier argued that the sheer number of minds to be coordinated affects the cost of the effort, for a major part of the

cost is communication and correcting the ill effects of miscom-

munication (system debugging). This, too, suggests that one wants

the system to be built by as few minds as possible. Indeed, most experience with large programming systems shows that the brute- force approach is costly, slow, inefficient, and produces systems

that are not conceptually integrated. OS/360, Exec 8, Scope 6600,

Multics, TSS, SAGE, etc.—the list goes on and on. The conclusion is simple: if a 200-man project has 25 manag-

ers who are the most competent and experienced programmers, fire the 175 troops and put the managers back to programming.

The Problem 31

Now let's examine this solution. On the one hand, it fails to approach the ideal of the small sharp team, which by common consensus shouldn't exceed 10 people. It is so large that it will need

to have at least two levels of management, or about five managers.

It will additionally need support in finance, personnel, space, sec-

retaries, and machine operators.

On the other hand, the original 200-man team was not large enough to build the really large systems by brute-force methods. Consider OS/360, for example. At the peak over 1000 people were working on it—programmers, writers, machine operators, clerks, secretaries, managers, support groups, and so on. From 1963 through 1966 probably 5000 man-years went into its design, con- struction, and documentation. Our postulated 200-man team would have taken 25 years to have brought the product to its present stage, if men and months traded evenly!

This then is the problem with the small, sharp team concept:

it is too slow for really big systems. Consider the OS/360 job as it might be tackled with a small, sharp team. Postulate a 10-man team. As a bound, let them be seven times as productive as medi- ocre programmers in both programming and documentation, be- cause they are sharp. Assume OS/360 was built only by mediocre programmers (which is far from the truth). As a bound, assume that another productivity improvement factor of seven comes

from reduced communication on the part of the smaller team.

Assume the same team stays on the entire job. Well, 5000/(10 X 7X7) = 10; they can do the 5000 man-year job in 10 years. Will the product be interesting 10 years after its initial design? Or will it have been made obsolete by the rapidly developing software technology?

The dilemma is a cruel one. For efficiency and conceptual integrity, one prefers a few good minds doing design and construc- tion. Yet for large systems one wants a way to bring considerable manpower to bear, so that the product can make a timely appear- ance. How can these two needs be reconciled?

32 The Surgical Team

Mills's Proposal

A proposal by Harlan Mills offers a fresh and creative solu- tion.^''' Mills proposes that each segment of a large job be tackled

by a team, but that the team be organized like a surgical team rather than a hog-butchering team. That is, instead of each mem- ber cutting away on the problem, one does the cutting and the others give him every support that will enhance his effectiveness and productivity.

A little thought shows that this concept meets the desiderata, if it can be made to work. Few minds are involved in design and construction, yet many hands are brought to bear. Can it work? Who are the anesthesiologists and nurses on a programming team, and how is the work divided? Let me freely mix metaphors to suggest how such a team might work if enlarged to include all conceivable support.

The surgeon. Mills calls him a chief programmer. He personally defines the functional and performance specifications, designs the

program, codes it, tests it, and writes its documentation. He writes in a structured programming language such as PL/I, and has effec- tive access to a computing system which not only runs his tests but also stores the various versions of his programs, allows easy file

updating, and provides text editing for his documentation. He needs great talent, ten years experience, and considerable systems

and application knowledge, whether in applied mathematics,

business data handling, or whatever.

The copilot. He is the alter ego of the surgeon, able to do any part of the job, but is less experienced. His main function is to share in the design as a thinker, discussant, and evaluator. The surgeon tries ideas on him, but is not bound by his advice. The copilot often represents his team in discussions of function and

interface with other teams. He knows all the code intimately. He researches alternative design strategies. He obviously serves as insurance against disaster to the surgeon. He may even write code, but he is not responsible for any part of the code.

Mills's Proposal 33

The administrator. The surgeon is boss, and he must have the last word on personnel, raises, space, and so on, but he must spend almost none of his time on these matters. Thus he needs a profes- sional administrator who handles money, people, space, and ma- chines, and who interfaces with the administrative machinery of the rest of the organization. Baker suggests that the administrator

has a full-time job only if the project has substantial legal, con-

tractual, reporting, or financial requirements because of the user-

producer relationship. Otherwise, one administrator can serve two teams.

The editor. The surgeon is responsible for generating the docu- mentation—for maximum clarity he must write it. This is true of both external and internal descriptions. The editor, however, takes the draft or dictated manuscript produced by the surgeon and criticizes it, reworks it, provides it with references and bibliogra- phy, nurses it through several versions, and oversees the mechan- ics of production.

Two secretaries. The administrator and the editor will each need a secretary; the administrator's secretary will handle project corre-

spondence and non-product files.

The program clerk. He is responsible for maintaining all the technical records of the team in a programming-product library.

The clerk is trained as a secretary and has responsibility for both machine-readable and human-readable files.

All computer input goes to the clerk, who logs and keys it if required. The output listings go back to him to be filed and in- dexed. The most recent runs of any model are kept in a status notebook; all previous ones are filed in a chronological archive.

Absolutely vital to Mills's concept is the transformation of

programming "from private art to public practice" by making all the computer runs visible to all team members and identifying all programs and data as team property, not private property.

The specialized function of the program clerk relieves pro- grammers of clerical chores, systematizes and ensures proper per-

34 The Surgical Team

formance of those oft-neglected chores, and enhances the team's

most valuable asset—its work-product. Clearly the concept as set forth above assumes batch runs. When interactive terminals are used, particularly those with no hard-copy output, the program

clerk's functions do not diminish, but they change. Now he logs all updates of team program copies from private working copies,

still handles all batch runs, and uses his own interactive facility to control the integrity and availability of the growing product.

The toolsmith. File-editing, text-editing, and interactive debug- ging services are now readily available, so that a team will rarely need its own machine and machine-operating crew. But these services must be available with unquestionably satisfactory re- sponse and reliability; and the surgeon must be sole judge of the adequacy of the service available to him. He needs a toolsmith, responsible for ensuring this adequacy of the basic service and for

constructing, maintaining, and upgrading special tools—mostly interactive computer services—needed by his team. Each team will need its own toolsmith, regardless of the excellence and reliability of any centrally provided service, for his job is to see to the tools

needed or wanted by his surgeon, without regard to any other team's needs. The tool-builder will often construct specialized utilities, catalogued procedures, macro libraries.

The tester. The surgeon will need a bank of suitable test cases for testing pieces of his work as he writes it, and then for testing the whole thing. The tester is therefore both an adversary who devises system test cases from the functional specs, and an assis-

tant who devises test data for the day-by-day debugging. He would also plan testing sequences and set up the scaffolding re- quired for component tests.

The language lawyer. By the time Algol came along, people began to recognize that most computer installations have one or

two people who delight in mastery of the intricacies of a program- ming language. And these experts turn out to be very useful and very widely consulted. The talent here is rather different from that of the surgeon, who is primarily a system designer and who thinks

How It Works 35

representations. The language lawyer can find a neat and efficient way to use the language to do difficult, obscure, or tricky things. Often he will need to do small studies (two or three days) on good

technique. One language lawyer can service two or three surgeons.

This, then, is how 10 people might contribute in well- differentiated and specialized roles on a programming team built

on the surgical model.

How It Works

The team just defined meets the desiderata in several ways. Ten people, seven of them professionals, are at work on the problem, but the system is the product of one mind—or at most two, acting uno animo.

Notice in particular the differences between a team of two programmers conventionally organized and the surgeon-copilot

team. First, in the conventional team the partners divide the work,

and each is responsible for design and implementation of part of

the work. In the surgical team, the surgeon and copilot are each

cognizant of all of the design and all of the code. This saves the

labor of allocating space, disk accesses, etc. It also ensures the

conceptual integrity of the work.

Second, in the conventional team the partners are equal, and

the inevitable differences of judgment must be talked out or com- promised. Since the work and resources are divided, the differ- ences in judgment are confined to overall strategy and interfacing,

but they are compounded by differences of interest—e.g., whose space will be used for a buffer. In the surgical team, there are no differences of interest, and differences of judgment are settled by the surgeon unilaterally. These two differences—lack of division of the problem and the superior-subordinate relationship—make it possible for the surgical team to act uno animo.

Yet the specialization of function of the remainder of the team

is the key to its efficiency, for it permits a radically simpler com- munication pattern among the members, as Fig. 3.1 shows.

36 The Surgical Team

I Secretary

Fig. 3.1 Communication patterns in 10-man programming teams

Baker's article^ reports on a single, small-scale test of the team

concept. It worked as predicted for that case, with phenomenally good results.

Scaling Up

So far, so good. The problem, however, is how to build things that today take 5000 man-years, not things that take 20 or 30. A 10- man team can be effective no matter how it is organized, if the whole job is within its purview. But how is the surgical team concept to be used on large jobs when several hundred people are brought to bear on the task?

The success of the scaling-up process depends upon the fact that the conceptual integrity of each piece has been radically im-

proved—that the number of minds determining the design has

Scaling Up 37

been divided by seven. So it is possible to put 200 people on a problem and face the problem of coordinating only 20 minds,

those of the surgeons.

For that coordination problem, however, separate techniques

must be used, and these are discussed in succeeding chapters. Let it suffice here to say that the entire system also must have concep- tual integrity, and that requires a system architect to design it all,

from the top down. To make that job manageable, a sharp distinc- tion must be made between architecture and implementation, and the system architect must confine himself scrupulously to archi- tecture. However, such roles and techniques have been shown to be feasible and, indeed, very productive.

4 Aristocracy, Democracy,

and System Design

tl

Aristocracy, Democracy,

and System Design

This great church is an incomparable work of art. There is

neither aridity nor confusion in the tenets it sets forth. . . .

// is the zenith of a style, the work of artists who had understood and assimilated all their predecessors ' successes,

in complete possession of the techniques of their times, but

using them without indiscreet display nor gratuitous feats

of skill.

It was Jean d'Orbais who undoubtedly conceived the general plan of the building, a plan which was respected,

at least in its essential elements, by his successors. This is

one of the reasons for the extreme coherence and unity of the edifice.

REIMS CATHEDRAL GUIDEBOOK'

Photographies Emmanuel Boudot-Lamotte

41

42 Aristocracy, Democracy, and System Design

Conceptual Integrity

Most European cathedrals show differences in plan or architec- tural style between parts built in different generations by different builders. The later builders were tempted to ''improve'' upon the designs of the earlier ones, to reflect both changes in fashion and

differences in individual taste. So the peaceful Norman transept abuts and contradicts the soaring Gothic nave, and the result pro-

claims the pridefulness of the builders as much as the glory of God.

Against these, the architectural unity of Reims stands in glori- ous contrast. The joy that stirs the beholder comes as much from the integrity of the design as from any particular excellences. As the guidebook tells, this integrity was achieved by the self-abne- gation of eight generations of builders, each of whom sacrificed some of his ideas so that the whole might be of pure design. The result proclaims not only the glory of God, but also His power to salvage fallen men from their pride.

Even though they have not taken centuries to build, most

programming systems reflect conceptual disunity far worse than that of cathedrals. Usually this arises not from a serial succession

of master designers, but from the separation of design into many tasks done by many men.

I will contend that conceptual integrity is the most important

consideration in system design. It is better to have a system omit

certain anomalous features and improvements, but to reflect one

set of design ideas, than to have one that contains many good but independent and uncoordinated ideas. In this chapter and the next

two, we will examine the consequences of this theme for program- ming system design:

• How is conceptual integrity to be achieved? • Does not this argument imply an elite, or aristocracy of archi-

tects, and a horde of plebeian implementers whose creative talents and ideas are suppressed?

Achieving Conceptual Integrity 43

• How does one keep the architects from drifting off into the blue with unimplementable or costly specifications?

• How does one ensure that every trifling detail of an architec- tural specification gets communicated to the implementer,

properly understood by him, and accurately incorporated into

the product?

Achieving Conceptual Integrity

The purpose of a programming system is to make a computer easy to use. To do this, it furnishes languages and various facilities that are in fact programs invoked and controlled by language features. But these facilities are bought at a price: the external description

of a programming system is ten to twenty times as large as the

external description of the computer system itself. The user finds it far easier to specify any particular function, but there are far

more to choose from, and far more options and formats to remem- ber.

Ease of use is enhanced only if the time gained in functional

specification exceeds the time lost in learning, remembering, and

searching manuals. With modern programming systems this gain does exceed the cost, but in recent years the ratio of gain to cost

seems to have fallen as more and more complex functions have been added. I am haunted by the memory of the ease of use of the IBM 650, even without an assembler or any other software at all.

Because ease of use is the purpose, this ratio of function to

conceptual complexity is the ultimate test of system design. Nei-

ther function alone nor simplicity alone defines a good design.

This point is widely misunderstood. Operating System/360 is

hailed by its builders as the finest ever built, because it indisputa- bly has the most function. Function, and not simplicity, has al-

ways been the measure of excellence for its designers. On the other hand, the Time-Sharing System for the PDP-10 is hailed by its builders as the finest, because of its simpHcity and the spareness

44 Aristocracy, Democracy, and System Design

of its concepts. By any measure, however, its function is not even in the same class as that of OS/360. As soon as ease of use is held up as the criterion, each of these is seen to be unbalanced, reaching for only half of the true goal.

For a given level of function, however, that system is best in

which one can specify things with the most simplicity and straightforwardness. Simplicity is not enough. Mooers's TRAC language and Algol 68 achieve simplicity as measured by the num- ber of distinct elementary concepts. They are not, however, straightforward. The expression of the things one wants to do often requires involuted and unexpected combinations of the basic facil-

ities. It is not enough to learn the elements and rules of combina- tion; one must also learn the idiomatic usage, a whole lore of how the elements are combined in practice. Simplicity and straightfor-

wardness proceed from conceptual integrity. Every part must re- flect the same philosophies and the same balancing of desiderata. Every part must even use the same techniques in syntax and analogous notions in semantics. Ease of use, then, dictates unity

of design, conceptual integrity.

Aristocracy and Democracy

Conceptual integrity in turn dictates that the design must proceed

from one mind, or from a very small number of agreeing resonant minds.

Schedule pressures, however, dictate that system building

needs many hands. Two techniques are available for resolving this dilemma. The first is a careful division of labor between architec- ture and implementation. The second is the new way of structur- ing programming implementation teams discussed in the previous

chapter.

The separation of architectural effort from implementation is a very powerful way of getting conceptual integrity on very large projects. I myself have seen it used with great success on IBM's

Stretch computer and on the System/360 computer product line.

Aristocracy and Democracy 45

I have seen it fail through lack of application on Operating Sys- tem/360.

By the architecture of a system, I mean the complete and de- tailed specification of the user interface. For a computer this is the

programming manual. For a compiler it is the language manual. For

a control program it is the manuals for the language or languages

used to invoke its functions. For the entire system it is the union

of the manuals the user must consult to do his entire job. The architect of a system, like the architect of a building, is

the user's agent. It is his job to bring professional and technical

knowledge to bear in the unalloyed interest of the user, as opposed

to the interests of the salesman, the fabricator, etc.^

Architecture must be carefully distinguished from implemen- tation. As Blaauw has said, ''Where architecture tells what hap- pens, implementation tells how it is made to happen."^ He gives as a simple example a clock, whose architecture consists of the face, the hands, and the winding knob. When a child has learned this architecture, he can tell time as easily from a wristwatch as

from a church tower. The implementation, however, and its real- ization, describe what goes on inside the case—powering by any of many mechanisms and accuracy control by any of many.

In System/360, for example, a single computer architecture is

implemented quite differently in each of some nine models. Con- versely, a single implementation, the Model 30 data flow, memory, and microcode, serves at different times for four different architec-

tures: a System/360 computer, a multiplex channel with up to 224 logically independent subchannels, a selector channel, and a 1401

computer.*

The same distinction is equally applicable to programming systems. There is a U.S. standard Fortran IV. This is the architec-

ture for many compilers. Within this architecture many imple- mentations are possible: text-in-core or compiler-in-core,

fast-compile or optimizing, syntax-directed or ad-hoc. Likewise

any assembler language or job-control language admits of many implementations of the assembler or scheduler.

46 Aristocracy, Democracy, and System Design

Now we can deal with the deeply emotional question of aris- tocracy versus democracy. Are not the architects a new aristocracy, an intellectual elite, set up to tell the poor dumb implementers what to do? Has not all the creative work been sequestered for this elite, leaving the implementers as cogs in the machine? Won't one get a better product by getting the good ideas from all the team, following a democratic philosophy, rather than by restricting the development of specifications to a few?

As to the last question, it is the easiest. I will certainly not contend that only the architects will have good architectural ideas.

Often the fresh concept does come from an implementer or from a user. However, all my own experience convinces me, and I have tried to show, that the conceptual integrity of a system determines

its ease of use. Good features and ideas that do not integrate with a system's basic concepts are best left out. If there appear many such important but incompatible ideas, one scraps the whole sys-

tem and starts again on an integrated system with different basic

concepts.

As to the aristocracy charge, the answer must be yes and no. Yes, in the sense that there must be few architects, their product

must endure longer than that of an implementer, and the architect

sits at the focus of forces which he must ultimately resolve in the

user's interest. If a system is to have conceptual integrity, someone

must control the concepts. That is an aristocracy that needs no

apology.

No, because the setting of external specifications is not more creative work than the designing of implementations. It is just different creative work. The design of an implementation, given an architecture, requires and allows as much design creativity, as many new ideas, and as much technical brilliance as the design of the external specifications. Indeed, the cost-performance ratio of

the product will depend most heavily on the implementer, just as

ease of use depends most heavily on the architect. There are many examples from other arts and crafts that lead

one to believe that discipline is good for art. Indeed, an artist's

What Does the Implementer Do While Waiting? 47

aphorism asserts, 'Torm is liberating/' The worst buildings are those whose budget was too great for the purposes to be served. Bach's creative output hardly seems to have been squelched by the

necessity of producing a limited-form cantata each week. I am sure that the Stretch computer would have had a better architecture had it been more tightly constrained; the constraints imposed by the System/360 Model 30's budget were in my opinion entirely beneficial for the Model 75's architecture.

Similarly, I observe that the external provision of an architec-

ture enhances, not cramps, the creative style of an implementing

group. They focus at once on the part of the problem no one has addressed, and inventions begin to flow. In an unconstrained im-

plementing group, most thought and debate goes into architectural

decisions, and implementation proper gets short shrift.^ This effect, which I have seen many times, is confirmed by

R. W. Conway, whose group at Cornell built the PL/C compiler for the PL/I language. He says, ''We finally decided to implement the language unchanged and unimproved, for the debates about language would have taken all our effort."^

What Does the Implementer Do While Waiting? It is a very humbling experience to make a multimillion-dollar mistake, but it is also very memorable. I vividly recall the night

we decided how to organize the actual writing of external specifi- cations for OS/360. The manager of architecture, the manager of control program implementation, and I were threshing out the

plan, schedule, and division of responsibilities.

The architecture manager had 10 good men. He asserted that they could write the specifications and do it right. It would take ten months, three more than the schedule allowed.

The control program manager had 150 men. He asserted that they could prepare the specifications, with the architecture team

coordinating; it would be well-done and practical, and he could do it on schedule. Furthermore, if the architecture team did it, his 150

men would sit twiddling their thumbs for ten months.

48 Aristocracy, Democracy, and System Design

To this the architecture manager responded that if I gave the control program team the responsibiUty, the result would not in fact be on time, but would also be three months late, and of much lower quality. I did, and it was. He was right on both counts. Moreover, the lack of conceptual integrity made the system far more costly to build and change, and I would estimate that it added a year to debugging time.

Many factors, of course, entered into that mistaken decision; but the overwhelming one was schedule time and the appeal of putting all those 150 implementers to work. It is this siren song

whose deadly hazards I would now make visible. When it is proposed that a small architecture team in fact

write all the external specifications for a computer or a program-

ming system, the implementers raise three objections:

• The specifications will be too rich in function and will not reflect practical cost considerations.

• The architects will get all the creative fun and shut out the inventiveness of the implementers.

• The many implementers will have to sit idly by while the specifications come through the narrow funnel that is the architecture team.

The first of these is a real danger, and it will be treated in the next chapter. The other two are illusions, pure and simple. As we have seen above, implementation is also a creative activity of the

first order. The opportunity to be creative and inventive in imple- mentation is not significantly diminished by working within a given external specification, and the order of creativity may even be enhanced by that discipline. The total product will surely be.

The last objection is one of timing and phasing. A quick an- swer is to refrain from hiring implementers until the specifications

are complete. This is what is done when a building is constructed. In the computer systems business, however, the pace is

quicker, and one wants to compress the schedule as much as possible. How much can specification and building be overlapped?

What Does the Implementer Do While Waiting? 49

As Blaauw points out, the total creative effort involves three distinct phases: architecture, implementation, and realization. It

turns out that these can in fact be begun in parallel and proceed simultaneously.

In computer design, for example, the implementer can start as

soon as he has relatively vague assumptions about the manual,

somewhat clearer ideas about the technology, and well-defined cost and performance objectives. He can begin designing data flows, control sequences, gross packaging concepts, and so on. He devises or adapts the tools he will need, especially the record-

keeping system, including the design automation system.

Meanwhile, at the realization level, circuits, cards, cables,

frames, power supplies, and memories must each be designed, refined, and documented. This work proceeds in parallel with architecture and implementation.

The same thing is true in programming system design. Long before the external specifications are complete, the implementer

has plenty to do. Given some rough approximations as to the function of the system that will be ultimately embodied in the external specifications, he can proceed. He must have well-defined space and time objectives. He must know the system configuration on which his product must run. Then he can begin designing module boundaries, table structures, pass or phase breakdowns, algorithms, and all kinds of tools. Some time, too, must be spent in communicating with the architect.

Meanwhile, on the realization level there is much to be done also. Programming has a technology, too. If the machine is a new one, much work must be done on subroutine conventions, super- visory techniques, searching and sorting algorithms.^

Conceptual integrity does require that a system reflect a single

philosophy and that the specification as seen by the user flow from a few minds. Because of the real division of labor into architecture, implementation, and realization, however, this does not imply that

a system so designed will take longer to build. Experience shows the opposite, that the integral system goes together faster and

50 Aristocracy, Democracy, and System Design

takes less time to test. In effect, a widespread horizontal division

of labor has been sharply reduced by a vertical division of labor, and the result is radically simplified communications and im-

proved conceptual integrity.

5

The Second-System Effect

.•I^Crf

m.

r 11

1

SC?TrT"""W'"( l|ii|l In mTHffl

^ ^A-

^ , (^^ p y

^,>';;xt 1 1

^. :-

1

Hi.

5

The Second-System Effect

Adde parvum parvo magnus acervus erit.

[Add little to little and there will be a big pile. ]

OVID

Turning house for air traffic. Lithograph, Paris, 1882

The Bettman Archive

53

54 The Second-System Effect

If one separates responsibility for functional specification from

responsibility for building a fast, cheap product, what discipline bounds the architect's inventive enthusiasm?

The fundamental answer is thoroughgoing, careful, and sym- pathetic communication between architect and builder. Neverthe-

less there are finer-grained answers that deserve attention.

Interactive Discipline for the Architect

The architect of a building works against a budget, using estimat- ing techniques that are later confirmed or corrected by the con- tractors' bids. It often happens that all the bids exceed the budget.

The architect then revises his estimating technique upward and his design downward for another iteration. He may perhaps suggest to the contractors ways to implement his design more cheaply than they had devised.

An analogous process governs the architect of a computer system or a programming system. He has, however, the advantage of getting bids from the contractor at many early points in his design, almost any time he asks for them. He usually has the disadvantage of working with only one contractor, who can raise or lower his estimates to reflect his pleasure with the design. In

practice, early and continuous communication can give the archi-

tect good cost readings and the builder confidence in the design without blurring the clear division of responsibilities.

The architect has two possible answers when confronted with an estimate that is too high: cut the design or challenge the esti-

mate by suggesting cheaper implementations. This latter is inher- ently an emotion-generating activity. The architect is now challenging the builder's way of doing the builder's job. For it to be successful, the architect must

• remember that the builder has the inventive and creative re- sponsibility for the implementation; so the architect suggests,

not dictates;

Self-Discipline—The Second-System Effect 55

• always be prepared to suggest a way of implementing any- thing he specifies, and be prepared to accept any other way that meets the objectives as well;

• deal quietly and privately in such suggestions;

• be ready to forego credit for suggested improvements.

Normally the builder will counter by suggesting changes to the architecture. Often he is right—some minor feature may have unexpectedly large costs when the implementation is worked out.

Self-Discipline—The Second-System Effect An architect's first work is apt to be spare and clean. He knows he doesn't know what he's doing, so he does it carefully and with great restraint.

As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used ''next time." Sooner or later the first system is finished, and

the architect, with firm confidence and a demonstrated mastery of

that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such sys-

tems, and their differences will identify those parts of his experi-

ence that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. The result, as Ovid says, is a "big pile." For example, consider the IBM 709 architecture, later embodied in the 7090. This is an upgrade, a second system for the very successful and

clean 704. The operation set is so rich and profuse that only about half of it was regularly used.

Consider as a stronger case the architecture, implementation,

and even the realization of the Stretch computer, an outlet for the

56 The Second-System Effect

pent-up inventive desires of many people, and a second system for most of them. As Strachey says in a review:

Iget the impression that Stretch is in some way the end of one line

of development. Like some early computer programs it is immensely

ingenious, immensely complicated, and extremely effective, but some-

how at the same time crude, wasteful, and inelegant, and one feels that there must be a better way of doing things.

^

Operating System/360 was the second system for most of its designers. Groups of its designers came from building the 1410- 7010 disk operating system, the Stretch operating system, the

Project Mercury real-time system, and IBSYS for the 7090. Hardly anyone had experience with two previous operating systems.^ So OS/360 is a prime example of the second-system effect, a Stretch of the software art to which both the commendations and the reproaches of Strachey's critique apply unchanged.

For example, OS/360 devotes 26 bytes of the permanently resident date-turnover routine to the proper handling of Decem- ber 31 on leap years (when it is Day 366). That might have been left to the operator.

The second-system effect has another manifestation some- what different from pure functional embellishment. That is a ten- dency to refine techniques whose very existence has been made obsolete by changes in basic system assumptions. OS/360 has many examples of this.

Consider the linkage editor, designed to load separately-com-

piled programs and resolve their cross-references. Beyond this basic function it also handles program overlays. It is one of the

finest overlay facilities ever built. It allows overlay structuring to

be done externally, at linkage time, without being designed into the source code. It allows the overlay structure to be changed from

run to run without recompilation. It furnishes a rich variety of

useful options and facilities. In a sense it is the culmination of

years of development of static overlay technique.

Self-Discipline—The Second-System Effect 57

Yet it is also the last and finest of the dinosaurs, for it belongs

to a system in which multiprogramn\ing is the normal mode and dynamic core allocation the basic assumption. This is in direct

conflict with the notion of using static overlays. How much better the system would work if the efforts devoted to overlay manage- ment had been spent on making the dynamic core allocation and the dynamic cross-referencing facilities really fast!

Furthermore, the linkage editor requires so much space and itself contains many overlays that even when it is used just for linkage without overlay management, it is slower than most of the

system compilers. The irony of this is that the purpose of the linker is to avoid recompilation. Like a skater whose stomach gets ahead of his feet, refinement proceeded until the system assump-

tions had been quite outrun.

The TESTRAN debugging facility is another example of this tendency. It is the culmination of batch debugging facilities, fur-

nishing truly elegant snapshot and core dump capabilities. It uses the control section concept and an ingenious generator technique

to allow selective tracing and snapshotting without interpretive

overhead or recompilation. The imaginative concepts of the Share Operating System^ for the 709 have been brought to full bloom.

Meanwhile, the whole notion of batch debugging without

recompilation was becoming obsolete. Interactive computing sys- tems, using language interpreters or incremental compilers have

provided the most fundamental challenge. But even in batch sys-

tems, the appearance of fast-compile/slow-execute compilers has

made source-level debugging and snapshotting the preferred tech- nique. How much better the system would have been if the TES- TRAN effort had been devoted instead to building the interactive and fast-compile facilities earlier and better!

Yet another example is the scheduler, which provides truly excellent facilities for managing a fixed-batch job stream. In a real

sense, this scheduler is the refined, improved, and embellished

second system succeeding the 1410-7010 Disk Operating System,

58 The Second-System Effect

a batch system unmultiprogrammed except for input-output and intended chiefly for business appHcations. As such, the OS/360 scheduler is good. But it is almost totally uninfluenced by the

OS/360 needs of remote job entry, multiprogramming, and per- manently resident interactive subsystems. Indeed, the scheduler's

design makes these hard. How does the architect avoid the second-system effect? Well,

obviously he can't skip his second system. But he can be conscious

of the peculiar hazards of that system, and exert extra self-disci-

pline to avoid functional ornamentation and to avoid extrapola-

tion of functions that are obviated by changes in assumptions and purposes.

A discipline that will open an architect's eyes is to assign each little function a value: capability x is worth not more than m bytes of memory and n microseconds per invocation. These values will guide initial decisions and serve during implementation as a guide

and warning to all.

How does the project manager avoid the second-system effect? By insisting on a senior architect who has at least two systems under his belt. Too, by staying aware of the special temp- tations, he can ask the right questions to ensure that the philo-

sophical concepts and objectives are fully reflected in the detailed

design.

6

Passing the Ward

i# d: :M 0' M 'P^ ^ . ^IS

(C

O- m^ «:. Q :^

Ov ^ o o & a ^ o m. a m.

^ 0, o 1:1 o

pw^m'^ :*^^. 'la

o o a 43 c3 a

6

Passing the Word

He'll sit here and he'll say, ''Do this! Do that!" And nothing will happen.

HARRYS. TRUMAN, ON PRESIDENTIAL POWER'

"The Seven Trumpets" from The Wells Apocalypse, 14th century The Bettman Archive

61

62 Passing the Word

Assuming that he has the disciplined, experienced architects and that there are many implementers, how shall the manager ensure that everyone hears, understands, and implements the architects'

decisions? How can a group of 10 architects maintain the concep- tual integrity of a system which 1000 men are building? A whole technology for doing this was worked out for the System/360 hardware design effort, and it is equally applicable to software

projects.

Written Specifications—the Manual The manual, or written specification, is a necessary tool, though not a sufficient one. The manual is the external specification of the product. It describes and prescribes every detail of what the user sees. As such, it is the chief product of the architect.

Round and round goes its preparation cycle, as feedback from users and implementers shows where the design is awkward to use or build. For the sake of implementers it is important that the

changes be quantized—that there be dated versions appearing on a schedule.

The manual must not only describe everything the user does see, including all interfaces; it must also refrain from describing what the user does not see. That is the implementer's business, and there his design freedom must be unconstrained. The architect must always be prepared to show an implementation for any feature he describes, but he must not attempt to dictate the imple- mentation.

The style must be precise, full, and accurately detailed. A user will often refer to a single definition, so each one must repeat all the essentials and yet all must agree. This tends to make manuals dull reading, but precision is more important than liveliness.

The unity of System/360's Principles of Operation springs from the fact that only two pens wrote it: Gerry Blaauw's and Andris Padegs'. The ideas are those of about ten men, but the casting of those decisions into prose specifications must be done by only one

Formal Definitions 63

or two, if the consistency of prose and product is to be maintained.

For the writing of a definition will necessitate a host of mini-

decisions which are not of full-debate importance. An example in System/360 is the detail of how the Condition Code is set after each operation. Not trivial, however, is the principle that such

mini-decisions be made consistently throughout. I think the finest piece of manual writing I have ever seen is

Blaauw's Appendix to System/360 Principles of Operation. This de- scribes with care and precision the limits of System/360 compati-

bility. It defines compatibility, prescribes what is to be achieved, and enumerates those areas of external appearance where the ar-

chitecture is intentionally silent and where results from one model

may differ from those of another, where one copy of a given model may differ from another copy, or where a copy may differ even from itself after an engineering change. This is the level of preci-

sion to which manual writers aspire, and they must define what is not prescribed as carefully as what is.

Formal Definitions

English, or any other human language, is not naturally a precision instrument for such definitions. Therefore the manual writer must

strain himself and his language to achieve the precision needed.

An attractive alternative is to use a formal notation for such defini- tions. After all, precision is the stock in trade, the raison d'etre of

formal notations.

Let us examine the merits and weaknesses of formal defini-

tions. As noted, formal definitions are precise. They tend to be complete; gaps show more conspicuously, so they are filled sooner. What they lack is comprehensibility. With English prose one can show structural principles, delineate structure in stages or levels, and give examples. One can readily mark exceptions and empha- size contrasts. Most important, one can explain why. The formal definitions put forward so far have inspired wonder at their ele- gance and confidence in their precision. But they have demanded

64 Passing the Word

prose explanations to make their content easy to learn and teach. For these reasons, I think we will see future specifications to con- sist of both a formal definition and a prose definition.

An ancient adage warns, ''Never go to sea with two chronom- eters; take one or three.'' The same thing clearly applies to prose and formal definitions. If one has both, one must be the standard, and the other must be a derivative description, clearly labeled as such. Either can be the primary standard. Algol 68 has a formal

definition as standard and a prose definition as descriptive. PL/I

has the prose as standard and the formal description as derivative.

System/360 also has prose as standard with a derived formal de-

scription.

Many tools are available for formal definition. The Backus- Naur Form is familiar for language definition, and it is amply discussed in the literature.^ The formal description of PL/I uses new notions of abstract syntax, and it is adequately described.^ Iverson's APL has been used to describe machines, most notably the IBM 7090^ and System/360.^

Bell and Newell have proposed new notations for describing both configurations and machine architectures, and they have il-

lustrated these with several machines, including the DEC PDP-8,® the 7090,^ and System/360.''

Almost all formal definitions turn out to embody or describe an implementation of the hardware or software system whose externals they are prescribing. Syntax can be described without

this, but semantics are usually defined by giving a program that carries out the defined operation. This is of course an implementa-

tion, and as such it over-prescribes the architecture. So one must take care to indicate that the formal definition applies only to

externals, and one must say what these are. Not only is a formal definition an implementation, an imple-

mentation can serve as a formal definition. When the first compat- ible computers were built, this was exactly the technique used. The new machine was to match an existing machine. The manual was vague on some points? "Ask the machine!" A test program

Formal Definitions 65

would be devised to determine the behavior, and the new machine would be built to match.

A programmed simulator of a hardware or software system can serve in precisely the same way. It is an implementation; it runs. So all questions of definition can be resolved by testing it.

Using an implementation as a definition has some advantages. All questions can be settled unambiguously by experiment. De- bate is never needed, so answers are quick. Answers are always as precise as one wants, and they are always correct, by definition. Opposed to these one has a formidable set of disadvantages. The implementation may over-prescribe even the externals. Invalid syntax always produces some result; in a policed system that result is an invalidity indication and nothing more. In an unpoliced system

all kinds of side effects may appear, and these may have been used by programmers. When we undertook to emulate the IBM 1401 on System/360, for example, it developed that there were 30 different ''curios''—side effects of supposedly invalid operations

that had come into widespread use and had to be considered as part of the definition. The implementation as a definition overpre- scribed; it not only said what the machine must do, it also said a great deal about how it had to do it.

Then, too, the implementation will sometimes give unex-

pected and unplanned answers when sharp questions are asked, and the de facto definition will often be found to be inelegant in

these particulars precisely because they have never received any

thought. This inelegance will often turn out to be slow or costly

to duplicate in another implementation. For example, some ma- chines leave trash in the multiplicand register after a multiplica-

tion. The precise nature of this trash turns out to be part of the de facto definition, yet duplicating it may preclude the use of a faster multiplication algorithm.

Finally, the use of an implementation as a formal definition is

peculiarly susceptible to confusion as to whether the prose de-

scription or the formal description is in fact the standard. This is

especially true of programmed simulations. One must also refrain

66 Passing the Word

from modifications to the implementation while it is serving as a standard.

Direct Incorporation

A lovely technique for disseminating and enforcing definitions, is available for the software system architect. It is especially useful

for establishing the syntax, if not the semantics, of intermodule

interfaces. This technique is to design the declaration of the passed

parameters or shared storage, and to require the implementations

to include that declaration via a compile-time operation (a macro

or a % INCLUDE in PL/I). If, in addition, the whole interface is referenced only by symbolic names, the declaration can be changed by adding or inserting new variables with only recompi- lation, not alteration, of the using program.

Conferences and Courts

Needless to say, meetings are necessary. The hundreds of man-to- man consultations must be supplemented by larger and more for- mal gatherings. We found two levels of these to be useful. The first is a weekly half-day conference of all the architects, plus official

representatives of the hardware and software implementers, and

the market planners. The chief system architect presides. Anyone can propose problems or changes, but proposals are

usually distributed in writing before the meeting. A new problem is usually discussed a while. The emphasis is on creativity, rather than merely decision. The group attempts to invent many solu- tions to problems, then a few solutions are passed to one or more of the architects for detailing into precisely worded manual change proposals.

Detailed change proposals then come up for decisions. These have been circulated and carefully considered by implementers and users, and the pros and cons are well delineated. If a consensus emerges, well and good. If not, the chief architect decides. Minutes

Conferences and Courts 67

are kept and decisions are formally, promptly, and widely dis- seminated.

Decisions from the weekly conferences give quick results and

allow work to proceed. If anyone is too unhappy, instant appeals to the project manager are possible, but this happens very rarely.

The fruitfulness of these meetings springs from several sources:

1. The same group—architects, users, and implementers—meets weekly for months. No time is needed for bringing people up to date.

2. The group is bright, resourceful, well versed in the issues, and deeply involved in the outcome. No one has an ''advisory" role. Everyone is authorized to make binding commitments.

3. When problems are raised, solutions are sought both within and outside the obvious boundaries.

4. The formality of written proposals focuses attention, forces decision, and avoids committee-drafted inconsistencies.

5. The clear vesting of decision-making power in the chief archi- tect avoids compromise and delay.

As time goes by, some decisions don't wear well. Some minor matters have never been wholeheartedly accepted by one or an- other of the participants. Other decisions have developed unfore-

seen problems, and sometimes the weekly meeting didn't agree to

reconsider these. So there builds up a backlog of minor appeals, open issues, or disgruntlements. To settle these we held annual supreme court sessions, lasting typically two weeks. (I would hold them every six months if I were doing it again.)

These sessions were held just before major freeze dates for the

manual. Those present included not only the architecture group and the programmers' and implementers' architectural representa-

tives, but also the managers of programming, marketing, and im-

plementation efforts. The System/360 project manager presided. The agenda typically consisted of about 200 items, mostly minor, which were enumerated in charts placarded around the room. All

68 Passing the Word

sides were heard and decisions made. By the miracle of computer- ized text editing (and lots of fine staff work), each participant

found an updated manual, embodying yesterday's decisions, at his seat every morning.

These ''fall festivals" were useful not only for resolving deci-

sions, but also for getting them accepted. Everyone was heard, everyone participated, everyone understood better the intricate

constraints and interrelationships among decisions.

Multiple Implementations

System/360 architects had two almost unprecedented advantages: enough time to work carefully, and political clout equal to that of the implementers. The provision of enough time came from the schedule of the new technology; the political equality came from the simultaneous construction of multiple implementations. The necessity for strict compatibility among these served as the best possible enforcing agent for the specifications.

In most computer projects there comes a day when it is discov- ered that the machine and the manual don't agree. When the confrontation follows, the manual usually loses, for it can be changed far more quickly and cheaply than the machine. Not so, however, when there are multiple implementations. Then the de- lays and costs associated with fixing the errant machine can be

overmatched by delays and costs in revising the machines that

followed the manual faithfully. This notion can be fruitfully applied whenever a programming

language is being defined. One can be certain that several inter- preters or compilers will sooner or later have to be built to meet

various objectives. The definition will be cleaner and the discipline tighter if at least two implementations are built initially.

The Telephone Log

As implementation proceeds, countless questions of architectural interpretation arise, no matter how precise the specification. Obvi-

Product Test 69

ously many such questions require amplifications and clarifica- tions in the text. Others merely reflect misunderstandings.

It is essential, however, to encourage the puzzled implementer

to telephone the responsible architect and ask his question, rather

than to guess and proceed. It is just as vital to recognize that the

answers to such questions are ex cathedra architectural pronounce-

ments that must be told to everyone.

One useful mechanism is a telephone log kept by the architect. In it he records every question and every answer. Each week the logs of the several architects are concatenated, reproduced, and

distributed to the users and implementers. While this mechanism is quite informal, it is both quick and comprehensive.

Product Test

The project manager's best friend is his daily adversary, the inde- pendent product-testing organization. This group checks ma- chines and programs against specifications and serves as a devil's

advocate, pinpointing every conceivable defect and discrepancy.

Every development organization needs such an independent tech-

nical auditing group to keep it honest.

In the last analysis the customer is the independent auditor.

In the merciless light of real use, every flaw will show. The prod- uct-testing group then is the surrogate customer, specialized for

finding flaws. Time after time, the careful product tester will find places where the word didn't get passed, where the design deci- sions were not properly understood or accurately implemented.

For this reason such a testing group is a necessary link in the chain

by which the design word is passed, a link that needs to operate early and simultaneously with design.

7

Why Did theTower ofBabel Fail?

7

Why Did theTower ofBabel Fail?

Now the whole earth used only one language, with few words. On the occasion of a migration from the east, men discovered a plain in the land of Shinar, and settled there.

Then they said to one another, ''Come, let us make bricks,

burning them well. " So they used bricks for stone, and

bitumen for mortar. Then they said, "Come, let us build

ourselves a city with a tower whose top shall reach the

heavens (thus making a name for ourselves), so that we may not be scattered all over the earth. " Then the Lord came down to look at the city and tower which human beings had built. The Lord said, "They are just one people,

and they all have the same language. If this is what they

can do as a beginning, then nothing that they resolve to do

will be impossible for them. Come, let us go down, and

there make such a babble of their language that they will

not understand one another's speech. " Thus the Lord

dispersed them from there all over the earth, so that they

had to stop building the city.

GENESIS lJ:l-8

P. Breughel, the Elder, "Turmbau zu Babel," 1563 Kunsthistorisches Museum, Vienna

73

74 Why Did The Tower of Babel Fail?

A Management Audit of the Babel Project

According to the Genesis account, the tower of Babel was man's second major engineering undertaking, after Noah's ark. Babel

was the first engineering fiasco. The story is deep and instructive on several levels. Let us,

however, examine it purely as an engineering project, and see what management lessons can be learned. How well was their project equipped with the prerequisites for success? Did they have:

1. A clear mission? Yes, although naively impossible. The project failed long before it ran into this fundamental limitation.

2. Manpower? Plenty of it. 3. Materials? Clay and asphalt are abundant in Mesopotamia.

4. Enough time? Yes, there is no hint of any time constraint. 5. Adequate technology? Yes, the pyramidal or conical structure

is inherently stable and spreads the compressive load well.

Clearly masonry was well understood. The project failed be- fore it hit technological limitations.

Well, if they had all of these things, why did the project fail? Where did they lack? In two respects

communication, and its con-

sequent, organization. They were unable to talk with each other; hence they could not coordinate. When coordination failed, work ground to a halt. Reading between the lines we gather that lack of communication led to disputes, bad feelings, and group jeal-

ousies. Shortly the clans began to move apart, preferring isolation to wrangling.

Communication in the Large Programming Project

So it is today. Schedule disaster, functional misfits, and system

bugs all arise because the left hand doesn't know what the right hand is doing. As work proceeds, the several teams slowly change the functions, sizes, and speeds of their own programs, and they explicitly or implicitly change their assumptions about the inputs

available and the uses to be made of the outputs.

The Project Workbook 75

For example, the implementer of a program-overlaying func-

tion may run into problems and reduce speed, relying on statistics that show how rarely this function will arise in application pro- grams. Meanwhile, back at the ranch, his neighbor may be design- ing a major part of the supervisor so that it critically depends upon the speed of this function. This change in speed itself becomes a major specification change, and it needs to be proclaimed abroad

and weighed from a system point of view.

How, then, shall teams communicate with one another? In as many ways as possible.

• Informally. Good telephone service and a clear definition of intergroup dependencies will encourage the hundreds of calls

upon which common interpretation of written documents de- pends.

• Meetings. Regular project meetings, with one team after an-

other giving technical briefings, are invaluable. Hundreds of minor misunderstandings get smoked out this way.

• Workbook. A formal project workbook must be started at the beginning. This deserves a section by itself.

The Project Workbook

What. The project workbook is not so much a separate docu- ment as it is a structure imposed on the documents that the project will be producing anyway.

All the documents of the project need to be part of this struc-

ture. This includes objectives, external specifications, interface

specifications, technical standards, internal specifications, and ad-

ministrative memoranda.

Why. Technical prose is almost immortal. If one examines the genealogy of a customer manual for a piece of hardware or soft- ware, one can trace not only the ideas, but also many of the very sentences and paragraphs back to the first memoranda proposing the product or explaining the first design. For the technical writer,

the paste-pot is as mighty as the pen.

76 Why Did The Tower of Babel Fail?

Since this is so, and since tomorrow's product-quality manuals

will grow from today's memos, it is very important to get the structure of the documentation right. The early design of the project workbook ensures that the documentation structure itself is crafted, not haphazard. Moreover, the establishment of a struc-

ture molds later writing into segments that fit into that structure.

The second reason for the project workbook is control of the distribution of information. The problem is not to restrict infor- mation, but to ensure that relevant information gets to all the

people who need it. The first step is to number all memoranda, so that ordered lists

of titles are available and each worker can see if he has what he wants. The organization of the workbook goes well beyond this to establish a tree-structure of memoranda. The tree-structure allows distribution hsts to be maintained by subtree, if that is desirable.

Mechanics. As with so many programming management prob- lems, the technical memorandum problem gets worse nonlinearly as size increases. With 10 people, documents can simply be num- bered. With 100 people, several linear sequences will often suffice. With 1000, scattered inevitably over several physical locations, the need for a structured workbook increases and the size of the work- book increases. How then shall the mechanics be handled?

I think this was well done on the OS/360 project. The need for a well-structured workbook was strongly urged by O. S. Locken, who had seen its effectiveness on his previous project, the 1410-7010 operating system.

We quickly decided that each programmer should see all the material, i.e., should have a copy of the workbook in his own office.

Of critical importance is timely updating. The workbook must be current. This is very difficult to do if whole documents must be retyped for changes. In a looseleaf book, however, only pages need

to be changed. We had available a computer-driven text-editing system, and this proved invaluable for timely maintenance. Offset

The Project Workbook 11

masters were prepared directly on the computer printer, and turnaround time was less than a day. The recipient of all these updated pages has an assimilation problem, however. When he first receives a changed page, he wants to know, ''What has been changed?'' When he later consults it, he wants to know, "What is the definition today?"

The latter need is met by the continually maintained docu- ment. Highlighting of changes requires other steps. First, one must mark changed text on the page, e.g., by a vertical bar in the margin alongside every altered line. Second, one needs to distribute with

the new pages a short, separately written change summary that lists the changes and remarks on their significance.

Our project had not been under way six months before we hit another problem. The workbook was about five feet thick! If we had stacked up the 100 copies serving programmers in our offices in Manhattan's Time-Life Building, they would have towered above the building itself. Furthermore, the daily change distribu-

tion averaged two inches, some 150 pages to be interfiled in the whole. Maintenance of the workbook began to take a significant time from each workday.

At this point we switched to microfiche, a change that saved a million dollars, even allowing for the cost of a microfiche reader

for each office. We were able to arrange excellent turnaround on microfiche production; the workbook shrank from three cubic feet to one-sixth of a cubic foot and, most significantly, updates ap-

peared in hundred-page chunks, reducing the interfiling problem

a hundredfold.

Microfiche has its drawbacks. From the manager's point of view the awkward interfiling of paper pages ensured that the changes were read, which was the purpose of the workbook. Mi- crofiche would make workbook maintenance too easy, unless the update fiche are distributed with a paper document enumerating the changes.

Also, a microfiche cannot readily be highlighted, marked, and

commented by the reader. Documents with which the reader has

78 Why Did The Tower of Babel Fail?

interacted are more effective for the author and more useful for the reader.

On balance I think the microfiche was a very happy mecha- nism, and 1 would recommend it over a paper workbook for very large projects.

How would one do it today? With today's system technology available, I think the technique of choice is to keep the workbook on the direct-access file, marked with change bars and revision dates. Each user would consult it from a display terminal (type- writers are too slow). A change summary, prepared daily, would be stored in LIFO fashion at a fixed access point. The programmer would probably read that daily, but if he missed a day he would need only read longer the next day. As he read the change sum- mary, he could interrupt to consult the changed text itself.

Notice that the workbook itself is not changed. It is still the assemblage of all project documentation, structured according to

a careful design. The only change is in the mechanics of distribu- tion and consultation. D. C. Engelbart and his colleagues at the

Stanford Research Institute have built such a system and are using

it to build and maintain documentation for the ARPA network. D. L. Parnas of Carnegie-Mellon University has proposed a

still more radical solution.^ His thesis is that the programmer is most effective if shielded from, rather than exposed to the details

of construction of system parts other than his own. This presup-

poses that all interfaces are completely and precisely defined.

While that is definitely sound design, dependence upon its perfect accomplishment is a recipe for disaster. A good information sys- tem both exposes interface errors and stimulates their correction.

Organization in the Large Programming Project

If there are n workers on a project, there are (n^-n)/2 interfaces across which there may be communication, and there are poten- tially almost 2" teams within which coordination must occur. The purpose of organization is to reduce the amount of communication

Organization in the Large Programming Project 79

and coordination necessary; hence organization is a radical attack

on the communication problems treated above.

The means by which communication is obviated are division of labor and specialization offunction. The tree-like structure of orga- nizations reflects the diminishing need for detailed communica- tion when division and specialization of labor are applied.

In fact, a tree organization really arises as a structure of au-

thority and responsibility. The principle that no man can serve two masters dictates that the authority structure be tree-like. But

the communication structure is not so restricted, and the tree is a barely passable approximation to the communication structure,

which is a network. The inadequacies of the tree approximation give rise to staff groups, task forces, committees, and even the

matrix-type organization used in many engineering laboratories. Let us consider a tree-like programming organization, and

examine the essentials which any subtree must have in order to be effective. They are:

1. a mission

2. a producer

3. a technical director or architect

4. a schedule

5. a division of labor

6. interface definitions among the parts

All of this is obvious and conventional except the distinction

between the producer and the technical director. Let us first con-

sider the two roles, then their relationship. What is the role of the producer? He assembles the team,

divides the work, and establishes the schedule. He acquires and keeps on acquiring the necessary resources. This means that a major part of his role is communication outside the team, upwards

and sideways. He establishes the pattern of communication and reporting within the team. Finally, he ensures that the schedule is

met, shifting resources and organization to respond to changing

circumstances.

80 Why Did The Tower of Babel Fail?

How about the technical director? He conceives of the design to be built, identifies its subparts, specifies how it will look from outside, and sketches its internal structure. He provides unity and conceptual integrity to the whole design; thus he serves as a limit

on system complexity. As individual technical problems arise, he invents solutions for them or shifts the system design as required. He is, in Al Capp's lovely phrase, ''inside-man at the skunk works.'' His communications are chiefly within the team. His

work is almost completely technical. Now it is clear that the talents required for these two roles are

quite different. Talents come in many different combinations; and the particular combination embodied in the producer and the di- rector must govern the relationship between them. Organizations must be designed around the people available; not people fitted into pure-theory organizations.

Three relationships are possible, and all three are found in

successful practice.

The producer and the technical director may be the same man. This is readily workable on very small teams, perhaps three to six

programmers. On larger projects it is very rarely workable, for two reasons. First, the man with strong management talent and strongs technical talent is rarely found. Thinkers are rare; doers are rarer;

and thinker-doers are rarest. ^

Second, on the larger project each of the roles is necessarily a

full-time job, or more. It is hard for the producer to delegate

enough of his duties to give him any technical time. It is impossi- ble for the director to delegate his without compromising the

conceptual integrity of the design.

The producer may be boss, the director his right-hand man. The difficulty here is to establish the director's authoriti/ to make technical decisions without impacting his time as would putting him in the management chain-of-command.

Obviously the producer must proclaim the director's technical authority, and he must back it in an extremely high proportion of

Organization in the Large Programming Project 81

the test cases that will arise. For this to be possible, the producer

and the director must see alike on fundamental technical philoso-

phy; they must talk out the main technical issues privately, before

they really become timely; and the producer must have a high respect for the director's technical prowess.

Less obviously, the producer can do all sorts of subtle things

with the symbols of status (office size, carpet, furnishing, carbon

copies, etc.) to proclaim that the director, although outside the

management line, is a source of decision power. This can be made to work very effectively. Unfortunately it

is rarely tried. The job done least well by project managers is to utilize the technical genius who is not strong on management talent.

The director may be boss, and the producer his right-hand man. Robert Heinlein, in The Man Who Sold the Moon, describes such an arrangement in a graphic for-instance:

Coster buried his face in his hands, then looked up. "I know it. I know

what needs to be done—but every time I try to tackle a technical problem some bloody fool wants me to make a decision about trucks —or telephones—or some damn thing. I'm sorry, Mr. Harriman. I thought I could do it.

"

Harriman said very gently, "Don 't let it throw you. Bob. You

haven 't had much sleep lately, have you? Tell you what—we'll put over a fast one on Ferguson. Til take that desk you 're at for a few

days and build you a set-up to protect you against such things. I want

that brain ofyours thinking about reaction vectors andfuel efficiencies

and design stresses, not about contracts for trucks. "Harriman stepped

to the door, looked around the outer office and spotted a man who might or might not be the office's chief clerk. "Hey you! C'mere.

"

The man looked startled, got up, came to the door and said, "Yes?"

"I want that desk in the corner and all the stuff that's on it moved

to an empty office on this floor, right away. "

82 Why Did The Tower of Babel Fail?

He supervised getting Coster and his other desk moved into another office, saw to it that the phone in the new office was disconnected, and,

as an afterthought, had a couch moved in there, too. "We'll install

a projector, and a drafting machine and bookcases and other junk like

that tonight, " he told Coster. "Just make a list of anything you need —to work on engineering/' He went hack to the nominal chief

-

engineer 's office and got happily to work trying to figure where the

organization stood and what was wrong with it.

Some four hours later he took Berkeley in to meet Coster. The chief

engineer was asleep at his desk, head cradled on his arms. Harriman

started to hack out, hut Coster roused. "Oh! Sorry, " he said, blush-

ing, "I must have dozed off. "

"TTiat's why I brought you the couch, " said Harriman. "It's more restful. Bob, meet Jock Berkeley. He 's your new slave. You remain chiefengineer and top, undisputed boss. Jock is Lord High Everything

Else. From now on you 've got absolutely nothing to worry about— except for the little detail of building a A/foon ship.

"

They shook hands. "Just one thing I ask, Mr. Coster, "Berkeley said seriously, "bypass me all you want to—you'll have to run the technical show—butfor God 's sake record it so Ell know what s going on. Em going to have a switch placed on your desk that will operate a sealed recorder at my desk.

"

'Tine!" Coster was looking, Harriman thought, younger already.

"And ifyou want something that is not technical, don 't do it yourself. Just flip a switch and whistle; it'll get done!" Berkeley glanced at

Harriman. "Ehe Boss says he wants to talk with you about the real

job. Ell leave you and get busy. " He left.

Harriman sat down; Coster followed suit and said, "Whew!"

"Feel better?"

"I like the looks of that fellow Berkeley. "

Organization in the Large Programming Project 83

"That's good; he's your hvin brother from now on. Stop worrying; I've used him before. You 'II think you 're living in a well-run hospi-

tal. "2

This account hardly needs any analytic commentary. This arrangement, too, can be made to work effectively.

I suspect that the last arrangement is best for small teams, as

discussed in Chapter 3, 'The Surgical Team.'' I think the producer as boss is a more suitable arrangement for the larger subtrees of a really big project.

The Tower of Babel was perhaps the first engineering fiasco, but it was not the last. Communication and its consequent, orga- nization, are critical for success. The techniques of communication and organization demand from the manager much thought and as much experienced competence as the software technology itself.