c Programming with pthreads
Programming with POSIX Threads
David R. Butenhof
ADDISON-WESLEY
An Imprint of Addison Wesley Longman, Inc.
Reading, Massachusetts · Harlow, England · Menlo Park, California
Berkeley, California · Don Mills, Ontario · Sydney
Bonn · Amsterdam · Tokyo · Mexico City
Trademark acknowledgments:
UNIX is a registered trademark in the United States and other countries, licensed exclusively
through X/Open Company Ltd. Digital, DEC, Digital UNIX, DECthreads, VMS, and OpenVMS
are trademarks of Digital Equipment Corporation. Solaris, SPARC, SunOS, and Sun are
trademarks of Sun Microsystems Incorporated. SGI and IRIX are trademarks of Silicon Graphics,
Incorporated. HP-UX is a trademark of Hewlett-Packard Company. AIX, IBM, and OS/2 are
trademarks or registered trademarks of the IBM Corporation. X/Open is a trademark of X/Open
Company Ltd. POSIX is a registered trademark of the Institute of Electrical and Electronics
Engineers, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book and Addison-Wesley was
aware of a trademark claim, the designations have been printed in initial caps or all caps.
The authors and publishers have taken care in the preparation of this book, but make no expressed
or implied warranty of any kind and assume no responsibility for errors or omissions. No liability
is assumed for incidental or consequential damages in connection with or arising out of the use of
the information or programs contained herein.
The publisher offers discounts on this book when ordered in quantity for special sales.
For more information, please contact:
Corporate & Professional Publishing Group
Addison-Wesley Publishing Company
One Jacob Way
Reading, Massachusetts 01867
Library of Congress Cataloging-in-Publication Data
Butenhof, David R., 1956-
Programming with POSIX threads / David R. Butenhof. p. cm. -- (Addison-Wesley
professional computing series). Includes bibliographical references and index.
ISBN 0-201-63392-2 (pbk.)
1. Threads (Computer programs) 2. POSIX (Computer software standard) 3. Electronic
digital computers--Programming. I. Title. II. Series.
QA76.76.T55B88 1997
005.4'32--dc21
97-6635
CIP
Copyright 1997 by Addison Wesley Longman, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form, or by any means, electronic, mechanical. photocopying, recording, or
otherwise, without the prior consent of the publisher. Printed in the United States of America.
Published simultaneously in Canada.
Text printed on recycled and acid-free paper.
23456789 MA 00999897
2nd Printing October, 1997
To Anne, Amy,
and
Alyssa.
Quote acknowledgment:
American Heritage Dictionary of the English Language: page 1.
ISO/IEC 9945-1:1996, ?1996 by IEEE: page 29.
�Lewis Carroll, Alice's Adventures in Wonderland: pages xv, 47, 70, 88, 97, 98, 106, 131, 142,
161, 189, 197. Reproduced by permission of Macmillan Children's Books.
Lewis Carroll, Through the Looking-Glass: pages 1, 4, 8, 20, 25, 29, 35, 45, 172, 214, 241,283,
290, 302. Reproduced by permission of Macmillan Children's Books.
Lewis Carroll, The Hunting of the Snark pages 3, 13, 28, 39, 120, 131, 134, 289, 367. Reproduced
by permission of Macmillan Children's Books.
Preface
_______________________________________________________________________________
The White Rabbit put on his spectacles,
"Where shall I begin, please your Majesty?" he asked.
"Begin at the beginning," the King said, very gravely,
"and go on till you come to the end: then stop."
--Lewis Carroll, Alice's Adventures in Wonderland
This book is about "threads" and how to use them. “Thread” is just a name for a basic
software "thing" that can do work on a computer. A thread is smaller, faster, and more
maneuverable than a traditional process. In fact, once threads have been added to an operating
system, a "process" becomes just data--address space, files, and so forth--plus one or more threads
that do something with all that data.
With threads, you can build applications that utilize system resources more efficiently, that
are more friendly to users, that run blazingly fast on multiprocessors, and that may even be easier
to maintain. To accomplish all this, you need only add some relatively simple function calls to
your code, adjust to a new way of thinking about programming, and leap over a few yawning
chasms. Reading this book carefully will, I hope, help you to accomplish all that without losing
your sense of humor.
The threads model used in this book is commonly called "Pthreads," or "POSIX threads." Or,
more formally (since you haven't yet been properly introduced), the POSIX 1003. lc-1995
standard. I'll give you a few other names later--but for now, "Pthreads" is all you need to worry
about.
Pthreads interfaces are included with Sun's Solaris; Hewlett-Packard's Tru64 UNIX,
OpenVMS, NonStop platform, and HP-UX; IBM's AIX, OS/400, and OS/390; SGI's IRIX; SCO's
UnixWare; Apple's Mac OS X; and Linux (any major distribution). There's even an Open Source
emulation package that allows you to use portable Pthread interfaces on Win32 systems.
In the personal computer market, Microsoft's Win32 API (the primary programming interface
to both Windows NT and Windows 95) supports threaded programming, as does IBM's OS/2.
These threaded programming models are quite different from Pthreads, but the important first step
toward using them productively is understanding concurrency, synchronization, and scheduling.
The rest is (more or less) a matter of syntax and style, and an experienced thread programmer can
adapt to any of these models.
The threaded model can be (and has been) applied with great success to a wide range of
programming problems. Here are just a few:
� Large scale, computationally intensive programs
� High-performance application programs and library code that can take advantage of
multiprocessor systems
� Library code that can be used by threaded application programs
� Realtime application programs and library code
� Application programs and library code that perform I/O to slow external devices (such
as networks and human beings).
Intended audience
This book assumes that you are an experienced programmer, familiar with developing code
for an operating system in "the UNIX family" using the ANSI C language. I have tried not to
assume that you have any experience with threads or other forms of asynchronous programming.
The Introduction chapter provides a general overview of the terms and concepts you'll need for the
rest of the book. If you don't want to read the Introduction first, that's fine, but if you ever feel like
you're "missing something" you might try skipping back to get introduced.
Along the way you'll find examples and simple analogies for everything. In the end I hope
that you'll be able to continue comfortably threading along on your own. Have fun, and "happy
threading."
About the author
I have been involved in the Pthreads standard since it began, although I stayed at home for
the first few meetings. I was finally forced to spend a grueling week in the avalanche-proof
concrete bunker at the base of Snowbird ski resort in Utah; watching hard-working standards
representatives from around the world wax their skis. This was very distracting, because I had
expected a standards meeting to be a formal and stuffy environment. As a result of this
misunderstanding, I was forced to rent ski equipment instead of using my own.
After the Pthreads standard went into balloting, I worked on additional thread
synchronization interfaces and multiprocessor issues with several POSIX working groups. I also
helped to define the Aspen threads extensions, which were fast tracked into X/Open XSH5.
I have worked at Digital Equipment Corporation for (mumble, mumble) years, in various
locations throughout Massachusetts and New Hampshire. I was one of the creators of Digital’s
own threading architecture, and I designed (and implemented much of) the Pthreads interfaces on
Digital UNIX 4.0. I have been helping people develop and debug threaded code for more than
eight years.
My unofficial motto is "Better Living Through Concurrency." Threads are not sliced bread,
but then, we're programmers, not bakers, so we do what we can.
Acknowledgments
This is the part where I write the stuff that I'd like to see printed, and that my friends and
coworkers want to see. You probably don't care, and I promise not to be annoyed if you skip over
it but if you're curious, by all means read on.
No project such as this book can truly be accomplished by a single person, despite the fact
that only one name appears on the cover. I could have written a book about threads without any
help--I know a great deal about threads, and I am at least reasonably competent at written
communication. However, the result would not have been this book, and this book is better than
that hypothetical work could possibly have been.
Thanks first and foremost to my manager Jean Fullerton, who gave me the time and
encouragement to write this book on the job--and thanks to the rest of the DECthreads team who
kept things going while I wrote, including Brian Keane, Webb Scales, Jacqueline Berg, Richard
Love, Peter Portante, Brian Silver, Mark Simons, and Steve Johnson.
Thanks to Garret Swart who, while he was with Digital at the Systems Research Center, got
us involved with POSIX. Thanks to Nawaf Bitar who worked with Garret to create, literally
overnight, the first draft of what became Pthreads, and who became POSIX thread evangelist
through the difficult period of getting everyone to understand just what the heck this threading
thing was all about anyway. Without Garret, and especially Nawaf, Pthreads might not exist, and
certainly wouldn't be as good as it is. (The lack of perfection is not their responsibility--that's the
way life is.)
Thanks to everyone who contributed to the design of cma, Pthreads, UNIX98, and to the
users of DCE threads and DECthreads, for all the help, thought-provoking discourse, and assorted
skin-thickening exercises, including Andrew Birrell, Paul Borman, Bob Conti, Bill Cox, Jeff
Denham, Peter Gilbert, Rick Greer, Mike Grier, Kevin Harris, Ken Hobday, Mike Jones, Steve
Kleiman, Bob Knighten, Leslie Lamport, Doug Locke, Paula Long, Finnbarr P. Murphy, Bill
Noyce, Simon Patience, Harold Seigel, A1 Simons, Jim Woodward, and John Zolnowsky.
Many thanks to all those who patiently reviewed the drafts of this book (and even to those
who didn't seem so patient at times). Brian Kernighan, Rich Stevens, Dave Brownell, Bill
Gallmeister, Ilan Ginzburg, Will Morse, Bryan O'Sullivan, Bob Robillard, Dave Ruddock, Bil
Lewis, and many others suggested or motivated improvements in structure and detail--and
provided additional skin-thickening exercises to keep me in shape. Devang Shah and Bart
Smaalders answered some Solaris questions, and Bryan O'Sullivan suggested what became the
"bailing programmers" analogy.
Thanks to John Wait and Lana Langlois at Addison Wesley Longman, who waited with great
patience as a first-time writer struggled to balance writing a book with engineering and consulting
commitments. Thanks to Pamela Yee and Erin Sweeney, who managed the book's production
process, and to all the team (many of whose names I'll never know), who helped.
Thanks to my wife, Anne Lederhos, and our daughters Amy and Alyssa, for all the things for
which any writers may thank their families, including support, tolerance, and just being there. And
thanks to Charles Dodgson (Lewis Carroll), who wrote extensively about threaded programming
(and nearly everything else) in his classic works Alice's Adventures in Wonderland, Through the
Looking-Glass, and The Hunting of the Snark.
Dave Butenhof
Digital Equipment Corporation
110 Spit Brook Road, ZKO2-3/Q18
Nashua, NH 03062
December 1996
1 Introduction "The time has come," the Walrus said,
"To talk of many things;
Of shoes--and ships--and sealing wax--
Of cabbages---and kings--
And why the sea is boiling hot--
And whether pigs have wings."
--Lewis Carroll, Through the Looking-Glass
In a dictionary, you would probably see that one of several definitions for "thread" is along
the lines of the third definition in the American Heritage paperback dictionary on my desk:
"Anything suggestive of the continuousness and sequence of thread." In computer terms, a thread
is the set of properties that suggest "continuousness and sequence" within the machine. A thread
comprises the machine state necessary to execute a sequence of machine instructions--the location
of the current instruction, the machine's address and data registers, and so forth.
A UNIX process can be thought of as a thread, plus an address space, files descriptors, and an
assortment of other data. Some versions of UNIX support "lightweight" or "variable weight"
processes that allow you to strip some or all of that data from some of your processes for
efficiency. Now, whether you're using a "thread" or a "lightweight process," you still need the
address space, file descriptors, and everything else. So, you might ask, what's the point? The point
is that you can have many threads sharing an address space, doing different things. On a
multiprocessor, the threads in a process can be doing different things simultaneously.
When computers lived in glass caves and were fed carefully prepared punch cards, the real
world outside could be kept waiting with no consequences more severe than some grumbling
programmers. But the real world doesn't do one thing at a time, and gradually computers began to
model that world by adding capabilities such as multiprogramming, time sharing, multiprocessing,
and, eventually, threads.
Threads can help you bring your application out of the cave, and Pthreads helps you do it in a
way that will be neat, efficient, and portable. This chapter briefly introduces you to what you need
to begin understanding and using threads. Don't worry--the rest of the book will follow up on the
details left dangling in this chapter.
Section 1.1 presents the framework for a number of analogies that I will use to explain
threading as we go. There is nothing all that unusual in the brief story—but hereafter you will
understand when I talk about programmers and buckets, which, otherwise, might seem mildly odd.
Section 1.2 defines some essential concepts and terms used in this book. The most important
of these concepts deserves a special introduction, which will also serve to demonstrate the
convention with which various particularly important points shall be emphasized throughout this
book:
| Asynchronous:
| Any two operations are "asynchronous" when they can proceed
| independently of each other.
Section 1.3 describes how you already use asynchronous programming on a regular basis,
both as a UNIX programmer and user, and as a human being in the real world. I wouldn't dare to
claim that asynchronous programming is easy, but the basic concepts it tries to model are so easy
and natural that you rarely need even to think about them until you try to apply them to software.
Threads are, to some extent, just one more way to make applications asynchronous, but
threads have some advantages over other models that have been used to build asynchronous
applications. Section 1.5 will show you some of the advantages as we apply various programming
methods in several versions of a simple alarm clock. You will get to see "threads in action" right
away, with a brief description of the few Pthreads interfaces needed to build this simple
application.
Armed, now, with a basic understanding of what threads are all about, you can go on to
Section 1.6, where we will explore some of the fundamental advantages of a threaded
programming model.
Although there are a lot of excellent reasons to use threads, there is a price to be paid. Section
1.7 provides a counterpoint to the previous section by describing some of the costs. What it boils
down to, though, is simply that you need to learn how the model works, and then apply it carefully.
It is not as hard as some folks would have you believe.
You have seen some of the fundamental benefits and costs. It may be obvious that you do not
want to rush out and put threads into every application or library you write. Section 1.8 asks the
question “To thread or not to thread?” and I will attempt to guide you toward determining the
proper answer in various cases.
You will know at that point what threads are, what they do, and when to use them. Aside
from brief examples, you haven't yet seen any detailed information about the particular
programming interfaces (APIs) that compose Pthreads. Section 1.9 points out some of the basic
landmarks of the Pthreads universe to get you oriented before we plunge ahead. The most
important part of this section is 1.9.3, which describes the Pthreads model for reporting
errors--which is some- what different than the rest of UNIX and POSIX.
1.1 The "bailing programmers" This was charming, no doubt: but they shortly found out
That the Captain they trusted so well
Had only one notion for crossing the ocean,
And that was to tingle his bell.
--Lewis Carroll, The Hunting of the Snark
Three programmers sail out to sea one fine day in a small boat. They sail quite some
distance from shore, enjoying the sun and sea breeze, allowing the wind to carry them. The
sky darkens, and a storm strikes. The small boat is tossed violently about, and when the storm
abates the programmers are missing their boat's sail and most of the mast. The boat has sprung
a small leak, and there is no land in sight.
The boat is equipped with food, water, oars, and a bailing bucket, and the programmers
set to work. One programmer rows, and monitors the accumulating water in the bosom of the
boat. The other programmers alternately sleep, watch the water level, or scan the horizon for
sight of land or another ship.
An idle programmer may notice rising water in the boat, and begin bailing. When
both idle programmers are awake, and become simultaneously concerned regarding their
increasing dampness, they may both lunge for the bailing bucket--but one will inevitably reach
it first, and the other will have to wait.
If the rower decides that bailing is required while both his companions sleep a nudge
is usually sufficient to awaken a programmer, allowing the other to continue sleeping. But if the
rower is in a bad mood, he may resort to a loud yell, awakening both sleeping programmers.
While one programmer assumes the necessary duty, the other can try to fall asleep again.
When the rower tires, he can signal one of the other programmers to take over the
task, and immediately fall into a deep sleep waiting to be signaled in turn. In this way,
they journey on for some time.
So, just what do the Bailing Programmers have to do with threads? I'm glad you asked! The
elements of the story represent analogies that apply to the Pthreads programming model. We'll
explore some additional analogies in later sections, and even expand the story a little, but for now
consider a few basics:
A programmer is an entity that is capable of independent activity. Our programmers
represent threads. A thread is not really much like a programmer, who, as we all know, is a
fascinatingly sophisticated mixture of engineer, mathematician, and artist that no computer can
match. Still, as a representation of the "active element" in our programming model, it will be
sufficient.
The bailing bucket and the oars are "tokens" that can be held by only one individual at a
time. They can be thought of as shared data, or as synchronization objects. The primary
Pthreads synchronization object, by the way, is called a mutex.
Nudges and shouts are communication mechanisms associated with a synchronization
object, on which individuals wait for some condition. Pthreads provides condition variables,
which may be signaled or broadcast to indicate changes in shared data state.
1.2 Definitions and terminology
"When I use a word," Humpty Dumpty said, in rather a scornful tone,
"It means just what I choose it to mean--neither more nor less."
--Lewis Carroll, Through the Looking-Glass
This book will use several critical terms that may be unfamiliar to you unless you've already
had some experience with parallel or asynchronous programming. Even if you are familiar with
them, some of the terms have seen assorted and even contradictory uses within research and
industry, and that is clearly not going to help communication. We need to begin by coming to a
mutual agreement regarding the meaning of these terms, and, since I am writing the book, we will
agree to use my definitions. (Thank you.)
1.2.1 Asynchronous Asynchronous means that things happen independently (concurrently) unless there's some
enforced dependency. Life is asynchronous. The dependencies are supplied by nature, and events
that are not dependent on one another can occur simultaneously. A programmer cannot row
without the oars, or bail effectively without the bucket--but a programmer with oars can row while
another programmer with a bucket bails. Traditional computer programming, on the other hand,
causes all events to occur in series unless the programmer takes "extraordinary measures" to allow
them to happen concurrently.
The greatest complication of "asynchrony" has been that there's little advantage to being
asynchronous unless you can have more than one activity going at a time. If you can start an
asynchronous operation, but then you can do nothing but wait for it, you're not getting much
benefit from the asynchrony.
1.2.2 Concurrency
Concurrency, which an English dictionary will tell you refer to things happening at the same
time, is used to refer to things that appear to happen at the same time, but which may occur
serially. Concurrency describes the behavior of threads or processes on a uniprocessor system. The
definition of concurrent execution in POSIX requires that "functions that suspend the execution of
the calling thread shall not cause the execution of other threads to be indefinitely suspended."
Concurrent operations may be arbitrarily interleaved so that they make progress
independently (one need not be completed before another begins), but concurrency does not imply
that the operations proceed simultaneously. Nevertheless, concurrency allows applications to take
advantage of asynchronous capabilities, and "do work" while independent operations are
proceeding.
Most programs have asynchronous aspects that may not be immediately obvious. Users, for
example, prefer asynchronous interfaces. They expect to be able to issue a command while they're
thinking about it, even before the program has finished with the last one. And when a windowing
interface provides separate windows, don't you intuitively expect those windows to act
asynchronously? Nobody likes a "busy" cursor. Pthreads provides you with both concurrency and
asynchrony, and the combination is exactly what you need to easily write responsive and efficient
programs. Your program can "wait in parallel" for slow I/O devices, and automatically take
advantage of multiprocessor systems to compute in parallel.
1.2.3 Uniprocessor and multiprocessor
The terms uniprocessor and multiprocessor are fairly straightforward, but let's define them
just to make sure there's no confusion. By uniprocessor, I mean a computer with a single
programmer-visible execution unit (processor). A single general-purpose processor with
superscalar processing, or vector processors, or other math or I/O coprocessors is still usually
considered a uniprocessor.
By multiprocessor, I mean a computer with more than one processor sharing a common
instruction set and access to the same physical memory. While the processors need not have equal
access to all physical memory, it should be possible for any processor to gain access to most
memory. A "massively parallel processor" (MPP) may or may not qualify as a multiprocessor for
the purposes of this book. Many MPP systems do qualify, because they provide access to all
physical memory from every processor, even though the access times may vary widely.
1.2.4 Parallelism Parallelism describes concurrent sequences that proceed simultaneously. In other words,
software "parallelism" is the same as English "concurrency" and different from software
"concurrency." Parallelism has a vaguely redeeming analogy to the English definition: It refers to
things proceeding in the same direction independently (without intersection).
True parallelism can occur only on a multiprocessor system, but concurrency can occur on
both uniprocessor and multiprocessor systems. Concurrency can occur on a uniprocessor because
concurrency is, essentially, the illusion of parallelism. While parallelism requires that a program
be able to perform two computations at once, concurrency requires only that the programmer be
able to pretend that two things can happen at once.
1.2.5 Thread safety and reentrancy
“Thread-safe" means that the code can be called from multiple threads without destructive
results. It does not require that the code run efficiently in multiple threads only that it can operate
safely in multiple threads. Most existing functions can be made thread-safe using tools provided
by Pthreads-- mutexes, condition variables, and thread-specific data. Functions that don't require
persistent context can be made thread-safe by serializing the entire function, for example, by
locking a mutex on entry to the function, and unlocking the mutex before returning. Functions
made thread-safe by serializing the entire function can be called in multiple threads--but only one
thread can truly perform the function at a time.
More usefully, thread-safe functions can be broken down into smaller critical sections. That
allows more than one thread to execute within the function, although not within the same part.
Even better, the code can be redesigned to protect critical data rather than critical code, which may
allow fully parallel execution of the code, when the threads don't need to use the same data at the
same time.
The putchar function, for example, which writes a character into a standard I/O (stdio) buffer,
might be made thread-safe by turning putchar into a critical section. That is, putchar might lock a
"putchar mutex," write the character, and then unlock the putchar mutex. You could call putchar
from two threads, and no data would be corrupted--it would be thread-safe. However, only one
thread could write its character at a time, and the others would wait, even if they were writing to
different stdio streams.
The correct solution is to associate the mutex with the stream, protecting the data rather than
the code. Now your threads, as long as they are writing to different streams, can execute putchar
in parallel. More importantly, all functions that access a stream can use the same mutex to safely
coordinate their access to that stream.
The term "reentrant" is sometimes used to mean "efficiently thread-safe." That is, the code
was made thread-safe by some more sophisticated measures than converting the function or
library into a single serial region. Although existing code can usually be made thread-safe by
adding mutexes and thread-specific data, it is often necessary to change the interface to make a
function reentrant. Reentrant code should avoid relying on static data and, ideally, should avoid
reliance on any form of synchronization between threads.
Often, a function can avoid internal synchronization by saving state in a "context structure"
that is controlled by the caller. The caller is then responsible for any necessary synchronization of
the data. The UNIX readdir function, for example, returns each directory entry in sequence. To
make readdir thread-safe, you might add a mutex that readdir locked each time it was called, and
unlocked before it returned to the caller. Another approach, as Pthreads has taken with readdir_r,
is to avoid any locking within the function, letting the caller allocate a structure that maintains the
context of readdir_r as it searches a directory.
At first glance, it may seem that we're just making the caller perform what ought to be the job
of readdir_r. But remember that only the caller knows how the data will be used. If only one
thread uses this particular directory context, for example, then no synchronization is needed. Even
when the data is shared between threads, the caller may be able to supply more efficient
synchronization, for example, if the context can be protected using a mutex that the application
also uses for other data.
1.2.6 Concurrency control functions
Any "concurrent system" must provide a core set of essential functions that you need to
create concurrent execution contexts, and control how they operate within your library or
application. Here are three essential facilities, or aspects, of any concurrent system:
1. Execution context is the state of a concurrent entity. A concurrent system must provide a
way to create and delete execution contexts, and maintain their state independently. It must be able
to save the state of one context and dispatch to another at various times, for example, when one
needs to wait for an external event. It must be able to continue a context from the point where it
last executed, with the same register contents, at a later time.
2. Scheduling determines which context (or set of contexts) should execute at any given point
in time, and switches between contexts when necessary.
3. Synchronization provides mechanisms for concurrent execution contexts to coordinate
their use of shared resources. We use this term in a way that is nearly the opposite of the standard
dictionary meaning. You'll find a definition much like "cause to occur at the same time," whereas
we usually mean something that might better be expressed as "prevent from occurring at the same
time." In a thesaurus, you may find that "cooperate" is a synonym for "synchronize"--and
synchronization is the mechanism by which threads cooperate to accomplish a task. This book will
use the term "synchronization," though, because that is what you'll see used, almost universally.
There are many ways to provide each of these facilities--but they are always present in some
form. The particular choices presented in this book are dictated by the book's subject--Pthreads.
Table 1.1 shows a few examples of the three facilities in various systems.
Execution context Scheduling Synchronization
Real traffic automobile traffic lights and signs turn signals and brake
lights
UNIX
(before threads)
process priority (nice) wait and pipes
Pthreads thread policy, priority condition variables
and mutexes
TABLE 1.1 Execution contexts, schedulers, and synchronization
A system's scheduling facility may allow each thread to run until it voluntarily yields the
processor to another thread ("run until block"). It may provide time-slicing, where each thread is
forced to periodically yield so that other threads may run ("round-robin"). It may provide various
scheduling policies that allow the application to control how each thread is scheduled according to
that thread's function. It may provide a "class scheduler" where dependencies between threads are
described so that, for example, the scheduler can ensure that members of a tightly coupled parallel
algorithm are scheduled at the same time.
Synchronization may be provided using a wide variety of mechanisms. Some of the most
common forms are mutexes, condition variables, semaphores, and events. You may also use
message passing mechanisms, such as UNIX pipes, sockets, POSIX message queues, or other
protocols for communicating between asynchronous processes--on the same system or across a
network. Any form of communication protocol contains some form of synchronization, because
passing data around with no synchronization results in chaos, not communication.
The terms thread, mutex, and condition variable are the main topics of this book. For now, it
is enough to know that a thread represents an "executable thing" on your computer. A mutex
provides a mechanism to prevent threads from colliding unexpectedly, and a condition variable
allows a thread, once it has avoided such a collision, to wait until it is safe to proceed. Both
mutexes and condition variables are used to synchronize the operation of threads.
1.3 Asynchronous programming is intuitive ... "In most gardens," the Tiger-lily said,
"they make the beds too soft--so that the flowers are always asleep."
This sounded a very good reason, and Alice was quite
pleased to know it.
"I never thought of that before!" she said.
--Lewis Carroll, Through the Looking-Glass
If you haven't been involved in traditional realtime programming, asynchronous
programming may seem new and different. But you've probably been using asynchronous
programming techniques all along. You've probably used UNIX, for example, and, even as a user,
the common UNIX shells from sh to ksh have been designed for asynchronous programming.
You've also been using asynchronous "programming" techniques in real life since you were born.
Most people understand asynchronous behavior much more thoroughly than they expect,
once they get past the complications of formal and restricted definitions.
1.3.1 …because UNIX is asynchronous In any UNIX system, processes execute asynchronously with respect to each other, even
when there is only a single processor. Yes, until recently it was difficult to write individual
programs for UNIX that behaved asynchronously--but UNIX has always made it fairly easy for
you to behave asynchronously. When you type a command to a shell, you are really starting an
independent program--if you run the program in the background, it runs asynchronously with the
shell. When you pipe the output of one command to another you are starting several independent
programs, which synchronize between themselves using the pipe.
| Time is a synchronization mechanism.
In many cases you provide synchronization between a series of processes yourself, maybe
without even thinking about it. For example, you run the compiler after you've finished editing the
source files. It wouldn't occur to you to compile them first, or even at the same time. That's
elementary real-life synchronization.
| UNIX pipes and files can be synchronization mechanisms.
In other cases you may use more complicated software synchronization mechanisms. When
you type "ls|more" to a shell to pass the output of the ls command into the more command, you're
describing synchronization by specifying a data dependency. The shell starts both commands right
away, but the more command can't generate any output until it receives input from ls through the
pipe. Both commands proceed concurrently (or even in parallel on a multiprocessor) with ls
supplying data and more processing that data, independently of each other. If the pipe buffer is big
enough, ls could complete before more ever started; but more can't ever get ahead of ls.
Some UNIX commands perform synchronization internally. For example, the command "cc
-o thread thread.c" might involve a number of separate processes. The cc command might be a
"front end" to the C language environment, which runs a filter to expand preprocessor commands
(like #include and #if), a compiler to translate the program into an intermediate form, an optimizer
to reorder the translation, an assembler to translate the intermediate form into object language, and
a loader to translate that into an executable binary file. As with ls|more, all these programs may be
running at the same time, with synchronization provided by pipes, or by access to temporary files.
UNIX processes can operate asynchronously because each process includes all the
information needed to execute code. The operating system can save the state of one process and
switch to another without affecting the operation of either. Any general-purpose asynchronous
"entity" needs enough state to enable the operating system to switch between them arbitrarily. But
a UNIX process includes additional state that is not directly related to "execution context," such as
an address space and file descriptors.
A thread is the part of a process that's necessary to execute code. On most computers that
means each thread has a pointer to the thread's current instruction (often called a "PC" or
"program counter"), a pointer to the top of the thread's stack (SP), general registers, and
floating-point or address registers if they are kept separate. A thread may have other things, such
as processor status and coprocessor control registers. A thread does not include most of the rest of
the state associated with a process; for example, threads do not have their own file descriptors or
address space. All threads within a process share all of the files and memory, including the
program text and data segments.
| Threads are "simpler" than processes.
You can think of a thread as a sort of "stripped down" process, lean and mean and ready to go.
The system can switch between two threads within a process much faster than it can switch
between processes. A large part of this advantage comes from the fact that threads within a process
share the address space--code, data, stack, everything.
When a processor switches between two processes, all of the hardware state for that process
becomes invalid. Some may need to be changed as part of the context switch procedure--data
cache and virtual memory translation entries may be flushed, for example. Even when they do not
need to be flushed immediately, however, the data is not useful to the new process. Each process
has a separate virtual memory address space, but threads running within the same process share
the virtual address space and all other process data.
Threads can make high-bandwidth communication easier between independent parts of your
program. You don't have to worry about message passing mechanisms like pipes or about keeping
shared memory region address references consistent between several different address spaces.
Synchronization is faster, and programming is much more natural. If you create or open a file, all
threads can use it. If you allocate a dynamic data structure with malloc, you can pass the address
to other threads and they can reference it. Threads make it easy to take advantage of concurrency.
1.3.2 ... because the world is asynchronous
Thinking asynchronously can seem awkward at first, but it'll become natural with a little
practice. Start by getting over the unnatural expectation that everything will happen serially unless
you do something "unusual." On a one-lane road cars proceed one at a time--but on a two-lane
road two cars go at once. You can go out for a cup of coffee, leaving your computer compiling
some code and fully expecting that it will proceed without you. Parallelism happens everywhere in
the real world, and you expect it.
A row of cashiers in a store serve customers in parallel; the customers in each line generally
wait their turn. You can improve throughput by opening more lines, as long as there are registers
and cashiers to serve them, and enough customers to be served by them. Creating two lines for the
same register may avoid confusion by keeping lines shorter--but nobody will get served faster.
Opening three registers to serve two customers may look good, but it is just a waste of resources.
In an assembly line, workers perform various parts of the complete job in parallel, passing
work down the line. Adding a station to the line may improve performance if it parallels or
subdivides a step in the assembly that was so complicated that the operator at the next station
spent a lot of time waiting for each piece. Beware of improving one step so much that it generates
more work than the next step on the assembly line can handle.
In an office, each project may be assigned to a "specialist." Common specialties include
marketing, management, engineering, typing pool, sales, support, and so forth. Each specialist
handles her project independently on behalf of the customer or some other specialist, reporting
back in some fashion when done. Assigning a second specialist to some task, or defining narrower
specialties (for example, assigning an engineer or manager permanently to one product) may
improve performance as long as there's enough work to keep her busy. If not, some specialists play
games while others' in-baskets overflow.
Motor vehicles move in parallel on a highway. They can move at different speeds, pass each
other, and enter and exit the highway independently. The drivers must agree to certain conventions
in order to avoid collisions. Despite speed limits and traffic signs, compliance with the "rules of
the road" is mostly voluntary. Similarly, threads must be coded to "agree" to rules that protect the
program, or risk ending up undergoing emergency debugging at the thread hospital.
Software can apply parallelism in the same ways you might use it in real life, and for the
same reasons. When you have more than one "thing" capable of doing work, you naturally expect
them to all do work at the same time. A multiprocessor system can perform multiple computations,
and any time-sharing system can perform computations while waiting for an external device to
respond. Software parallelism is subject to all of the complications and problems that we have
seen in real life--and the solutions may not be as easy to see or to apply. You need enough threads,
but not too many; enough communication, but not too much. A key to good threaded programming
is learning how to judge the proper balance for each situation.
Each thread can process similar parts of a problem, just like supermarket cashiers handling
customers. Each thread can perform a specific operation on each data item in turn, just like the
workers on an assembly line. Each thread can specialize in some specific operation and perform
that operation repeatedly on behalf of other threads. You can combine these basic models in all
sorts of ways; for example, in parallel assembly lines with some steps performed by a pool of
servers.
As you read this book you'll be introduced to concepts that may seem unfamiliar: mutexes,
condition variables, race conditions, deadlocks, and priority inversions. Threaded programming
may feel daunting and unnatural. But I’ll explain all those concepts as we move through this book,
and once you've been writing multithreaded code for a while you may find yourself noticing
real-world analogies to the concepts. Threads and all this other stuff are formalized and restricted
representations of things you already understand.
If you find yourself thinking that someone shouldn't interrupt you because you have the
conversation mutex locked, you've begun to develop an intuitive understanding of threaded
programming.* You can apply that understanding to help you design better threaded code with less
effort. If something wouldn't make sense in real life, you probably shouldn't try it in a program
either.
* It may also be a good time to take a break and read some healthy escapist fiction for a while.
1.4 About the examples in this book This book contains a number of examples. All are presented as complete programs, and they
have been built and tested on Digital UNIX 4.0d and Solaris 2.5.
All of these programs do something, but many do not do anything of any particular
importance. The purpose of the examples is to demonstrate thread management and
synchronization techniques, which are mere overhead in most real programs. They would be less
effective at revealing the details if that "overhead" was buried within large programs that "did
something."
Within the book, examples are presented in sections, usually one function at a time. The
source code is separated from the surrounding text by a header and trailer block which include the
file name and, if the example comprises more than one section, a section number and the name of
the function. Each line of the source code has a line number at the left margin. Major functional
blocks of each section are described in specially formatted paragraphs preceding the source code.
These paragraphs are marked by line numbers outside the left margin of the paragraph, denoting
the line numbers in the source listing to which the paragraph refers. Here's an example:
1-2 These lines show the header files included in most of the examples. The <pthread.h> header file
declares constants and prototypes for the Pthreads functions, and the errors.h header file includes
various other headers and some error-checking functions.
sample.c part 1 sampleinfo
I have written these examples to use error checking everywhere. That is, I check for errors on
each function call. As long as you code carefully, this isn't necessary, and some experts
recommend testing only for errors that can result from insufficient resources or other problems
beyond your control. I disagree, unless of course you're the sort of programmer who never makes
a mistake. Checking for errors is not that tedious, and may save you a lot of trouble during
debugging.
You can build and run all of the examples for yourself--the source code is available online at
http://www.aw. com/butenhof/posixcode.html. A Makefile is provided to build all of the examples,
though it requires modifications for various platforms. On Digital UNIX, the examples were built
with CFLAGS=-pthread -stdl -wl. On Solaris, they were built with CFLAGS=-D_REENTRANT
-D_POSIX_C_SOURCE= 199506 - lpthread. Some of the examples require interfaces that may
not be in the Pthreads library on your system, for example, clock_gettime, which is part of the
POSIX.lb realtime standard. The additional realtime library is specified by the RTFLAGS variable,
which is defined as RTFLAGS=-lrt on Digital UNIX, and as RTFLAGS=-lposix4 on Solaris.
On Solaris 2.5 systems, several of the examples require calls to thr_setconcurrency to ensure
proper operation. This function causes Solaris to provide the process with additional concurrency.
In a few cases, the example will not operate at all without this call, and in other cases, the example
would fail to demonstrate some behavior.
1.5 Asynchronous programming, by example "In one moment I've seen what has hitherto been
Enveloped in absolute mystery,
And without extra charge I will give you at large
A Lesson in Natural History."
--Lewis Carroll, The Hunting of the Snark
This section demonstrates some basic asynchronous programming, using a simple program
that does something vaguely useful, by pretending to be an alarm clock with a command interface
for which you would not consider paying a dime in a store. But then, this book is about threads,
not user interfaces, and the code that I need to show takes up quite enough space already.
The program prompts for input lines in a loop until it receives an error or end of file on stdin.
On each line, the first nonblank token is interpreted as the number of seconds to wait, and the rest
of the line (up to 64 characters) is a message that will be printed when the wait completes. I will
offer two additional versions-- one using multiple processes, and one using multiple threads. We'll
use the three examples to compare the approaches.
1.5.1 The baseline, synchronous version
1 Include the header file errors.h, which includes standard headers like <unistd.h> and
<stdio.h> and defines error reporting macros that are used throughout the examples in this book.
We don't use the error reporting macros in this particular example, but consistency is nice,
sometimes.
9-26 The "baseline" version, alarm.c, is a synchronous alarm program with a single routine, main.
Most of main is a loop, which processes simple commands until fgets returns a NULL (error or
end of file). Each line is "parsed" with sscanf to separate the number of seconds to wait (%d, the
first sequence of digits) from the message string to print (%64[ ^\ n l, the rest of the line, up to 64
characters excluding new line).
alarm.c
The problem with the program alarm.c is that only one alarm request can be active at a time.
If you set an alarm to remind you to do something in 10 minutes (600 seconds), you can't decide
to have it remind you of something else in 5 minutes. The program is doing something
synchronously that you would probably like to be asynchronous.
1.5.2 A version using multiple processes There are lots of ways to make this program asynchronous; for example, you could run more
than one copy of the program. One way to run multiple copies is to fork a child process for each
command, as shown in alarm_fork.c. The new version is asynchronous--you can enter commands
at any time, and they will be carried out independently. It isn't much more complicated than the
original, which is nice.
27-37 The main difference between alarm.c and alarm_fork.c is that instead of calling sleep directly,
it uses fork to create a new child process, which then calls sleep (and, eventually, printf}
asynchronously, while the parent process continues.
42-46 The primary complication in this version is the need to "reap" any child processes that have
terminated. If the program fails to do this, the system will save them all until the program
terminates. The normal way to reap terminated child processes is to call one of the wait functions.
In this case, we call waitpid, which allows the caller to specify the WNOHANG flag. The function
will immediately reap one child process if any have terminated, or will immediately return with a
process ID (pid) of 0. The parent process continues to reap terminated child processes until there
are no more to reap. When the loop terminates, main loops back to line 13 to read a new
command.
alarm_fork.c
1.5.3 A version using multiple threads Now, let us try another alarm program, alarm_thread.c. It is much like the fork version in
alarm_fork.c, except that it uses threads instead of child processes to create asynchronous alarms.
Four Pthreads calls are used in this program:
� pthread_create creates a thread running the routine specified in the third argument
(alarm_thread), returning an identifier for the new thread to the variable referenced by
thread.
� pthread_detach allows Pthreads to reclaim the thread's resources as soon as it terminates.
� pthread_exit terminates the calling thread.
� pthread_self returns the calling thread’s identifier.
4-7 The alarm_t structure defines the information stored for each alarm command, the number of
seconds until the alarm is due, and the message string that will be printed by the thread.
alarm_thread.c part 1 definitions
1-8 The alarm thread function is the "alarm thread." That is, each alarm thread is created running
this function, and when the function returns the thread terminates. The function's argument (void
*arg) is the fourth argument that was passed to pthread_create, in this case, a pointer to the control
packet (alarm_t) created for the alarm request that the thread is to satisfy. The thread starts by
"mapping" the void * argument as a pointer to a control packet. The thread detaches itself by
calling pthread_detach, which informs Pthreads that the application does not need to know when
the thread terminates or its termination status.
9-12 The thread sleeps for the number of seconds specified in its control packet, and then prints
the message string. Finally, the thread frees the control packet and returns. When a thread returns
from its initial routine, as it does here, the thread terminates. Normally, Pthreads would hold the
thread's resources so that another thread could later determine that it had exited and retrieve a final
result. Because the thread detached itself, none of that is necessary.
alarm_thread.c part 2 alarm_thread
The main program of alarm thread.c is much the same as the other two variants. It loops,
reading and interpreting command lines as long as it can read from stdin.
12-25 In this variation, main allocates heap storage (alarm_t) for each alarm command. The alarm
time and message are stored in this structure, so each thread can be given the appropriate
information. If the sscanf call fails to "parse" a correct command, the heap storage is freed.
12-26 An alarm thread is created, running function alarm_thread, with the alarm data (alarm_t) as
the thread's argument.
alarm_thread.c part 3 main
1.5.4 Summary
A good way to start thinking about threads is to compare the two asynchronous versions of
the alarm program. First, in the fork version, each alarm has an independent address space, copied
from the main program. That means we can put the seconds and message values into local
variables--once the child has been created (when fork returns), the parent can change the values
without affecting the alarm. In the threaded version, on the other hand, all threads share the same
address space--so we call malloc to create a new structure for each alarm, which is passed to the
new thread. The extra bookkeeping required introduces a little complexity into the threaded
version.
In the version using fork, the main program needs to tell the kernel to free resources used by
each child process it creates, by calling waitpid or some other member of the wait "family." The
alarm_fork.c program, for example, calls waitpid in a loop after each command, to collect all child
processes that have completed. You do not need to wait for a thread unless you need the thread's
return value--in alarm_thread.c, for example, each alarm thread detaches itself (at line 6, part 2) so
that the resources held by the thread will be returned immediately when it terminates.
In the threaded version, the "primary activities" (sleeping and printing the message) must be
coded in a separate routine. In alarm.c and alarm_fork.c, those activities were performed without a
call. In simple cases such as our alarm program, it is often easier to understand the program with
all code in one place, so that might seem like an advantage for alarm_fork. c. In more complicated
programs, though, it is rare that a program's "primary activities" are so simple that they can be
performed in a single routine without resulting in total confusion.
In a real alarm program, you wouldn't want to create a process for each alarm. You might
easily have hundreds of alarms active, and the system probably wouldn't let you create that many
processes. On the other hand, you probably can create hundreds of threads within a process. While
there is no real need to maintain a stack and thread context for each alarm request, it is a perfectly
viable design.
A more sophisticated version of alarm_thread.c might use only two threads: one to read input
from the user, and another to wait for expiration of the next alarm--I'll show that version later,
after we've worked through some more basic concepts. You could do the same thing with two
processes, of course, but it would be more cumbersome. Passing information between two threads
is easy and fast--no shared memory to map, no pipes to read or write, no concerns about whether
you are passing addresses that may not mean the same thing in both processes. Threads share
everything in their address space--any address that's valid in one thread is valid in all threads.
1.6 Benefits of threading
"'O Looking-Glass creatures,' quoth Alice, 'draw near!
'Tis an honour to see me, a favour to hear:
'Tis a privilege high to have dinner and tea
Along with the Red Queen, the White Queen, and me!'"
--Lewis Carroll, Through the Looking-Glass
Some advantages of the multithreaded programming model follow:
1. Exploitation of program parallelism on multiprocessor hardware. Parallelism is the only
benefit that requires special hardware. The others can help most programs without special
hardware.
2. More efficient exploitation of a program's natural concurrency, by allowing the program
to perform computations while waiting for slow I/O operations to complete.
3. A modular programming model that clearly expresses relationships between
independent "events" within the program.
These advantages are detailed in the following sections.
1.6.1 Parallelism On a multiprocessor system, threading allows a process to perform more than one
independent computation at the same time. A computation-intensive threaded application running
on two processors may achieve nearly twice the performance of a traditional single-threaded
version. "Nearly twice" takes into account the fact that you'll always have some overhead due to
creating the extra thread(s) and performing synchronization. This effect is often referred to as
"scaling." A two-processor system may perform 1.95 times as fast as a single processor, a
three-processor system 2.9 times as fast, a four-processor system 3.8 times as fast, and so forth.
Scaling almost always falls off as the number of processors increases because there's more chance
of lock and memory collisions, which cost time.
n
p p
Speedup
+−
=
)1(
1
FIGURE 1.1 Amdahl's law
Scaling can be predicted by "Amdahl's law," which is shown in Figure 1.1. In the equation, p
represents the ratio of "parallelizable code" over "total execution time," and n represents the
number of processors the code can use. The total elapsed time for a parallel job is the sum of the
elapsed time for the nonparallelizable work (1 - p) and the elapsed time for each processor
executing the parallelizable work (p/n).
Amdahl’s law is a simple relationship showing how parallelism is limited by the amount of
serialization needed. When the program has no parallelizable code (p is 0), the speedup is 1. That
is, it is not a parallel program. If the program requires no synchronization or other serial code (p is
1), then the speedup is n (the number of processors). As more synchronization is required,
parallelism provides less benefit. To put it another way, you'll get better scaling with activities that
are completely independent than with activities that are highly dependent: The independent
activities need less synchronization.
The diagram in Figure 1.2 shows the effect of Amdahl’s law. "Clock time" progresses from
left to right across the page, and the diagram shows the number of processors working in parallel
at any moment. Areas where the diagram has only a single horizontal line show that the process is
serialized. Areas that have several horizontal lines in parallel show where the process benefits
from multiple processors. If you can apply multiple processors for only 10% of your program's
execution time, and you have four processors, then Amdahl’s law predicts a speedup of
1/( 0.9+( 0.1/4 ) ), or about 8%.
FIGURE 1.2 Parallelism charted against time
Operations on large matrices can often be "parallelized" by splitting the matrix into pieces.
For example, each thread may be able to operate on a set of rows or columns without requiring
any data written by threads operating on other slices. You still generally need to synchronize
threads at the beginning and end of processing the matrix, frequently using a barrier.* Amdahl’s
law shows that you'll get better performance by giving each thread a large and relatively
independent "chunk" of work, requiring infrequent synchronization, than by giving them smaller
chunks.
Amdahl’s law is an excellent thought exercise to help you understand scaling. It is not,
however, a practical tool, because it is nearly impossible to accurately compute p for any program.
To be accurate, you need to consider not only all serialized regions within your code, but also
within the operating system kernel and even in hardware. Multiprocessor hardware must have
some mechanism to synchronize access to the contents of memory. When each processor has a
private data cache, the contents of those caches must be kept consistent with each other and with
the data in memory. All of this serialization must be included in any accurate calculation.
* A barrier is a simple synchronization mechanism that blocks each thread until a certain
number has reached the barrier; then all threads are unblocked. Barriers can be used, for example,
to keep any thread from executing a parallel region of code until all threads are ready to execute
the region. Section 7.1.1 describes barriers in more detail, and demonstrates the construction of a
simple barrier package.
1.6.2 Concurrency
The threaded programming model allows the program to make computational progress while
waiting for blocking operations like I/O. This is useful for network servers and clients, and it is the
main reason that client/server systems (such as OSF DCE) use threads. While one thread waits for
a long network read or write operation, that thread is blocked and other threads in your application
can execute independently. Some systems support asynchronous I/O operations, which can give
similar advantages; but most UNIX-based systems do not have asynchronous I/O*. Furthermore,
asynchronous I/O is generally a lot more complicated to use than threads.
* UNIX systems support "nonblocking I/O," but this is not the same thing as asynchronous
I/O. Nonblocking I/O allows the program to defer issuing an I/O operation until it can complete
without blocking, but asynchronous I/O can proceed while the program does something else.
For example, you need to either handle asynchronous notification when the I/O completes, or
poll for completion. If you issued an asynchronous I/O and then entered a polling loop, you would
lose the advantage of asynchronous I/O your application would just wait. If you poll elsewhere, or
handle asynchronous notification, then issuing the I/O and processing the resulting data occur in
different locations within your program. That makes the code more difficult to analyze and
maintain. When you use synchronous I/O you just perform the I/O and then do whatever comes
next. Synchronous I/O within multiple threads gives nearly all the advantages of asynchronous I/O.
In most cases you will find it much easier to write complex asynchronous code using threads than
using traditional asynchronous programming techniques.
You could write an alarm program, like those shown in Section 1.5, as an asynchronous
program without using processes or threads, with timer signals for the alarms and asynchronous
reads for input. Using timer signals is more complicated in many ways, since you are severely
limited in what you can do within a signal handler. Asynchronous I/O does not allow you to take
advantage of the convenience of stdio functions. The basic program function will be scattered
through a series of signal handlers and functions, and will probably be harder to understand.
Asynchronous I/O does have one advantage over threaded concurrency, though. Just as a
thread is usually "cheaper" (in execution time and storage space) than a process, the context
required for an asynchronous I/O operation is almost always cheaper than a thread. If you plan to
have a lot of asynchronous I/O operations active at the same time, that might be important enough
to justify using the more complicated programming model. But watch out--some "asynchronous
I/O" packages just distribute your I/O requests across a pool of threads! Most of the time you will
be better off using threads.
Another method of coding an asynchronous application is for each action to be treated as an
"event." Events are queued by some "hidden" process, and dispatched serially to be handled by the
application, usually through "callback" routines registered with the dispatcher. Event dispatchers
have been popularized by windowing interface systems such as the Apple Macintosh toolbox,
Microsoft Windows, and X Windows on UNIX (used by Motif and CDE).
The event mechanism alleviates much of the complication of using signals and asynchronous
I/O, as long as the events are supported directly by the event dispatcher. All, for example, handle
input from the keyboard or pointer device, and generally one can request a timer event to be
inserted automatically at a desired time. Thus, the alarm program, written to an event interface,
need only initialize the event dispatcher and enter a loop to process events. Input events would be
dispatched to the parser, resulting in a request for a new timer event; and timer events would be
dispatched to a function that would format and print the alarm message.
For very simple applications (and the alarm program here is certainly one example), an
event-based implementation may be simpler than the multiprocess or multithread variations I've
shown--at least when the (often substantial) overhead of initializing the event dispatcher is
removed. The limitations of events become more obvious when you build larger and more
sophisticated applications-the problem is that the events are sequential.
Events are not concurrent, and the program can do only one thing at a time. Your application
receives an event, processes it, and then receives the next event. If processing an event takes a
long time, for example, sorting a large database, the user interface may remain unresponsive for
quite a while. If an event involves a long wait, for example, reading data over a slow network
connection, then, again, the user must wait.
The response problems can be minimized by liberally sprinkling extended operations with
calls to the event dispatcher--but getting them in the right place, without substantially impacting
the performance of the operation, can be difficult. Furthermore, you may not have that option, if
the database sort is taking place in a shared library you bought from somebody else.
On the other hand, one might code the application to create a new thread that runs the
database sort, or reads from the slow network, leaving the "user interface" thread to immediately
request another event. The application becomes responsive, while the slow operation continues to
run. You can do this even if a database package, for example, cannot tolerate being run in multiple
threads, by queuing a "sort" command to a server thread that runs database operations
serially--while still retaining interface responsiveness.
1.6.3 Programming model
It may be surprising that programming with threads is a good idea even if you know your
code will never run on a multiprocessor. But it is true. Writing with threads forces you to think
about and plan for the synchronization requirements of your program. You've always had to think
about program dependencies, but threads help to move the requirements from comments into the
executable structure of the program.
Assembly language programs can use all the same sequential control structures (loops,
conditional code) as programs written in a high-level language. However, it can be difficult to
determine whether a branch instruction represents the top or bottom of a loop, a simple conditional,
a "conditional goto," or something more exotic. Switching to a higher-level language that supports
these sequential controls directly in source, for example, the C language do, while, for, if, and
switch statements, makes these sequential programming constructs explicit in the source language.
Making control structures explicit in the program source code means that more of your program's
design is explicit in the source, and that makes it easier for someone else to understand.
Similarly, a C language program (or even an assembler program) may use data encapsulation
and polymorphism by adhering to programming conventions, and with luck those conventions
may be carefully documented and the documentation kept updated. But if that same code is
written in an object-oriented language, the encapsulation and polymorphism become explicit in
the source language.
In a sequential program, synchronization requirements are implicit in the ordering of
operations. The true synchronization requirements, for example, that "a file must be opened before
data can be read from the file," may be documented only by source comments, if at all. When you
program using threads, sequential assumptions are (or at least should be) limited to small
segments of contiguous code--for example, within a single function. More global assumptions, to
be at all safe, must be protected by explicit synchronization constructs.
In traditional serial programming you call function A to do one thing, then call another
function B to do something else, even when those two functions don't require serialization. If a
developer is trying to determine what the program is doing, perhaps to trace a bug, it often isn't
obvious that there may be no need to follow both calls. Furthermore, the strictly serial model
makes it easy for someone to inadvertently make function B dependent on some side effect of
function A. If a later modification reverses the order of the calls, the program may break in ways
that aren't obvious. Program dependencies may be documented using source code comment blocks,
but comments are often ambiguous and may not be properly updated when code is changed.
The threaded programming model isolates independent or loosely coupled functional
execution streams (threads) in a clear way that's made explicit in the program's source code. If
activities are designed as threads, each function must include explicit synchronization to enforce
its dependencies. Because synchronization is executable code, it can't be ignored when
dependencies are changed. The presence of synchronization constructs allows anyone reading the
code to follow temporal dependencies within the code, which can make maintenance substantially
easier, especially for large programs with a lot of independent code.
An assembly language programmer can write better, more maintainable assembly code by
understanding high-level language programming; a C language programmer can write better, more
maintainable C code by understanding object-oriented programming. Even if you never write a
threaded program, you may benefit from understanding the threaded programming model of
independent functions with explicit dependencies. These are "mental models" (or that dreadfully
overused word, "paradigms") that are more or less independent of the specific code sequences you
write. Cleanly isolating functionally independent code may even make sequential programs easier
to understand and maintain.
1.7 Costs of threading
All this time the Guard was looking at her, first through a telescope, then
through a microscope, and then through an opera-glass.
At last he said, "You're traveling the wrong way,"
and shut up the window, and went away,
--Lewis Carroll, Through the Looking-Glass
Of course there's always "the flip side." As I showed in the previous section, threads provide
definite and powerful advantages, even on uniprocessor systems. They provide even more
advantages on a multiprocessor.
So why wouldn't you want to use threads? Everything has a cost, and threaded programming
is no exception. In many cases the advantage exceeds the cost; in others it doesn't. To be fair, the
following subsections discuss the cost of threaded programming.
1.7.1 Computing overhead
Overhead costs in threaded code include direct effects such as the time it takes to synchronize
your threads. Many clever algorithms are available for avoiding synchronization in some cases,
but none of them is portable. You'll have to use some synchronization in just about any threaded
code. It is easy to lose performance by using too much synchronization; for example, by
separately protecting two variables that are always used together. Protecting each variable
separately means you spend a lot more time on synchronization without gaining parallelism, since
any thread that needs one variable will need the other as well.
The overhead of threaded programming can also include more subtle effects. For example,
threads that constantly write the same memory locations may spend a lot of time synchronizing
the memory system on processors that support "read/write ordering." Other processors may spend
that time synchronizing only when you use special instructions such as a memory barrier, or a
"multiprocessor atomic" operation like test-and-set. Section 3.4 says a lot more about these
effects.
Removing a bottleneck in your code, for example, by adding threads to perform multiple
concurrent I/O operations, may end up revealing another bottleneck at a lower level--in the ANSI
C library, the operating system, the file system, the device driver, the memory or I/O architecture,
or the device controller. These effects are often difficult to predict, or measure, and are usually not
well documented.
A compute-bound thread, which rarely blocks for any external event, cannot effectively share
a processor with other compute-bound threads. An I/O thread might interrupt it once in a while,
but the I/O thread would block for another external event and the compute-bound thread would
run again. When you create more compute-bound threads than there are available processors, you
may gain better code structuring over a single-threaded implementation, but you will have worse
performance. The performance suffers because the multithreaded implementation adds thread
synchronization and scheduling overhead to the work you wanted to accomplish--and does it all
using the same compute resources.
1.7.2 Programming discipline Despite the basic simplicity of the threaded programming model, writing real world code is
never trivial. Writing code that works well in multiple threads takes careful thought and planning.
You have to keep track of synchronization protocols and program invariants. You have to avoid
deadlocks, races, and priority inversions. I'll describe all of these things in later sections, show
how to design code to avoid the problems, and how to find and repair them after the fact.
You will almost certainly use library code that you did not write. Some will be supplied with
the operating system you use, and most of the more common libraries will likely be safe to use
within multiple threads. POSIX guarantees that most functions specified by ANSI C and POSIX
must be safe for use by multithreaded applications. However, a lot of "interesting" functions you
will probably need are not included in that list. You will often need to call libraries that are not
supplied with the operating system, for example, database software. Some of that code will not be
thread-safe. I will discuss techniques to allow you to use most unsafe code, but they will not
always work, and they can be ugly.
All threads within a process share the same address space, and there's no protection boundary
between the threads. If a thread writes to memory through an uninitialized pointer, it can wipe out
another thread's stack, or heap memory being used by some other thread. The eventual failure will
most likely occur in the innocent victim, possibly long after the perpetrator has gone on to other
things. This can be especially important if arbitrary code is run within a thread. For example, in a
library that supports callbacks to functions supplied by its caller, be sure that the callback, as well
as the library, is thread-safe.
The important points are that good sequential code is not necessarily good threaded code, and
bad threaded code will break in ways that are more difficult to locate and repair. Thinking about
real-life parallelism can help a lot, but programming requires a lot more detailed work than most
things in real life.
1.7.3 Harder to debug
You will learn more about debugging threaded code, and, more importantly, not debugging
threaded code, in Chapter 8. You will see some of the tools you may encounter as well as some
techniques you can use on your own. By then you will know all about mutexes and memory
visibility, and you will be ready to deal with deadlocks and races. Don't worry about the details
now--the point of this brief section is to demonstrate that you will have to learn about threaded
debugging, and it is not as easy yet as anyone would like it to be. (So when was debugging ever
easy?)
Systems that support threads generally extend traditional sequential debugging tools to
provide basic debugging support. The system may provide a debugger that allows you to see the
call tree for all of your program's threads, for example, and set breakpoints that activate only in
particular threads. The system may provide some form of performance analyzer that lets you
measure the processor time accumulated within each function for a specific thread or across all
threads.
Unfortunately that's only the beginning of the problems when you're debugging asynchronous
code. Debugging inevitably changes the timing of events. That doesn't matter much when you're
debugging sequential code, but it is critical when you're debugging asynchronous code. If one
thread runs even slightly slower than another because it had to process a debugger trap, the
problem you're trying to track down may not happen. Every programmer has run into problems
that won't reproduce under the debugger. You'll run into a lot more of them when you use threads.
It is difficult to track down a memory corruptor, for example, a function that writes through
an uninitialized pointer, in a sequential program. It is even harder in a threaded program. Did some
other thread write to memory without using a mutex? Did it use the wrong mutex? Did it count on
another thread setting up a pointer without explicit synchronization? Was it just an old fashioned
sequential memory corruptor?
Various additional tools are provided by some systems to help you. None of these is standard
or widely available. Tools may check source code for obvious violations of locking protocol,
given a definition of which variables are shared and how they should be locked. They may record
thread interactions while the program runs, and allow you to analyze or even replay the
interactions to determine what happened. They may record and measure synchronization
contention and overhead. They may detect complicated deadlock conditions between a set of
mutexes.
Your most powerful and portable thread debugging tool is your mind, applied through the old
fashioned manual human-powered code review. You'll probably spend a lot of time setting up a
few breakpoints and examining lots of states to try to narrow the problem down a little and then
carefully reading the code to find problems. It is best if you have someone available who didn't
write the code, because a lot of the worst errors are embarrassingly obvious to someone who's not
burdened with detailed knowledge of what the code was supposed to do.
1.8 To thread or not to thread?
"My poor client's fate now depends on your votes."
Here the speaker sat down in his place,
And directed the Judge to refer to his notes
And briefly to sum up the case.
--Lewis Carroll, The Hunting of the Snark
Threads don't necessarily provide the best solution to every programming problem. They're
not always easier to use, and they don't always result in better performance.
A few problems are really "inherently nonconcurrent," and adding threads will only slow the
program down and complicate it. If every step in your program depends directly on the result of
the previous step, then using threads probably won't help. Each thread would have to wait for
another thread to complete.
The most obvious candidates for threaded coding are new applications that accomplish the
following:
1. Perform extensive computation that can be parallelized (or "decomposed") into multiple
threads, and which is intended to run on multiprocessor hardware, or
2. Perform substantial I/O which can be overlapped to improve throughput--many threads can
wait for different I/O requests at the same time. Distributed server applications are good
candidates, since they may have work to do in response to multiple clients, and they must
also be prepared for unsolicited I/O over relatively slow network connections.
Most programs have some natural concurrency, even if it is only reading a command from the
input device while processing the previous command. Threaded applications are often faster, and
usually more responsive, than sequential programs that do the same job. They are generally much
easier to develop and maintain than nonthreaded asynchronous applications that do the same job.
So should you use threads? You probably won't find them appropriate for every programming
job you approach. But threaded programming is a technique that all software developers should
understand.
1.9 POSIX thread concepts
"You seem very clever at explaining words, Sir," said Alice.
"Would you kindly tell me the meaning of the poem
called 'Jabberwocky'?"
"Let's hear it," said Humpty. Dumpty. "I can explain all
the poems that ever were invented--and a good many
that haven't been invented just yet."
--Lewis Carroll, Through the Looking-Glass
First of all, this book focuses on "POSIX threads." Technically, that means the thread
"application programming interfaces" (API) specified by the international formal standard POSIX
1003. lc-1995. This standard was approved by the IEEE in June 1995. A new edition of POSIX
1003.1, called ISO/IEC 9945-1:1996 (ANSI/IEEE Std 1003.1, 1996 Edition) is available from the
IEEE.* This new document includes 1003.1b-1993 (realtime), 1003.1c-1995 (threads), and
1003.1i-1995 (corrections to 1003. lb-1993). Unless you are writing an implementation of the
standard, or are extremely curious, you probably don't want to bother buying the POSIX standard.
For writing threaded code, you'll find books like this one much more useful, supplemented by the
programming documentation for the operating system you're using.
* Contact the IEEE at 1-800-678-IEEE. 9945-1:1996 Information Technology--Portable
Operating System Interface (POSIX)--Part I: System Application: Program Interface (API) [C
Language], ISBN 1-55937-573-6, order number SH94352.
As I explained in the preface, I will use the informal term "Pthreads" to refer to "POSIX 1003.
lc-1995." I will use the slightly more formal term "POSIX.lb" to refer to "POSIX 1003.1b-1993"
in the text, "POSIX.14" to refer to the POSIX 1003.14 "Multiprocessor Profile," and similar
abbreviated notation for other POSIX standards. I'll use the full names where precision is
important, for example, to compare POSIX 1003.1-1990 and POSIX 1003.1-1996, and also in
section titles and captions that appear in the table of contents.
1.9.1 Architectural overview You may remember from Section 1.2 that the three essential aspects of a thread system are
execution context, scheduling, and synchronization. When you evaluate any thread system, or
compare any two thread systems, start by categorizing the features into capabilities that support
execution contexts, scheduling, and synchronization.
With Pthreads, you create an execution context (thread) by calling pthread_create. Creating a
thread also schedules the thread for execution, and it will begin by calling a "thread start function"
that you specified. Pthreads allows you to specify scheduling parameters either at the time you
create the thread, or later on while the thread is running. A thread normally terminates when it
calls pthread_exit, or returns from the thread start function, although we will encounter a few
other possibilities later.
The primary Pthreads synchronization model uses mutexes for protection and condition
variables for communication. You can also use other synchronization mechanisms such as
semaphores, pipes, and message queues. A mutex allows one thread to lock shared data while
using it so that other threads cannot accidentally interfere. A condition variable allows a thread to
wait for shared data to reach some desired state (such as "queue not empty" or "resource
available").
1.9.2 Types and interfaces
This section briefly outlines the Pthreads data types, and some of the rules for interpreting
them. For a full description of the "object" represented by each type and how to create and use
those objects in a program, see the appropriate sections later in this book, as shown in Table 1.2.
| All Pthreads types are "opaque,"
| Portable code cannot make assumptions regarding the representation
| of these types.
All of the "pthread" types listed in Table 1.2 are considered opaque. There is no public
definition of these types' representation, and programmers should never assume anything about the
representation. You should use them only in the manner specifically described by the standard. A
thread identifier, for example, may be an integer, or a pointer, or a structure, and any code that
uses a thread identifier in a way that is not compatible with all of those definitions is incorrect.
Type Section Description
pthread_t 2 thread identifier
pthread_mutex_t 3.2 mutex
pthread_cond_t 3.3 condition variable
pthread_key_t 5.4 "access key" for thread-specific data
pthread_attr_t 5.2.3 thread attributes object
pthread_mutexattr_t 5.2.1 mutex attributes object
pthread_condattr_t 5.2.2 condition variable attributes object
pthread_once_t 5.1 "one time initialization" control context
TABLE 1.2 POSIX threads types
1.9.3 Checking for errors | Pthreads introduces a new way to report errors, without using the
| errno variable,
The Pthreads amendment is the first part of POSIX to depart from the ancient UNIX and C
language conventions regarding error status. Traditionally, functions that succeed returned a useful
value if appropriate, or otherwise indicated success by returning the value 0. On failure, they
returned the special value -1, and set the global value errno to a code specifying the type of error.
The old mechanism has a number of problems, including the fact that it is difficult to create a
function that can both report an error and return a useful value of -1. There are even worse
problems when you add multiple threads to a process. In traditional UNIX systems, and in the
original POSIX.1-1990 standard, errno was an extern int variable. Since such a variable can have
only one value at a time, it can support only a single stream of execution within the process.
| Pthreads functions don't set errno on errors!
| (But most other POSIX functions do.)
New functions in the Pthreads standard reserve the return value for error status, and errno is
not used. Pthreads functions return the value 0 on success, and include an extra output parameter
to specify an address where "useful results" are stored. When a function cannot complete
successfully, an error code from the <errno.h> header file is returned instead of 0 as the function
value.
Pthreads also provides a per-thread errno, which supports other code that uses errno. This
means that when one thread calls some function that reports an error using errno, the value cannot
be overwritten, or read, by any other thread--you may go on using errno just as you always have.
But if you're designing new interfaces you should consider following the new Pthreads convention
for reporting errors. Setting or reading the per-thread errno involves more overhead than reading
or writing a memory location, or returning a value from a function.
To wait for a thread, for example, and check for an error, you might use code like that shown
in the following code example, thread_error.c. The pthread_join function, used to wait for a thread
to terminate, will report an invalid threat identifier by returning the error code ESRCH. An
uninitialized pthread_t is likely to be an invalid thread identifier on most implementations. The
result of running this program should be a message such as "error 3: no such process."
In the unlikely event that the uninitialized thread variable has a pthread_t value that is not
invalid, it should be the ID of the initial thread (there are no other threads in this process). In this
case, pthread_join should either fail with EDEADLK, if your implementation of Pthreads detects
self-deadlock, or the thread will hang waiting for itself to exit.
thread_error.c
Note that there is no equivalent to the perror function to format and print an error value
returned by the Pthreads interfaces. Instead, use strerror to get a string description of the error
number, and print the string to the file stream stderr.
To avoid cluttering each function call in the example programs with a block of code to report
each error and call abort, I have built two error macros--err_abort detects a standard Pthreads error,
and errno_abort is used when a value of -1 means that errno contains an error code. The following
header file, called errors.h, shows these macros. The errors.h header file also includes several
system header files, which would otherwise be required by most of the example programs--this
helps to reduce the size of the examples.
errors.h
The one exception to the Pthreads error rules is pthread_getspecific, which returns the
thread-specific data value of a shared "key." Section 5.4 describes thread-specific data in detail but
for now we're just concerned with error reporting. The capability of managing thread-specific data
is critical to many applications, and the function has to be as fast as possible, so the
pthread_getspecific function doesn't report errors at all. If the pthread_key_t value is illegal, or if
no value has been set in the thread, pthread_getspecific just returns the value NULL.
2 Threads "If seven maids with seven mops
Swept it for half a year,
Do you suppose," the Walrus said,
"That they could get it clear?"
"I doubt it," said the Carpenter,
And shed a bitter tear.
--Lewis Carroll, Through the Looking-Glass
Threads are (and perhaps this will come as no surprise) the essential basis of the style of
programming that I am advocating. Although this chapter focuses on threads, you will never learn
everything you need to know about threads by simply skipping to this chapter and reading it.
Threads are a critical part of the landscape, but you can't do much with only threads. Nevertheless,
one must start somewhere, and here we are.
Section 2.1 describes the programming aspects of creating and managing threads in your
program, that is, how to create threads, how they are represented in your program, and the most
basic things you can do to them once you've created them.
Section 2.2 describes the life cycle of a thread, from creation through "recycling," taking you
through all the scheduling states threads can assume along the way.
2.1 Creating and using threads “A loaf of bread," the Walrus said,
"Is what we chiefly need:
Pepper and vinegar besides
Are very good indeed--
Now, if you're ready, Oysters dear,
We can begin to feed."
--Lewis Carroll, Through the Looking-Glass
pthread_t thread;
int pthread_equal (pthread_t t1, pthread_t t2);
int pthread_create (pthread_t *thread, const pthread_attr_t *attr, void *(*start)(void *),
void *arg);
pthread_t pthread_self (void);
int sched_yield (void);
int pthread_exit (void *value_ptr);
int pthread_detach (pthread_t thread);
int pthread_join (pthread_t thread, void **value_ptr);
The introduction covered some of the basics of what a thread is, and what it means to the
computer hardware. This section begins where the introduction left off. It explains how a thread is
represented in your program, what it means to your program, and some of the operations you can
perform on threads. If you haven't read the introduction, this would be a good time to skip back to
it. (I'll wait for you here.)
Within your program a thread is represented by a thread identifier, of the opaque type
pthread_t. To create a thread, you must declare a variable of type pthread_t somewhere in your
program. If the identifier is needed only within a function, or if the function won't return until the
thread is done, you could declare the identifier with auto storage class. Most of the time, though,
the identifier will be stored in a shared (static or extern) variable, or in a structure allocated from
the heap.
A Pthreads thread begins by calling some function that you provide. This "thread function"
should expect a single argument of type void *, and should return a value of the same type. You
create a thread by passing the thread function's address, and the argument value with which you
want the function to be called, to pthread_create.
When you create a thread, pthread_create returns an identifier, in the pthread_t value referred
to by the thread argument, by which your code refers to the new thread. A thread can also get its
own identifier using the pthread_self function. There is no way to find a thread's identifier unless
either the creator or the thread itself stores the identifier somewhere. You need to have a thread's
identifier to do anything to the thread. If you'll need to know when a thread completes, for
example, you must keep the identifier somewhere.
Pthreads provides the pthread_equal function to compare two thread identifiers. You can only
test for equality. It doesn't make any sense to ask whether one thread identifier is "greater than" or
"less than" another, because there is no ordering between threads. The pthread_equal function
returns a nonzero value if the thread identifiers refer to the same thread, and the value 0 if they do
not refer to the same thread.
| The initial thread (main) is special.
When a C program runs, it begins in a special function named main. In a threaded program,
this special stream of execution is called the "initial thread" or sometimes the "main thread." You
can do anything within the initial thread that you can do within any other thread. It can determine
its own thread identifier by calling pthread_self, for example, or terminate itself by calling
pthread_exit. If the initial thread stores its thread identifier somewhere accessible to another
thread, that thread can wait for the initial thread to terminate, or detach the initial thread.
The initial thread is special because Pthreads retains traditional UNIX process behavior when
the function main returns; that is, the process terminates without allowing other threads to
complete. In general, you do not want to do this in a threaded program, but sometimes it can be
convenient. In many of the programs in this book, for example, threads are created that have no
effect on anything outside the process. It doesn't really matter what those threads are doing, then,
if the process goes away. When the process exits, those threads, all their states, and anything they
might accomplish, simply "evaporate"--there's no reason to clean up.
| Detaching a thread that is still running doesn't affect the thread in any
| way--it just informs the system that the thread's resources can be
| reclaimed when the thread eventually terminates.
Although "thread evaporation" is sometimes useful, most of the time your process will
outlive the individual threads you create. To be sure that resources used by terminated threads are
available to the process, you should always detach each thread you create when you're finished
with it. Threads that have terminated but are not detached may retain virtual memory, including
their stacks, as well as other system resources. Detaching a thread tells the system that you no
longer need that thread, and allows the system to reclaim the resources it has allocated to the
thread.
If you create a thread that you will never need to control, you can use an attribute to create
the thread so that it is already detached. (We'll get to attributes later, in Section 5.2.3.) If you do
not want to wait for a thread that you created, and you know that you will no longer need to
control that thread, you can detach it at any time by calling pthread_detach. A thread may detach
itself, or any other thread that knows its pthread_t identifier may detach it at any time. If you need
to know a thread's return value, or if you need to know when a thread has completed, call
pthread_join. The pthread_join function will block the caller until the thread you specify has
terminated, and then, optionally, store the terminated thread's return value. Calling pthread_join
detaches the specified thread automatically.
As we've seen, threads within a process can execute different instructions, using different
stacks, all at the same time. Although the threads execute independently of each other, they always
share the same address space and file descriptors. The shared address space provides an important
advantage of the threaded programming model by allowing threads to communicate efficiently.
Some programs may create threads that perform unrelated activities, but most often a set of
threads works together toward a common goal. For example, one set of threads may form an
assembly line in which each performs some specific task on a shared data stream and then passes
the data on to the next thread. A set of threads may form a work crew and divide independent parts
of a common task. Or one "manager" thread may take control and divide work among a "crew" of
worker threads. You can combine these models in a variety of ways; for example, a work crew
might perform some complicated step in a pipeline, such as transforming a slice of an array.
The following program, lifecycle.c, creates a thread. We'll refer to this simple example in the
following sections about a thread's life cycle.
7-10 The thread function, thread_routine, returns a value to satisfy the standard thread function
prototype. In this example the thread returns its argument, and the value is always NULL.
18-25 The program creates a thread by calling pthread_create, and then waits for it by calling
pthread_join. You don't need to wait for a thread, but if you don't, you'll need to do something else
to make sure the process runs until the thread completes. Returning from main will cause the
process to terminate, along with all threads. You could, for example, code the main thread to
terminate by calling pthread_exit, which would allow the process to continue until all threads have
terminated.
26-29 When the join completes, the program checks the thread's return value, to be sure that the
thread returned the value it was given. The program exits with 0 (success) if the value is NULL, or
with 1 otherwise.
It is a good idea for all thread functions to return something, even if it is simply NULL. If
you omit the return statement, pthread_join will still return some value--whatever happens to be in
the place where the thread's start function would have stored a return value (probably a register).
lifecycle.c
If the "joining" thread doesn't care about the return value, or if it knows that the "joinee" (the
thread with which it is joining) didn't return a value, then it can pass NULL instead of &retval in
the call to pthread_join. The joinee's return value will be ignored.
When the call to pthread_join returns, the joinee has been detached and you can't join with it
again. In the rare cases where more than one thread might need to know when some particular
thread has terminated, the threads should wait on a condition variable instead of calling
pthread_join. The terminating thread would store its return value (or any other information) in
some known location, and broadcast the condition variable to wake all threads that might be
interested.
2.2 The life of a thread
Come, listen, my men, while I tell you again
The five unmistakable marks
By which you may know, wheresoever you go,
The warranted genuine Snarks.
--Lewis Carroll, The Hunting of the Snark
At any instant, a thread is in one of the four basic states described in Table 2.1. In
implementations, you may see additional "states" that distinguish between various reasons for
entering the four basic states. Digital UNIX, for example, represents these finer distinctions as
"substates," of which each state may have several. Whether they're called "substates" or additional
states, "terminated" might be divided into "exited" and "cancelled"; "blocked" might be broken up
into "blocked on condition variable," "blocked on mutex," "blocked in read," and so forth.
State Meaning
Ready The thread is able to run, but is waiting for a processor. It may have just started, or
just been unblocked, or preempted by another thread.
Running The thread is currently running; on a multiprocessor there may be more than one
running thread in the process.
Blocked The thread is not able to run because it is waiting for something; for example, it
may be waiting for a condition variable, or waiting to lock a mutex, or waiting for
an I/O operation to complete.
Terminated The thread has terminated by returning from its start function, calling
pthread_exit, or having been cancelled and completing all cleanup handlers. It
was not detached, and has not yet been joined. Once it is detached or joined, it
will be recycled.
TABLE 2.1 Thread states
These finer distinctions can be important in debugging and analyzing threaded programs.
However, they do not add substantially to the basic understanding of thread scheduling, and we
will not deal with them here.
Threads begin in the ready state. When the new thread runs it calls your specified thread start
function. It may be preempted by other threads, or block itself to wait for external events any
number of times. Eventually it completes and either returns from the thread start function or calls
the pthread_exit function. In either case it terminates. If the thread has been detached, it is
immediately recycled. (Doesn't that sound nicer than "destroyed"--and most systems reuse the
resources to make new threads.) Otherwise the thread remains in the terminated state until joined
or detached. Figure 2.1 shows the relationships between these thread states, and the events that
cause threads to move from one state to another.
2.2.1 Creation
The "initial thread" of a process is created when the process is created. In a system that fully
supports threaded programming, there's probably no way to execute any code without a thread. A
thread is likely to be the only software context that includes the hardware state needed to execute
code: registers, program counter, stack pointer, and so forth.
Additional threads are created by explicit calls. The primary way to create threads on a
Pthreads system is to call pthread_create. Threads may also be created when the process receives
a POSIX signal if the process signal notify mechanism is set to SIGEV_THREAD. Your system
may provide additional nonstandard mechanisms to create a thread.
FIGURE 2.1 Thread state transitions
When a new thread is created, its state is ready. Depending on scheduling constraints, it may
remain in that state for a substantial period of time before executing. Section 5.5 contains more
information on thread scheduling. Going back to lifecycle.c, the thread running thread_routine
becomes ready during main's call to pthread_create, at line 18.
The most important thing to remember about thread creation is that there is no
synchronization between the creating thread's return from pthread_create and the scheduling of
the new thread. That is, the thread may start before the creating thread returns. The thread may
even run to completion and terminate before pthread_create returns. Refer to Section 8.1.1 for
more information and warnings about what to expect when you create a thread.
2.2.2 Startup Once a thread has been created, it will eventually begin executing machine instructions. The
initial sequence of instructions will lead to the execution of the thread start function that you
specified to pthread_create. The thread start function is called with the argument value you
specified when you created the thread. In lifecycle.c, for example, the thread begins executing user
code at function thread_routine, with the formal parameter argument having a value of NULL.
In the initial thread, the thread "start function" (main) is called from outside your program;
for example, many UNIX systems link your program with a file called crt0.o, which initializes the
process and then calls your main. This is a minor implementation distinction, but it is important to
remember because there are a few ways in which the initial thread is different. For one thing, main
is called with different arguments than a thread start function: the program's argument array (argc
and argv) instead of a single void* argument. For another thing, when a thread start function
returns, the thread terminates but other threads continue to run. When the function main returns in
the initial thread, the process will be terminated immediately. If you want to terminate the initial
thread while allowing other threads in the process to continue running, call pthread_exit instead of
returning from main.
Another important difference to remember is that on most systems, the initial thread runs on
the default process stack, which can grow to a substantial size. “Thread" stacks may be much
more limited on some implementations, and the program will fail with a segmentation fault or bus
error if a thread overflows its stack.
2.2.3 Running and blocking
Like us, threads usually can't stay awake their entire life. Most threads occasionally go to
sleep. A thread can go to sleep because it needs a resource that is not available (it becomes
"blocked") or because the system reassigned the processor on which it was running (it is
"preempted"). A thread spends most of its active life in three states: ready, running, and blocked.
A thread is ready when it is first created, and whenever it is unblocked so that it is once again
eligible to run. Ready threads are waiting for a processor. Also, when a running thread is
preempted, for example, if it is timesliced (because it has run too long), the thread immediately
becomes ready.
A thread becomes running when it was ready and a processor selects the thread for execution.
Usually this means that some other thread has blocked, or has been preempted by a timeslice--the
blocking (or preempted) thread saves its context and restores the context of the next ready thread
to replace itself. On a multiprocessor, however, a previously unused processor may execute a
readied thread without any other thread blocking.
A thread becomes blocked when it attempts to lock a mutex that is currently locked, when it
waits on a condition variable, when it calls sigwait for a signal that is not currently pending, or
when it attempts an I/O operation that cannot be immediately completed. A thread may also
become blocked for other system operations, such as a page fault.
When a thread is unblocked after a wait for some event, it is made ready again. It may
execute immediately, for example, if a processor is available. In lifecycle.c, the main thread blocks
at line 23, in pthread_join, to wait for the thread it created to run. If the thread had not already run
at this point, it would move from ready to running when main becomes blocked. As the thread
runs to completion and returns, the main thread will be unblocked--returning to the ready state.
When processor resources are available, either immediately or after the thread becomes terminated,
main will again become running, and complete.
2.2.4 Termination
A thread usually terminates by returning from its start function (the one you pass to the
pthread_create function). The thread shown in lifecycle.c terminates by returning the value NULL,
for example. Threads that call pthread_exit or that are cancelled using pthread_cancel also
terminate after calling each cleanup handler that the thread registered by calling
pthread_cleanup_push and that hasn't yet been removed by calling pthread_cleanup_pop. Cleanup
handlers are discussed in Section 5.3.3.
Threads may have private "thread-specific data" values (thread-specific data is discussed in
Section 5.4). If the thread has any non-NULL thread-specific data values, the associated destructor
functions for those keys (if any) are called.
If the thread was already detached it moves immediately to the next section, recycling.
Otherwise, the thread becomes terminated. It will remain available for another thread to join with
it using pthread_join. This is analogous to a UNIX process that's terminated but hasn't yet been
"reaped" by a wait operation. Sometimes it is called a "zombie" because it still exists even though
it is "dead." A zombie may retain most or all of the system resources that it used when running, so
it is not a good idea to leave threads in this state for longer than necessary. Whenever you create a
thread with which you won't need to join, you should use the detachstate attribute to create it
"detached" (see Section 5.2.3).
At a minimum, a terminated thread retains the identification (pthread_t value) and the void*
return value that was returned from the thread's start function or specified in a call to pthread_exit.
The only external difference between a thread that terminated "normally" by returning or calling
pthread_exit, and one that terminated through cancellation, is that a cancelled thread's return value
is always PTHREAD_CANCELLED. (This is why "cancelled" is not considered a distinct thread
state.)
If any other thread is waiting to join with the terminating thread, that thread is awakened. It
will return from its call to pthread_join with the appropriate return value. Once pthread_join has
extracted the return value, the terminated thread is detached by pthread_join, and may be recycled
before the call to pthread_join returns. This means that, among other things, the returned value
should never be a stack address associated with the terminated thread's stack--the value at that
address could be overwritten by the time the caller could use it. In lifecycle.c, the main thread will
return from the pthread_join call at line 23 with the value NULL.
| pthread_join is a convenience, not a rule.
Even when you need a return value from a thread that you create, it is often at least as simple
to create the thread detached and devise your own customized return mechanism as it is to use
pthread_join. For example, if you pass information to a worker thread in some form of structure
that another thread can find later, you might have the worker thread simply place the result in that
same structure and broadcast a condition variable when done. The Pthreads context for the thread,
including the thread identifier, can then be recycled immediately when the thread is done, and you
still have the part you really need, the return value, where you can find it easily at any time.
If pthread_join does exactly what you want, then by all means use it. But remember that it is
nothing more than a convenience for the simplest and most limited model of communicating a
thread's results. If it does not do exactly what you need, build your own return mechanism instead
of warping your design to fit the limitations of pthread_join.
2.2.5 Recycling
If the thread was created with the detachstate attribute set to PTHREAD_CREATE_
DETACHED (see Section 5.2.3), or if the thread or some other thread has already called
pthread_detach for the thread's identifier, then the thread is immediately recycled when it becomes
terminated.
If the thread has not been detached when it terminates, it remains in the terminated state until
the thread's pthread_t identifier is passed to pthread_detach or pthread_join. When either function
returns, the thread cannot be accessed again. In lifecycle.c, for example, the thread that had run
thread_routine will be recycled by the time the main thread returns from the pthread_join call at
line 23.
Recycling releases any system or process resources that weren't released at termination. That
includes the storage used for the thread's return value, the stack, memory used to store register
state, and so forth. Some of these resources may have been released at termination; it is important
to remember that none of it should be accessed from any other thread after termination. For
example, if a thread passes a pointer to its stack storage to another thread through shared data, you
should treat that information as obsolete from the time the thread that owns the stack terminates.
3 Synchronization "That's right!" said the Tiger-lily. "The daisies are worst of all.
When one speaks, they all begin together, and it's
enough to make one wither to hear the way they go on!"
--Lewis Carroll, Through the Looking-Glass
To write a program of any complexity using threads, you'll need to share data between
threads, or cause various actions to be performed in some coherent order across multiple threads.
To do this, you need to synchronize the activity of your threads.
Section 3.1 describes a few of the basic terms we'll be using to talk about thread
synchronization: critical section and invariant.
Section 3.2 describes the basic Pthreads synchronization mechanism, the mutex.
Section 3.3 describes the condition variable, a mechanism that your code can use to
communicate changes to the state of invariants protected by a mutex.
Section 3.4 completes this chapter on synchronization with some important information about
threads and how they view the computer's memory.
3.1 Invariants, critical sections, and predicates
"I know what you're thinking about,"
said Tweedledum; "but it isn't so, nohow."
"Contrariwise," continued Tweedledee,
"If it was so, it might be; and if it were so, it would be;
but as it isn't, it ain't. That's logic."
--Lewis Carroll, Through the Looking-Glass
Invariants are assumptions made by a program, especially assumptions about the
relationships between sets of variables. When you build a queue package, for example, you need
certain data. Each queue has a queue header, which is a pointer to the first queued data element.
Each data element includes a pointer to the next data element. But the data isn't all that's
important--your queue package relies on relationships between that data. The queue header, for
example, must either be NULL or contain a pointer to the first queued data element. Each data
element must contain a pointer to the next data element, or NULL if it is the last. Those
relationships are the invariants of your queue package.
It is hard to write a program that doesn't have invariants, though many of them are subtle.
When a program encounters a broken invariant, for example, if it dereferences a queue header
containing a pointer to something that is not a valid data element, the program will probably
produce incorrect results or fail immediately.
Critical sections (also sometimes called "serial regions") are areas of code that affect a shared
state. Since most programmers are trained to think about program functions instead of program
data, you may well find it easier to recognize critical sections than data invariants. However, a
critical section can almost always be translated into a data invariant, and vice versa. When you
remove an element from a queue, for example, you can see the code performing the removal as a
critical section, or you can see the state of the queue as an invariant. Which you see first may
depend on how you're thinking about that aspect of your design.
Most invariants can be "broken," and are routinely broken, during isolated areas of code. The
trick is to be sure that broken invariants are always repaired before "unsuspecting" code can
encounter them. That is a large part of what "synchronization" is all about in an asynchronous
program. Synchronization protects your program from broken invariants. If your code locks a
mutex whenever it must (temporarily) break an invariant, then other threads that rely on the
invariant, and which also lock the mutex, will be delayed until the mutex is unlocked--when the
invariant has been restored.
Synchronization is voluntary, and the participants must cooperate for the system to work. The
programmers must agree not to fight for (or against) possession of the bailing bucket. The bucket
itself does not somehow magically ensure that one and only one programmer bails at any time.
Rather, the bucket is a reliable shared token that, if used properly, can allow the programmers to
manage their resources effectively.
"Predicates" are logical expressions that describe the state of invariants needed by your code.
In English, predicates can be expressed as statements like "the queue is empty" or "the resource is
available." A predicate may be a boolean variable with a TRUE or FALSE value, or it may be the
result of testing whether a pointer is NULL. A predicate may also be a more complicated
expression, such as determining whether a counter is greater than some threshold. A predicate may
even be a value returned from some function. For example, you might call select or poll to
determine whether a file is ready for input.
3.2 Mutexes "How are you getting on?" said the Cat,
as soon as there was mouth enough for it to speak with.
Alice waited till the eyes appeared, and then nodded.
"It's no use speaking to it," she thought,
"till its ears have come, or at least one of them."
--Lewis Carroll. Alice's Adventures in Wonderland
Most threaded programs need to share some data between threads. There may be trouble if
two threads try to access shared data at the same time, because one thread may be in the midst of
modifying some data invariant while another acts on the data as if it were consistent. This section
is all about protecting the program from that sort of trouble.
The most common and general way to synchronize between threads is to ensure that all
memory accesses to the same (or related) data are "mutually exclusive." That means that only one
thread is allowed to write at a time---others must wait for their turn. Pthreads provides mutual
exclusion using a special form of Edsger Dijkstra's semaphore [Dijkstra, 1968a], called a mutex.
The word mutex is a clever combination of "mut" from the word "mutual" and "ex" from the word
"exclusion."
Experience has shown that it is easier to use mutexes correctly than it is to use other
synchronization models such as a more general semaphore. It is also easy to build any
synchronization models using mutexes in combination with condition variables (we'll meet them
at the next comer, in Section 3.3). Mutexes are simple, flexible, and can be implemented
efficiently.
The programmers' bailing bucket is something like a mutex (Figure 3.1). Both are "tokens"
that can be handed around, and used to preserve the integrity of the concurrent system. The bucket
can be thought of as protecting the bailing critical section---each programmer accepts the
responsibility of bailing while holding the bucket, and of avoiding interference with the current
bailer while not holding the bucket. Or, the bucket can be thought of as protecting the invariant
that water can be removed by only one programmer at any time.
Synchronization isn't important just when you modify data. You also need synchronization
when a thread needs to read data that was written by another thread, if the order in which the data
was written matters. As we'll see a little later, in Section 3.4, many hardware systems don't
guarantee that one processor will see shared memory accesses in the same order as another
processor without a "nudge" from software.
FIGURE 3.1 Mutex analogy
Consider, for example, a thread that writes new data to an element in an array, and then
updates a max_index variable to indicate that the array element is valid. Now consider another
thread, running simultaneously on another processor, that steps through the array performing some
computation on each valid element. If the second thread "sees" the new value of max_index before
it sees the new value of the array element, the computation would be incorrect. This may seem
irrational, but memory systems that work this way can be substantially faster than memory
systems that guarantee predictable ordering of memory accesses. A mutex is one general solution
to this sort of problem. If each thread locks a mutex around the section of code that's using shared
data, only one thread will be able to enter the section at a time.
Figure 3.2 shows a timing diagram of three threads sharing a mutex. Sections of the lines that
are above the rounded box labeled "mutex" show where the associated thread does not own the
mutex. Sections of the lines that are below the center line of the box show where the associated
thread owns the mutex, and sections of the lines hovering above the center line show where the
thread is waiting to own the mutex.
FIGURE 3.2 Mutex operation
Initially, the mutex is unlocked. Thread 1 locks the mutex and, because there is no contention,
it succeeds immediately--thread l's line moves below the center of the box. Thread 2 then attempts
to lock the mutex and, because the mutex is already locked, thread 2 blocks, its line remaining
above the center line. Thread 1 unlocks the mutex, unblocking thread 2, which then succeeds in
locking the mutex. Slightly later, thread 3 attempts to lock the mutex, and blocks. Thread 1 calls
pthread_mutex_trylock to try to lock the mutex and, because the mutex is locked, returns
immediately with EBUSY status. Thread 2 unlocks the mutex, which unblocks thread 3 so that it
can lock the mutex. Finally, thread 3 unlocks the mutex to complete our example.
3.2.1 Creating and destroying a mutex pthread_mutex_t mutex = PTHREAD_MUNEX_INITIALIZER;
int pthread_mutex_init (pthread_mitex_t *mutex, pthread_mutexattr_t *attr);
int pthread_mutex_destroy (pthread_mutex_t *mutex);
A mutex is represented in your program by a variable of type pthread_mutex_t. You should
never make a copy of a mutex, because the result of using a copied mutex is undefined. You can,
however, freely copy a pointer to a mutex so that various functions and threads can use it for
synchronization.
Most of the time you'll probably declare mutexes using extern or static storage class, at "file
scope," that is, outside of any function. They should have "normal" (extern) storage class if they
are used by other files, or static storage class if used only within the file that declares the variable.
When you declare a static mutex that has default attributes, you should use the
PTHREAD_MUTEX_INITIALIZER macro, as shown in the mutex_static.c program shown next.
(You can build and run this program, but don't expect anything interesting to happen, since main is
empty.)
mutex_static.c
Often you cannot initialize a mutex statically, for example, when you use malloc to create a
structure that contains a mutex. Then you will need to call pthread_mutex_init to initialize the
mutex dynamically, as shown in mutex_dynamic.c, the next program. You can also dynamically
initialize a mutex that you declare statically--but you must ensure that each mutex is initialized
before it is used, and that each is initialized only once. You may initialize it before creating any
threads, for example, or by calling pthread_once (Section 5.1). Also, if you need to initialize a
mutex with nondefault attributes, you must use dynamic initialization (see Section 5.2.1).
mutex_dynamic.c
It is a good idea to associate a mutex clearly with the data it protects, if possible, by keeping
the definition of the mutex and data together. In mutex_static.c and mutex_dynamic.c, for example,
the mutex and the data it protects are defined in the same structure, and line comments document
the association.
When you no longer need a mutex that you dynamically initialized by calling pthread_
mutex_init, you should destroy the mutex by calling pthread_mutex_destroy. You do not need to
destroy a mutex that was statically initialized using the PTHREAD_MUTEX_INITIALIZER
macro.
| You can destroy a mutex as soon as you are sure no threads are
| blocked on the mutex.
It is safe to destroy a mutex when you know that no threads can be blocked on the mutex, and
no additional threads will try to lock the mutex. The best way to know this is usually within a
thread that has just unlocked the mutex, when program logic ensures that no threads will try to
lock the mutex later. When a thread locks a mutex within some heap data structure to remove the
structure from a list and free the storage, for example, it is safe (and a good idea) to unlock and
destroy the mutex before freeing the storage that the mutex occupies.
3.2.2 Locking and unlocking a mutex int pthread_mutex_lock (pthread_mutex_t *mutex);
int pthread_mutex_trylock (pthread_mutex_t *mutex)s
int pthread_mutex_unlock (pthread_mutex_t *mutex);
In the simplest case, using a mutex is easy. You lock the mutex by calling either
pthread_mutex_lock or pthread_mutex_trylock, do something with the shared data, and then
unlock the mutex by calling pthread_mutex_unlock. To make sure that a thread can read consistent
values for a series of variables, you need to lock your mutex around any section of code that reads
or writes those variables.
You cannot lock a mutex when the calling thread already has that mutex locked. The result of
attempting to do so may be an error return, or it may be a self-deadlock, with the unfortunate
thread waiting forever for itself to unlock the mutex. (If you have access to a system supporting
the UNIX98 thread extensions, you can create mutexes of various types, including recursive
mutexes, which allow a thread to relock a mutex it already owns. The mutex type attribute is
discussed in Section 10.1.2.)
The following program, alarm_mutex.c, is an improved version of alarm_thread.c (from
Chapter 1). It lines up multiple alarm requests in a single "alarm server" thread.
12-17 The alarm_t structure now contains an absolute time, as a standard UNIX time_t, which is the
number of seconds from the UNIX Epoch (Jan 1 1970 00:00) to the expiration time. This is
necessary so that alarm_t structures can be sorted by "expiration time" instead of merely by the
requested number of seconds. In addition, there is a link member to connect the list of alarms.
19-20 The alarm_mutex mutex coordinates access to the list head for alarm requests, called
alarm_list. The mutex is statically initialized using default attributes, with the
PTHREAD_MUTEX_INITIALIZER macro. The list head is initialized to NULL, or empty.
alarm_mutex.c part 1 definitions
The code for the alarm_thread function follows. This function is run as a thread, and
processes each alarm request in order from the list alarm_list. The thread never terminates--when
main returns, the thread simply "evaporates." The only consequence of this is that any remaining
alarms will not be delivered--the thread maintains no state that can be seen outside the process.
If you would prefer that the program process all outstanding alarm requests before exiting,
you can easily modify the program to accomplish this. The main thread must notify alarm_thread,
by some means, that it should terminate when it finds the alarm_list empty. You could, for
example, have main set a new global variable alarm_done and then terminate using pthread_exit
rather than exit. When alarm_thread finds alarm_list empty and alarm_done set, it would
immediately call pthread_exit rather than waiting for a new entry.
29-30 If there are no alarms on the list, alarm_thread needs to block itself, with the mutex unlocked,
at least for a short time, so that main will be able to add a new alarm. It does this by setting
sleep_time to one second.
31-42 If an alarm is found, it is removed from the list. The current time is retrieved by calling the
time function, and it is compared to the requested time for the alarm. If the alarm has already
expired, then alarm_thread will set sleep_time to 0. If the alarm has not expired, alarm_thread
computes the difference between the current time and the alarm expiration time, and sets
sleep_time to that number of seconds.
52-58 The mutex is always unlocked before sleeping or yielding. If the mutex remained locked,
then main would be unable to insert a new alarm on the list. That would make the program behave
synchronously--the user would have to wait until the alarm expired before doing anything else.
(The user would be able to enter a single command, but would not receive another prompt until
the next alarm expired.) Calling sleep blocks alarm_thread for the required period of time--it
cannot run until the timer expires.
Calling sched_yield instead is slightly different. We'll describe sched_yield in detail later (in
Section 5.5.2)--for now, just remember that calling sched_yield will yield the processor to a thread
that is ready to run, but will return immediately if there are no ready threads. In this case, it means
that the main thread will be allowed to process a user command if there's input waiting--but if the
user hasn't entered a command, sched_yield will return immediately.
If the alarm pointer is not NULL, that is, if an alarm was processed from alarm_list, the
function prints a message indicating that the alarm has expired. After printing the message, it frees
the alarm structure. The thread is now ready to process another alarm.
alarm_mutex.c part 2 alarm_thread
And finally, the code for the main program for alarm_mutex.c. The basic structure is the
same as all of the other versions of the alarm program that we've developed--a loop, reading
simple commands from stdin and processing each in turn. This time, instead of waiting
synchronously as in alarm.c, or creating a new asynchronous entity to process each alarm
command as in alarm_fork.c and alarm_thread.c, each request is queued to a server thread,
alarm_thread. As soon as main has queued the request, it is free to read the next command.
8-11 Create the server thread that will process all alarm requests. Although we don't use it, the
thread's ID is returned in local variable thread.
13-28 Read and process a command, much as in any of the other versions of our alarm program. As
in alarm_thread.c, the data is stored in a heap structure allocated by malloc.
30-32 The program needs to add the alarm request to alarm_list, which is shared by both
alarm_thread and main. So we start by locking the mutex that synchronizes access to the shared
data, alarm_mutex.
33 Because alarm_thread processes queued requests, serially, it has no way of knowing how
much time has elapsed between reading the command and processing it. Therefore, the alarm
structure includes the absolute time of the alarm expiration, which we calculate by adding the
alarm interval, in seconds, to the current number of seconds since the UNIX Epoch, as returned by
the time function.
39-49 The alarms are sorted in order of expiration time on the alarm_list queue. The insertion code
searches the queue until it finds the first entry with a time greater than or equal to the new alarm's
time. The new entry is inserted preceding the located entry. Because alarm_list is a simple linked
list, the traversal maintains a current entry pointer (this) and a pointer to the previous entry's link
member, or to the alarm_list head pointer (last).
56-59 If no alarm with a time greater than or equal to the new alarm's time is found, then the new
alarm is inserted at the end of the list. That is, if the alarm pointer is NULL on exit from the search
loop (the last entry on the list always has a link pointer of NULL.), the previous entry (or queue
head) is made to point to the new entry.
alarm_mutex.c part 3 main
This simple program has a few severe failings. Although it has the advantage, compared to
alarm_fork.c or alarm_thread.c, of using fewer resources, it is less responsive. Once alarm_thread
has accepted an alarm request from the queue, it sleeps until that alarm expires. When it fails to
find an alarm request on the list, it sleeps for a second anyway, to allow main to accept another
alarm command. During all this sleeping, it will fail to notice any alarm requests added to the head
of the queue by main, until it returns from sleep.
This problem could be addressed in various ways. The simplest, of course, would be to go
back to alarm_thread.c, where a thread was created for each alarm request. That wasn't so bad,
since threads are relatively cheap. They're still not as cheap as the alarm_t data structure, however,
and we'd like to make efficient programs--not just responsive programs. The best solution is to
make use of condition variables for signaling changes in the state of shared data, so it shouldn't be
a surprise that you'll be seeing one final version of the alarm program, alarm_cond.c, in Section
3.3.4.
3.2.2.1 Nonblocking mutex locks When you lock a mutex by calling pthread_mutex_lock, the calling thread will block if the
mutex is already locked. Normally, that's what you want. But occasionally you want your code to
take some alternate path if the mutex is locked. Your program may be able to do useful work
instead of waiting. Pthreads provides the pthread_mutex_trylock function, which will return an
error status (EBUSY) instead of blocking if the mutex is already locked.
When you use a nonblocking mutex lock, be careful to unlock the mutex only if
pthread_mutex_trylock returned with success status. Only the thread that owns a mutex may
unlock it. An erroneous call to pthread_mutex_unlock may return an error, or it may unlock the
mutex while some other thread relies on having it locked--and that will probably cause your
program to break in ways that may be very difficult to debug.
The following program, trylock.c, uses pthread_mutex_trylock to occasionally report the
value of a counter--but only when its access does not conflict with the counting thread.
4 This definition controls how long counter_thread holds the mutex while updating the counter.
Making this number larger increases the chance that the pthread_mutex_trylock in monitor_thread
will occasionally return EBUSY.
19-39 The counter_thread wakes up approximately each second, locks the mutex, and spins for a
while, incrementing counter. The counter is therefore increased by SPIN each second.
46-72 The monitor_thread wakes up every three seconds, and tries to lock the mutex. If the attempt
fails with EBUSY, monitor_thread counts the failure and waits another three seconds. If the
pthread_mutex_trylock succeeds, then monitor_thread prints the current value of counter (scaled
by SPIN).
80-88 On Solaris 2.5, call thr_setconcurrency to set the thread concurrency level to 2. This allows
the counter_thread and monitor_thread to run concurrently on a uniprocessor. Otherwise, monitor
thread would not run until counter thread terminated.
trylock.c
3.2.3 Using mutexes for atomicity
Invariants, as we saw in Section 3.1, are statements about your program that must always be
true. But we also saw that invariants probably aren't always true, and many can't be. To be always
true, data composing an invariant must be modified atomically. Yet it is rarely possible to make
multiple changes to a program state atomically. It may not even be possible to guarantee that a
single change is made atomically, without substantial knowledge of the hardware and architecture
and control over the executed instructions.
| "Atomic" means indivisible. But most of the time, we just mean
| that threads don't see things that would confuse them.
Although some hardware will allow you to set an array element and increment the array
index in a single instruction that cannot be interrupted, most won't. Most compilers don't let you
control the code to that level of detail even if the hardware can do it, and who wants to write in
assembler unless it is really important? And, more importantly, most interesting invariants are
more complicated than that.
By "atomic," we really mean only that other threads can't accidentally find invariants broken
(in intermediate and inconsistent states), even when the threads are running simultaneously on
separate processors. There are two basic ways to do that when the hardware doesn't support
making the operation indivisible and noninterruptable. One is to detect that you're looking at a
broken invariant and try again, or reconstruct the original state. That's hard to do reliably unless
you know a lot about the processor architecture and are willing to design nonportable code.
When there is no way to enlist true atomicity in your cause, you need to create your own
synchronization. Atomicity is nice, but synchronization will do just as well in most cases. So when
you need to update an array element and the index variable atomically, just perform the operation
while a mutex is locked.
Whether or not the store and increment operations are performed indivisibly and
noninterruptably by the hardware, you know that no cooperating thread can peek until you're done.
The transaction is, for all practical purposes, "atomic." The key, of course, is the word
"cooperating." Any thread that is sensitive to the invariant must use the same mutex before
modifying or examining the state of the invariant.
3.2.4 Sizing a mutex to fit the job How big is a mutex? No, I don't mean the amount of memory consumed by a
pthread_mutex_t structure. I'm talking about a colloquial and completely inaccurate meaning that
happens to make sense to most people. This colorful usage became common during discussions
about modifying existing nonthreaded code to be thread-safe. One relatively simple way to make a
library thread-safe is to create a single mutex, lock it on each entry to the library, and unlock it on
each exit from the library. The library becomes a single serial region, preventing any conflict
between threads. The mutex protecting this big serial region came to be referred to as a "big"
mutex, clearly larger in some metaphysical sense than a mutex that protects only a few lines of
code.
By irrelevant but inevitable extension, a mutex that protects two variables must be "bigger"
than a mutex protecting only a single variable. So we can ask, "How big should a mutex be?" And
we can answer only, "As big as necessary, but no bigger."
When you need to protect two shared variables, you have two basic strategies: You can assign
a small mutex to each variable, or assign a single larger mutex to both variables. Which is better
will depend on a lot of factors. Furthermore, the factors will probably change during development,
depending on how many threads need the data and how they use it.
These are the main design factors:
1. Mutexes aren't free. It takes time to lock them, and time to unlock them. Therefore, code
that locks fewer mutexes will usually run faster than code that locks more mutexes. So use as
few as practical, each protecting as much as makes sense.
2. Mutexes, by their nature, serialize execution. If a lot of threads frequently need to lock a
single mutex, the threads will spend most of their time waiting. That's bad for performance. If
the pieces of data (or code) protected by the mutex are unrelated, you can often improve
performance by splitting the big mutex into several smaller mutexes. Fewer threads will need
the smaller mutexes at any time, so they'll spend less time waiting. So use as many as makes
sense, each protecting as little as is practical.
3. Items 1 and 2 conflict. But that's nothing new or unique, and you can deal with it once
you understand what's going on.
In a complicated program it will usually take some experimentation to get the right balance. Your
code will be simpler in most cases if you start with large mutexes and then work toward smaller
mutexes as experience and performance data show where the heavy contention happens. Simple is
good. Don't spend too much time optimizing until you know there's a problem.
On the other hand, in cases where you can tell from the beginning that the algorithms will
make heavy contention inevitable, don't oversimplify. Your job will be a lot easier if you start with
the necessary mutexes and data structure design rather than adding them later. You will get it
wrong sometimes, because, especially when you are working on your first major threaded project,
your intuition will not always be correct. Wisdom, as they say, comes from experience, and
experience comes from lack of wisdom.
3.2.5 Using more than one mutex Sometimes one mutex isn't enough. This happens when your code "crosses over" some
boundary within the software architecture. For example, when multiple threads will access a
queue data structure at the same time, you may need a mutex to protect the queue header and
another to protect data within a queue element. When you build a tree structure for threaded
programming, you may need a mutex for each node in the tree.
Complications can arise when using more than one mutex at the same time. The worst is
deadlock--when each of two threads holds one mutex and needs the other to continue. More subtle
problems such as priority inversion can occur when you combine mutexes with priority scheduling.
For more information on deadlock, priority inversion, and other synchronization problems, refer to
Section 8.1.
3.2.5.1 Lock hierarchy
If you can apply two separate mutexes to completely independent data, do it. You'll almost
always win in the end by reducing the time when a thread has to wait for another thread to finish
with data that this thread doesn't even need. And if the data is independent you're unlikely to run
into many cases where a given function will need to lock both mutexes.
The complications arise when data isn't completely independent. If you have some program
invariant--even one that's rarely changed or referenced--that affects data protected by two mutexes,
sooner or later you'll need to write code that must lock both mutexes at the same time to ensure the
integrity of that invariant. If one thread locks mutex_a and then locks mutex_b, while another
thread locks mutex_b and then mutex_a, you've coded a classic deadlock, as shown in Table 3.1.
First thread Second thread
pthread_mutex_lock (&mutex_a); pthread_mutex_lock (&mutex_b);
pthread_mutex_lock (&mutex_b); pthread_mutex_lock (&mutex_a);
TABLE 3.1 Mutex deadlock
Both of the threads shown in Table 3.1 may complete the first step about the same time. Even
on a uniprocessor, a thread might complete the first step and then be timesliced (preempted by the
system), allowing the second thread to complete its first step. Once this has happened, neither of
them can ever complete the second step because each thread needs a mutex that is already locked
by the other thread.
Consider these two common solutions to this type of deadlock:
� Fixed locking hierarchy: All code that needs both mutex_a and mutex_b must always
lock mutex_a first and then rautex_b.
� Try and back off: After locking the first mutex of some set (which can be allowed to
block), use pthread_mutex_trylock to lock additional mutexes in the set. If an attempt
fails, release all mutexes in the set and start again.
There are any number of ways to define a fixed locking hierarchy. Sometimes there's an
obvious hierarchical order to the mutexes anyway, for example, if one mutex controls a queue
header and one controls an element on the queue, you'll probably have to have the queue header
locked by the time you need to lock the queue element anyway.
When there's no obvious logical hierarchy, you can create an arbitrary hierarchy; for example,
you could create a generic "lock a set of mutexes" function that sorts a list of mutexes in order of
their identifier address and locks them in that order. Or you could assign them names and lock
them in alphabetical order, or integer sequence numbers and lock them in numerical order.
To some extent, the order doesn't really matter as long as it is always the same. On the other
hand, you will rarely need to lock "a set of mutexes" at one time. Function A will need to lock
mutex 1, and then call function B, which needs to also lock mutex 2. If the code was designed
with a functional locking hierarchy, you will usually find that mutex 1 and mutex 2 are being
locked in the proper order, that is, mutex 1 is locked first and then mutex 2. If the code was
designed with an arbitrary locking order, especially an order not directly controlled by the code,
such as sorting pointers to mutexes initialized in heap structures, you may find that mutex 2
should have been locked before mutex 1.
If the code invariants permit you to unlock mutex 1 safely at this point, you would do better
to avoid owning both mutexes at the same time. That is, unlock mutex 1, and then lock mutex 2. If
there is a broken invariant that requires mutex 1 to be owned, then mutex 1 cannot be released
until the invariant is restored. If this situation is possible, you should consider using a backoff (or
"try and back off') algorithm.
"Backoff' means that you lock the first mutex normally, but any additional mutexes in the set
that are required by the thread are locked conditionally by calling pthread_mutex_trylock. If
pthread_mutex_trylock returns EBUSY, indicating that the mutex is already locked, you must
unlock all of the mutexes in the set and start over.
The backoff solution is less efficient than a fixed hierarchy. You may waste a lot of time
trying and backing off. On the other hand, you don't need to define and follow strict locking
hierarchy conventions, which makes backoff more flexible. You can use the two techniques in
combination to minimize the cost of backing off. Follow some fixed hierarchy for well-defined
areas of code, but apply a backoff algorithm where a function needs to be more flexible.
The program below, backoff. c, demonstrates how to avoid mutex deadlocks by applying a
backoff algorithm. The program creates two threads, one running function lock forward and the
other running function lock_backward. The two threads loop ITERATIONS times, each iteration
attempting to lock all of three mutexes in sequence. The lock_forward thread locks mutex 0, then
mutex 1, then mutex 2, while lock_backward locks the three mutexes in the opposite order.
Without special precautions, this design will always deadlock quickly (except on a uniprocessor
system with a sufficiently long timeslice that either thread can complete before the other has a
chance to run).
15 You can see the deadlock by running the program as backoff 0. The first argument is used to
set the backoff variable. If backoff is 0, the two threads will use pthread_mutex_lock to lock each
mutex. Because the two threads are starting from opposite ends, they will crash in the middle, and
the program will hang. When backoff is nonzero (which it is unless you specify an argument), the
threads use pthread_mutex_trylock, which enables the backoff algorithm. When the mutex lock
fails with EBUSY, the thread will release all mutexes it currently owns, and start over.
16 It is possible that, on some systems, you may not see any mutex collisions, because one
thread is always able to lock all mutexes before the other thread has a chance to lock any. You can
resolve that problem by setting the yield_flag variable, which you do by running the program with
a second argument, for example, backoff 1 1. When yield_flag is 0, which it is unless you specify
a second argument, each thread's mutex locking loop may run uninterrupted, preventing a
deadlock (at least, on a uniprocessor). When yield_flag has a value greater than 0, however, the
threads will call sched_yield after locking each mutex, ensuring that the other thread has a chance
to run. And if you set yield_ flag to a value less than 0, the threads will sleep for one second after
locking each mutex, to be really sure the other thread has a chance to run.
70-75 After locking all of the three mutexes, each thread reports success, and tells how many times
it had to back off before succeeding. On a multiprocessor, or when you've set yield_flag to a
nonzero value, you'll usually see a lot more nonzero backoff counts. The thread unlocks all three
mutexes, in the reverse order of locking, which helps to avoid unnecessary backoffs in other
threads. Calling sched_yield at the end of each iteration "mixes things up" a little so one thread
doesn't always start each iteration first. The sched_yield function is described in Section 5.5.2.
backoff.c
Whatever type of hierarchy you choose, document it, carefully, completely, and often.
Document it in each function that uses any of the mutexes. Document it where the mutexes are
defined. Document it where they are declared in a project header file. Document it in the project
design notes. Write it on your whiteboard. And then tie a string around your finger to be sure that
you do not forget.
You are free to unlock the mutexes in whatever order makes the most sense. Unlocking
mutexes cannot result in deadlock. In the next section, I will talk about a sort of "overlapping
hierarchy" of mutexes, called a "lock chain," where the normal mode of operation is to lock one
mutex, lock the next, unlock the first, and so on. If you use a "try and back off' algorithm, however,
you should always try to release the mutexes in reverse order. That is, if you lock mutex 1, mutex
2, and then mutex 3, you should unlock mutex 3, then mutex 2, and finally mutex 1. If you unlock
mutex 1 and mutex 2 while mutex 3 is still locked, another thread may have to lock both mutex 1
and mutex 2 before finding it cannot lock the entire hierarchy, at which point it will have to unlock
mutex 2 and mutex 1, and then retry. Unlocking in reverse order reduces the chance that another
thread will need to back off.
3.2.5.2 Lock chaining "Chaining" is a special case of locking hierarchy, where the scope of two locks overlap. With
one mutex locked, the code enters a region where another mutex is required. After successfully
locking that second mutex, the first is no longer needed, and can be released. This technique can
be very valuable in traversing data structures such as trees or linked lists. Instead of locking the
entire data structure with a single mutex, and thereby preventing any parallel access, each node or
link has a unique mutex. The traversal code would first lock the queue head, or tree root, find the
desired node, lock it, and then release the root or queue head mutex.
Because chaining is a special form of hierarchy, the two techniques are compatible, if you
apply them carefully. You might use hierarchical locking when balancing or pruning a tree, for
example, and chaining when searching for a specific node.
Apply lock chaining with caution, however. It is exceptionally easy to write code that spends
most of its time locking and unlocking mutexes that never exhibit any contention, and that is
wasted processor time. Use lock chaining only when multiple threads will almost always be active
within different parts of the hierarchy.
3.3 Condition variables
"There's no sort of use in knocking," said the Footman, "and that for two
reasons. First, because I'm on the same side of the door as you are:
secondly, because they're making such a noise inside, no one could
possibly hear you."
--Lewis CarrolI, Afice' s Adventures in Wonderland
A condition variable is used for communicating information about the state of shared data.
You would use a condition variable to signal that a queue was no longer empty, or that it had
become empty, or that anything else needs to be done or can be done within the shared data
manipulated by threads in your program.
Our seafaring programmers use a mechanism much like condition variables to communicate
(Figure 3.3). When the rower nudges a sleeping programmer to signal that the sleeping
programmer should wake up and start rowing, the original rower "signals a condition." When the
exhausted ex-rower sinks into a deep slumber, secure that another programmer will wake him at
the appropriate time, he is "waiting on a condition." When the horrified bailer discovers that water
is seeping into the boat faster than he can remove it, and he yells for help, he is "broadcasting a
condition."
When a thread has mutually exclusive access to some shared state, it may find that there is no
more it can do until some other thread changes the state. The state may be correct, and
consistent--that is, no invariants are broken--but the current state just doesn't happen to be of
interest to the thread. If a thread servicing a queue finds the queue empty, for example, the thread
must wait until an entry is added to the queue.
The shared data, for example, the queue, is protected by a mutex. A thread must lock the
mutex to determine the current state of the queue, for example, to determine that it is empty. The
thread must unlock the mutex before waiting (or no other thread would be able to insert an entry
onto the queue), and then it must wait for the state to change. The thread might, for example, by
some means block itself so that a thread inserting a new queue entry can find its identifier and
awaken it. There is a problem here, though--the thread is running between unlocking and
blocking.
If the thread is still running while another thread locks the mutex and inserts an entry onto the
queue, that other thread cannot determine that a thread is waiting for the new entry. The waiting
thread has already looked at the queue and found it empty, and has unlocked the mutex, so it will
now block itself without knowing that the queue is no longer empty. Worse, it may not yet have
recorded the fact that it intends to wait, so it may wait forever because the other thread cannot find
its identifier. The unlock and wait operations must be atomic, so that no other thread can lock the
mutex before the waiter has become blocked, and is in a state where another thread can awaken it.
| A condition variable wait always returns with the mutex locked.
That's why condition variables exist. A condition variable is a "signaling mechanism"
associated with a mutex and by extension is also associated with the shared data protected by the
mutex. Waiting on a condition variable atomically releases the associated mutex and waits until
another thread signals (to wake one waiter) or broadcasts (to wake all waiters) the condition
variable. The mutex must always be locked when you wait on a condition variable and, when a
thread wakes up from a condition variable wait, it always resumes with the mutex locked.
The shared data associated with a condition variable, for example, the queue "full" and
"empty" conditions, are the predicates we talked about in Section 3.1. A condition variable is the
mechanism your program uses to wait for a predicate to become true, and to communicate to other
threads that it might be true. In other words, a condition variable allows threads using the queue to
exchange information about the changes to the queue state.
| Condition variables are for signaling, not for mutual exclusion.
Condition variables do not provide mutual exclusion. You need a mutex to synchronize
access to the shared data, including the predicate for which you wait. That is why you must
specify a mutex when you wait on a condition variable. By making the unlock atomic with the
wait, the Pthreads system ensures that no thread can change the predicate after you have unlocked
the mutex but before your thread is waiting on the condition variable.
Why isn't the mutex created as part of the condition variable? First, mutexes are used
separately from any condition variable as often as they're used with condition variables. Second, it
is common for one mutex to have more than one associated condition variable. For example, a
queue may be "full" or "empty." Although you may have two condition variables to allow threads
to wait for either condition, you must have one and only one mutex to synchronize all access to
the queue header.
A condition variable should be associated with a single predicate. If you try to share one
condition variable between several predicates, or use several condition variables for a single
predicate, you're risking deadlock or race problems. There's nothing wrong with doing either, as
long as you're careful--but it is easy to confuse your program (computers aren't very smart) and it
is usually not worth the risk. I will expound on the details later, but the rules are as follows: First,
when you share a condition variable between multiple predicates, you must always broadcast,
never signal; and second, signal is more efficient than broadcast.
Both the condition variable and the predicate are shared data in your program; they are used
by multiple threads, possibly at the same time. Because you're thinking of the condition variable
and predicate as being locked together, it is easy to remember that they're always controlled using
the same mutex. It is possible (and legal, and often even reasonable) to signal or broadcast a
condition variable without having the mutex locked, but it is safer to have it locked.
Figure 3.4 is a timing diagram showing how three threads, thread 1, thread 2, and thread 3,
interact with a condition variable. The rounded box represents the condition variable, and the three
lines represent the actions of the three threads. When a line goes within the box, it is "doing
something" with the condition variable. When a thread's line stops before reaching below the
middle line through the box, it is waiting on the condition variable; and when a thread's line
reaches below the middle line, it is signaling or broadcasting to awaken waiters.
Thread 1 signals the condition variable, which has no effect since there are no waiters.
Thread 1 then waits on the condition variable. Thread 2 also blocks on the condition variable and,
shortly thereafter, thread 3 signals the condition variable. Thread 3's signal unblocks thread 1.
Thread 3 then waits on the condition variable. Thread 1 broadcasts the condition variable,
unblocking both thread 2 and thread 3. Thread 3 waits on the condition variable shortly thereafter,
with a timed wait. Some time later, thread 3's wait times out, and the thread awakens.
3.3.1 Creating and destroying a condition variable pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int pthread_cond_init (pthread_cond_t *cond, pthread_condattr_t *condattr);
int pthread_cond_destroy (pthread_cond_t *cond);
A condition variable is represented in your program by a variable of type pthread_cond_t.
You should never make a copy of a condition variable, because the result of using a copied
condition variable is undefined. It would be like telephoning a disconnected number and expecting
an answer. One thread could, for example, wait on one copy of the condition variable, while
another thread signaled or broadcast the other copy of the condition variable--the waiting thread
would not be awakened. You can, however, freely pass pointers to a condition variable so that
various functions and threads can use it for synchronization.
Most of the time you'll probably declare condition variables using the extern or static storage
class at file scope, that is, outside of any function. They should have normal (extern) storage class
if they are used by other files, or static storage class if used only within the file that declares the
variable. When you declare a static condition variable that has default attributes, you should use
the PTHREAD_COND_INITIALIZER initialization macro, as shown in the following example,
cond_static.c.
cond_static.c
| Condition variables and their predicates are "linked"--for best results,
| treat them that way!
When you declare a condition variable, remember that a condition variable and the associated
predicate are "locked together." You may save yourself (or your successor) some confusion by
always declaring the condition variable and predicate together, if possible. I recommend that you
try to encapsulate a set of invariants and predicates with its mutex and one or more condition
variables as members in a structure, and carefully document the association.
Sometimes you cannot initialize a condition variable statically; for example, when you use
malloc to create a structure that contains a condition variable. Then you will need to call
pthread_cond_init to initialize the condition variable dynamically, as shown in the following
example, cond_dynamic.c. You can also dynamically initialize condition variables that you declare
statically--but you must ensure that each condition variable is initialized before it is used, and that
each is initialized only once. You may initialize it before creating any threads, for example, or by
using pthread_once (Section 5.1). If you need to initialize a condition variable with nondefault
attributes, you must use dynamic initialization (see Section 5.2.2).
cond_dynamic.c
When you dynamically initialize a condition variable, you should destroy the condition
variable when you no longer need it, by calling pthread_cond_destroy. You do not need to destroy
a condition variable that was statically initialized using the PTHREAD_COND_INITIALIZER
macro.
It is safe to destroy a condition variable when you know that no threads can be blocked on the
condition variable, and no additional threads will try to wait on, signal, or broadcast the condition
variable. The best way to determine this is usually within a thread that has just successfully
broadcast to unblock all waiters, when program logic ensures that no threads will try to use the
condition variable later.
When a thread removes a structure containing a condition variable from a list, for example,
and then broadcasts to awaken any waiters, it is safe (and also a very good idea) to destroy the
condition variable before freeing the storage that the condition variable occupies. The awakened
threads should check their wait predicate when they resume, so you must make sure that you don't
free resources required for the predicate before they've done so--this may require additional
synchronization.
3.3.2 Waiting on a condition variable int pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_timedwait (pthread_cond_t *cond, pthread_mutex_t *mutex,
struct timespec *expiration);
Each condition variable must be associated with a specific mutex, and with a predicate
condition. When a thread waits on a condition variable it must always have the associated mutex
locked. Remember that the condition variable wait operation will unlock the mutex for you before
blocking the thread, and it will relock the mutex before returning to your code.
All threads that wait on any one condition variable concurrently (at the same time) must
specify the same associated mutex. Pthreads does not allow thread 1, for example, to wait on
condition variable A specifying mutex A while thread 2 waits on condition variable A specifying
mutex B. It is, however, perfectly reasonable for thread 1 to wait on condition variable A
specifying mutex A while thread 2 waits on condition variable B specifying mutex A. That is, each
condition variable must be associated, at any given time, with only one mutex--but a mutex may
have any number of condition variables associated with it.
It is important that you test the predicate after locking the appropriate mutex and before
waiting on the condition variable. If a thread signals or broadcasts a condition variable while no
threads are waiting, nothing happens. If some other thread calls pthread_cond_wait right after that,
it will keep waiting regardless of the fact that the condition variable was just signaled, which
means that if a thread waits when it doesn't have to, it may never wake up. Because the mutex
remains locked until the thread is blocked on the condition variable, the predicate cannot become
set between the predicate test and the wait--the mutex is locked and no other thread can change the
shared data, including the predicate.
| Always test your predicate; and then test it again!
It is equally important that you test the predicate again when the thread wakes up. You should
always wait for a condition variable in a loop, to protect against program errors, multiprocessor
races, and spurious wakeups. The following short program, cond.c shows how to wait on a
condition variable. Proper predicate loops are also shown in all of the examples in this book that
use condition variables, for example, alarm_cond.c in Section 3.3.4.
20-37 The wait_thread sleeps for a short time to allow the main thread to reach its condition wait
before waking it, sets the shared predicate (data.value), and then signals the condition variable.
The amount of time for which wait_thread will sleep is controlled by the hibernation variable,
which defaults to one second.
51-52 If the program was run with an argument, interpret the argument as an integer value, which is
stored in hibernation. This controls the amount of time for which wait.thread will sleep before
signaling the condition variable.
68-83 The main thread calls pthread_cond_timedwait to wait for up to two seconds (from the
current time). If hibernation has been set to a value of greater than two seconds, the condition wait
will time out, returning ETIMEDOUT. If hibernation has been set to two, the main thread and
wait_thread race, and, in principle, the result could differ each time you run the program. If
hibernation is set to a value less than two, the condition wait should not time out.
cond.c
There are a lot of reasons why it is a good idea to write code that does not assume the
predicate is always true on wakeup, but here are a few of the main reasons:
Intercepted wakeups: Remember that threads are asynchronous. Waking up from a
condition variable wait involves locking the associated mutex. But what if some other thread
acquires the mutex first? It may, for example, be checking the predicate before waiting itself. It
doesn't have to wait, since the predicate is now true. If the predicate is "work available," it will
accept the work. When it unlocks the mutex there may be no more work. It would be expensive,
and usually counterproductive, to ensure that the latest awakened thread got the work.
Loose predicates: For a lot of reasons it is often easy and convenient to use approximations
of actual state. For example, "there may be work" instead of "there is work." It is often much
easier to signal or broadcast based on "loose predicates" than on the real "tight predicates." If you
always test the tight predicates before and after waiting on a condition variable, you're free to
signal based on the loose approximations when that makes sense. And your code will be much
more robust when a condition variable is signaled or broadcast accidentally. Use of loose
predicates or accidental wakeups may turn out to be a performance issue; but in many cases it
won't make a difference.
Spurious wakeups: This means that when you wait on a condition variable, the wait may
(occasionally) return when no thread specifically broadcast or signaled that condition variable.
Spurious wakeups may sound strange, but on some multiprocessor systems, making condition
wakeup completely predictable might substantially slow all condition variable operations. The
race conditions that cause spurious wakeups should be considered rare.
It usually takes only a few instructions to retest your predicate, and it is a good programming
discipline. Continuing without retesting the predicate could lead to serious application errors that
might be difficult to track down later. So don't make assumptions: Always wait for a condition
variable in a while loop testing the predicate.
You can also use the pthread_cond_timedwait function, which causes the wait to end with an
ETIMEDOUT status after a certain time is reached. The time is an absolute clock time, using the
POSIX. lb struct timespec format. The timeout is absolute rather than an interval (or "delta time")
so that once you've computed the timeout it remains valid regardless of spurious or intercepted
wakeups. Although it might seem easier to use an interval time, you'd have to recompute it every
time the thread wakes up, before waiting again--which would require determining how long it had
already waited.
When a timed condition wait returns with the ETIMEDOUT error, you should test your
predicate before treating the return as an error. If the condition for which you were waiting is true,
the fact that it may have taken too long usually isn't important. Remember that a thread always
relocks the mutex before returning from a condition wait, even when the wait times out. Waiting
for a locked mutex after timeout can cause the timed wait to appear to have taken a lot longer than
the time you requested.
3.3.3 Waking condition variable waiters
int pthread_cond_signal (pthread_cond_t *cond);
int pthread_cond_broadcast (pthread_cond_t *cond);
Once you've got a thread waiting on a condition variable for some predicate, you'll probably
want to wake it up. Pthreads provides two ways to wake a condition variable waiter. One is called
"signal" and the other is called "broadcast." A signal operation wakes up a single thread waiting on
the condition variable, while broadcast wakes up all threads waiting on the condition variable.
The term "signal" is easily confused with the "POSIX signal" mechanisms that allow you to
define "signal actions," manipulate "signal masks," and so forth. However, the term "signal," as
we use it here, had independently become well established in threading literature, and even in
commercial implementations, and the Pthreads working group decided not to change the term.
Luckily, there are few situations where we might be tempted to use both terms together--it is a
very good idea to avoid using signals in threaded programs when at all possible. If we are careful
to say "signal a condition variable" or "POSIX signal" (or "UNIX signal") where there is any
ambiguity, we are unlikely to cause anyone severe discomfort.
It is easy to think of "broadcast" as a generalization of "signal," but it is more accurate to
think of signal as an optimization of broadcast. Remember that it is never wrong to use broadcast
instead of signal since waiters have to account for intercepted and spurious wakes. The only
difference, in fact, is efficiency: A broadcast will wake additional threads that will have to test
their predicate and resume waiting. But, in general, you can't replace a broadcast with a signal.
"When in doubt, broadcast."
Use signal when only one thread needs to wake up to process the changed state, and when
any waiting thread can do so. If you use one condition variable for several program predicate
conditions, you can't use the signal operation; you couldn't tell whether it would awaken a thread
waiting for that predicate, or for another predicate. Don't try to get around that by resignaling the
condition variable when you find the predicate isn't true. That might not pass on the signal as you
expect; a spurious or intercepted wakeup could result in a series of pointless resignals.
If you add a single item to a queue, and only threads waiting for an item to appear are
blocked on the condition variable, then you should probably use a signal. That'll wake up a single
thread to check the queue and let the others sleep undisturbed, avoiding unnecessary context
switches. On the other hand, if you add more than one item to the queue, you will probably need
to broadcast. For examples of both broadcast and signal operations on condition variables, check
out the "read/write lock" package in Section 7.1.2.
Although you must have the associated mutex locked to wait on a condition variable, you can
signal (or broadcast) a condition variable with the associated mutex unlocked if that is more
convenient. The advantage of doing so is that, on many systems, this may be more efficient. When
a waiting thread awakens, it must first lock the mutex. If the thread awakens while the signaling
thread holds the mutex, then the awakened thread must immediately block on the mutex--you've
gone through two context switches to get back where you started.*
* There is an optimization, which I've called "wait morphing," that moves a thread directly
from the condition variable wait queue to the mutex wait queue in this case, without a context
switch, when the mutex is locked. This optimization can produce a substantial performance
benefit for many applications.
Weighing on the other side is the fact that, if the mutex is not locked, any thread (not only the
one being awakened) can lock the mutex prior to the thread being awakened. This race is one
source of intercepted wakeups. A lower-priority thread, for example, might lock the mutex while
another thread was about to awaken a very high-priority thread, delaying scheduling of the
high-priority thread. If the mutex remains locked while signaling, this cannot happen--the
high-priority waiter will be placed before the lower-priority waiter on the mutex, and will be
scheduled first.
3.3.4 One final alarm program
It is time for one final version of our simple alarm program. In alarm_mutex.c, we reduced
resource utilization by eliminating the use of a separate execution context (thread or process) for
each alarm. Instead of separate execution contexts, we used a single thread that processed a list of
alarms. There was one problem, however, with that approach--it was not responsive to new alarm
commands. It had to finish waiting for one alarm before it could detect that another had been
entered onto the list with an earlier expiration time, for example, if one entered the commands "10
message 1" followed by "5 message 2."
Now that we have added condition variables to our arsenal of threaded programming tools,
we will solve that problem. The new version, creatively named alarm_cond.c, uses a timed
condition wait rather than sleep to wait for an alarm expiration time. When main inserts a new
entry at the head of the list, it signals the condition variable to awaken alarm_thread immediately.
The alarm_thread then requeues the alarm on which it was waiting, to sort it properly with respect
to the new entry, and tries again.
20-22 Part 1 shows the declarations for alarm_cond.c. There are two additions to this section,
compared to alarm_mutex.c: a condition variable called alarm_cond and the current_alarm
variable, which allows main to determine the expiration time of the alarm on which alarm_thread
is currently waiting. The current_alarm variable is an optimization--main does not need to awaken
alarm_thread unless it is either idle, or waiting for an alarm later than the one main has just
inserted.
alarm_cond.c part 1 declarations
Part 2 shows the new function alarm_insert. This function is nearly the same as the list
insertion code from alarm_mutex.c, except that it signals the condition variable alarm_cond when
necessary. I made alarm_insert a separate function because now it needs to be called from two
places--once by main to insert a new alarm, and now also by alarm_thread to reinsert an alarm that
has been "preempted" by a new earlier alarm.
9-14 I have recommended that mutex locking protocols be documented, and here is an example:
The alarm_insert function points out explicitly that it must be called with the alarm_mutex
locked.
48-53 If current_alarm (the time of the next alarm expiration) is 0, then the alarm_thread is not
aware of any outstanding alarm requests, and is waiting for new work. If current_alarm has a time
greater than the expiration time of the new alarm, then alarm_thread is not planning to look for
new work soon enough to handle the new alarm. In either case, signal the alarm_cond condition
variable so that alarm_thread will wake up and process the new alarm.
alarm_cond.c part 2 alarm_insert
Part 3 shows the alarm_thread function, the start function for the "alarm server" thread. The
general structure of alarm_thread is very much like the alarm_thread in alarm_mutex.c. The
differences are due to the addition of the condition variable.
26-31 If the alarm_list is empty, alarm_mutex.c could do nothing but sleep anyway, so that main
would be able to process a new command. The result was that it could not see a new alarm request
for at least a full second. Now, alarm_thread instead waits on the alarm_cond condition variable,
with no timeout. It will "sleep" until you enter a new alarm command, and then main will be able
to awaken it immediately. Setting current_alarm to 0 tells main that alarm_thread is idle.
Remember that pthread_cond_wait unlocks the mutex before waiting, and relocks the mutex
before returning to the caller.
35 The new variable expired is initialized to 0; it will be set to 1 later if the timed condition wait
expires. This makes it a little easier to decide whether to print the current alarm's message at the
bottom of the loop.
36-42 If the alarm we've just removed from the list hasn't already expired, then we need to wait for
it. Because we're using a timed condition wait, which requires a POSIX.lb struct timespec, rather
than the simple integer time required by sleep, we convert the expiration time. This is easy,
because a struct timespec has two members—tv_sec is the number of seconds since the Epoch,
which is exactly what we already have from the time function, and tv_nsec is an additional count
of nanoseconds. We will just set tv_nsec to 0, since we have no need of the greater resolution.
43 Record the expiration time in the current_alarm variable so that main can determine whether
to signal alarm_cond when a new alarm is added.
44-53 Wait until either the current alarm has expired, or main requests that alarm_thread look for a
new, earlier alarm. Notice that the predicate test is split here, for convenience. The expression in
the while statement is only half the predicate, detecting that main has changed current_alarm by
inserting an earlier timer. When the timed wait returns ETIMEDOUT, indicating that the current
alarm has expired, we exit the while loop with a break statement at line 49.
54-55 If the while loop exited when the current alarm had not expired, main must have asked
alarm_thread to process an earlier alarm. Make sure the current alarm isn't lost by reinserting it
onto the list.
57 If we remove from alarm_list an alarm that has already expired, just set the expired variable
to 1 to ensure that the message is printed.
alarm_cond.c part 3 alarm_routine
Part 4 shows the final section of alarm_cond.c, the main program. It is nearly identical to the
main function from alarm_mutex.c.
38 Because the condition variable signal operation is built into the new alarm_insert function,
we call alarm_insert rather than inserting a new alarm directly.
alarm_cond.c part 4 main
3.4 Memory visibility between threads The moment Alice appeared, she was appealed to by all three to settle the
question, and they repeated their arguments to her, though, as they all
spoke at once, she found it very hard to make out exactly what they
said.
--Lewis Carroll, Alice's Adventures in Wonderland
In this chapter we have seen how you should use mutexes and condition variables to
synchronize (or "coordinate") thread activities. Now we'll journey off on a tangent, for just a few
pages, and see what is really meant by "synchronization" in the world of threads. It is more than
making sure two threads don't write to the same location at the same time, although that's part of it.
As the title of this section implies, it is about how threads see the computer's memory.
Pthreads provides a few basic rules about memory visibility. You can count on all
implementations of the standard to follow these rules:
1. Whatever memory values a thread can see when it calls pthread_create can also be seen by
the new thread when it starts. Any data written to memory after the call to pthread_create may not
necessarily be seen by the new thread, even if the write occurs before the thread starts.
2. Whatever memory values a thread can see when it unlocks a mutex, either directly or by
waiting on a condition variable, can also be seen by any thread that later locks the same mutex.
Again, data written after the mutex is unlocked may not necessarily be seen by the thread that
locks the mutex, even if the write occurs before the lock.
3. Whatever memory values a thread can see when it terminates, either by cancellation,
returning from its start function, or by calling pthread_exit, can also be seen by the thread that
joins with the terminated thread by calling pthread_join. And, of course, data written after the
thread terminates may not necessarily be seen by the thread that joins, even if the write occurs
before the join.
4. Whatever memory values a thread can see when it signals or broadcasts a condition
variable can also be seen by any thread that is awakened by that signal or broadcast. And, one
more time, data written after the signal or broadcast may not necessarily be seen by the thread that
wakes up, even if the write occurs before it awakens.
Figures 3.5 and 3.6 demonstrate some of the consequences. So what should you, as a
programmer, do?
First, where possible make sure that only one thread will ever access a piece of data. A
thread's registers can't be modified by another thread. A thread's stack and heap memory a thread
allocates is private unless the thread communicates pointers to that memory to other threads. Any
data you put in register or auto variables can therefore be read at a later time with no more
complication than in a completely synchronous program. Each thread is synchronous with itself.
The less data you share between threads, the less work you have to do.
Second, any time two threads need to access the same data, you have to apply one of the
Pthreads memory visibility rules, which, in most cases, means using a mutex. This is not only to
protect against multiple writes--even when a thread only reads data it must use a mutex to ensure
that it sees the most recent value of the data written while the mutex was locked.
This example does everything correctly. The left-hand code (running in thread A) sets the
value of several variables while it has a mutex locked. The right-hand code (running in thread B)
reads those values, also while holding the mutex.
Thread A Thread B
pthread_mutex_lock (&mutexl);
variableA = 1;
variableB = 2;
pthread_mutex_unlock (&mutexl);
pthread_mutex_lock (&mutexl);
localA = variableA;
localB = variableB;
pthread_mutex_unlock (&mutexl);
Rule 2: visibility from pthread_mutex_unlock to pthread_mutex_lock. When thread B returns
from pthread_mutex_lock, it will see the same values for variableA and variableB that thread A
had seen at the time it called pthread_mutex_unlock. That is, 1 and 2, respectively.
FIGURE 3.5 Correct memory visibility
This example shows an error. The left-hand code (running in thread A) sets the value of variables
after unlocking the mutex. The right-hand code (running in thread B) reads those values while
holding the mutex.
Thread A Thread B
pthread_mutex_lock (&mutexl);
variableA = 1T
pthread_mutex_unlock (&mutexl);
variableB = 2T
pthread_mutex_lock (&mutexl);
localA = variableA;
localB = variableB;
pthread_mutex_unlock (&mutexl);
Rule 2: visibility from pthread_mutex_unlock to pthread_mutex_lock. When thread B returns
from pthread_mutex_lock, it will see the same values for variableA and variableB that thread A
had seen at the time it called pthread_mutex_unlock. That is, it will see the value 1 for variableA,
but may not see the value 2 for variableB since that was written after the mutex was unlocked.
FIGURE 3.6 Incorrect memory visibility
As the rules state, there are specific cases where you do not need to use a mutex to ensure
visibility. If one thread sets a global variable, and then creates a new thread that reads the same
variable, you know that the new thread will not see an old value. But if you create a thread and
then set some variable that the new thread reads, the thread may not see the new value, even if the
creating thread succeeds in writing the new value before the new thread reads it.
| Warning! We are now descending below the Pthreads API into details
| of hardware memory architecture that you may prefer not to know, You
| may want to skip this explanation for now and come back later.
If you are willing to just trust me on all that (or if you've had enough for now), you may now
skip past the end of this section. This book is not about multiprocessor memory architecture, so I
will just skim the surface--but even so, the details are a little deep, and if you don't care right now,
you do not need to worry about them yet. You will probably want to come back later and read the
rest, though, when you have some time.
In a single-threaded, fully synchronous program, it is "safe" to read or write any memory at
any time. That is, if the program writes a value to some memory address, and later reads from that
memory address, it will always receive the last value that it wrote to that address.
When you add asynchronous behavior (which includes multiprocessors) to the program, the
assumptions about memory visibility become more complicated. For example, an asynchronous
signal could occur at any point in the program's execution. If the program writes a value to
memory, a signal handler runs and writes a different value to the same memory address, when the
main program resumes and reads the value, it may not receive the value it wrote.
That's not usually a major problem, because you go to a lot of trouble to declare and use
signal handlers. They run "specialized" code in a distinctly different environment from the main
program. Experienced programmers know that they should write global data only with extreme
care, and it is possible to keep track of what they do. If that becomes awkward, you block the
signal around areas of code that use the global data.
When you add multiple threads to the program the asynchronous code is no longer special.
Each thread runs normal program code, and all in the same unrestricted environment. You can
hardly ever be sure you always know what each thread may be doing. It is likely that they will all
read and write some of the same data. Your threads may run at unpredictable times or even
simultaneously on different processors. And that's when things get interesting.
By the way, although we are talking about programming with multiple threads, none of the
problems outlined in this section is specific to threads. Rather, they are artifacts of memory
architecture design, and they apply to any situation where two "things" independently access the
same memory. The two things may be threads running on separate processors, but they could
instead be processes running on separate processors and using shared memory. Or one "thing"
might be code running on a uniprocessor, while an independent I/O controller reads or writes the
same memory.
| A memory address can hold only one value at a time; don't let threads
| "race" to get there first.
When two threads write different values to the same memory address, one after the other, the
final state of memory is the same as if a single thread had written those two values in the same
sequence. Either way only one value remains in memory. The problem is that it becomes difficult
to know which write occurred last. Measuring some absolute external time base, it may be obvious
that "processor B" wrote the value "2" several microseconds after "processor A' wrote the value
"1 ." That doesn't mean the final state of memory will have a "2."
Why? Because we haven't said anything about how the machine's cache and memory bus
work. The processors probably have cache memory, which is just fast, local memory used to keep
quickly accessible copies of data that were recently read from main memory. In a write-back cache
system, data is initially written only to cache, and copied ("flushed") to main memory at some
later time. In a machine that doesn't guarantee read/write ordering, each cache block may be
written whenever the processor finds it convenient. If two processors write different values to the
same memory address, each processor's value will go into its own cache. Eventually both values
will be written to main memory, but at essentially random times, not directly related to the order in
which the values were written to the respective processor caches.
Even two writes from within a single thread (processor) need not appear in memory in the
same order. The memory controller may find it faster, or just more convenient, to write the values
in "reverse" order, as shown in Figure 3.7. They may have been cached in different cache blocks,
for example, or interleaved to different memory banks. In general, there's no way to make a
program aware of these effects. If there was, a program that relied on them might not run correctly
on a different model of the same processor family, much less on a different type of computer.
The problems aren't restricted to two threads writing memory. Imagine that one thread writes
a value to a memory address on one processor, and then another thread reads from that memory
address on another processor. It may seem obvious that the thread will see the last value written to
that address, and on some hardware that will be true. This is sometimes called "memory
coherence" or "read/write ordering." But it is complicated to ensure that sort of synchronization
between processors. It slows the memory system and the overhead provides no benefit to most
code. Many modern computers (usually among the fastest) don't guarantee any ordering of
memory accesses between different processors, unless the program uses special instructions
commonly known as memory barriers.
Time Thread 1 Thread 2
t write "1" to address 1 (cache)
t+ 1 write "2" to address 2 (cache) read "0" from address 1
t+2 cache system flushes address 2
t+3 read "2" from address 2
t+4 cache system flushes address 1
FIGURE 3.7 Memory ordering without synchronization
Memory accesses in these computers are, at least in principle, queued to the memory
controller, and may be processed in whatever order becomes most efficient. A read from an
address that is not in the processor's cache may be held waiting for the cache fill, while later reads
complete. A write to a "dirty" cache line, which requires that old data be flushed, may be held
while later writes complete. A memory barrier ensures that all memory accesses that were initiated
by the processor prior to the memory barrier have completed before any memory accesses initiated
after the memory barrier can complete.
| A "memory barrier" is a moving wall, not a "cache flush" command.
A common misconception about memory barriers is that they "flush" values to main memory,
thus ensuring that the values are visible to other processors. That is not the case, however. What
memory barriers do is ensure an order between sets of operations. If each memory access is an
item in a queue, you can think of a memory barrier as a special queue token. Unlike other memory
accesses, however, the memory controller cannot remove the barrier, or look past it, until it has
completed all previous accesses.
A mutex lock, for example, begins by locking the mutex, and completes by issuing a memory
barrier. The result is that any memory accesses issued while the mutex is locked cannot complete
before other threads can see that the mutex was locked. Similarly, a mutex unlock begins by
issuing a memory barrier and completes by unlocking the mutex, ensuring that memory accesses
issued while the mutex is locked cannot complete after other threads can see that the mutex is
unlocked.
This memory barrier model is the logic behind my description of the Pthreads memory rules.
For each of the rules, we have a "source" event, such as a thread calling pthread_mutex_unlock,
and a "destination" event, such as another thread returning from pthread_mutex_lock. The passage
of "memory view" from the first to the second occurs because of the memory barriers carefully
placed in each.
Even without read/write ordering and memory barriers, it may seem that writes to a single
memory address must be atomic, meaning that another thread will always see either the intact
original value or the intact new value. But that's not always true, either. Most computers have a
natural memory granularity, which depends on the organization of memory and the bus
architecture. Even if the processor naturally reads and writes 8-bit units, memory transfers may
occur in 32- or 64-bit "memory units."
That may mean that 8-bit writes aren't atomic with respect to other memory operations that
overlap the same 32- or 64-bit unit. Most computers write the full memory unit (say, 32 bits) that
contains the data you're modifying. If two threads write different 8-bit values within the same
32-bit memory unit, the result may be that the last thread to write the memory unit specifies the
value of both bytes, overwriting the value supplied by the first writer. Figure 3.8 shows this effect.
FIGURE 3.8 Memory conflict
If a variable crosses the boundary between memory units, which can happen if the machine
supports unaligned memory access, the computer may have to send the data in two bus
transactions. An unaligned 32-bit value, for example, may be sent by writing the two adjacent
32-bit memory units. If either memory unit involved in the transaction is simultaneously written
from another processor, half of the value may be lost. This is called "word tearing," and is shown
in Figure 3.9.
We have finally returned to the advice at the beginning of this section: If you want to write
portable Pthreads code, you will always guarantee correct memory visibility by using the Pthreads
memory visibility rules instead of relying on any assumptions regarding the hardware or compiler
behavior. But now, at the bottom of the section, you have some understanding of why this is true.
For a substantially more in-depth treatment of multiprocessor memory architecture, refer to UNIX
Systems for Modern Architectures [Schimmel, 1994].
Figure 3.10 shows the same sequence as Figure 3.7, but it uses a mutex to ensure the desired
read/write ordering. Figure 3.10 does not show the cache flush steps that are shown in Figure 3.7,
because those steps are no longer relevant. Memory visibility is guaranteed by passing mutex
ownership in steps t+3 and t+4, through the associated memory barriers. That is, when thread 2
has successfully locked the mutex previously unlocked by thread 1, thread 2 is guaranteed to see
memory values "at least as recent" as the values visible to thread 1 at the time it unlocked the
mutex.
Time Thread 1 Thread 2
t lock mutex
(memory barrier)
t+ 1 write "1" to address 1 (cache)
t+2 write "2" to address 2 (cache)
t+3 (memory barrier)
unlock mutex
t+4 lock mutex
(memory barrier)
t+5 read "1" from address 1
t+6 read "2" from address 2
t+7 (memory barrier)
unlock mutex
FIGURE 3.10 Memory ordering with synchronization
4 A few ways to use threads "They were obliged to have him with them," the Mock Turtle said.
"No wise fish would go anywhere without a porpoise."
"Wouldn't it, really?" said Alice, in a tone of great surprise.
"Of course not," said the Mock Turtle. "Why, if a fish came to me,
and told me he was going on a journey, I should say 'With what porpoise?'"
--Lewis Carroll, Alice's Adventures in Wonderland
During the introduction to this book, I mentioned some of the ways you can structure a
threaded solution to a problem. There are infinite variations, but the primary models of threaded
programming are shown in Table 4.1.
Pipeline Each thread repeatedly performs the same operation on a sequence of data sets,
passing each result to another thread for the next step. This is also known as an
"assembly line."
Work crew Each thread performs an operation on its own data. Threads in a work crew may
all perform the same operation, or each a separate operation, but they always
proceed independently.
Client/server A client "contracts" with an independent server for each job. Often the "contract"
is anonymous--a request is made through some interface that queues the work
item.
TABLE 4.1 Thread programming models
All of these models can be combined in arbitrary ways and modified beyond all recognition
to fit individual situations. A step in a pipeline could involve requesting a service from a server
thread, and the server might use a work crew, and one or more workers in the crew might use a
pipeline. Or a parallel search "engine" might initiate several threads, each trying a different search
algorithm.
4.1 Pipeline "I want a clean cup," interrupted the Hatter: "let's all move one place on."
He moved on as he spoke, and the Dormouse followed him: the March Hare
moved into the Dormouse's place, and Alice rather unwillingly took the
place of the March Hare. The Hatter was the only one who got any
advantage from the change; and Alice was a good deal worse off than
before, as the March Hare had just upset the milk-jug into his plate.
--Lewis Carroll, Alice's Adventures in Wonderland
In pipelining, a stream of "data items" is processed serially by an ordered set of threads
(Figure 4.1). Each thread performs a specific operation on each item in sequence, passing the data
on to the next thread in the pipeline.
For example, the data might be a scanned image, and thread A might process an image array,
thread B might search the processed data for a specific set of features, and thread C might collect
the serial stream of search results from thread B into a report. Or each thread might perform a
single step in some sequence of modifications on the data.
The following program, called pipe.c, shows the pieces of a simple pipeline program. Each
thread in the pipeline increments its input value by 1 and passes it to the next thread. The main
program reads a series of "command lines" from stdin. A command line is either a number, which
is fed into the beginning of the pipeline, or the character "=," which causes the program to read the
next result from the end of the pipeline and print it to stdout.
FIGURE 4.1 Pipelining
9-17 Each stage of a pipeline is represented by a variable of type stage_t. stage_t contains a mutex
to synchronize access to the stage. The avail condition variable is used to signal a stage that data is
ready for it to process, and each stage signals its own ready condition variable when it is ready for
new data. The data member is the data passed from the previous stage, thread is the thread
operating this stage, and next is a pointer to the following stage.
23-29 The pipe_t structure describes a pipeline. It provides pointers to the first and last stage of a
pipeline. The first stage, head, represents the first thread in the pipeline. The last stage, tail, is a
special stage_t that has no thread--it is a place to store the final result of the pipeline.
pipe.c part 1 definitions
Part 2 shows pipe_send, a utility function used to start data along a pipeline, and also called
by each stage to pass data to the next stage.
17-23 It begins by waiting on the specified pipeline stage's ready condition variable until it can
accept new data.
28-30 Store the new data value, and then tell the stage that data is available.
pipe.c part 2 pipe_send
Part 3 shows pipe_stage, the start function for each thread in the pipeline. The thread's
argument is a pointer to its stage_t structure.
16-27 The thread loops forever, processing data. Because the mutex is locked outside the loop, the
thread appears to have the pipeline stage's mutex locked all the time. However, it spends most of
its time waiting for new data, on the avail condition variable. Remember that a thread
automatically unlocks the mutex associated with a condition variable, while waiting on that
condition variable. In reality, therefore, the thread spends most of its time with mutex unlocked.
22-26 When given data, the thread increases its own data value by one, and passes the result to the
next stage. The thread then records that the stage no longer has data by clearing the data_ready
flag, and signals the ready condition variable to wake any thread that might be waiting for this
pipeline stage.
pipe.c part 3 pipe_stage
Part 4 shows pipe_create, the function that creates a pipeline. It can create a pipeline of any
number of stages, linking them together in a list.
18-34 For each stage, it allocates a new stage_t structure and initializes the members. Notice that
one additional "stage" is allocated and initialized to hold the final result of the pipeline.
36-37 The link member of the final stage is set to NULL to terminate the list, and the pipeline's tail
is set to point at the final stage. The tail pointer allows pipe_result to easily find the final product
of the pipeline, which is stored into the final stage.
52-59 After all the stage data is initialized, pipe_create creates a thread for each stage. The extra
"final stage" does not get a thread--the termination condition of the for loop is that the current
stage's next link is not NULL, which means that it will not process the final stage.
pipe.c part 4 pipe_create
Part 5 shows pipe_start and pipe_result. The pipe_start function pushes an item of data into
the beginning of the pipeline and then returns immediately without waiting for a result. The
pipe_result function allows the caller to wait for the final result, whenever the result might be
needed.
9-22 The pipe_start function sends data to the first stage of the pipeline. The function increments a
count of "active" items in the pipeline, which allows pipe_result to detect that there are no more
active items to collect, and to return immediately instead of blocking. You would not always want
a pipeline to behave this way--it makes sense for this example because a single thread alternately
"feeds" and "reads" the pipeline, and the application would hang forever if the user inadvertently
reads one more item than had been fed.
28-47 The pipe_result function first checks whether there is an active item in the pipeline. If not, it
returns with a status of 0, after unlocking the pipeline mutex.
49-55 If there is another item in the pipeline, pipe_result locks the tail (final) stage, and waits for it
to receive data. It copies the data and then resets the stage so it can receive the next item of data.
Remember that the final stage does not have a thread, and cannot reset itself.
pipe.c part 5 pipe_start,pipe_result
Part 6 shows the main program that drives the pipeline. It creates a pipeline, and then loops
reading lines from stdin. If the line is a single "=" character, it pulls a result from the pipeline and
prints it. Otherwise, it converts the line to an integer value, which it feeds into the pipeline.
pipe.c part 6 main
4.2 Work crew The twelve jurors were all writing very busily on slates.
"What are they doing?" Alice whispered to the Gryphon.
"They ca'n't have anything to put down yet, before the trial's begun."
"They're putting down their names," the Gryphon whispered in reply,
"for fear they should forget them before the end of the trial."
--Lewis Carroll, Alice's Adventures in Wonderland
In a work crew, data is processed independently by a set of threads (Figure 4.2). A "parallel
decomposition" of a loop generally falls into this category. A set of threads may be created, for
example, each directed to process some set of rows or columns of an array. A single set of data is
split between the threads, and the result is a single (filtered) set of data. Because all the threads in
the work crew, in this model, are performing the same operation on different data, it is often
known as SIMD parallel processing, for "single instruction, multiple data." The original use of
SIMD was in an entirely different form of parallelism, and doesn't literally apply to threads--but
the concept is similar.
The threads in a work crew don't have to use a SIMD model, though. They may perform
entirely different operations on different data. The members of our work crew, for example, each
remove work requests from a shared queue, and do whatever is required by that request. Each
queued request packet could describe a variety of operations--but the common queue and “mission
statement” (to process that queue) make them a "crew" rather than independent worker threads.
This model can be compared to the original definition of MIMD parallel processing, "multiple
instruction, multiple data."
FIGURE 4.2 Work crew
Section 7.2, by the way, shows the development of a more robust and general (and more
complicated) “work queue manager” package. A “work crew” and a “work queue” are related in
much the same way as "invariants" and “critical sections”--it depends on how you look at what's
happening. A work crew is the set of threads that independently processes data, whereas a work
queue is a mechanism by which your code may request that data be processed by anonymous and
independent “agents." So in this section, we develop a "work crew," whereas in Section 7.2 we
develop a more sophisticated “work queue." The focus differs, but the principle is the same.
The following program, called crew.c, shows a simple work crew. Run the program with two
arguments, a string, and a file path. The program will queue the file path to the work crew. A crew
member will determine whether the file path is a file or a directory--if a file, it will search the file
for the string; if a directory, it will use readdir_r to find all directories and regular files within the
directory, and queue each entry as new work. Each file containing the search string will be
reported on stdout.
Part I shows the header files and definitions used by the program.
7 The symbol CREW_SIZE determines how many threads are created for each work crew.
13-17 Each item of work is described by a work_t structure. This structure has a pointer to the next
work item (set to NULL to indicate the end of the list), a pointer to the file path described by the
work item, and a pointer to the string for which the program is searching. As currently constructed,
all work items point to the same search string.
23-27 Each member of a work crew has a worker_t structure. This structure contains the index of
the crew member in the crew vector, the thread identifier of the crew member (thread), and a
pointer to the crew_t structure (crew).
33-41 The crew_t structure describes the work crew state. It records the number of members in the
work crew (crew_size) and an array of worker_t structures (crew). It also has a counter of how
many work items remain to be processed (work_count) and a list of outstanding work items (first
points to the earliest item, and last to the latest). Finally, it contains the various Pthreads
synchronization objects: a mutex to control access, a condition variable (done) to wait for the
work crew to finish a task, and a condition variable on which crew members wait to receive new
work (go).
41-42 The allowed size of a file name and path name may vary depending on the file system to
which the path leads. When a crew is started, the program calculates the allowable file name and
path length for the specified file path by calling pathconf, and stores the values in path_max and
name_max, respectively, for later use.
crew.c part 1 definitions
Part 2 shows worker_routine, the start function for crew threads. The outer loop repeats
processing until the thread is told to terminate.
22-26 This condition variable loop blocks each new crew member until work is made available.
40-43 POSIX is a little ambiguous about the actual size of the struct dirent type. The actual
requirement for readdir_r is that you pass the address of a buffer large enough to contain a struct
dirent with a name member of at least NAME_MAX bytes. To ensure that we have enough space,
allocate a buffer the size of the system's struct dirent plus the maximum size necessary for a file
name on the file system we're using. This may be bigger than necessary, but it surely won't be too
small.
61-65 This wait is a little different. While the work list is empty, wait for more work. The crew
members never terminate--once they're all done with the current assignment, they're ready for a
new assignment. (This example doesn't take advantage of that capability--the process will
terminate once the single search command has completed.)
73-76 Remove the first work item from the queue. If the queue becomes empty, also clear the
pointer to the last entry, crew->last.
81-83 Unlock the work crew mutex, so that the bulk of the crew's work can proceed concurrently.
89 Determine what sort of file we've got in the work item's path string. We use lstat, which will
return information for a symbolic link, rather than stat, which would return information for the file
to which the link pointed. By not following symbolic links, we reduce the amount of work in this
example, and, especially, avoid following links into other file systems where our name_max and
path_max sizes may not be sufficient.
91-95 If the file is a link, report the name, but do nothing else with it. Note that each message
includes the thread's work crew index (mine->index), so that you can easily see "concurrency at
work" within the example.
96-165 If the file is a directory, open it with opendir. Find all entries in the directory by repeatedly
calling readdir_r. Each directory entry is entered as a new work item.
166-206 If the file is a regular file, open it and read all text, looking for the search string. If we find it,
write a message and exit the search loop.
207-218 If the file is of any other type, write a message attempting to identify the type.
232-252 Relock the work crew mutex, and report that another work item is done. If the count reaches
0, then the crew has completed the assignment, and we broadcast to awaken any threads waiting to
issue a new assignment. Note that the work count is decreased only after the work item is fully
processed--the count will never reach 0 if any crew member is still busy (and might queue
additional directory entries).
crew.c part 2 worker_routine
Part 3 shows crew_create, the function used to create a new work crew. This simple example
does not provide a way to destroy a work crew, because that is not necessary--the work crew
would be destroyed only when the main program was prepared to exit, and process exit will
destroy all threads and process data.
12-15 The crew_create function begins by checking the crew_size argument. The size of the crew is
not allowed to exceed the size of the crew array in crew_t. If the requested size is acceptable,
copy it into the structure.
16-31 Start with no work and an empty work queue. Initialize the crew's synchronization objects.
36-43 Then, for each crew member, initialize the member's worker_t data. The index of the member
within the crew array is recorded, and a pointer back to the crew_t. Then the crew member thread
is created, with a pointer to the member's worker_t as its argument.
crew.c part 3 crew_create
Part 4 shows the crew_start function, which is called to assign a new path name and search
string to the work crew. The function is synchronous--that is, after assigning the task it waits for
the crew members to complete the task before returning to the caller. The crew_start function
assumes that the crew_t structure has been previously created by calling crew_create, shown in
part 3, but does not attempt to validate the structure.
Wait for the crew members to finish any previously assigned work. Although crew_start is
synchronous, the crew may be processing a task assigned by another thread. On creation, the
crew's work_count is set to 0, so the first call to crew_start will not need to wait.
28-43 Get the proper values of path_max and name_max for the file system specified by the file
path we’ll be reading. The pathconf function may return a value of -1 without setting errno, if the
requested value for the file system is “unlimited.” To detect this, we need to clear errno before
making the call. If pathconf returns -1 without setting errno, assume reasonable values.
47-48 The values returned by pathconf don't include the terminating null character of a string--so
add one character to both.
49-67 Allocate a work queue entry (work_t) and fill it in. Add it to the end of the request queue.
68-75 We've queued a single work request, so awaken one of the waiting work crew members by
signaling the condition variable. If the attempt fails, free the work request, clear the work queue,
and return with the error.
76-80 Wait for the crew to complete the task. The crew members handle all output, so when they're
done we simply return to the caller.
crew.c part 4 crew_ start
Part 5 shows the initial thread (main) for the little work crew sample.
10-13 The program requires three arguments--the program name, a string for which to search, and a
path name. For example, "crew butenhof -"
15-23 On a Solaris system, call thr_setconcurrency to ensure that at least one LWP (kernel
execution context) is created for each crew member. The program will work without this call, but,
on a uniprocessor, you would not see any concurrency. See Section 5.6.3 for more information on
"many to few" scheduling models, and Section 10.1.3 for information on "set concurrency"
functions.
24-30 Create a work crew, and assign to it the concurrent file search.
crew.c part 5 main
4.3 Client/Server
But the Judge said he never had summed up before;
So the Snark undertook it instead,
And summed it so well that it came to far more
Than the Witnesses ever had said!
--Lewis Carroll, The Hunting of the Snark
In a client/server system, a "client" requests that a "server" perform some operation on a set
of data {Figure 4.3). The server performs the operation independently-the client can either wait for
the server or proceed in parallel and look for the result at a later time when the result is required.
Although it is simplest to have the client wait for the server, that's rarely very useful--it certainly
doesn't provide a speed advantage to the client. On the other hand, it can be an easy way to
manage synchronization for some common resource.
FIGURE 4.3 Client/Server
If a set of threads all need to read input from stdin, it might be confusing for them to each
issue independent prompt-and-read operations. Imagine that two threads each writes its prompt
using printf, and then each reads the response using gets--you would have no way of knowing to
which thread you were responding. If one thread asks "OK to send mail?" and the other asks "OK
to delete root directory?." you'd probably like to know which thread will receive your response. Of
course there are ways to keep the prompt and response "connected" without introducing a server
thread; for example, by using the flockfile and funlockfile functions to lock both stdin and stdout
around the prompt-and-read sequences, but a server thread is more interesting--and certainly more
relevant to this section.
In the following program, server.c, each of four threads will repeatedly read, and then echo,
input lines. When the program is run you should see the threads prompt in varying orders, and
another thread may prompt before the echo. But you'll never see a prompt or an echo between the
prompt and read performed by the "prompt server."
7-9 These symbols define the commands that can be sent to the "prompt server" It can be asked
to read input, write output, or quit.
14-22 The request_t structure defines each request to the server. The outstanding requests are linked
in a list using the next member. The operation member contains one of the request codes (read,
write, or quit). The synchronous member is nonzero if the client wishes to wait for the operation to
be completed (synchronous), or 0 if it does not wish to wait (asynchronous).
27-33 The tty_server_t structure provides the context for the server thread. It has the
synchronization objects (mutex and request), a flag denoting whether the server is running, and a
list of requests that have been made and not yet processed (first and last).
35-37 This program has a single server, and the control structure (tty_server) is statically allocated
and initialized here. The list of requests is empty, and the server is not running. The mutex and
condition variable are statically initialized.
36-43 The main program and client threads coordinate their shutdown using these synchronization
objects (client_mutex and clients_done) rather than using pthread_join.
server.c part 1 definitions
Part 2 shows the server thread function, tty_server_routine. It loops, processing requests
continuously until asked to quit.
25-30 The server waits for a request to appear using the request condition variable.
31-34 Remove the first request from the queue--if the queue is now empty, also clear the pointer to
the last entry {try_server.last}.
43-66 The switch statement performs the requested work, depending on the operation given in the
request packet. REQ_QUIT tells the server to shut down. REQ_READ tells the server to read,
with an optional prompt string. REQ_WRITE tells the server to write a string.
67-79 If a request is marked "synchronous" (synchronous flag is nonzero), the server sets done_flag
and signals the done condition variable. When the request is synchronous, the client is responsible
for freeing the request packet. If the request was asynchronous, the server frees request on
completion.
80-81 If the request was REQ_QUIT, terminate the server thread by breaking out of the while loop,
to the return statement.
server.c part 2 tty_server_routine
Part 3 shows the function that is called to initiate a request to the tty server thread. The caller
specifies the desired operation (REQ_QUIT, REQ_READ, or REQ_WRITE), whether the
operation is synchronous or not (sync), an optional prompt string (prompt) for REQ_READ
operations, and the pointer to a string (input for REQ_WRITE, or a buffer to return the result of an
REQ_READ operation).
16-40 If a tty server thread is not already running, start one. A temporary thread attributes object
(detached_attr) is created, and the detachstate attribute is set to PTHREAD_CREATE_
DETACHED. Thread attributes will be explained later in Section 5.2.3. In this case, we are just
saying that we will not need to use the thread identifier after creation.
45-76 Allocate and initialize a server request (request_t) packet. If the request is synchronous,
initialize the condition variable (done) in the request packet--otherwise the condition variable isn't
used. The new request is linked onto the request queue.
81-83 Wake the server thread to handle the queued request.
88-105 If the request is synchronous, wait for the server to set done_flag and signal the done
condition variable. If the operation is REQ_READ, copy the result string into the output buffer.
Finally, destroy the condition variable, and free the request packet.
server.c part 3 tty_server_request
Part 4 shows the thread start function for the client threads, which repeatedly queue tty
operation requests to the server.
12-22 Read a line through the tty server. If the resulting string is empty, break out of the loop and
terminate. Otherwise, loop four times printing the result string, at one-second intervals. Why four?
It just "mixes things up" a little.
26-31 Decrease the count of client threads, and wake the main thread if this is the last client thread
to terminate.
server.c part 4 client_routine
Part 5 shows the main program for server. c. It creates a set of client threads to utilize the tty
server, and waits for them.
7-15 On a Solaris system, set the concurrency level to the number of client threads by calling
thr_setconcurrency. Because all the client threads will spend some of their time blocked on
condition variables, we don't really need to increase the concurrency level for this
program--however, it will provide less predictable execution behavior.
20-26 Create the client threads.
27-35 This construct is much like pthread_join, except that it completes only when all of the client
threads have terminated. As I have said elsewhere, pthread_join is nothing magical, and there is no
reason to use it to detect thread termination unless it does exactly what you want. Joining multiple
threads in a loop with pthread_join is rarely exactly what you want, and a "multiple join" like that
shown here is easy to construct.
server.c part 5 main
5 Advanced threaded programming "Take some more tea," the March Hare said to Alice, very earnestly.
"I've had nothing yet," Alice replied in an offended tone:
"so I ca’n't take more."
"You mean you ca'n't take less," said the Hatter:
"it's very easy to take more than nothing."
--Lewis Carroll, Alice's Adventures in Wonderland
The Pthreads standard provides many capabilities that aren't needed by many programs. To
keep the sections dealing with synchronization and threads relatively simple, the more advanced
capabilities are collected into this additional section.
Section 5.1 describes a facility to manage initialization of data, particularly within a library,
in a multithreaded environment.
Section 5.2 describes "attributes objects," a way to control various characteristics of your
threads, mutexes, and condition variables when you create them.
Section 5.3 describes cancellation, a way to ask your threads to "go away" when you don't
need them to continue.
Section 5.4 describes thread-specific data, a sort of database mechanism that allows a library
to associate data with individual threads that it encounters and to retrieve that data later.
Section 5.5 describes the Pthreads facilities for realtime scheduling, to help your program
interact with the cold, cruel world in a predictable way.
5.1 One-time initialization
"'Tis the voice of the Jubjub!" he suddenly cried.
(This man, that they used to call "Dunce.")
“As the Bellman would tell you," he added with pride,
"I have uttered that sentiment once."
--Lewis Carroll, The Hunting of the Snark
pthread_once_t once_control = PTHREAD_ONCE_INIT;
int pthread_once (pthread_once_t *once_control, void(*init_routine) (void));
Some things need to be done once and only once, no matter what. When you are initializing
an application, it is often easiest to do all that from main, before calling anything else that might
depend on the initialization--and, in particular before creating any threads that might depend on
having initialized mutexes, created thread-specific data keys, and so forth.
If you are writing a library, you usually don't have that luxury. But you must still be sure that
the necessary initialization has been completed before you can use anything that needs to be
initialized. Statically initialized mutexes can help a lot, but sometimes you may find this "one-time
initialization" feature more convenient.
In traditional sequential programming, one-time initialization is often managed by a boolean
variable. A control variable is statically initialized to 0, and any code that depends on the
initialization can test the variable. If the value is still 0 it can perform the initialization and then set
the variable to 1. Later checks will skip the initialization.
When you are using multiple threads, it is not that easy. If more than one thread executes the
initialization sequence concurrently, two threads may both find initializer to be 0, and both
perform the initialization, which, presumably should have been performed only once. The state of
initialization is a shared invariant that must be protected by a mutex.
You can code your own one-time initialization using a boolean variable and a statically
initialized mutex. In many cases this will be more convenient than pthread_once, and it will
always be more efficient. The main reason for pthread_once is that you were not originally
allowed to statically initialize a mutex. Thus to use a mutex, you had to first call
pthread_mutex_init. You must initialize a mutex only once, so the initialization call must be made
in one-time initialization code. The pthread_once function solved this recursive problem. When
static initialization of mutexes was added to the standard, pthread_once was retained as a
convenience function. If it's convenient, use it, but remember that you don't have to use it.
First, you declare a control variable of type pthread_once_t. The control variable must be
statically initialized using the PTHREAD_ONCE_INIT macro, as shown in the following program,
called once.c. You must also create a function containing the code to perform all initialization that
is to be associated with the control variable. Now, at any time, a thread may call pthread_once,
specifying a pointer to the control variable and a pointer to the associated initialization function.
The pthread_once function first checks the control variable to determine whether the
initialization has already completed. If so, pthread_once simply returns. If initialization has not
yet been started, pthread_once calls the initialization function (with no arguments), and then
records that initialization has been completed. If a thread calls pthread_once while initialization is
in progress in another thread, the calling thread will wait until that other thread completes
initialization, and then return. In other words, when any call to pthread_once returns successfully,
the caller can be certain that all states initialized by the associated initialization function are ready
to go.
13-20 The function once_init_routine initializes the mutex when called--the use of pthread_once
ensures that it will be called exactly one time.
29 The thread function thread_routine calls pthread_once before using mutex, to ensure that it
exists even if it had not already been created by main.
51 The main program also calls pthread_once before using mutex, so that the program will
execute correctly regardless of when thread_routine runs. Notice that, while I normally stress that
all shared data must be initialized before creating any thread that uses it, in this case, the only
critical shared data is really the once_block--it is irrelevant that the mutex is not initialized,
because the use of pthread_once ensures proper synchronization.
once.c
5.2 Attributes objects
The fifth is ambition. It next will be right
To describe each particular batch:
Distinguishing those that have feathers, and bite,
From those that have whiskers, and scratch.
--Lewis Carroll, The Hunting of the Snark
So far, when we created threads, or dynamically initialized mutexes and condition variables,
we have usually used the pointer value NULL as the second argument. That argument is actually a
pointer to an attributes object. The value NULL indicates that Pthreads should assume the default
value for all attributes--just as it does when statically initializing a mutex or condition variable.
An attributes object is an extended argument list provided when you initialize an object. It
allows the main interfaces (for example, pthread_create} to be relatively simple, while allowing
"expert" capability when you need it. Later POSIX standards will be able to add options without
requiring source changes to existing code. In addition to standard attributes provided by Pthreads,
an implementation can provide specialized options without creating nonstandard parameters.
You can think of an attributes object as a private structure. You read or write the "members"
of the structure by calling special functions, rather than by accessing public member names. For
example, you read the stacksize attribute from a thread attributes object by calling
pthread_attr_getstacksize, or write it by calling pthread_attr_setstacksize.
In a simple implementation of Pthreads the type pthread_attr_t might be a typedef struct and
the get and set functions might be macros to read or write members of the variable. Another
implementation might allocate memory when you initialize an attributes object, and it may
implement the get and set operations as real functions that perform validity checking.
Threads, mutexes, and condition variables each have their own special attributes object type.
Respectively, the types are pthread_attr_t, pthread_mutexattr_t, and pthread_condattr_t.
5.2.1 Mutex attributes pthread_mutexattr_t attr;
int pthread_mutexattr_init { pthread_mutexattr_t *attr);
int pthread_mutexattr_destroy (pthread_mutexattr_t *attr);
#ifdef _POSIX_THREAD PROCESS SHARED
int pthread_mutexattr_getpshared (pthread_mutexattr_t� *attr, int *pshared);
int pthread_mutexattr_setpshared (pthread_mutexattr_t *attr, int pshared);
� endif
Pthreads defines the following attributes for mutex creation: pshared, protocol, and
prioceiling. No system is required to implement any of these attributes, however, so check the
system documentation before using them.
You initialize a mutex attributes object by calling pthread_mutexattr_init, specifying a
pointer to a variable of type pthread_mutexattr_t, as in mutex_attr.c, shown next. You use that
attributes object by passing its address to pthread_mutex_init instead of the NULL value we've
been using so far.
If your system provides the _POSIX_THREAD_PROCESS_SHARED option, then it
supports the pshared attribute, which you can set by calling the function pthread_mutexattr_
setpshared. If you set the pshared attribute to the value PTHREAD_PROCESS_SHARED, you
can use the mutex to synchronize threads within separate processes that have access to the
memory where the mutex (pthread_mutex_t) is initialized. The default value for this attribute is
PTHREAD_PROCESS_PRIVATE.
The mutex_attr.c program shows how to set a mutex attributes object to create a mutex using
the pshared attribute. This example uses the default value, PTHREAD_PROCESS_PRIVATE, to
avoid the additional complexity of creating shared memory and forking a process. The other
mutex attributes, protocol and prioceiling, will be discussed later in Section 5.5.5.
mutex_attr.c
5.2.2 Condition variable attributes pthread_condattr_t attr;
int pthread_condattr_init (pthread_condattr_t *attr);
int pthread_condattr_destroy (pthread_condattr_t *attr);
#ifdef _POSIX_THREAD_PROCESS_SHARED
int pthread_condattr_getpshared(pthread_condattr_t *attr, int *pshared);
int pthread_condattr_setpshared (pthread_condattr_t *attr, int pshared);
#endif
Pthreads defines only one attribute for condition variable creation, pshared. No system is
required to implement this attribute, so check the system documentation before using it. You
initialize a condition variable attributes object using pthread_condattr_init, specifying a pointer to
a variable of type pthread_condattr_t, as in cond_attr.c, shown next. You use that attributes object
by passing its address to pthread_cond_init instead of the NULL value we've been using so far.
If your system defines _POSIX_THREAD_PROCESS_SHARED then it supports the
pshared attribute. You set the pshared attribute by calling the function
pthread_condattr_setpshared. If you set the pshared attribute to the value PTHREAD_PROCESS
SHARED, the condition variable can be used by threads in separate processes that have access to
the memory where the condition variable (pthread_cond_t) is initialized. The default value for this
attribute is PTHREAD_PROCESS_PRIVATE.
The cond_attr.c program shows how to set a condition variable attributes object to create a
condition variable using the pshared attribute. This example uses the default value, PTHREAD_
PROCESS_PRIVATE, to avoid the additional complexity of creating shared memory and forking
a process.
cond_attr.c
To make use of a PTHREAD_PROCESS_SHARED condition variable, you must also use a
PTHREAD_PROCESS_SHARED mutex. That's because two threads that synchronize using a
condition variable must also use the same mutex. Waiting for a condition variable automatically
unlocks, and then locks, the associated mutex. So if the mutex isn't also created with
PTHREAD_PROCESS_SHARED, the synchronization won't work.
5.2.3 Thread attributes pthread_attr_t attr;
int pthread_attr_init (pthread_attr_t *attr);
int pthread_attr_destroy (pthread_attr_t *attr);
int pthread_attr_getdetachstate (pthread_attr_t *attr, int *detachstate);
int pthread_attr_setdetachstate (pthread_attr_t *attr, int detachstate);
#ifdef _POSIX_THREAD_ATTR_STACKSISE
int pthread_attr_getstacksize (pthread_attr_t *attr, size_t *stacksize);
int pthreae_attr_setstacksize (pthread_attr_t *attr, size_t stacksize);
#endif
#ifdef _POSIX_THREAD_ATTR_STACKADDR
int pthread_attr_getstarkaddr (pthread_attr_t *attr, void *stackaddr);
int pthread_attr_set-tackaddr (pthread_attr_t *attr, void **stackaddr);
#endif
POSIX defines the following attributes for thread creation: detachstate, stacksize, stackaddr,
scope, inheritsched, schedpolicy, and schedparam. Some systems won't support all of these
attributes, so you need to check the system documentation before using them. You initialize a
thread attributes object using pthread_attr_init, specifying a pointer to a variable of type
pthread_attr_t, as in the program thread_attr.c, shown later. You use the attributes object you've
created by passing its address as the second argument to pthread_create instead of the NULL value
we've been using so far.
All Pthreads systems support the detachstate attribute. The value of this attribute can be
either PTHREAD_CREATE_JOINABLE or PTHREAD_CREATE_DETACHED. By default,
threads are created joinable, which means that the thread identification created by pthread_create
can be used to join with the thread and retrieve its return value, or to cancel it. If you set the
detachstate attribute to PTHREAD_CREATE_DETACHED, the identification of threads created
using that attributes object can't be used. It also means that when the thread terminates, any
resources it used can immediately be reclaimed by the system.
When you create threads that you know you won't need to cancel, or join with, you should
create them detached. Remember that, in many cases, even if you want to know when a thread
terminates, or receive some return value from it, you may not need to use pthread_join. If you
provide your own notification mechanism, for example, using a condition variable, you can still
create your threads detached.
| Setting the size of a stack is not very portable.
If your system defines the symbol _POSIX_THREAD_ATTR_STACKSIZE, then you can
set the stacksize attribute to specify the minimum size for the stack of a thread created using the
attributes object. Most systems will support this option, but you should use it with caution because
stack size isn't portable. The amount of stack space you'll need depends on the calling standards
and data formats used by each system.
Pthreads defines the symbol PTHREAD_STACK_MIN as the minimum stack size required
for a thread: If you really need to specify a stack size, you might be best off calculating your
requirements in terms of the minimum required by the implementation. Or, you could base your
requirements on the default stacksize attribute selected by the implementation--for example, twice
the default, or half the default. The program thread_attr.c shows how to read the default stacksize
attribute value of an initialized attribute by calling pthread_attr_getstacksize.
| Setting the address of a stack is less portable!
If your system defines the symbol _POSIX_THREAD_ATTR_STACKADDR, then you can
set the stackaddr attribute to specify a region of memory to be used as a stack by any thread
created using this attributes object. The stack must be at least as large as PTHREAD_STACK_
MIN. You may need to specify an area of memory with an address that's aligned to some required
granularity. On a machine where the stack grows downward from higher addresses to lower
addresses, the address you specify should be the highest address in the stack, not the lowest. If the
stack grows up, you need to specify the lowest address.
You also need to be aware of whether the machine increments (or decrements) the stack
before or after writing a new value--this determines whether the address you specify should be
"inside" or "outside" the stack you've allocated. The system can't tell whether you allocated
enough space, or specified the right address, so it has to trust you. If you get it wrong, undesirable
things will occur.
Use the stackaddr attribute only with great caution, and beware that it may well be the least
portable aspect of Pthreads. While a reasonable value for the stacksize attribute will probably
work on a wide range of machines, it is little more than a wild coincidence if any particular value
of the stackaddr attribute works on any two machines. Also, you must remember that you can
create only one thread with any value of the stackaddr attribute. If you create two concurrent
threads with the same stackaddr attribute value, the threads will run on the same stack. (That
would be bad.)
The thread_attr.c program that follows shows some of these attributes in action, with proper
conditionalization to avoid using the stacksize attribute if it is not supported by your system. If
stacksize is supported (and it will be on most UNIX systems), the program will print the default
and minimum stack size, and set stacksize to a value twice the minimum. The code also creates the
thread detached, which means no thread can join with it to determine when it completes. Instead,
main exits by calling pthread_exit, which means that the process will terminate when the last
thread exits.
This example does not include the priority scheduling attributes, which are discussed (and
demonstrated) in Section 5.5.2. It also does not demonstrate use of the stackaddr attribute--as I
said, there is no way to use stackaddr in any remotely portable way and, although I have
mentioned it for completeness, I strongly discourage use of stackaddr in any program.
thread_attr.c
5.3 Cancellation "Now, I give you fair warning,"
shouted the Queen, stamping on the ground as she spoke;
"either you or your head must be off,
and that in about half no time! Take your choice!"
The Duchess took her choice, and was gone in a moment.
--Lewis Carroll, Alice's Adventures in Wonderland
int pthread_cancel (pthread_t thread);
int pthread_setcancelstate (int state, int * oldstate);
int pthread_setcanceltype (int type, int *oldstate);
void pthread_testcancel (void);
void pthread_cleanup_push (void (*routine)(void *), void *arg);
void pthread_cleanup_pop (int execute);
Most of the time each thread runs independently, finishes a specific job, and exits on its own.
But sometimes a thread is created to do something that doesn't necessarily need to be finished. The
user might press a CANCEL button to stop a long search operation. Or the thread might be part of
a redundant algorithm and is no longer useful because some other thread succeeded. What do you
do when you just want a thread to go away? That's what the Pthreads cancellation interfaces are
for.
Cancelling a thread is a lot like telling a human to stop something they're doing. Say that one
of the bailing programmers has become maniacally obsessed with reaching land, and refuses to
stop rowing until reaching safety (Figure 5.1). When the boat finally runs up onto the beach, he's
become so fixated that he fails to realize he's done. The other programmers must roughly shake
him, and forcibly remove the oars from his blistered hands to stop him--but clearly he must be
stopped. That's cancellation. Sort of. I can think of other analogies for cancellation within the
bailing programmer story, but I choose to ignore them. Perhaps you can, too.
Cancellation allows you to tell a thread to shut itself down. You don't need it often, but it can
sometimes be extremely useful. Cancellation isn't an arbitrary external termination. It is more like
a polite (though not necessarily "friendly") request. You're most likely to want to cancel a thread
when you've found that something you set it off to accomplish is no longer necessary. You should
never use cancellation unless you really want the target thread to go away. It is a termination
mechanism, not a communication channel. So, why would you want to do that to a thread that you
presumably created for some reason?
An application might use threads to perform long-running operations, perhaps in the
background, while the user continues working. Such operations might include saving a large
document, preparing to print a document, or sorting a large list. Most such interfaces probably will
need to have some way for the user to cancel an operation, whether it is pressing the ESC key or
Ctrl-C, or clicking a stop sign icon on the screen. The thread receiving the user interface cancel
request would then determine that one or more background operations were in progress, and use
pthread_cancel to cancel the appropriate threads.
FIGURE 5.1 Thread cancellation analogy
Often, threads are deployed to "explore" a data set in parallel for some heuristic solution. For
example, solving an equation for a local minimum or maximum. Once you've gotten one answer
that's good enough, the remaining threads may no longer be needed. If so, you can cancel them to
avoid wasting processor time and get on to other work.
Pthreads allows each thread to control its own termination. It can restore program invariants
and unlock mutexes. It can even defer cancellation while it completes some important operation.
For example, when two write operations must both complete if either completes, a cancellation
between the two is not acceptable.
Pthreads supports three cancellation modes, described in Table 5.1, which are encoded as two
binary values called "cancellation state" and "cancellation type." Each essentially can be on or off.
(While that technically gives four modes, one of them is redundant.) As shown in the table,
cancellation state is said to be enabled or disabled, and cancellation type is said to be deferred or
asynchronous.
Mode State Type Meaning
Off disabled may be either Cancellation remains pending until enabled.
Deferred enabled deferred Cancellation occurs at next cancellation point.
Asynchronous enabled asynchronous Cancellation may be processed at any time.
TABLE 5.1 Cancellation states
By default, cancellation is deferred, and can occur only at specific points in the program that
check whether the thread has been requested to terminate, called cancellation points. Most
functions that can wait for an unbounded time should be deferred cancellation points. Deferred
cancellation points include waiting on a condition variable, reading or writing a file, and other
functions where the thread may be blocked for a substantial period of time. There is also a special
function called pthread_testcancel that is nothing but a deferred cancellation point. It will return
immediately if the thread hasn't been asked to terminate, which allows you to turn any of your
functions into cancellation points.
Some systems provide a function to terminate a thread immediately. Although that sounds
useful, it is difficult to use such a function safely. In fact, it is nearly impossible in a normal
modular programming environment. If a thread is terminated with a mutex locked, for example,
the next thread trying to lock that mutex will be stuck waiting forever.
It might seem that the thread system could automatically release the mutex; but most of the
time that's no help. Threads lock mutexes because they're modifying shared data. No other thread
can know what data has been modified or what the thread was trying to change, which makes it
difficult to fix the data. Now the program is broken. When the mutex is left locked, you can
usually tell that something's broken because one or more threads will hang waiting for the mutex.
The only way to recover from terminating a thread with a locked mutex is for the application
to be able to analyze all shared data and repair it to achieve a consistent and correct state. That is
not impossible, and it is worth substantial effort when an application must be fail-safe. However, it
is generally not practical for anything but an embedded system where the application designers
control every bit of shared state in the process. You would have to rebuild not only your own
program or library state, but also the state affected by any library functions that might be called by
the thread (for example, the ANSI C library).
To cancel a thread, you need the thread's identifier, the pthread_t value returned to the creator
by pthread_create or returned to the thread itself by pthread_self. Cancelling a thread is
asynchronous--that is, when the call to pthread_cancel returns, the thread has not necessarily been
canceled, it may have only been notified that a cancel request is pending against it. If you need to
know when the thread has actually terminated, you must join with it by calling pthread_join after
cancelling it.
If the thread had asynchronous cancelability type set, or when the thread next reaches a
deferred cancellation point, the cancel request will be delivered by the system. When that happens,
the system will set the thread's cancelability type to PTHREAD_CANCEL_DEFERRED and the
cancelability state to PTHREAD_CANCEL_DISABLE. That is, the thread can clean up and
terminate without having to worry about being canceled again.
When a function that is a cancellation point detects a pending cancel request, the function
does not return to the caller. The active cleanup handlers will be called, if there are any, and the
thread will terminate. There is no way to "handle" cancellation and continue execution--the thread
must either defer cancellation entirely or terminate. This is analogous to C++ object destructors,
rather than C++ exceptions--the object is allowed to clean up after itself, but it is not allowed to
avoid destruction.
The following program, called cancel.c, shows how to write a thread that responds
"reasonably quickly" to deferred cancellation, by calling pthread_testcancel within a loop.
11-19 The thread function thread_routine loops indefinitely, until canceled, testing periodically for
a pending cancellation request. It minimizes the overhead of calling pthread_testcancel by doing
so only every 1000 iterations (line 17).
27-35 On a Solaris system, set the thread concurrency level to 2, by calling thr_setconcurrency.
Without the call to thr_setconcurrency, this program will hang on Solaris because thread_routine is
"compute bound" and will not block. The main program would never have another chance to run
once thread routine started, and could not call pthread_cancel.
36-54 The main program creates a thread running thread_routine, sleeps for two seconds, and then
cancels the thread. It joins with the thread, and checks the return value, which should be
PTHREAD_CANCELED to indicate that it was canceled, rather than terminated normally.
cancel.c
A thread can disable cancellation around sections of code that need to complete without
interruption, by calling pthread_setcancelstate. For example, if a database update operation takes
two separate write calls, you wouldn't want to complete the first and have the second canceled. If
you request that a thread be canceled while cancellation is disabled, the thread remembers that it
was canceled but won't do anything about it until after cancellation is enabled again. Because
enabling cancellation isn't a cancellation point, you also need to test for a pending cancel request if
you want a cancel processed immediately.
When a thread may be canceled while it holds private resources, such as a locked mutex or
heap storage that won't ever be freed by any other thread, those resources need to be released
when the thread is canceled. If the thread has a mutex locked, it may also need to "repair" shared
data to restore program invariants. Cleanup handlers provide the mechanism to accomplish the
cleanup, somewhat like process atexit handlers. After acquiring a resource, and before any
cancellation points, declare a cleanup handler by calling pthread_cleanup_push. Before releasing
the resource, but after any cancellation points, remove the cleanup handler by calling
pthread_cleanup_pop.
If you don't have a thread's identifier, you can't cancel the thread. That means that, at least
using portable POSIX functions, you can't write an "idle thread killer" that will arbitrarily
terminate threads in the process. You can only cancel threads that you created, or threads for
which the creator (or the thread itself) gave you an identifier. That generally means that
cancellation is restricted to operating within a subsystem.
5.3.1 Deferred cancelability "Deferred cancelability" means that the thread's cancelability type has been set to PTHREAD
_CANCEL_DEFERRED and the thread's cancelability enable has been set to PTHREAD_
CANCEL_ENABLE. The thread will only respond to cancellation requests when it reaches one of
a set of "cancellation points."
The following functions are always cancellation points on any Pthreads system:
pthread_cond_wait fsync sigwaitinfo
pthread_cond_timedwait mq_receive sigsuspend
pthread_join mq_send sigtimedwait
pthread_testcancel msync sleep
sigwait nanosleep system
aio_suspend open tcdrain
close pause wait
creat read waitpid
fcntl (F_SETLCKW) sem_wait write
The following list of functions may be cancellation points. You should write your code so
that it will function correctly if any of these are cancellation points and also so that it will not
break if any of them are not. If you depend upon any particular behavior, you may limit the
portability of your code. You'll have to look at the conformance documentation to find out which,
if any, are cancellation points for the system you are using:
closedir getc_unlocked printf
ctermid getchar putc
fclose getchar_unlocked putc_unlocked
fcntl (except F_SETLCKW) getcwd putchar
fflush getgrgid putchar_unlocked
fgetc getgrgid_r puts
fgets getrtnam readdir
fopen getgrnam_r remove
fprintf getlogin rename
fputc getlogin_r rewind
fputs getpwnam rewinddir
fread getpwnam_r scanf
freopen getpwuid tmpfile
fscanf getpwuid_r tmpname
fseek gets ttyname
ftell lseek ttyname_r
fwrite opendir ungetc
getc perror
Pthreads specifies that any ANSI C or POSIX function not specified in one of the two lists
cannot be a cancellation point. However, your system probably has many additional cancellation
points. That's because few UNIX systems are "POSIX." That is, they support other programming
interfaces as well--such as BSD 4.3, System V Release 4, UNIX95, and so forth. POSIX doesn't
recognize the existence of functions such as select or poll, and therefore it can't say whether or not
they are cancellation points. Yet clearly both are functions that may block for an arbitrary period
of time, and programmers using them with cancellation would reasonably expect them to behave
as cancellation points. X/Open is currently addressing this problem for UNIX98 (X/Open System
Interfaces, Issue 5), by extending the Pthreads list of cancellation points.
Most cancellation points involve I/O operations that may block the thread for an
"unbounded" time. They're cancelable so that the waits can be interrupted. When a thread reaches
a cancellation point the system determines whether a cancel is pending for the current ("target")
thread. A cancel will be pending if another thread has called pthread_cancel for the target thread
since the last time the target thread returned from a cancellation point. If a cancel is pending, the
system will immediately begin calling cleanup functions, and then the thread will terminate.
If no cancel is currently pending, the function will proceed. If another thread requests that the
thread be canceled while the thread is waiting for something (such as I/O) then the wait will be
interrupted and the thread will begin its cancellation cleanup.
If you need to ensure that cancellation can't occur at a particular cancellation point, or during
some sequence of cancellation points, you can temporarily disable cancellation in that region of
code. The following program, called cancel_disable.c, is a variant of cancel.c. The "target" thread
periodically calls sleep, and does not want the call to be cancelable.
23-32 After each cycle of 755 iterations, thread_routine will call sleep to wait a second. (The value
755 is just an arbitrary number that popped into my head. Do arbitrary numbers ever pop into your
head?) Prior to sleeping, thread_routine disables cancellation by setting the cancelability state to
PTHREAD_CANCEL_DISABLE. After sleep returns, it restores the saved cancelability state by
calling pthread_setcancelstate again.
33-35 Just as in cancel.c, test for a pending cancel every 1000 iterations.
cancel_disable.c
5.3.2 Asynchronous cancelability
Asynchronous cancellation is useful because the "target thread" doesn't need to poll for
cancellation requests by using cancellation points. That can be valuable for a thread that runs a
tight compute-bound loop (for example, searching for a prime number factor) where the overhead
of calling pthread_testcancel might be severe.
| Avoid asynchronous cancellation!
| It is difficult to use correctly and is rarely useful.
The problem is that you're limited in what you can do with asynchronous cancellation
enabled. You can't acquire any resources, for example, including locking a mutex. That's because
the cleanup code would have no way to determine whether the mutex had been locked.
Asynchronous cancellation can occur at any hardware instruction. On some computers it may even
be possible to interrupt some instructions in the middle. That makes it really difficult to determine
what the canceled thread was doing.
For example, when you call malloc the system allocates some heap memory for you, stores a
pointer to that memory somewhere (possibly in a hardware register), and then returns to your code,
which probably moves the return value into some local storage for later use. There are lots of
places that malloc might be interrupted by an asynchronous cancel, with varying effects. It might
be interrupted before the memory was allocated. Or it might be interrupted after allocating storage
but before it stored the address for return. Or it might even return to your code, but get interrupted
before the return value could be copied to a local variable. In any of those cases the variable where
your code expects to find a pointer to the allocated memory will be uninitialized. You can't tell
whether the memory really was allocated yet. You can't free the memory, so that memory (if it was
allocated to you) will remain allocated for the life of the program. That's a memory leak, which is
not a desirable feature.
Or when you call pthread_mutex_lock, the system might be interrupted within a function call
either before or after locking the mutex. Again, there's no way for your program to find out,
because the interrupt may have occurred between any two instructions, even within the
pthread_mutex_lock function, which might leave the mutex unusable. If the mutex is locked, the
application will likely end up hanging because it will never be unlocked.
| Call no code with asynchronous cancellation enabled unless you
| wrote it to be async-cancel safe--and even then, think twice!
You are not allowed to call any function that acquires resources while asynchronous
cancellation is enabled. In fact, you should never call any function while asynchronous
cancellation is enabled unless the function is documented as "async-cancel safe." The only
functions required to be async safe by Pthreads are pthread_cancel, pthread_setcancelstate, and
pthread_setcanceltype. (And there is no reason to call pthread_cancel with asynchronous
cancelability enabled.) No other POSIX or ANSI C functions need be async-cancel safe, and you
should never call them with asynchronous cancelability enabled.
Pthreads suggests that all library functions should document whether or not they are
async-cancel safe. However if the description of a function does not specifically say it is
async-cancel safe you should always assume that it is not. The consequences of asynchronous
cancellation in a function that is not async-cancel safe can be severe. And worse, the effects are
sensitive to timing--so a function that appears to be async-cancel safe during experimentation may
in fact cause all sorts of problems later when it ends up being canceled in a slightly different place.
The following program, cancel_async.c, shows the use of asynchronous cancellation in a
compute-bound loop. Use of asynchronous cancellation makes this loop "more responsive" than
the deferred cancellation loop in cancel.c. However, the program would become unreliable if any
function calls were made within the loop, whereas the deferred cancellation version would
continue to function correctly. In most cases, synchronous cancellation is preferable.
24-28 To keep the thread running awhile with something more interesting than an empty loop,
cancel_async.c uses a simple matrix multiply nested loop. The matrixa and matrixb arrays are
initialized with, respectively, their major or minor array index.
34-36 The cancellation type is changed to PTHREAD_CANCEL_ASYNCHRONOUS, allowing
asynchronous cancellation within the matrix multiply loops.
39-44 The thread repeats the matrix multiply until canceled, on each iteration replacing the first
source array (matrixa) with the result of the previous multiplication (matrixc).
66-74 Once again, on a Solaris system, set the thread concurrency level to 2, allowing the main
thread and thread_routine to run concurrently on a uniprocessor. The program will hang without
this step, since user mode threads are not timesliced on Solaris.
cancel_async.c
| Warning: do not let "DCE threads'" habits carry over to Pthreads!
I’ll end this section with a warning. DCE threads, a critical component of the Open Software
Foundation's Distributed Computing Environment, was designed to be independent of the
underlying UNIX kernel. Systems with no thread support at all often emulated "thread
synchronous" I/O in user mode, using nonblocking I/O mode, so that a thread attempting I/O on a
busy file was blocked on a condition variable until a later select or poll showed that the I/O could
complete. DCE listener threads might block indefinitely on a socket read, and it was important to
be able to cancel that read.
When DCE was ported to newer kernels that had thread support, but not Pthreads support, the
user mode I/O wrappers were usually omitted, resulting in a thread blocked within a kernel that
did not support deferred cancellation. Users discovered that, in many cases, these systems
implemented asynchronous cancellation in such a way that, quite by coincidence, a kernel wait
might be canceled "safely" ff the thread switched to asynchronous cancellation immediately
before the kernel call, and switched back to deferred cancellation immediately after. This
observation was publicized in DCE documentation, but it is a very dangerous hack, even on
systems where it seems to work. You should never try this on any Pthreads system! If your system
conforms to POSIX 1003.lc-1995 (or POSIX 1003.1, 1996 edition, or later), it supports deferred
cancellation of, at minimum, kernel functions such as read and write. You do not need
asynchronous cancellation, and using it can be extremely dangerous.
5.3.3 Cleaning up | When you write any library code, design it to handle deferred
| cancellation gracefully. Disable cancellation where it is not
| appropriate, and always use cleanup handlers at cancellation points.
If a section of code needs to restore some state when it is canceled, it must use cleanup
handlers. When a thread is canceled while waiting for a condition variable, it will wake up with
the mutex locked. Before the thread terminates it usually needs to restore invariants, and it always
needs to release the mutex.
Each thread may be considered to have a stack of active cleanup handlers. Cleanup handlers
are added to the stack by calling pthread_cleanup_push, and the most recently added cleanup
handler is removed by calling pthread_cleanup_pop. When the thread is canceled or when it exits
by calling pthread_exit, Pthreads calls each active cleanup handler in turn, beginning with the
most recently added cleanup handler. When all active cleanup handlers have returned, the thread is
terminated.
Pthreads cleanup handlers are designed so that you can often use the cleanup handler even
when the thread wasn't canceled. It is often useful to run the same cleanup function regardless of
whether your code is canceled or completes normally. When pthread_cleanup_pop is called with a
nonzero value, the cleanup handler is executed even if the thread was not canceled.
You cannot push a cleanup handler in one function and pop it in another function. The
pthread_cleanup_push and pthread_cleanup_pop operations may be defined as macros, such that
pthread_cleanup_push contains the opening brace "{" of a block, while pthread_cleanup_pop
contains the matching closing brace "}" of the block. You must always keep this restriction in
mind while using cleanup handlers, if you wish your code to be portable.
The following progranl, cancel_cleanup.c, shows the use of a cleanup handler to release a
mutex when a condition variable wait is canceled.
10-17 The control structure (control) is used by all threads to maintain shared synchronization
objects and invariants. Each thread increases the member counter by one when it starts, and
decreases it at termination. The member busy is used as a dummy condition wait predicate--it is
initialized to 1, and never cleared, which means that the condition wait loops will never terminate
(in this example) until the threads are canceled.
23-34 The function cleanup_handler is installed as the cancellation cleanup handler for each thread.
It is called on normal termination as well as through cancellation, to decrease the count of active
threads and unlock the mutex.
47 The function thread_routine establishes cleanup_handler as the active cancellation cleanup
handler.
54-58 Wait until the control structure's busy member is set to 0, which, in this example, will never
occur. The condition wait loop will exit only when the wait is canceled.
60 Although the condition wait loop in this example will not exit, the function cleans up by
removing the active cleanup handler. The nonzero argument to pthread_cleanup_pop, remember,
means that the cleanup handler will be called even though cancellation did not occur.
In some cases, you may omit "unreachable statements" like this pthread_cleanup_pop call.
However, in this case, your code might not compile without it. The pthread_cleanup_push and
pthread_cleanup_pop macros are special, and may expand to form, respectively, the beginning
and ending of a block. Digital UNIX does this, for example, to implement cancellation on top of
the common structured exception handling provided by the operating system.
cancel_cleanup.c
If one of your threads creates a set of threads to "subcontract" some function, say, a parallel
arithmetic operation, and the "contractor" is canceled while the function is in progress, you
probably won't want to leave the subcontractor threads running. Instead, you could "pass on" the
cancellation to each subcontrator thread, letting them handle their own termination independently.
If you had originally intended to join with the subcontractors, remember that they will
continue to consume some resources until they have been joined or detached. When the contractor
thread cancels them, you should not delay cancellation by joining with the subcontractors. Instead,
you can cancel each thread and immediately detach it using pthread_detach. The subcontractor
resources can then be recycled immediately as they finish, while the contractor can wrap things up
independently.
The following program, cancel_subcontract.c, shows one way to propagate cancellation to
subcontractors.
9-12 The team_t structure defines the state of the team of subcontractor threads. The join_i
member records the index of the last subcontractor with which the contractor had joined, so on
cancellation from within pthread_join, it can cancel the threads it had not yet joined. The workers
member is an array recording the thread identifiers of the subcontractor threads.
18-25 The subcontractor threads are started running the worker_routine function. This function
loops until canceled, calling pthread_testcancel every 1000 iterations.
31-46 The cleanup function is established as the active cleanup handler within the contractor thread.
When the contractor is canceled, cleanup iterates through the remaining (unjoined) subcontractors,
cancelling and detaching each. Note that it does not join the subcontractors--in general, it is not a
good idea to wait in a cleanup handler. The thread, after all, is expected to clean up and terminate,
not to wait around for something to happen. But if your cleanup handler really needs to wait for
something, don't be afraid, it will work just fine.
53-76 The contractor thread is started running thread_routine. This function creates a set of
subcontractors, then joins with each subcontractor. As it joins each thread, it records the current
index within the workers array in the team_t member join_i. The cleanup handler is established
with a pointer to the team structure so that it can determine the last offset and begin cancelling the
remaining subcontractors.
78-104 The main program creates the contractor thread, running thread_routine, and then sleeps for
five seconds. When it wakes up, it cancels the contractor thread, and waits for it to terminate.
cancel_subcontract.c
5.4 Thread-specific data No, I've made up my mind about it: if I'm Mabel, I'll stay down here. It'll be
no use their putting their heads down and saying "Come up again,
dear!" I shall only look up and say "Who am I, then? Tell me that first, and
then, if I like being that person, I'll come up: if not, I'll stay down here till
I'm somebody else."
--Lewis Carroll, Alice's Adventures in Wonderland
When a function in a single threaded program needs to create private data that persists across
calls to that function, the data can be allocated statically in memory. The name's scope can be
limited to the function or file that uses it (static) or it can be made global (extern).
It is not quite that simple when you use threads. All threads within a process share the same
address space, which means that any variable declared as static or extern, or in the process heap,
may be read and written by all threads within the process. That has several important implications
for code that wants to store "persistent" data between a series of function calls within a thread:
� The value in a static or extern variable, or in the heap, will be the value last written by
any thread. In some cases this may be what you want, for example, to maintain the seed
of a pseudorandom number sequence. In other cases, it may not be what you want.
� The only storage a thread has that's truly "private" are processor registers. Even stack
addresses can be shared, although only if the "owner" deliberately exposes an address to
another thread. In any event, neither registers nor "private" stack can replace uses of
persistent static storage in nonthreaded code.
So when you need a private variable, you must first decide whether all threads share the same
value, or whether each thread should have its own value. If they share, then you can use static or
extern data, just as you could in a single threaded program; however, you must synchronize access
to the shared data across multiple threads, usually by adding one or more mutexes.
If each thread needs its own value for a private variable, then you must store all the values
somewhere, and each thread must be able to locate the proper value. In some cases you might be
able to use static data, for example, a table where you can search for a value unique to each thread,
such as the thread's pthread_t. In many interesting cases you cannot predict how many threads
might call the function--imagine you were implementing a thread-safe library that could be called
by arbitrary code, in any number of threads.
The most general solution is to malloc some heap in each thread and store the values there,
but your code will need to be able to find the proper private data in any thread. You could create a
linked list of all the private values, storing the creating thread's identifier (pthread_t) so it could be
found again, but that will be slow if there are many threads. You need to search the list to find the
proper value, and it would be difficult to recover the storage that was allocated by terminated
threads--your function cannot know when a thread terminates.
| New interfaces should not rely on implicit persistent storage!
When you are designing new interfaces, there's a better solution. You should require the
caller to allocate the necessary persistent state, and tell you where it is. There are many advantages
to this model, including, most importantly:
� In many cases, you can avoid internal synchronization using this model, and, in rare
cases where the caller wishes to share the persistent state between threads, the caller
can supply the needed synchronization.
� The caller can instead choose to allocate more than one state buffer for use within a
single thread. The result is several independent sequences of calls to your function
within the same thread, with no conflict.
The problem is that you often need to support implicit persistent states. You may be making an
existing interface thread-safe, and cannot add an argument to the functions, or require that the
caller maintain a new data structure for your benefit. That's where thread-specific data comes in.
Thread-specific data allows each thread to have a separate copy of a variable, as if each
thread has an array of thread-specific data values, which is indexed by a common "key" value.
Imagine that the bailing programmers are wearing their corporate ID badges, clipped to their shirt
pockets (Figure 5.2). While the information is different for each programmer, you can find the
information easily without already knowing which programmer you're examining.
The program creates a key (sort of like posting a corporate regulation that employee
identification badges always be displayed clipped to the left breast pocket of the employee's shirt
or jacket) and each thread can then independently set or get its own value for that key (although
the badge is always on the left pocket, each employee has a unique badge number, and, in most
cases, a unique name). The key is the same for all threads, but each thread can associate its own
independent value with that shared key. Each thread can change its private value for a key at any
time, without affecting the key or any value other threads may have for the key.
FIGURE 5.2 Thread-specific data analogy
5.4.1 Creating thread-specific data
pthread_key_t key;
int pthread_key_create (pthread_key_t *key, void (*destructor)(void *));
int pthread_key_delete (pthread_key_t key);
A thread-specific data key is represented in your program by a variable of type pthread_key_t.
Like most Pthreads types, pthread_key_t is opaque and you should never make any assumptions
about the structure or content. The easiest way to create a thread-specific data key is to call
pthread_key_create before any threads try to use the key, for example early in the program's main
function.
If you need to create a thread-specific data key later, you have to ensure that
pthread_key_create is called only once for each pthread_key_t variable. That's because if you
create a key twice, you are really creating two different keys. The second key will overwrite the
first, which will be lost forever along with the values any threads might have set for the first key.
When you can't add code to main, the easiest way to ensure that a threadspecific data key is
created only once is to use pthread_once, the one-time initialization function, as shown in the
following program, tsd_once.c.
7-10 The tsd_t structure is used to contain per-thread data. Each thread allocates a private tsd_t
structure, and stores a pointer to that structure as its value for the thread-specific data key tsd_key.
The thread_id member holds the thread's identifier (pthread_t), and the string member holds the
pointer to a “name” string for the thread. The variable tsd_key holds the thread-specific data key
used to access the tsd_t structures.
19-27 One-time initialization (pthread_once) is used to ensure that the key tsd_key is created before
the first access.
33-56 The threads begin in the thread start function thread_routine. The argument (arg) is a pointer
to a character string naming the thread. Each thread calls pthread_once to ensure that the
thread-specific data key has been created. The thread then allocates a tsd_t structure, initializes the
thread id member with the thread's identifier, and copies its argument to the string member.
The thread gets the current thread-specific data value by calling pthread_getspecific, and
prints a message using the thread's name. It then sleeps for a few seconds and prints another
message to demonstrate that the thread-specific data value remains the same, even though another
thread has assigned a different tsd_t structure address to the same thread-specific data key.
tsd_once.c
Pthreads allows you to destroy a thread-specific data key when your program no longer needs
it, by calling pthread_key_delete. The Pthreads standard guarantees only 128 thread-specific data
keys at any one time, so it may be useful to destroy a key that you know you aren't using and
won't need again. The actual number of keys supported by your Pthreads system is specified by
the value of the symbol PTHREAD_KEYS MAX defined in <limits. h>.
When you destroy a thread-specific data key, it does not affect the current value of that key in
any thread, not even in the calling thread. That means your code is completely responsible for
freeing any memory that you have associated with the thread-specific data key, in all threads. Of
course, any use of the deleted thread-specific data key (pthread_key_t) results in undefined
behavior.
| Delete thread-specific data keys only when you
| are sure no thread has a value for that key!
|
| Or... don't destroy them at all.
You should never destroy a key while some thread still has a value for that key. Some later
call to pthread_key_create, for example, might reuse the pthread_key_t identifier that had been
assigned to a deleted key. When an existing thread that had set a value for the old key requests the
value of the new key, it will receive the old value. The program will likely react badly to receiving
this incorrect data, so you should never delete a thread-specific data key until you are sure that no
existing threads have a value for that key, for example, by maintaining a "reference count" for the
key, as shown in the program tsd_destructor.c. in Section 5.4.3.
Even better, don't destroy thread-specific data keys. There's rarely any need to do so, and if
you try you will almost certainly run into difficulties. Few programs will require even the
minimum Pthreads limit of 128 thread-specific data keys. Rarely will you use more than a few. In
general, each component that uses thread-specific data will have a small number of keys each
maintaining pointers to data structures that contain related data. It would take a lot of components
to exhaust the available keys!
5.4.2 Using thread-specific data
int pthread_setspecific (pthread_key_t key, const void *value);
void *pthread_getspecific (pthread_key_t key);
You can use the pthread_getspecific function to determine the thread's current value for a key,
or pthread_setspecific to change the current value. Take a look at Section 7.3.1 for ideas on using
thread-specific data to adapt old libraries that rely on static data to be thread-safe.
| A thread-specific data value of NULL means something special to
| Pthreads--do not set a thread-specific data value of NULL unless you
| really mean it,
The initial value for any new key (in all threads) is NULL. Also, Pthreads sets the
thread-specific data value for a key to NULL before calling that key's destructor (passing the
previous value of the key) when a thread terminates.* If your thread-specific data value is the
address of heap storage, for example, and you want to free that storage in your destructor, you
must use the argument passed to the destructor rather than calling pthread_getspecific.
*That is, unfortunately, not what the standard says. This is one of the problems with formal
standards--they say what they say, not what they were intended to say. Somehow, an error crept in,
and the sentence specifying that "the implementation clears the thread-specific data value before
calling the destructor" was deleted. Nobody noticed, and the standard was approved with the error.
So the standard says (by omission) that if you want to write a portable application using
thread-specific data, that will not hang on thread termination, you must call pthread_setspecific
within your destructor function to change the value to NULL. This would be silly, and any serious
implementation of Pthreads will violate the standard in this respect. Of course. the standard will be
fixed, probably by the 1003.1n amendment (assorted corrections to 1003. lc-1995), but that will
take a while.
Pthreads will not call the destructor for a thread-specific data key if the terminating thread
has a value of NULL for that key. NULL is special, meaning "this key has no value." If you ever
use pthread_setspecific to set the value of a threadspecific data key to NULL, you need to
remember that you are not setting the value NULL, but rather stating that the key no longer has a
value in the current thread.
| Destructor functions are called only when the thread terminates,
| not when the value of a thread-specific data key is changed.
Another important thing to remember is that thread-specific data key destructor functions are
not called when you replace an existing value for that key. That is, if you allocate a structure in
heap and assign a pointer to that structure as the value of a thread-specific data key, and then later
allocate a new structure and assign a pointer to that new structure to the same thread-specific data
key, in the same thread, you are responsible for freeing the old structure. Pthreads will not free the
old structure, nor will it call your destructor function with a pointer to the old structure.
5.4.3 Using destructor functions When a thread exits while it has a value defined for some thread-specific data key, you
usually need to do something about it. If your key's value is a pointer to heap memory, you will
need to free the memory to avoid a memory leak each time a thread terminates. Pthreads allows
you to define a destructor function when you create a thread-specific data key. When a thread
terminates with a non-NULL value for a thread-specific data key, the key's destructor (if any) is
called with the current value of the key.
| Thread-specific data destructors are called in "unspecified order."
Pthreads checks all thread-specific data keys in the process when a thread exits, and for each
thread-specific data key with a value that's not NULL, it sets the value to NULL and then calls the
key's destructor function. Going back to our analogy, someone might collect the identity badges of
all programmers by removing whatever is hanging from each programmer's left shirt pocket, safe
in the knowledge that it will always be the programmer's badge. Be careful, because the order in
which destructors are called is undefined. Try to make each destructor as independent as possible.
Thread-specific data destructors can set a new value for the key for which a value is being
destroyed or for any other key. You should never do this directly but it can easily happen indirectly
if you call other functions from your destructor. For example, the ANSI C library's destructors
might be called before yours--and calling an ANSI C function, for example, using fprintf to write a
log message to a file, could cause a new value to be assigned to a thread-specific data key. The
system must recheck the list of thread-specific data values for you after all destructors have been
called.
| If your thread-specific data destructor creates a new thread-specific
| data value, you will get another chance. Maybe too many chances!
The standard requires that a Pthreads implementation may recheck the list some fixed
number of times and then give up. When it gives up, the final threadspecific data value is not
destroyed. If the value is a pointer to heap memory the result may be a memory leak, so be careful.
The <limits. h> header defines _PTHREAD_DESTRUCTOR_ITERATIONS to the number of
times the system will check and the value must be at least 4. Alternately, the system is allowed to
keep checking forever, so a destructor function that always sets thread-specific data values may
cause an infinite loop.
Usually, new thread-specific data values are set within a destructor only when subsystem 1
uses thread-specific data that depends on another independent subsystem 2 that also uses
thread-specific data. Because the order in which destructor functions run is unspecified, the two
may be called in the wrong order. If the subsystem 1 destructor needs to call into subsystem 2, it
may inadvertently result in allocating new thread-specific data for subsystem 2. Although the
subsystem 2 destructor will need to be called again to free the new data, the subsystem 1
thread-specific data remains NULL, so the loop will terminate.
The following program, tsd_destructor.c, demonstrates using threadspecific data destructors
to release memory when a thread terminates. It also keeps track of how many threads are using the
thread-specific data, and deletes the thread-specific data key when the destructor is run for the
final thread. This program is similar in structure to tsd_once.c, from Section 5.3, so only the
relevant differences will be annotated here.
12-14 In addition to the key value (identity_key}, the program maintains a count of threads that are
using the key (identity_key_counter), which is protected by a mutex (identity_key_mutex).
22-42 The function identity_key_destructor is the thread-specific data key's destructor function. It
begins by printing a message so we can observe when it runs in each thread. It frees the storage
used to maintain thread-specific data, the private_t structure. Then it locks the mutex associated
with the threadspecific data key (identity_key_mutex) and decreases the count of threads using the
key. If the count reaches 0, it deletes the key and prints a message.
48-63 The function identity_key_get can be used anywhere (in this example, it is used only once
per thread) to get the value of identity_key for the calling thread. If there is no current value (the
value is NULL), then it allocates a new private_t structure and assigns it to the key for future
reference.
68-78 The function thread_routine is the thread start function used by the exampie. It acquires a
value for the key by calling identity_key_get, and sets the members of the structure. The string
member is set to the thread's argument, creating a global "name" for the thread, which can be used
for printing messages.
80-114 The main program creates the thread-specific data key tsd_key. Notice that, unlike tsd_once.c,
this program does not bother to use pthread_once. As I mentioned in the annotation for that
example, in a main program it is perfectly safe, and more efficient, to create the key inside main,
before creating any threads.
101 The main program initializes the reference counter (identity_key_counter) to 3. It is critical
that you define in advance how many threads will reference a key that will be deleted based on a
reference count, as we intend to do. The counter must be set before any thread using the key can
possibly terminate.
You cannot, for example, code identity_key_get so that it dynamically increases the counter
when it first assigns a thread-specific value for identity_key. That is because one thread might
assign a thread-specific value for identity_key and then terminate before another thread using the
key had a chance to start. If that happened, the first thread's destructor would find no remaining
references to the key, and it would delete the key. Later threads would then fail when trying to set
thread-specific data values.
tsd_destructor.c
5.5 Realtime scheduling "Well, it's no use your talking about waking him," said Tweedledum,
"when you're only one of the things in his dream. You know
very well you're not real."
"I am real!" said Alice, and began to cry.
"You wo'n't make yourself a bit realler by crying," Tweedledee remarked:
"there's nothing to cry about."
--Lewis Carroll, Through the Looking-Glass
Once upon a time, realtime programming was considered an arcane and rare art. Realtime
programmers were doing unusual things, outside of the programming mainstream, like controlling
nuclear reactors or airplane navigational systems. But the POSIX. lb realtime extension defines
realtime as "the ability of the operating system to provide a required level of service in a bounded
response time." What applies to the operating system also applies to your application or library.
"Bounded" response time does not necessarily mean "fast" response, but it does mean
"predictable" response. There must be some way to define a span of time during which a sequence
of operations is guaranteed to complete. A system controlling a nuclear reactor has more strict
response requirements than most programs you will write, and certainly the consequences of
failing to meet the reactor's response requirements are more severe. But a lot of code you write
will need to provide some "required level of service" within some "bounded response time."
Realtime programming just means that the software lives in the real world.
Realtime programming covers such a vast area that it is common to divide it into two
separate categories. "Hard realtime" is the traditional sort most people think of. When your
nuclear reactor will go critical if a fuel rod adjustment is delayed by a microsecond or your
airplane will crash if the navigation system takes a haft second to respond to a wind sheer, that's
hard realtime. Hard realtime is unforgiving, because the required level of service and bounded
response time are defined by physics or something equally unyielding. "Soft realtime" means that
you need to meet your schedule most of the time, but the consequences of failing to meet the
schedule are not severe.
Many systems that interact with humans should be designed according to soft realtime
principles. Although humans react slowly, in computer terms, they're sensitive to response time.
Make your users wait too often while the screen redraws before accepting the next mouse click,
and they'll be annoyed. Nobody likes a "busy cursor"--most people expect response to be at least
predictable, even when it cannot be fast.
Threads are useful for all types of realtime programming, because coding for predictable
response is far easier when you can keep the operations separate. Your "user input function"
doesn't have to wait for your sort operation or for your screen update operation because it executes
independently.
Achieving predictability requires a lot more than just separating operations into different
threads, however. For one thing, you need to make sure that the thread you need to run "soon"
won't be left sitting on a run queue somewhere while another thread uses the processor. Most
systems, by default, will try to distribute resources more or less fairly between threads. That's nice
for a lot of things--but realtime isn't fair. Realtime means carefully giving precedence to the parts
of the program that limit external response time.
5.5.1 POSIX realtime options The POSIX standards are flexible, because they're designed to be useful in a wide range of
environments. In particular, since traditional UNIX systems don't support any form of realtime
scheduling control, all of the tools for controlling realtime response are optional. The fact that a
given implementation of UNIX "conforms to 1003.lc-1995" does not mean you can write
predictable realtime programs.
If the system defines _POSIX_THREAD_PRIORITY_SCHEDULING, it provides support
for assigning realtime scheduling priorities to threads. The POSIX priority scheduling model is a
little more complicated than the traditional UNIX priority model, but the principle is similar.
Priority scheduling allows the programmer to give the system an idea of how important any two
threads are, relative to each other. Whenever more than one thread is ready to execute, the system
will choose the thread with the highest priority.
5.5.2 Scheduling policies and priorities int sched_get_priority_max (int policy);
int sched_get_priority_min (int policy);
int pthread_attr_getinheritsched( const pthread_attr_t *attr, int *inheritsched);
int pthread_attr_setinheritsched( pthread_attr_t *attr, int inheritsched);
int pthread_attr_getschedparam (const pthread_attr_t *attr, struct sched_param *param);
int pthread_attr_setschedparam ( pthread_attr_t *attr, const struct sched_param *param);
int pthread_attr_getschedpolicy (const pthread_attr_t *attr, int *policy);
int pthread_attr_setschedpolicy (pthread_attr_t *attr, int policy);
int pthread_getschedparam (pthread_t thread, int *policy, struct sched_param *param);
int pthread_setschedparam (pthread_t thread; int policy; const struct sched_param *param);
A Pthreads system that supports _POSIX_THREAD_PRIORITY_SCHEDULING must
provide a definition of the struct sched_param structure that includes at least the member
sched_priority. The sched_priority member is the only scheduling parameter used by the standard
Pthreads scheduling policies, SCHED_FIFO and SCHED_RR. The minimum and maximum
priority values (sched_priority member) that are allowed for each scheduling policy can be
determined by calling sched_get_priority_min or sched_get_priority_max, respectively, for the
scheduling policy. Pthreads systems that support additional, nonstandard scheduling policies may
include additional members.
The SCHED_FIFO (first in, first out) policy allows a thread to run until another thread with a
higher priority becomes ready, or until it blocks voluntarily. When a thread with SCHED_FIFO
scheduling policy becomes ready, it begins executing immediately unless a thread with equal or
higher priority is already executing.
The SCHED_RR (round-robin) policy is much the same, except that if a thread with
SCHED_RR policy executes for more than a fixed period of time (the timeslice interval) without
blocking, and another thread with SCHED_RR or SCHED_FIFO policy and the same priority is
ready, the running thread will be preempted so the ready thread can be executed.
When threads with SCHED_FIFO or SCHED_RR policy wait on a condition variable or wait
to lock a mutex, They will be awakened in priority order. That is, if a low-priority SCHED_FIFO
thread and a high-priority SCHED_FIFO thread are both waiting to lock the same mutex, the
high-priority thread will always be unblocked first when the mutex is unlocked.
Pthreads defines the name of an additional scheduling policy, called SCHED_OTHER.
Pthreads, however, says nothing at all regarding what this scheduling policy does. This is an
illustration of an unofficial POSIX philosophy that has been termed "a standard way to be
nonstandard" (or, alternately, "a portable way to be non-portable"). That is, when you use any
implementation of Pthreads that supports the priority scheduling option, you can write a portable
program that creates threads running in SCHED_OTHER policy, but the behavior of that program
is non-portable. (The official explanation of SCHED_OTHER is that it provides a portable way
for a program to declare that it does not need a realtime scheduling policy.)
The SCHED_OTHER policy may be an alias for SCHED_FIFO, or it may be SCHED_RR,
or it may be something entirely different. The real problem with this ambiguity is not that you
don't know what SCHED_OTHER does, but that you have no way of knowing what scheduling
parameters it might require. Because the meaning of SCHED_OTHER is undefined, it does not
necessarily use the sched_priority member of the struct sched_param structure, and it may require
additional, nonstandard members that an implementation may add to the structure. If there's any
point to this, it is simply that SCHED_OTHER is not portable. If you write any code that uses
SCHED_OTHER you should be aware that the code is not portable--you are, by definition,
depending on the SCHED_OTHER of the particular Pthreads implementation for which you wrote
the code.
The schedpolicy and schedparam attributes, set respectively by pthread_attr_setschedpolicy
and pthread_attr_setschedparam, specify the explicit scheduling policy and parameters for the
attributes object. Pthreads does not specify a default value for either of these attributes, which
means that each implementation may choose some "appropriate" value. A realtime operating
system intended for embedded controller applications, for example, might choose to create threads
by default with SCHED_FTFO policy, and, perhaps, some medium-range priority.
Most multi-user operating systems are more likely to use a nonstandard "timeshare"
scheduling policy by default, causing threads to be scheduled more or less as processes have
always been scheduled. The system may, for example, temporarily reduce the priority of "CPU
hogs" so that they cannot prevent other threads from making progress.
One example of a multi-user operating system is Digital UNIX, which supports two
nonstandard timeshare scheduling policies. The foreground policy (SCHED_FG_NP), which is the
default, is used for normal interactive activity, and corresponds to the way non-threaded processes
are scheduled. The background policy (SCHED_BG_NP) can be used for less important support
activities.
| When you set the scheduling policy or priority attributes in an
| attributes object, you must also set the inheritsched attribute!
The inheritsched attribute, which you can set by calling pthread_attr_setinheritsched,
controls whether a thread you create inherits scheduling information from the creating thread, or
uses the explicit scheduling information in the schedpolicy and schedparam attributes. Pthreads
does not specify a default value for inheritsched, either, so if you care about the policy and
scheduling parameters of your thread, you must always set this attribute.
Set the inheritsched attribute to PTHREAD_INHERIT_SCHED to cause a new thread to
inherit the scheduling policy and parameters of the creating thread. Scheduling inheritance is
useful when you're creating "helper" threads that are working on behalf of the creator--it generally
makes sense for them to run at the same policy and priority. Whenever you need to control the
scheduling policy or parameters of a thread you create, you must set the inheritsched attribute to
PTHREAD_EXPLICIT_SCHED.
58-118 The following program, sched_attr.c, shows how to use an attributes object to create a thread
with an explicit scheduling policy and priority. Notice that it uses conditional code to determine
whether the priority scheduling feature of Pthreads is supported at compilation time. It will print a
message if the option is not supported and continue, although the program in that case will not do
much. (It creates a thread with default scheduling behavior, which can only say that it ran.)
Although Solaris 2.5 defines POSIX_THREAD_PRIORITY_SCHEDULING, it does not
support the POSIX realtime scheduling policies, and attempting to set the policy attribute to
SCHED_RR would fail. This program treats Solaris as if it did not define the POSIX_THREAD_
PRIORITY_SCHEDULING option.
sched_attr.c
The next program, sched_thread.c, shows how to modify the realtime scheduling policy and
parameters7or a running thread. When changing the scheduling policy and parameters in a thread
attributes object, remember, you use two separate operations: one to modify the scheduling policy
and the other to modify the scheduling parameters.
You cannot modify the scheduling policy of a running thread separately from the thread's
parameters, because the policy and parameters must always be consistent for scheduling to operate
correctly. Each scheduling policy may have a unique range of valid scheduling priorities, and a
thread cannot operate at a priority that isn't valid for its current policy. To ensure consistency of
the policy and parameters, they are set with a single call.
55 Unlike sched_attr.c, sched_thread.c does not check the compile-time feature macro _POSIX_
THREAD_PRIORITY_SCHEDULING. That means it will probably not compile, and almost
certainly won't run correctly, on a system that does not support the option. There's nothing wrong
with writing a program that way--in fact, that's what you are likely to do most of the time. If you
need priority scheduling, you would document that your application requires the _POSIX_
THREAD_PRIORITY SCHEDULING option, and use it.
57-62 �Solari 2.5, despite defining _POSIX_THREAD_PRIORITY_SCHEDULING, does not
support realtime scheduling policies. For this reason, the ENOSYS from sched_get_priority_min
is handled as a special case.
sched_thread.c
5.5.3 Contention scope and allocation domain
int pthread_attr_getscope (const pthread_attr_t * attr, int *contentionscope);
int pthread_attr_setscope (pthread_attr_t *attr, int contentionscope);
Besides scheduling policy and parameters, two other controls are important in realtime
scheduling. Unless you are writing a realtime application, they probably don't matter. If you are
writing a realtime application, you will need to find out which settings of these controls are
supported by a system.
The first control is called contention scope. It is a description of how your threads compete
for processor resources. System contention scope means that your thread competes for processor
resources against threads outside your process. A high-priority system contention scope thread in
your process can keep system contention scope threads in other processes from running (or vice
versa). Process contention scope means that your threads compete only among themselves.
Usually, process contention scope means that the operating system chooses a process to execute,
possibly using only the traditional UNIX priority, and some additional scheduler within the
process applies the POSIX scheduling rules to determine which thread to execute.
Pthreads provides the thread scope attribute so that you can specify whether each thread you
create should have process or system contention scope. A Pthreads system may choose to support
PTHREAD_SCOPE_PROCESS, PTHREAD_SCOPE_SYSTEM, or both. If you try to create a
thread with a scope that is not supported by the system, pthread_attr_setscope will return
ENOTSUP.
The second control is allocation domain. An allocation domain is the set of processors within
the system for which threads may compete. A system may have one or more allocation domains,
each containing one or more processors. In a uniprocessor system, an allocation domain will
contain only one processor, but you may still have more than one allocation domain. On a
multiprocessor, each allocation domain may contain from one processor to the number of
processors in the system.
There is no Pthreads interface to set a thread's allocation domain. The POSIX.14
(Multiprocessor Profile) working group considered proposing standard interfaces, but the effort
was halted by the prospect of dealing with the wide range of hardware architectures and existing
software interfaces. Despite the lack of a standard, any system supporting multiprocessors will
have interfaces to affect the allocation domain of a thread.
Because there is no standard interface to control allocation domain, there is no way to
describe precisely all the effects of any particular hypothetical situation. Still, you may need to be
concerned about these things if you use a system that supports multiprocessors. A few things to
think about:
1. How do system contention scope threads and process contention scope threads, within
the same allocation domain, interact with each other? They are competing for resources
in some manner, but the behavior is not defined by the standard.
2. If the system supports "overlapping" allocation domains, in other words, if a processor
can appear in more than one allocation domain within the system, and you have one
system contention scope thread in each of two overlapping allocation domains, what
happens?
| System contention scope is predictable.
| Process contention scope is cheap.
On most systems, you will get better performance, and lower cost, by using only process
contention scope. Context switches between system contention scope threads usually require at
least one call into the kernel, and those calls are relatively expensive compared to the cost of
saving and restoring thread state in user mode. Each system contention scope thread will be
permanently associated with one "kernel entity," and the number of kernel entities is usually more
limited than the number of Pthreads threads. Process contention scope threads may share one
kernel entity, or some small number of kernel entities. On a given system configuration, for
example, you may be able to create thousands of process contention scope threads, but only
hundreds of system contention scope threads.
On the other hand, process contention scope gives you no real control over the scheduling
priority of your thread--while a high priority may give it precedence over other threads in the
process, it has no advantage over threads in other processes with lower priority. System contention
scope gives you better predictability by allowing control, often to the extent of being able to make
your thread "more important" than threads running within the operating system kernel.
| System contention scope is less predictable with an allocation domain
| greater than one.
When a thread is assigned to an allocation domain with more than a single processor, the
application can no longer rely on completely predictable scheduling behavior. Both high- and
low-priority threads may run at the same time, for example, because the scheduler will not allow
processors to be idle just because a high-priority thread is running. The uniprocessor behavior
would make little sense on a multiprocessor.
When thread 1 awakens thread 2 by unlocking a mutex, and thread 2 has a higher priority
than thread 1, thread 2 will preempt thread 1 and begin running immediately. However, if thread 1
and thread 2 are running simultaneously in an allocation domain greater than one, and thread 1
awakens thread 3, which has lower priority than thread 1 but higher priority than thread 2, thread 3
may not immediately preempt thread 2. Thread 3 may remain ready until thread 2 blocks.
For some applications, the predictability afforded by guaranteed preemption in the case
outlined in the previous paragraph may be important. In most cases, it is not that important as long
as thread 3 will eventually run. Although POSIX does not require any Pthreads system to
implement this type of "cross processor preemption," you are more likely to find it when you use
system contention scope threads. If predictability is critical, of course, you should be using system
contention scope anyway.
5.5.4 Problems with realtime scheduling
One of the problems of relying on realtime scheduling is that it is not modular. In real
applications you will generally be working with libraries from a variety of sources, and those
libraries may rely on threads for important functions like network communication and resource
management. Now, it may seem reasonable to make "the most important thread" in your library
run with SCHED-FIFO policy and maximum priority. The resulting thread, however, isn't just the
most important thread for your library--it is (or, at least, behaves as) the most important thread in
the entire process, including the main program and any other libraries. Your high-priority thread
may prevent all other libraries, and in some cases even the operating system, from performing
work on which the application relies.
Another problem, which really isn't a problem with priority scheduling, but with the way
many people think about priority scheduling, is that it doesn't do what many people expect. Many
people think that "realtime priority" threads somehow "go faster" than other threads, and that's not
true. Realtime priority threads may actually go slower, because there is more overhead involved in
making all of the required preemption checks at all the right times--especially on a multiprocessor.
A more severe problem with fixed priority scheduling is called priority inversion. Priority
inversion is when a low-priority thread can prevent a high-priority thread from running--a nasty
interaction between scheduling and synchronization. Scheduling rules state that one thread should
run, but synchronization requires that another thread run, so that the priorities of the two threads
appear to be reversed.
Priority inversion occurs when low-priority thread acquires a shared resource (such as a
mutex), and is preempted by a high-priority thread that then blocks on that same resource. With
only two threads, the low-priority thread would then be allowed to run, eventually (we assume)
releasing the mutex. However, if a third thread with a priority between those two is ready to run, it
can prevent the low-priority thread from running. Because the low-priority thread holds the mutex
that the high-priority thread needs, the middle-priority thread is also keeping the higher-priority
thread from running.
There are a number of ways to prevent priority inversion. The simplest is to avoid using
realtime scheduling, but that's not always practical. Pthreads provides several mutex locking
protocols that help avoid priority inversion, priority ceiling and priority inheritance. These are
discussed in Section 5.5.5.
| Most threaded programs do not need realtime scheduling.
A final problem is that priority scheduling isn't completely portable. Pthreads defines the
priority scheduling features under an option, and many implementations that are not primarily
intended for realtime programming may choose not to support the option. Even if the option is
supported, there are many important aspects of priority scheduling that are not covered by the
standard. When you use system contention scope, for example, where your threads may compete
directly against threads within the operating system, setting a high priority on your threads might
prevent kernel I/O drivers from functioning on some systems.
Pthreads does not specify a thread's default scheduling policy or priority, or how the standard
scheduling policies interact with nonstandard policies. So when you set the scheduling policy and
priority of your thread, using "portable" interfaces, the standard provides no way to predict how
that setting will affect any other threads in the process or the system itself.
If you really need priority scheduling, then use it--and be aware that it has special
requirements beyond simply Pthreads. If you need priority scheduling, keep the following in mind
1. Process contention scope is "nicer" than system contention scope, because you will not
prevent a thread in another process, or in the kernel, from running.
2. SCHED_RR is "nicer" than SCHED_FIFO, and slightly more portable, because
SCHED_RR threads will be preempted at intervals to share the available processor time
with other threads at the same priority.
3. Lower priorities for SCHED_FIFO and SCHED_RR policies are nicer than higher
priorities, because you are less likely to interfere with something else that's important.
Unless your code really needs priority scheduling, avoid it. In most cases, introducing
priority scheduling will cause more problems than it will solve.
5.5.5 Priority-aware mutexes
#if defined (_POSIX_THREAD_PRIO_PROTECT)
|| defined (_POSIX_THREAD_PRIO_INDERIT)
int pthread_mutexattr_getprotocol (const pthread_mutexattr_t *attr, int * protocol);
int pthread_mutexattr_setprotrcol (pthread_mutexattr_t *attr, int protocol);
#endif
#ifdef _POSIX_THREAD_PRIO_PROTECT
int pthread_mutexattr_getprioceiling (const pthread_attr_t *attr, int *prioceiling);
int pthread_mutexattr_setprioceiiing (pthread_mutexattr_t *attr, int prioceiling);
int pthread_mutex_getprioceiling (const pthread_mutex_t *mutex, int *prioceiling);
int pthread_mutex_setprioceiling (pthread_mutex_t *mutex,
int prioceiling, int *old_ceiling);
#endif
Pthreads provides several special mutex attributes that can help to avoid priority inversion
deadlocks. Locking, or waiting for, a mutex with one of these attributes may change the priority of
the thread—or the priority of other threads--to ensure that the thread that owns the mutex cannot
be preempted by another thread that needs to lock the same mutex.
These mutex attributes may not be supported by your implementation of Pthreads, because
they are optional features. If your code needs to function with or without these options, you can
conditionally compile references based on the feature test macros _POSIX_THREAD_PRIO
PROTECT or _POSIX_THREAD_PRIO_INHERIT, defined in <unistd. h>, or you can� call
sysconf during program execution to check for _SC_THREAD_PRIO_PROTECT or _SC
THREAD_PRIO_INHERIT.
Once you've created a mutex using one of these attributes, you can lock and unlock the mutex
exactly like any other mutex. As a consequence, you can easily convert any mutex you create by
changing the code that initializes the mutex. (You must call pthread_mutex_init, however, because
you cannot statically initialize a mutex with non-default attributes.)
| "Priority ceiling" protocol means that while a thread owns the mutex, it
| runs at the specified priority.
If your system defines _POSIX_THREAD_PRIO_PROTECT then it supports the protocol
and prioceiling attributes. You set the protocol attribute by calling pthread_mutexattr_setprotocol.
If you set the protocol attribute to the value PTHREAD_PRIO_PROTECT, then you can also
specify the priority ceiling for mutexes created using the attributes object by setting the prioceiling
attribute.
You set the prioceiling attribute by calling the function pthread_mutexattr_setprioceiling.
When any thread locks a mutex defined with such an attributes object, the thread's priority will be
set to the priority ceiling of the mutex, unless the thread's priority is already the same or higher,
Note that locking the mutex in a thread running at a priority above the priority ceiling of the mutex
breaks the protocol, removing the protection against priority inversion.
| "Priority inheritance" means that when a thread waits on a mutex
| owned by a lower-priority thread, the priority of the owner is increased
| to that of the waiter.
If your system defines _POSIX_THREAD_PRIO_INHERIT then it supports the protocol
attribute. If you set the protocol attribute to the value PTHREAD_PRIO_INHERIT, then no thread
holding the mutex can be preempted by another thread with a priority lower than that of any
thread waiting for the mutex. When any thread attempts to lock the mutex while a lower-priority
thread holds the mutex, the priority of the thread currently holding the mutex will be raised to the
priority of the waiter as long as it owns the mutex.
If your system does not define either _POSIX_THREAD_PRIO_PROTECT or _POSIX_
THREAD_PRIO_INHERIT then the protocol attribute may not be defined. The default value of
the protocol attribute (or the effective value if the attribute isn't defined) is POSIX_PRIO_NONE,
which means that thread priorities are not modified by the act of locking (or waiting for) a mutex.
5.5.5.1 Priority ceiling mutexes
The simplest of the two types of "priority aware" mutexes is the priority ceiling (or "priority
protection") protocol (Figure 5.3). When you create a mutex using a priority ceiling, you specify
the highest priority at which a thread will ever be running when it locks the mutex. Any thread
locking that mutex will have its priority automatically raised to that value, which will allow it to
finish with the mutex before it can be preempted by any other thread that might try to lock the
mutex. You can also examine or modify the priority ceiling of a mutex that was created with the
priority ceiling (protect) protocol.
FIGURE 5.3 Priority ceiling mutex operation
A priority ceiling mutex is not useful within a library that can be called by threads you don't
control. If any thread that is running at a priority above the ceiling locks the priority ceiling mutex,
the protocol is broken. This doesn't necessarily guarantee a priority inversion, but it removes all
protection against priority inversion. Since the priority ceiling protocol adds overhead to each
mutex operation compared to a normal "unprotected" mutex, you may have wasted processor time
accomplishing nothing.
Priority ceiling is perfect for an embedded realtime application where the developers control
all synchronization within the system. The priority ceiling can be safely determined when the code
is designed, and you can avoid priority inversion with a relatively small cost in performance
compared to more general solutions. Of course it is always most efficient to avoid priority
inversion, either by avoiding priority scheduling or by using any given mutex only within threads
of equal priority. Equally, of course, these alternatives rarely prove practical when you need them
most.
You can use priority ceiling within almost any main program, even when you don't control
the code in libraries you use. That's because while it is common for threads that call into library
functions to lock library mutexes, it is not common for threads created by a library to call into
application code and lock application mutexes. If you use a library that has "callbacks" into your
code, you must either ensure that those callbacks (and any functions they call) don't use the
priority ceiling mutexes or that no thread in which the callback might be invoked will run at a
priority above the ceiling priority of the mutex.
5.5.5.2 Priority inheritance mutexes The other Pthreads mutex protocol is priority inheritance. In the priority inheritance protocol,
when a thread locks a mutex the thread's priority is controlled through the mutex (Figure 5.4).
When another thread needs to block on that mutex, it looks at the priority of the thread that owns
the mutex. If the thread that owns the mutex has a lower priority than the thread attempting to
block the mutex, the priority of the owner is raised to the priority of the blocking thread.
The priority increase ensures that the thread that has the mutex locked cannot be preempted
unless the waiting thread would also have been preempted--in a sense, the thread owning the
mutex is working on behalf of the higher-priority thread. When the thread unlocks the mutex, the
thread's priority is automatically lowered to its normal priority and the highest-priority waiter is
awakened. If a second thread of even higher priority blocks on the mutex, the thread that has the
mutex blocked will again have its priority increased. The thread will still be returned to its original
priority when the mutex is unlocked.
The priority inheritance protocol is more general and powerful than priority ceiling, but also
more complicated and expensive. If a library package must make use of priority scheduling, and
cannot avoid use of a mutex from threads of different priority, then priority inheritance is the only
currently available solution. If you are writing a main program, and know that none of your
mutexes can be locked by threads created within a library, then priority ceiling will accomplish the
same result as priority inheritance, and with less overhead.
FIGURE 5.4 Priority inheritance mutex operation
5.6 Threads and kernel entities "Two lines? cried the Mock Turtle. "Seals, turtles, salmon, and so on:
then, when you've cleared all the jelly-fish out of the way--"
“That generally takes some time," interrupted the Gryphon.
"--you advance twice--"
"Each with a lobster as a partner!" cried the Gryphon.
--Lewis Carroll, Alice's Adventures in Wonderland
Pthreads deliberately says very little about implementation details. This leaves each vendor
free to make decisions based on the needs of their users and to allow the state of the art to advance
by permitting innovation. The standard places a few essential requirements on the implementation
--enough that you can write strictly conforming POSIX applications* that do useful work with
threads and will be able to run correctly on all conforming implementations of the standard.
* Strictly conforming is used by POSIX to mean something quite specific: a strictly
conforming application is one that does not rely on any options or extensions to the standard and
requires only the specified minimum value for all implementation limits (but will work correctly
with any allowed value).
Any Pthreads implementation must ensure that "system services invoked by one thread do
not suspend other threads" so that you do not need to worry that calling read might block all
threads in the process on some systems. On the other hand, this does not mean that your process
will always have the maximum possible level of concurrency.
Nevertheless, when using a system it is often useful to understand the ways in which the
system may be implemented. When writing ANSI C expressions, for example, it is often helpful to
understand what the code generator, and even the hardware, will do with those expressions. With
that in mind, the following sections describe, briefly, a few of the main variations you're likely to
encounter.
The important terms used in these sections are "Pthreads thread," "kernel entity," and
"processor.” “Pthreads thread" means a thread that you created by calling pthread_create,
represented by an identifier of type pthread_t. These are the threads that you control using
Pthreads interfaces. By "processor," I refer to the physical hardware, the particular thing of which
a "multiprocessor" has more than one.
Most operating systems have at least one additional level of abstraction between "Pthreads
thread" and "processor" and I refer to that as a "kernel entity," because that is the term used by
Pthreads. In some systems, "kernel entity" may be a traditional UNIX process. It may be a Digital
UNIX Mach thread, or a Solaris 2.x LWP, or an IRIX sproc process. The exact meaning of "kernel
entity," and how it interacts with the Pthreads thread, is the crucial difference between the three
models described in the following sections.
5.6.1 Many-to-one (user level)
The many-to-one method is also sometimes called a "library implementation.” In general,
"many-to-one" implementations are designed for operating systems with no support for threads.
Pthreads implementations that run on generic UNIX kernels usually fall into this category--for
example, the classic DCE threads reference implementation, or the SunOS 4.x LWP package (no
relation to the Solaris 2.x LWP, which is a kernel entity).
Many-to-one implementations cannot take advantage of parallelism on a multiprocessor, and
any blocking system service, for example, a call to read, will block all threads in the process.
Some implementations may help you avoid this problem by using features such as UNIX
nonblocking I/O, or POSIX.lb asynchronous I/O, where available. However, these features have
limitations; for example, not all device drivers support nonblocking I/O, and traditional UNIX
disk file system response is usually considered "instantaneous" and will ignore the nonblocking
I/O mode.
Some many-to-one implementations may not be tightly integrated with the ANSI C library's
support functions, and that can cause serious trouble. The stdio functions, for example, might
block the entire process (and all threads) while one thread waits for you to enter a command. Any
many-to-one implementation that conforms to the Pthreads standard, however, has gotten around
these problems, perhaps by including a special version of stdio and other functions.
When you require concurrency but do not need parallelism, a many-to-one implementation
may provide the best thread creation performance, as well as the best context switch performance
for voluntary blocking using mutexes and condition variables. It is fast because the Pthreads
library saves and restores thread context entirely in user mode. You can, for example, create a lot
of threads and block most of them on condition variables (waiting for some external event) very
quickly, without involving the kernel at all.
Figure 5.5 shows the mapping of Pthreads threads (left column) to the kernel entity (middle
column), which is a process, to physical processors (right column). In this case, the process has
four Pthreads threads, labeled "Pthread 1" through "Pthread 4." The Pthreads library schedules the
four threads onto the single process in user mode by swapping register state (SP, general registers,
and so forth). The library may use a timer to preempt a Pthreads thread that runs too long. The
kernel schedules the process onto one of the two physical processors, labeled "processor 1" and
"processor 2." The important characteristics of this model are shown in Table 5.2.
FIGURE 5.5 Many-to-one thread mapping
Advantages Disadvantages
Fastest context switch time. Potentially long latency during system service blocking.
Simple; the implementation may
even be (mostly) portable.*
Single-process applications cannot take advantage of
multiprocessor hardware.
* The DCE threads user-mode scheduler can usually be ported to new operating systems in a few
days, involving primarily new assembly language for the register context switching routines. We
use the motto "Some Assembly Required."
TABLE 5.2 Many-to-one thread scheduling
5.6.2 One-to-one (kernel level)
One-to-one thread mapping is also sometimes called a "kernel thread" implementation. The
Pthreads library assigns each thread to a kernel entity. It generally must use blocking kernel
functions to wait on mutexes and condition variables. While synchronization may occur either
within the kernel or in user mode, thread scheduling occurs within the kernel.
Pthreads threads can take full advantage of multiprocessor hardware in a one-to-one
implementation without any extra effort on your part, for example, separating your code into
multiple processes. When a thread blocks in the kernel, it does not affect other threads any more
than the blocking of a normal UNIX process affects other processes. One thread can even process
a page fault without affecting other threads.
One-to-one implementations suffer from two main problems. The first is that they do not
scale well. That is, each thread in your application is a kernel entity. Because kernel memory is
precious, kernel objects such as processes and threads are often limited by preallocated arrays, and
most implementations will limit the number of threads you can create. It will also limit the number
of threads that can be created on the entire system--so depending on what other processes are
doing, your process may not be able to reach its own limit.
The second problem is that blocking on a mutex and waiting on a condition variable, which
happen frequently in many applications, are substantially more expensive on most one-to-one
implementations, because they require entering the machine's protected kernel mode. Note that
locking a mutex, when it was not already locked, or unlocking a mutex, when there are no waiting
threads, may be no more expensive than on a many-to-one implementation, because on most
systems those functions can be completed in user mode.
A one-to-one implementation can be a good choice for CPU-bound applications, which don't
block very often. Many high-performance parallel applications begin by creating a worker thread
for each physical processor in the system, and once started, the threads run independently for a
substantial time period. Such applications will work well because they do not strain the kernel by
creating a lot of threads, and they don't require a lot of calls into the kernel to block and unblock
their threads.
Figure 5.6 shows the mapping of Pthreads threads (left column) to kernel entities (middle
column) to physical processors (right column). In this case, the process has four Pthreads threads,
labeled "Pthread 1" through "Pthread 4.” Each Pthreads thread is permanently bound to the
corresponding kernel entity. The kernel schedules the four kernel entities (along with those from
other processes) onto the two physical processors, labeled "processor 1" and "processor 2.” The
important characteristics of this model are shown in Table 5.3.
FIGURE 5.6 One-to-one thread mapping
Advantages Disadvantages
Can take advantage of multiprocessor
hardware within a single process.
Relatively slow thread context switch (calls into kernel).
No latency during system service
blocking.
Poor scaling when many threads are used, because each
Pthreads thread takes kernel resources from the system.
TABLE 5.3 One-to-one thread scheduling
5.6.3 Many-to-few (two level)
The many-to-few model tries to merge the advantages of both the many-to-one and
one-to-one models, while avoiding their disadvantages. This model requires cooperation between
the user-level Pthreads library and the kernel. They share scheduling responsibilities and may
communicate information about the threads between each other.
When the Pthreads library needs to switch between two threads, it can do so directly, in user
mode. The new Pthreads thread runs on the same kernel entity without intervention from the
kernel. This gains the performance benefit of many-to-one implementations for the most common
cases, when a thread blocks on a mutex or condition variable, and when a thread terminates.
When the kernel needs to block a thread, to wait for an I/O or other resource it does so. The
kernel may inform the Pthreads library, as in Digital UNIX 4.0, so that the library can preserve
process concurrency by immediately scheduling a new Pthreads thread, somewhat like the original
"scheduler activations" model proposed by the famous University of Washington research
[Anderson, 1991]. Or. the kernel may simply block the kernel entity, in which case it may allow
programmers to increase the number of kernel entities that are allocated to the process, as in
Solaris 2.5 otherwise the process could be stalled when all kernel entities have blocked, even
though other user threads are ready to run.
Many-to-few implementations excel in most real-world applications, because in most
applications, threads perform a mixture of CPU-bound and I/O-bound operations, and block both
in I/O and in Pthreads synchronization. Most applications also create more threads than there are
physical processors, either directly or because an application that creates a few threads also uses a
parallel library that creates a few threads, and so forth.
Figure 5.7 shows the mapping of Pthreads threads (left column) to kernel entities (middle
column) to physical processors (right column). In this case, the process has four Pthreads threads,
labeled "Pthread 1" through "Pthread 4." The Pthreads library creates some number of kernel
entities at initialization (and may create more later). Typically, the library will start with one kernel
entity (labeled "kernel entity 1" and "kernel entity 2") for each physical processor. The kernel
schedules these kernel entities (along with those from other processes) onto the two physical
processors, labeled "processor 1' and "processor 2." The important characteristics of this model are
shown in Table 5.4.
FIGURE 5.7 Many-to-few thread mapping
Advantages Disadvantages
Can take advantage of multiprocessor hardware
within a process.
More complicated than other models.
Most context switches are in user mode (fast). Programmers lose direct control over kernel
entities, since the thread's priority may be
meaningful only in user mode.
Scales well; a process may use one kernel
entity per physical processor, or "a few" more.
Little latency during system service blocking.
TABLE 5.4 Many-to-few thread scheduling
6 POSIX adjusts to threads "Who are you?" said the Caterpillar.
This was not an encouraging opening for a conversation.
Alice replied, rather shyly, "1--1 hardly know, Sir,
just at present--at least I know who I was when I got up this morning, but
I think I must have been changed several times since then."
--Lewis Carroll, Alice's Adventures in Wonderland
Pthreads changes the meaning of a number of traditional POSIX process functions. Most of
the changes are obvious, and you'd probably expect them even if the standard hadn't added
specific wording. When a thread blocks for I/O, for example, only the calling thread blocks, while
other threads in the process can continue to run.
But there's another class of POSIX functions that doesn't extend into the threaded world quite
so unambiguously. For example, when you fork a threaded process, what happens to the threads?
What does exec do in a threaded process? What happens when one of the threads in a threaded
process calls exit?
6.1 fork
| Avoid using fork in a threaded program (if you can)
| unless you intend to exec a new program immediately.
When a threaded process calls fork to create a child process, Pthreads specifies that only the
thread calling fork exists in the child. Although only the calling thread exists on return from fork
in the child process, all other Pthreads states remain as they were at the time of the call to fork. In
the child process, the thread has the same thread state as in the parent. It owns the same mutexes,
has the same value for all thread-specific data keys, and so forth. All mutexes and condition
variables exist, although any threads that were waiting on a synchronization object at the time of
the fork are no longer waiting. (They don't exist in the child process, so how could they be
waiting?)
Pthreads does not "terminate" the other threads in a forked process, as if they exited with
pthread_exit or even as if they were canceled. They simply cease to exist. That is, the threads do
not run thread-specific data destructors or cleanup handlers. This is not a problem if the child
process is about to call exec to run a new program, but if you use fork to clone a threaded program,
beware that you may lose access to memory, especially heap memory stored only as
thread-specific data values.
| The state of mutexes is not affected by a fork. If it was locked in the
| parent it is locked in the child!
If a mutex was locked at the time of the call to fork, then it is still locked in the child.
Because a locked mutex is owned by the thread that locked it, the mutex can be unlocked in the
child only if the thread that locked the mutex was the one that called fork. This is important to
remember--if another thread has a mutex locked when you call fork, you will lose access to that
mutex and any data controlled by that mutex.
Despite the complications, you can fork a child that continues running and even continues to
use Pthreads. You must use fork handlers carefully to protect your mutexes and the shared data
that the mutexes are protecting. Fork handlers are described in Section 6.1.1.
Because thread-specific data destructors and cleanup handlers are not called. you may need to
worry about memory leaks. One possible solution would be to cancel threads created by your
subsystem in the prepare fork handler, and wait for them to terminate before allowing the fork to
continue (by returning), and then create new threads in the parent handler that is called after fork
completes. This could easily become messy, and I am not recommending it as a solution. Instead,
take another look at the warning back at the beginning of this section: Avoid using fork in
threaded code except where the child process will immediately exec a new program.
POSIX specifies a small set of functions that may be called safely from within
signal-catching functions ("async-signal safe" functions), and fork is one of them. However, none
of the POSIX threads functions is async-signal safe (and there are good reasons for this, because
being async-signal safe generally makes a function substantially more expensive). With the
introduction of fork handlers, however, a call to fork is also a call to some set of fork handlers.
The purpose of a fork handler is to allow threaded code to protect synchronization state and
data invariants across a fork, and in most cases that requires locking mutexes. But you cannot lock
mutexes from a signal-catching function. So while it is legal to call fork from within a
signal-catching function, doing so may (beyond the control or knowledge of the caller) require
performing other operations that cannot be performed within a signal-catching function.
This is an inconsistency in the POSIX standard that will need to be fixed. Nobody yet knows
what the eventual solution will be. My advice is to avoid using fork in a signal-catching function.
6.1.1 Fork handlers int pthread_atferk (void (*prepare)(void), void (*parent) (void), void (*child)(void) );
Pthreads added the pthread_atfork "fork handler" mechanism to allow your code to protect
data invariants across fork. This is somewhat analogous to atexit, which allows a program to
perform cleanup when a process terminates. With pthread_atfork you supply three separate
handler addresses. The prepare fork handler is called before the fork takes place in the parent
process. The parent fork handler is called after the fork in the parent process, and the child fork
handler is called after the fork in the child process.
| If you write a subsystem that uses mutexes and does not establish
| fork handlers, then that subsystem will not function correctly in a
| child process after a fork,
Normally a prepare fork handler locks all mutexes used by the associated code (for a library
or an application) in the correct order to prevent deadlocks. The thread calling fork will block in
the prepare fork handler until it has locked all the mutexes. That ensures that no other threads can
have the mutexes locked or be modifying data that the child might need. The parent fork handler
need only unlock all of those mutexes, allowing the parent process and all threads to continue
normally.
The child fork handler may often be the same as the parent fork handler; but sometimes you'll
need to reset the program or library state. For example, if you use "daemon" threads to perform
functions in the background you'll need to either record the fact that those threads no longer exist
or create new threads to perform the same function in the child. You may need to reset counters,
free heap memory, and so forth.
| Your fork handlers are only as good as everyone else's fork handlers.
The system will run all prepare fork handlers declared in the process when any thread calls
fork. If you code your prepare and child fork handlers correctly then, in principle, you will be able
to continue operating in the child process. But what if someone else didn't supply fork handlers or
didn't do it right? The ANSI C library on a threaded system, for example, must use a set of
mutexes to synchronize internal data, such as stdio file streams.
If you use an ANSI C library that doesn't supply fork handlers to prepare those mutexes
properly for a fork, for example, then, sometimes, you may find that your child process hangs
when it calls printf, because another thread in the parent process had the mutex locked when your
thread called fork. There's often nothing you can do about this type of problem except to file a
problem report against the system. These mutexes are usually private to the library, and aren't
visible to your code--you can't lock them in your prepare handler or before calling fork.
The program atfork.c shows the use of fork handlers. When run with no argument, or with a
nonzero argument, the program will install fork handlers. When run with a zero argument, such as
atfork 0, it will not.
With fork handlers installed, the result will be two output lines reporting the result of the fork
call and, in parentheses, the pid of the current process. Without fork handlers, the child process
will be created while the initial thread owns the mutex. Because the initial thread does not exist in
the child, the mutex cannot be unlocked, and the child process will hang--only the parent process
will print its message.
13-25 Function fork_prepare is the prepare handler. This will be called by fork, in the parent
process, before creating the child process. Any state changed by this function, in particular,
mutexes that are locked, will be copied into the child process. The fork_prepare function locks
the program's mutex.
31-42 Function fork_parent is the parent handler. This will be called by fork, in the parent process,
after creating the child process. In general, a parent handler should undo whatever was done in the
prepare handler, so that the parent process can continue normally. The fork_parent function
unlocks the mutex that was locked by fork_prepare.
48-60 Function fork_child is the child handler. This will be called by fork, in the child process. In
most cases, the child handler will need to do whatever was done in the fork_parent handler to
"unlock" the state so that the child can continue. It may also need to perform additional cleanup,
for example, fork_child sets the self_pid variable to the child process's pid as well as unlocking
the process mutex.
65-91 After creating a child process, which will continue executing the thread_routine code, the
thread_routine function locks the mutex. When run with fore handlers, the fork call will be
blocked (when the prepare handler locks the mutex) until the mutex is available. Without fork
handlers, the thread will fork before main unlocks the mutex, and the thread will hang in the child
at this point.
99-106 The main program declares fork handlers unless the program is run with an argument of 0.
108-123 The main program locks the mutex before creating the thread that will fork. It then sleeps for
several seconds, to ensure that the thread will be able to call fork while the mutex is locked, and
then unlocks the mutex. The thread running thread_routine will always succeed in the parent
process, because it will simply block until main releases the lock.
However, without the fork handlers, the child process will be created while the mutex is
locked. The thread (main) that locked the mutex does not exist in the child, and cannot unlock the
mutex in the child process. Mutexes can be unlocked in the child only if they were locked by the
thread that called fork--and fork handlers provide the best way to ensure that.
atfork.c
Now, imagine you are writing a library that manages network server connections, and you
create a thread for each network connection that listens for service requests. In your prepare fork
handler you lock all of the library's mutexes to make sure the child's state is consistent and
recoverable. In your parent fork handler you unlock those mutexes and return. When designing the
child fork handler, you need to decide exactly what a fork means to your library. If you want to
retain all network connections in the child, then you would create a new listener thread for each
connection and record their identifiers in the appropriate data structures before releasing the
mutexes. If you want the child to begin with no open connections, then you would locate the
existing parent connection data structures and free them, closing the associated files that were
propagated by fork.
6.2 exec
The exec function isn't affected much by the presence of threads. The function of exec is to
wipe out the current program context and replace it with a new program. A call to exec
immediately terminates all threads in the process except the thread calling exec. They do not
execute cleanup handlers or thread-specific data destructors--the threads simply cease to exist.
All synchronization objects also vanish, except for pshared mutexes (mutexes created using
the PTHREAD_PROCESS_SHARED attribute value) and pshared condition variables, which
remain usable as long as the shared memory is mapped by some process. You should, however,
unlock any pshared mutexes that the current process may have locked--the system will not unlock
them for you.
6.3 Process exit
In a non-threaded program, an explicit call to the exit function has the same effect as
returning from the program's main function. The process exits. Pthreads adds the pthread_exit
function, which can be used to cause a single thread to exit while the process continues. In a
threaded program, therefore, you call exit when you want the process to exit, or pthread_exit when
you want only the calling thread to exit.
In a threaded program, main is effectively the "thread start function" for the process's initial
thread. Although returning from the start function of any other thread terminates that thread just as
if it had called pthread_exit, returning from main terminates the process. All memory (and threads)
associated with the process evaporate. Threads do not run cleanup handlers or thread-specific data
destructors. Calling exit has the same effect. When you don't want to make use of the initial thread
or make it wait for other threads to complete, you can exit from main by calling pthread_exit
rather than by returning or calling exit. Calling pthread_exit from main will terminate the initial
thread without affecting the other threads in the process, allowing them to continue and complete
normally.
The exit function provides a simple way to shut down the entire process. For example, if a
thread determines that data has been severely corrupted by some error, it may be dangerous to
allow the program to continue to operate on the data. When the program is somehow broken, it
might be dangerous to attempt to shut down the application threads cleanly. In that case, you
might call exit to stop all processing immediately.
6.4 Stdio Pthreads specifies that the ANSI C standard I/O (stdio) functions are thread-safe. Because the
stdio package requires static storage for output buffers and file state, stdio implementations will
use synchronization, such as mutexes or semaphores.
6.4.1 flockfile and funlockfile void flockfile (FILE *file);
int ftrylockfile (FILE *file);
void funlockfile (FILE *file);
In some cases, it is important that a sequence of stdio operations occur in uninterrupted
sequence; for example, a prompt followed by a read from the terminal, or two writes that need to
appear together in the output file even if another thread attempts to write data between the two
stdio calls. Therefore, Pthreads adds a mechanism to lock a file and specifies how file locking
interacts with internal stdio locking. To write a prompt string to stdin and read a response from
stdout without allowing another thread to read from stdin or write to stdout between the two, you
would need to lock both stdin and stdout around the two calls as shown in the following program,
flock.c.
19-20 This is the important part: Two separate calls to flockfile are made, one for each of the two
file streams. To avoid possible deadlock problems within stdio, Pthreads recommends always
locking input streams before output streams, when you must lock both. That's good advice, and
I've taken it by locking stdin before stdout.
29-30 The two calls to funlockfile must, of course, be made in the opposite order. Despite the
specialized call, you are effectively locking mutexes within the stdio library, and you should
respect a consistent lock hierarchy.
flock.c
You can also use the flockfile and funlockfile functions to ensure that a series of writes is not
interrupted by a file access from some other thread. The ftrylockfile function works like
pthread_mutex_trylock in that it attempts to lock the file and, if the file is already locked, returns
an error status instead of blocking.
6.4.2 getchar_unlocked and putchar_unlocked
int getc_unlocked (FILE *stream);
int getchar_unlocked (void);
int putc_unlocked (int c, FILE *stream);
int putchar_unlocked (int c);
ANSI C provides functions to get and put single characters efficiently into stdio buffers. The
functions getchar and putchar operate on stdin and stdout, respectively, and getc and putc can be
used on any stdio file stream. These are traditionally implemented as macros for maximum
performance, directly reading or writing the file stream's data buffer. Pthreads, however, requires
these functions to lock the stdio stream data, to prevent code from accidentally corrupting the stdio
buffers.
The overhead of locking and unlocking mutexes will probably vastly exceed the time spent
performing the character copy, so these functions are no longer high performance. Pthreads could
have defined new functions that provided the locked variety rather than redefining the existing
functions; however, the result would be that existing code would be unsafe for use in threads. The
working group decided that it was preferable to make existing code slower, rather than to make it
incorrect.
Pthreads adds new functions that replace the old high-performance macros with essentially
the same implementation as the traditional macros. The functions getc_unlocked, putc_unlocked,
getchar_unlocked, and putchar_unlocked do not perform any locking, so you must use flockfile
and funlockfile around any sequence of these operations. If you want to read or write a single
character you should usually use the locked variety rather than locking the file stream, calling the
new unlocked get or put function, and then unlocking the file stream.
If you want to perform a sequence of fast character accesses, where you would have
previously used getchar and putchar, you can now use getchar_unlocked and putchar_unlocked.
The following program, putchar.c, shows the difference between using putchar and using a
sequence of putchar_unlocked calls within a file lock.
9-20 When the program is run with a nonzero argument or no argument at all, it creates threads
running the lock_routine function. This function locks the stdout file stream, and then writes its
argument (a string) to stdout one character at a time using putchar_unlocked.
29-37 When the program is run with a zero argument, it creates threads running the unlock_routine
function. This function writes its argument to stdout one character at a time using putchar.
Although putchar is internally synchronized to ensure that the stdio buffer is not corrupted, the
individual characters may appear in any order.
putchar.c
6.5 Thread-safe functions
Although ANSI C and POSIX 1003.1-1990 were not developed with threads in mind, most of
the functions they define can be made thread-safe without changing the external interface. For
example, although malloc and free must be changed to support threads, code calling these
functions need not be aware of the changes. When you call malloc, it locks a mutex (or perhaps
several mutexes) to perform the operation, or may use other equivalent synchronization
mechanisms. But your code just calls malloc as it always has, and it does the same thing as
always.
In two main classes of functions, this is not true:
� Functions that traditionally return pointers to internal static buffers, for example,
asctime. An internal mutex wouldn't help, since the caller will read the formatted time
string some time after the function returns and, therefore, after the mutex has been
unlocked.
� Functions that require static context between a series of calls, for example, strtok, which
stores the current position within the token string in a local static variable. Again, using
a mutex within strtok wouldn't help, because other threads would be able to overwrite
the current location between two calls.
In these cases, Pthreads has defined variants of the existing functions that are thread-safe,
which are designated by the suffix "_r" at the end of the function name. These variants move
context outside the library, under the caller's control. When each thread uses a private buffer or
context, the functions are thread-safe. You can also share context between threads if you want--but
the caller must provide synchronization between the threads. If you want two threads to search a
directory in parallel, you must synchronize their use of the shared struct dirent passed to readdir_r.
A few existing functions, such as ctermid, are already thread-safe as long as certain
restrictions are placed on parameters. These restrictions are noted in the following sections.
6.5.1 User and terminal identification int getlogin_r (char *name, size_t namesize);
char *ctermid (char *s);
int ttyname_r (int fildes, char *name, size_t namesize);
These functions return data to a caller-specified buffer. For getlogin_r, name-size must be at
least LOGIN_NAME_MAX characters. For ttyname_r, name-size must be at least TTY_NAME_
MAX characters. Either function returns a value of 0 on success, or an error number on failure. In
addition to errors that might be returned by getlogin or ttyname, getlogin_r and ttyname_r may
return ERANGE to indicate that the name buffer is too small.
Pthreads requires that when ctermid (which has not changed) is used in a threaded
environment, the s return argument must be specified as a pointer to a character buffer having at
least L_ctermid bytes. It was felt that this restriction was sufficient, without defining a new variant
to also specify the size of the buffer. Program getlogin.c shows how to call these functions. Notice
that these functions do not depend on threads, or <pthread.h>, in any way, and may even be
provided on systems that don't support threads.
getlogin.c
6.5.2 Directory searching
int readdir_r (DIR *dirp, struct dirent *entry, struct dirent **result);
This function performs essentially the same action as readdir. That is, it returns the next
directory entry in the directory stream specified by dirp. The difference is that instead of returning
a pointer to that entry, it copies the entry into the buffer specified by entry. On success, it returns 0
and sets the pointer specified by result to the buffer entry. On reaching the end of the directory
stream, it returns 0 and sets result to NULL. On failure, it returns an error number such as
EBADF.
Refer to program pipe.c, in Section 4.1, for a demonstration of using readdir_r to allow your
threads to search multiple directories concurrently.
6.5.3 String token char *strtok_r (char *s, const char *sep, char **lasts);
This function returns the next token in the string s. Unlike strtok, the context (the current
pointer within the original string) is maintained in lasts, which is specified by the caller, rather
than in a static pointer internal to the function.
In the first call of a series, the argument s gives a pointer to the string. In subsequent calls to
return successive tokens of that string, s must be specified as NULL. The value lasts is set by
strtok_r to maintain the function's position within the string, and on each subsequent call you must
return that same value of lasts. The strtok_r function returns a pointer to the next token, or NULL
when there are no more tokens to be found in the original string.
6.5.4 Time representation
char *asctime_r (const struct tm *tm, char *buf);
char *ctime_r (const time_t *clock, char *buf);
struct tm *gmtime_r (const time_t *clock, struct tm *result);
struct tm *localtime_r (const time_t *clock, struct tm *result);
The output buffers (buf and result) are supplied by the caller, instead of returning a pointer to
static storage internal to the functions. Otherwise, they are identical to the traditional variants. The
asctime_r and ctime_r routines, which return ASCII character representations of a system time,
both require that their buf argument point to a character string of at least 26 bytes.
6.5.5 Random number generation int rand_r (unsigned int *seed);
The seed is maintained in caller-supplied storage (seed) rather than using static storage
internal to the function. The main problem with this interface is that it is not usually practical to
have a single seed shared by all application and library code within a program. As a result, the
application and each library generally have a separate "stream" of random numbers. Thus, a
program converted to use rand_r instead of rand is likely to generate different results, even if no
threads are created. (Creating threads would probably change the order of calls to rand, and
therefore change results anyway.)
6.5.6 Group and user database Group database:
int getgrgid_r (gid_t gid, struct group *grp, char *buffer, size_t bufsize,
struct group **result);
int getgrnam_r (const char *name, struct group *grp, char *buffer, size_t bufsize,
struch group **result);
User database:
int getpwuid_r (uid_t uid, struct passwd *pwd, char *buffer, size_t bufsize,
struct passwd **result);
int getpwnam_r (const char *name, struct passwd *pwd, char *buffer,
size_t bufsize, struct passwd **result);
These functions store a copy of the group or user record (grp or pwd, respectively) for the
specified group or user (gid, uid, or name) in a buffer designated by the arguments buffer and
bufsize. The function return value is in each case either 0 for success, or an error number (such as
ERANGF. when the buffer is too small) to designate an error. If the requested record is not present
in the group or passwd database, the functions may return success but store the value NULL into
the result pointer. If the record is found and the buffer is large enough, result becomes a pointer to
the struct group or struct passwd record within buffer.
he maximum required size for buffer can be determined by calling sysconf with the argument
_SC_GETGR_R_SIZE_MAX (for group data) or with the argument _SC_GETPW_R_SIZE
_MAX (for user data).
6.6 Signals Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!
--Lewis Carroll, Through the Looking-Glass
The history of the Pthreads signal-handling model is the most convoluted and confusing part
of the standard. There were several different viewpoints, and it was difficult to devise a
compromise that would satisfy everyone in the working group (much less the larger and more
diverse balloting group). This isn't surprising, since signals are complicated anyway, and have a
widely divergent history in the industry.
There were two primary conflicting goals:
� First, "signals should be completely compatible with traditional UNIX." That means
signal handlers and masks should remain associated with the process. That makes them
virtually useless with multiple threads, which is as it should be since signals have
complicating semantics that make it difficult for signals and threads to coexist
peacefully. Tasks should be accomplished synchronously using threads rather than
asynchronously using signals.
� Second, "signals should be completely compatible with traditional UNIX." This time,
"compatible" means signal handlers and masks should be completely thread-private.
Most existing UNIX code would then function essentially the same running within a
thread as it had within a process. Code migration would be simplified.
The problem is that the definitions of "compatible" were incompatible. Although many people
involved in the negotiation may not agree with the final result, nearly everyone would agree that
those who devised the compromise did an extraordinarily good job, and that they were quite
courageous to attempt the feat.
| When writing threaded code, treat signals as Jabberwocks—
| curious and potentially dangerous creatures to be
| approached with caution, if at all.
It is always best to avoid using signals in conjunction with threads. At the same time, it is
often not possible or practical to keep them separate. When signals and threads meet, beware. If at
all possible, use only pthread_sigmask to mask signals in the main thread, and sigwait to handle
signals synchronously within a single thread dedicated to that purpose. If you must use sigaction
(or equivalent) to handle synchronous signals (such as SIGSEGV) within threads, be especially
cautious. Do as little work as possible within the signal-catching function.
6.6.1 Signal actions
All signal actions are process-wide. A program must coordinate any use of sigaction between
threads. This is non-modular, but also relatively simple, and signals have never been modular. A
function that dynamically modifies signal actions, for example, to catch or ignore SIGFPE while it
performs floating-point operations, or SIGPIPE while it performs network I/O, will be tricky to
code on a threaded system.
While modifying the process signal action, for a signal number is itself thread-safe, there is
no protection against some other thread setting a new signal action immediately afterward. Even if
the code tries to be "good" by saving the original signal action and restoring it, it may be foiled by
another thread, as shown in Figure 6.1.
Signals that are not "tied" to a specific hardware execution context are delivered to one
arbitrary thread within the process. That means a SIGCHLD raised by a child process termination,
for example, may not be delivered to the thread that created the child. Similarly, a call to kill
results in a signal that may be delivered to any thread.
Thread 1 Thread 2 Comments
sigaction ( SIGFPE )
Generate SIGFPE
Restore action
sigaction ( SIGFPE )
restore action
Thread l's signal action active.
Thread 2's signal action active.
Thread 1 signal is handled by the thread 2 signal
action (but still in the context of thread 1).
Thread 1 restores original signal action.
Thread 2 restores thread l's signal action--
original action is lost.
FIGURE 6.1 Non-modularity of signal actions
The synchronous "hardware context" signals, including SIGFPE, SIGSEGV, and SIGTRAP,
are delivered to the thread that caused the hardware condition, never to another thread.
| You cannot kill a thread by sending it a SIGKILL or stop a thread by
| sending it o SIGSTOP.
Any signal that affected a process still affects the process when multiple threads are active,
which means that sending a SIGKILL to a process or to any specific thread in the process (using
pthread_kil1, which we'll get to in Section 6.6.3) will terminate the process. Sending a SIGSTOP
will cause all threads to stop until a SIGCONT is received. This ensures that existing process
control functions continue to work--otherwise most threads in a process could continue running
when you stopped a command by sending a SIGSTOP. This also applies to the default action of
the other signals, for example, SIGSEGV, if not handled, will terminate the process and generate a
core file--it will not terminate only the thread that generated the SIGSEGV.
What does this mean to a programmer? It has always been common wisdom that library code
should not change signal actions--that this is exclusively the province of the main program. This
philosophy becomes even more wise when you are programming with threads. Signal actions must
always be under the control of a single component, at least, and to assign that responsibility to the
main program makes the most sense in nearly all situations.
6.6.2 Signal masks
int pthread_sigmask (int how, const sigset_t *set, sigset_t (oset);
Each thread has its own private signal mask, which is modified by calling pthread_sigmask.
Pthreads does not specify what sigprocmask does within a threaded process--it may do nothing.
Portable threaded code does not call sigprocmask. A thread can block or unblock signals without
affecting the ability of other threads to handle the signal. This is particularly important for
synchronous signals. It would be awkward if thread A were unable to process a SIGFPF because
thread B was currently processing its own SIGFPE or, even worse, because thread C had blocked
SIGFPE. When a thread is created, it inherits the signal mask of the thread that created it--if you
want a signal to be masked everywhere, mask it first thing in main.
6.6.3 pthread_kill int pthread_kill (pthread_t thread, int sig);
Within a process, one thread can send a signal to a specific thread (including itself) by calling
pthread_kill. When calling pthread_kill, you specify not only the signal number to be delivered,
but also the pthread_t identifier for the thread to which you want the signal sent. You cannot use
pthread_kill to send a signal to a thread in another process, however, because a thread identifier
(pthread_t) is meaningful only within the process that created it.
The signal sent by pthread_kill is handled like any other signal. If the "target" thread has the
signal masked, it will be marked pending against that thread. If the thread is waiting for the signal
in sigwait (covered in Section 6.6.4), the thread will receive the signal. If the thread does not have
the signal masked, and is not blocked in sigwait, the current signal action will be taken.
Remember that, aside from signal-catching functions, signal actions affect the process.
Sending the SIGKILL signal to a specific thread using pthread_kill will kill the process, not just
the specified thread. Use pthread_cancel to get rid of a particular thread (see Section 5.3). Sending
SIGSTOP to a thread will stop all threads in the process until a SIGCONT is sent by some other
process.
The raise function specified by ANSI C has traditionally been mapped to a kill for the current
process. That is, raise (SIGABRT) is usually the same as kill(getpid (), SIGABRT).
With multiple threads, code calling raise is most likely to intend that the signal be sent to the
calling thread, rather than to some arbitrary thread within the process. Pthreads specifies that raise
(SIGABRT) is the same as pthread_kill(pthread_self (), SIGABRT).
The following program, susp.c, uses pthread_kill to implement a portable "suspend and
resume" (or, equivalently, "suspend and continue") capability much like that provided by the
Solaris "UI threads" interfaces thr_suspend and thr_continue.* You call the thd_suspend function
with the pthread_t of a thread, and when the function returns, the specified thread has been
suspended from execution. The thread cannot execute until a later call to thd_continue is made
with the same pthread_t.
* The algorithm (and most of the code) for susp.c was developed by a coworker of mine, Brian
Silver. The code shown here is a simplified version for demonstration purposes.
A request to suspend a thread that is already suspended has no effect. Calling thd_continue a
single time for a suspended thread will cause it to resume execution, even if it had been suspended
by multiple calls to thd_suspend. Calling thd_continue for a thread that is not currently suspended
has no effect.
Suspend and resume are commonly used to solve some problems, for example, multithread
garbage collectors, and may even work sometimes if the programmer is very careful. This
emulation of suspend and resume may therefore be valuable to the few programmers who really
need these functions. Beware, however, that should you suspend a thread while it holds some
resource (such as a mutex), application deadlock can easily result.
6 The symbol ITERATIONS defines how many times the "target" threads will loop. If this
value is set too small, some or all of the threads will terminate before the main thread has been
able to suspend and continue them as it desires. If that happens, the program will fail with an error
message--increase the value of ITERATIONS until the problem goes away.
12 The variable sentinel is used to synchronize between a signal-catching function and another
thread. "Oh?" you may ask, incredulously. This mechanism is not perfect--the suspending thread
(the one calling thd_suspend) waits in a loop, yielding the processor until this sentinel changes
state. The volatile storage attribute ensures that the signal-catching function will write the value to
memory.* Remember, you cannot use a mutex within a signal-catching function.
*A semaphore, as described later in Section 6.6.6, would provide cleaner, and somewhat safer,
synchronization, The thd_suspend would call sem_wait on a semaphore with an initial value of 0,
and the signal-catching function would call sem_post to wake it.
22-40 The suspend_signal_handler function will be established as the signal-catching function for
the "suspend" signal, SIGUSR1. It initializes a signal mask to block all signals except SIGUSR2,
which is the "resume" signal, and then waits for that signal by calling sigsuspend. Just before
suspending itself, it sets the sentinel variable to inform the suspending thread that it is no longer
executing user code for most practical purposes, it is already suspended.
The purpose for this synchronization between the signal-catching function and thd_suspend
is that, to be most useful, the thread calling thd_suspend must be able to know when the target
thread has been successfully suspended. Simply calling pthread_kill is not enough, because the
system might not deliver the signal for a substantial period of time; we need to know when the
signal has been received.
47-51 The resume_signal_handler function will be established as the signal-catching function for
the "resume" signal, SlGUSR1. The function isn't important, since the signal is sent only to
interrupt the call to sigsuspend in suspend_signal_handler.
susp.c partl signal-catchingfunctions
The suspend_init_routine function dynamically initializes the suspend/resume package when
the first call to thd_suspend is made. It is actually called indirectly by pthread_once.
15-16 It allocates an initial array of thread identifiers, which is used to record the identifiers of all
threads that have been suspended. This array is used to ensure that multiple calls to thd_suspend
have no additional effect on the target thread, and that calling thd_continue for a thread that is not
suspended has no effect.
21-35 It sets up signal actions for the SIGUSR1 and SIGUSR2 signals, which will be used,
respectively, to suspend and resume threads.
susp.c part 2 initialization
9-40 The thd_suspend function suspends a thread, and returns when that thread has ceased to
execute user code. It first ensures that the suspend/resume package is initialized by calling
pthread_once. Under protection of a mutex, it searches for the target thread's identifier in the array
of suspended thread identifiers. If the thread is already suspended, thd_suspend returns
successfully.
47-60 Determine whether there is an empty entry in the array of suspended threads and, if not,
realloc the array with an extra entry.
65-78 The sentinel variable is initialized to 0, to detect when the target thread suspension occurs.
The thread is sent a SlGUSR1 signal by calling pthread_kill, and thd_suspend loops, calling
sched_yield to avoid monopolizing a processor, until the target thread responds by setting sentinel.
Finally, the suspended thread's identifier is stored in the array.
susp.c part 3 thd_suspend
23-26 The thd_continue function first checks whether the suspend/resume package has been
initialized (inited is not 0). If it has not been initialized, then no threads are suspended, and
thd_continue returns with success.
33-39 If the specified thread identifier is not found in the array of suspended threads, then it is not
suspended--again, return with success.
45-51 Send the resume signal, SIGUSR2. There's no need to wait--the thread will resume whenever
it can, and the thread calling thd_continue doesn't need to know.
susp.c part 4 thd_continue
2-25 The thread_routine function is the thread start routine for each of the "target" threads created
by the program. It simply loops for a substantial period of time, periodically printing a status
message. On each iteration, it yields to other threads to ensure that the processor time is
apportioned "fairly" across all the threads.
Notice that instead of calling printf, the function formats a message with sprintf and then
displays it on stdout (file descriptor 1) by calling write. This illustrates one of the problems with
using suspend and resume (thd_suspend and thd_continue) for synchronization. Suspend and
resume are scheduling functions, not synchronization functions, and using scheduling and
synchronization controls together can have severe consequences.
| Incautious use of suspend and resume can deadlock your application.
In this case, if a thread were suspended while modifying a stdio stream, all other threads that
tried to modify that stdio stream might block, waiting for a mutex that is locked by the suspended
thread. The write function, on the other hand, is usually a call to the kernel--the kernel is atomic
with respect to signals, and therefore can't be suspended. Use of write, therefore, cannot cause a
deadlock.
In general, you cannot suspend a thread that may possibly hold any resource, if that resource
may be required by some other thread before the suspended thread is resumed. In particular, the
result is a deadlock if the thread that would resume the suspended thread first needs to acquire the
resource. This prohibition includes, especially, mutexes used by libraries you call--such as the
mutexes used by malloc and free, or the mutexes used by stdio.
36-42 Threads are created with an attributes object set to create threads detached, rather than
joinable. The result is that threads will cease to exist as soon as they terminate, rather than
remaining until main calls pthread_join. The pthread_kill function does not necessarily fail if you
attempt to send a signal to a terminated thread (the standard is silent on this point), and you may
be merely setting a pending signal in a thread that will never be able to act on it. If this were to
occur, the thd_suspend routine would hang waiting for the thread to respond. Although
pthread_kill may not fail when sending to a terminated thread, it will fail when sending to a thread
that doesn't exist--so this attribute converts a possible hang, when the program is run with
ITERATTONS set too low, into an abort with an error message.
51-85 The main thread sleeps for two seconds after creating the threads to allow them to reach a
"steady state." It then loops through the first half of the threads, suspending each of them. It waits
an additional two seconds and then resumes each of the threads it had suspended. It waits another
two seconds, suspends each of the remaining threads (the second half), and then after another two
seconds resumes them.
By watching the status messages printed by the individual threads, you can see the pattern of
output change as the threads are suspended and resumed.
susp.c part 5 sampleprogram
6.6.4 sigwait and sigwaitinfo
int sigwait (const siqset_t *set, int *siq);
#ifdef _POSIX_REALTIME_SIGNALS
int sigwaitinfo (const sigset_t *set, siginfo_t *info);
int sigtimedwait (const sigset_t *set; siginfo_t* info, const struct timespec * timeout);
#endif
| Always use sigwait to work with asynchronous signals within threaded
| code.
Pthreads adds a function to allow threaded programs to deal with "asynchronous" signals
synchronously. That is, instead of allowing a signal to interrupt a thread at some arbitrary point, a
thread can choose to receive a signal synchronously. It does this by calling sigwait, or one of
sigwait's siblings.
| The signals for which you sigwait must be masked in the sigwaiting
| thread, and should usually be masked in all threads.
The sigwait function takes a signal set as its argument, and returns a signal number when any
signal in that set occurs. You can create a thread that waits for some signal, for example, SIGINT,
and causes some application activity when it occurs. The non-obvious rule is that the signals for
which you wait must be masked before calling sigwait. In fact, you should ideally mask these
signals in main, at the start of the program. Because signal masks are inherited by threads you
create, all threads will (by default) have the signal masked. This ensures that the signal will never
be delivered to any thread except the one that calls sigwait.
Signals are delivered only once. If two threads are blocked in sigwait, only one of them will
receive a signal that's sent to the process. This means you can't, for example, have two
independent subsystems using sigwait that catch SIGINT. It also means that the signal will not be
caught by sigwait in one thread and also delivered to some signal-catching function in another
thread. That's not so bad, since you couldn't do that in the old non-threaded model either--only one
signal action can be active at a time.
While sigwait, a Pthreads function, reports errors by returning an error number, its siblings,
sigwaitinfo and sigtimedwait, were added to POSIX prior to Pthreads, and use the older errno
mechanism. This is confusing and awkward, and that is unfortunate. The problem is that they deal
with the additional information supplied by the POSIX realtime signals option (<unistd.h> defines
the symbol _POSIX_REALTIME_SIGNALS), and the POSIX realtime amendment, POSIX.lb,
was completed before the Pthreads amendment.
Both sigwaitinfo and sigtimedwait return the realtime signal information, siginfo_t, for
signals received. In addition, sigtimedwait allows the caller to specify that sigtimedwait should
return with the error EAGAIN in the event that none of the selected signals is received within the
specified interval.
The sigwait.c program creates a "sigwait thread" that handles SIGINT.
23-41 The signal_waiter thread repeatedly calls sigwait, waiting for a SIGINT signal. It counts five
occurrences of SZGINT (printing a message each time), and then signals a condition variable on
which main is waiting. At that time, main will exit.
61-65 The main program begins by masking SIGINT. Because all threads inherit their initial signal
mask from their creator, SIGINT will be masked in all threads. This prevents SIGINT from being
delivered at any time except when the signal_waiter thread is blocked in sigwait and ready to
receive the signal.
sigwait.c
6.6.5 SIGEV_THREAD
Some of the functions in the POSIX.lb realtime standard, which provide for asynchronous
notification, allow the programmer to give specific instructions about how that notification is to be
accomplished. For example, when initiating an asynchronous device read or write using aio_read
or aio_write, the programmer specifies a struct aiocb, which contains, among other members, a
struct sigevent. Other functions that accept a struct sigevent include timer_create (which creates a
per-process timer) and sigqueue (which queues a signal to a process).
The struct sigevent structure in POSIX.lb provides a "notification mechanism" that allows the
programmer to specify whether a signal is to be generated, and, if so, what signal number should
be used. Pthreads adds a new notification mechanism called SIGEV_THREAD. This new
notification mechanism causes the signal notification function to be run as if it were the start
routine of a thread.
Pthreads adds several members to the POSIX.lb struct sigevent structure. The new members
are sigev_notify_function, a pointer to a thread start function; and sigev_notify_attributes, a
pointer to a thread attributes object (pthread_attr_t) containing the desired thread creation
attributes. If sigev_notify_attributes is NULL, the notify thread is created as if the detachstate
attribute was set to PTHREAD_CREATE_DETACHED. This avoids a memory leak--in general,
the notify thread's identifier won't be available to any other thread. Furthermore, Pthreads says that
the result of specifying an attributes object that has the detachstate attribute set to PTHREAD_
CREATE_JOINABLE is "undefined." (Most likely, the result will be a memory leak because the
thread cannot be joined--if you are lucky, the system may override your choice and create it
detached anyway.)
The SIGEV_THREAD notification function may not actually be run in a new
thread--Pthreads carefully specifies that it behaves as if it were run in a new thread, just as I did a
few paragraphs ago. The system may, for example, queue SIGEV_THREAD events and call the
start routines, serially, in some internal "server thread." The difference is effectively
indistinguishable to the application. A system that uses a server thread must be very careful about
the attributes specified for the notification thread--for example, scheduling policy and priority,
contention scope, and minimum stack size must all be taken into consideration.
The SIGEV_THREAD feature is not available to any of the "traditional" signal generation
mechanisms, such as setitimer, or for SIGCHLD, SIGINT, and so forth. Those who are
programming using the POSIX.lb "realtime signal" interfaces, including timers and asynchronous
I/O, may find this new capability useful.
The following program, sigev_thread.c, shows how to use the SIGEV_THREAD notification
mechanism for a POSIX.lb timer.
20-37 The function timer_thread is specified as the "notification function" (thread start routine) for
the SIGEV_THREAD timer. The function will be called each time the timer expires. It counts
expirations, and wakes the main thread after five. Notice that, unlike a signal-catching function,
the SIGEV_THREAD notification function can make full use of Pthreads synchronization
operations. This can be a substantial advantage in many situations.
45-51 Unfortunately, neither Solaris 2.5 nor Digital UNIX 4.0 correctly implemented SIGEV_
THREAD. Thus, unlike all other examples in this book, this code will not compile on Solaris 2.5.
This #ifdef block allows the code to compile, and to fail gracefully if the resulting program is run,
with an error message. Although the program will compile on Digital UNIX 4.0, it will not run.
The implementation of SIGEV_THREAD has been fixed in Digital UNIX 4.0D, which should be
available by the time you read this, and it should also be fixed in Solaris 2.6.
56-59 These statements initialize the sigevent structure, which describes how the system should
notify the application when an event occurs. In this case, we are telling it to call timer_thread
when the timer expires, and to use default attributes.
sigev_thread.c
6.6.6 Semaphores: synchronizing with a signal-catching function #ifdef _POSIX_SEMAPHORS
int sem_init (sem_t *sem, int pshared, unsigned int value);
int sem_destroy (sem_t *sem);
int sem_wait (sem_t *sem);
int sem_trymake (sem_t *sem);
int sem_post (sem_t *sem);
int sem_getvalue (sem_t *sem, int *sval);
#endif
Although mutexes and condition variables provide an ideal solution to most synchronization
needs, they cannot meet all needs. One example of this is a need to communicate between a
POSIX signal-catching function and a thread waiting for some asynchronous event. In new code,
it is best to use sigwait or sigwaitinfo rather than relying on a signal-catching function, and this
neatly avoids this problem. However, the use of asynchronous POSIX signal-catching functions is
well established and widespread, and most programmers working with threads and existing code
will probably encounter this situation.
To awaken a thread from a POSIX signal-catching function, you need a mechanism that's
reentrant with respect to POSIX signals (async-signal safe). POSIX provides relatively few of
these functions, and none of the Pthreads functions is included. That's primarily because an
async-signal safe mutex lock operation would be many times slower than one that isn't
async-signal safe. Outside of the kernel, making a function async-signal safe usually requires that
the function mask (block) signals while it runs--and that is expensive.
In case you're curious, here is the full list of POSIX 1003.1-1996 functions that are
async-signal safe (some of these functions exist only when certain POSIX options are defined,
such as _POSIX_ASYNCHRONOUS_IO or _POS IX_TIMERS):
access getoverrun sigismember
aio_error getgroups sigpending
aio_return getpgrp sigprocmask
aio_suspend getpid sigqueue
alarm getppid sigsuspend
cfgetispeed getuid sleep
cfgetospeed kill stat
cfsetispeed link sysconf
cfsetospeed lseek tcdrain
chdir mkdir tcflow
chmod mkfifo tcflush
chown open tcgetattr
clock_gettime pathconf tcgetpgrp
close pause tcsendbreak
creat pipe tcsetattr
dup2 read tcsetpgrp
dup rename time
execle rmdir timer_getoverrun
execve sem_post timer_gettime
exit setgid timer_settime
cntl setpgid times
fdatasync setsid umask
fork setuid uname
fstat sigaction unlink
fsync sigaddset utime
getegid sigdelset wait
geteuid sigemptyset waitpid
getgid sigfillset write
POSIX.lb provides counting semaphores, and most systems that support Pthreads also
support POSIX.lb semaphores. You may notice that the sem_post function, which wakes threads
waiting on a semaphore, appears in the list of async-signal safe functions. If your system supports
POSIX semaphores (<unistd.h> defines the _POSIX_SEMAPHORES option), then Pthreads adds
the ability to use semaphores between threads within a process. That means you can post a
semaphore, from within a POSIX signal-catching function, to wake a thread in the same process
or in another process.
A semaphore is a different kind of synchronization object--it is a little like a mutex, a little
like a condition variable. The differences can make semaphores a little harder to use for many
common tasks, but they make semaphores substantially easier to use for certain specialized
purposes. In particular, semaphores can be posted (unlocked or signaled) from a POSIX
signal-catching function.
| Semaphores are a general synchronization mechanism.
| We just have no reason to use them that way.
I am emphasizing the use of semaphores to pass information from a signal-catching function,
rather than for general use, for a couple of reasons. One reason is that semaphores are part of a
different standard. As I said, most systems that support Pthreads will also support POSIX.lb, but
there is no such requirement anywhere in the standard. So you may well find yourself without
access to semaphores, and you shouldn't feel dependent on them. (Of course, you may also find
yourself with semaphores and without threads--but in that case, you should be reading a different
book.)
Another reason for keeping semaphores here with signals is that, although semaphores are a
completely general synchronization mechanism, it can be more difficult to solve many problems
using semaphores--mutexes and condition variables are simpler. If you've got Pthreads, you only
need semaphores to handle this one specialized function--waking a waiting thread from a
signal-catching function. Just remember that you can use them for other things when they're
convenient and available.
POSIX semaphores contain a count, but no "owner," so although they can be used essentially
as a lock, they can also be used to wait for events. The terminology used in the POSIX semaphore
operations stresses the "wait" behavior rather than the "lock" behavior. Don't be confused by the
names, though; there's no difference between "waiting" on a semaphore and "locking" the
semaphore.
A thread waits on a semaphore (to lock a resource, or wait for an event) by calling sem_wait.
If the semaphore counter is greater than zero, sem_wait decrements the counter and returns
immediately. Otherwise, the thread blocks. A thread can post a semaphore (to unlock a resource,
or awaken a waiter) by calling sem_post. If one or more threads are waiting on the semaphore,
sem_post will wake one waiter (the highest priority, or earliest, waiter). If no threads are waiting,
the semaphore counter is incremented.
The initial value of the semaphore counter is the distinction between a "lock" semaphore and
a "wait" semaphore. By creating a semaphore with an initial count of 1, you allow one thread to
complete a sem_wait operation without blocking--this "locks" the semaphore. By creating a
semaphore with an initial count of 0, you force all threads that call sem_wait to block until some
thread calls sem_post.
The differences in how semaphores work give the semaphore two important advantages over
mutexes and condition variables that may be of use in threaded programs:
1. Unlike mutexes, semaphores have no concept of an "owner." This means that any thread
may release threads blocked on a semaphore, much as if any thread could unlock a mutex
that some thread had locked. (Although this is usually not a good programming model,
there are times when it is handy.)
2. Unlike condition variables, semaphores can be independent of any external state. Condition
variables depend on a shared predicate and a mutex for waiting--semaphores do not.
A semaphore is represented in your program by a variable of type sem_t. You should never
make a copy of a sem_t variable--the result of using a copy of a sem_t variable in the sem_wait,
sem_trywait, sem_post, and sem_destroy functions is undefined. For our purposes, a sem_t
variable is initialized by calling the sem_init function. POSIX.lb provides other ways to create a
"named" semaphore that can be shared between processes without sharing memory, but there is no
need for this capability when using a semaphore within a single process.
Unlike Pthreads functions, the POSIX semaphore functions use errno to report errors. That is,
success is designated by returning the value 0, and errors are designated by returning the value -1
and setting the variable errno to an error code.
If you have a section of code in which you want up to two threads to execute simultaneously
while others wait, you can use a semaphore without any additional state. Initialize the semaphore
to the value 2; then put a sem_wait at the beginning of the code and a sem_post at the end. Two
threads can then wait on the semaphore without blocking, but a third thread will find the
semaphore's counter at 0, and block. As each thread exits the region of code it posts the semaphore,
releasing one waiter (if any) or restoring the counter.
The sem_getvalue function returns the current value of the semaphore counter if there are no
threads waiting. If threads are waiting, sem_getvalue returns a negative number. The absolute
value of that number tells how many threads are waiting on the semaphore. Keep in mind that the
value it returns may already be incorrect--it can change at any time due to the action of some other
thread.
The best use for sem_getvalue is as a way to wake multiple waiters, somewhat like a
condition variable broadcast. Without sem_getvalue, you have no way of knowing how many
threads might be blocked on a semaphore. To "broadcast” a semaphore, you could call
sem_getvalue and sem_post in a loop until sem_getvalue reports that there are no more waiters.
But remember that other threads can call sem_post during this loop, and there is no
synchronization between the various concurrent calls to sero_post and sem_getvalue. You can
easily end up issuing one or more extra calls to sero_post, which will cause the next thread that
calls sem_wait to find a value greater than 0, and return immediately without blocking.
The program below, semaphore_signal.c, uses a semaphore to awaken threads from within a
POSIX signal-catching function. Notice that the sem_init call sets the initial value to 0 so that
each thread calling sem_wait will block. The main program then requests an interval timer, with a
POSIX signal-catching function that will wake one waiting thread by calling sem_post. Each
occurrence of the POSIX timer signal will awaken one waiting thread. The program will exit when
each thread has been awakened five times.
32-35 Notice the code to check for EINTR return status from the sem_wait call. The POSIX timer
signal in this program will always occur while one or more threads are blocked in sem_wait.
When a signal occurs for a process (such as a timer signal), the system may deliver that signal
within the context of any thread within the process. Likely "victims" include threads that the
kernel knows to be waiting, for example, on a semaphore. So there is a fairly good chance that the
sem_wait thread will be chosen, at least sometimes. If that occurs, the call to sem_wait will return
with EINTR. The thread must then retry the call. Treating an EINTR return as "success" would
make it appear that two threads had been awakened by each call to sem_post: the thread that was
interrupted, and the thread that was awakened by the sem_post call.
semaphore_signal.c
7 "Real code" "When we were still little," the Mock Turtle went on at last, more calmly,
though still sobbing a little now and then, "we went to school in the sea.
The master was an old Turtle--we used to call him Tortoise---"
"Why did you call him Tortoise, if he wasn't one?" Alice asked.
"We called him Tortoise because he taught us," said the
Mock Turtle angrily,
--Lewis Carroll, Alice's Adventures in Wonderland
This section builds on most of the earlier sections of the book, but principally on the mutex
and condition variable sections. You should already understand how to create both types of
synchronization objects and how they work. I will demonstrate the design and construction of
barrier and read/write lock synchronization mechanisms that are built from mutexes, condition
variables, and a dash of data. Both barriers and read/write locks are in common use, and have been
proposed for standardization in the near future. I will follow up with a work queue server that lets
you parcel out tasks to a pool of threads.
The purpose of all this is to teach you more about the subtleties of using all these new
threaded programming tools (that is, mutexes, condition variable and threads). The library
packages may be useful to you as is or as templates. Primarily, though, they are here to give me
something to talk about in this section and I have omitted some complication that may be valuable
in real code. The error detection and recovery code, for example, is fairly primitive.
7.1 Extended synchronization
Mutexes and condition variables are flexible and efficient synchronization tools. You can
build just about any form of synchronization you need using those two things. But you shouldn’t
build them from scratch every time you need them. It is nice to start with a general, modular
implementation that doesn’t need to be debugged every time. This section shows some common
and useful tools that you won’t have to redesign every time you write an application that needs
them.
First we’ll build a barrier. The function of a barrier is about what you might guess—it stops
threads. A barrier is initialized to stop a certain number of threads—when the required number of
threads have reached the barrier, all are allowed to continue.
Then we’ll build something called a read/write lock. A read/write lock allows multiple
threads to read data simultaneously, but prevents any thread from modifying data that is being read
or modified by another thread.
7.1.1 Barriers
A barrier is a way to keep the members of a group together. If our intrepid “bailing
programmers” washed up on a deserted island, for example, and they ventured into the jungle to
explore, they would want to remain together, for the illusion of safety in numbers, if for no other
reason (Figure 7.1). Any exploring programmer finding himself very far in front of the others
would therefore wait for them before continuing.
FIGURE 7.1 Barrier analogy
A barrier is usually employed to ensure that all threads cooperating in some parallel
algorithm reach a specific point in that algorithm before any can pass. This is especially common
in code that has been decomposed automatically by creating fine-grained parallelism within
compiled source code. All threads may execute the same code, with threads processing separate
portions of a shared data set (such as an array) in some areas and processing private data in
parallel in other areas. Still other areas must be executed by only one thread, such as setup or
cleanup for the parallel regions. The boundaries between these areas are often implemented using
barriers. Thus, threads completing a matrix computation may wait at a barrier until all have
finished. One may then perform setup for the next parallel segment while the others skip ahead to
another barrier. When the setup thread reaches that barrier, all threads begin the next parallel
region.
Figure 7.2 shows the operation of a barrier being used to synchronize three threads, called
thread 1, thread 2, and thread 3. The figure is a sort of timing diagram, with time increasing from
left to right. Each of the lines beginning at the labels in the upper left designates the behavior of a
specific thread—solid for thread 1, dotted for thread 2, and dashed for thread 3. When the lines
drop within the rounded rectangle, they are interacting with the barrier. If the line drops below the
center line, it shows that the thread is blocked waiting for other threads to reach the barrier. The
line that stops above the center line represents the final thread to reach the barrier, awakening all
waiters.
In this example, thread 1 and then thread 2 wait on the barrier. At a later time, thread 3 waits
on the barrier, finds that the barrier is now full, and awakens all the waiters. All three threads then
return from the barrier wait. The core of a barrier is a counter. The counter is initialized to the
number of threads in the “tour group,” the number of threads that must wait on a barrier before all
the waiters return. I’ll call that the “threshold,” to give it a simple one-word name. When each
thread reaches the barrier, it decreases the counter. If the value hasn’t reached 0, it waits. If the
value has reached 0, it wakes up the waiting threads.
FIGURE 7.2 Barrier operation
Because the counter will be modified by multiple threads, it has to be protected by a mutex.
Because threads will be waiting for some event (a counter value of 0), the barrier needs to have a
condition variable and a predicate expression. When the counter reaches 0 and the barrier drops
open, we need to reset the counter, which means the barrier must store the original threshold.
The obvious predicate is to simply wait for the counter to reach 0, but that complicates the
process of resetting the barrier. When can we reset the count to the original value? We can’t reset it
when the counter reaches 0, because at that point most of the threads are waiting on the condition
variable. The counter must be 0 when they wake up, or they’ll continue waiting. Remember that
condition variable waits occur in loops that retest the predicate.
The best solution is to add a separate variable for the predicate. We will use a “cycle”
variable that is logically inverted each time some thread determines that one cycle of the barrier is
complete. That is, whenever the counter value is reset, before broadcasting the condition variable,
the thread inverts the cycle flag. Threads wait in a loop as long as the cycle flag remains the same
as the value seen on entry, which means that each thread must save the initial value.
The header file barrier.h and the C source file barrier.c demonstrate an implementation of
barriers using standard Pthreads mutexes and condition variables. This is a portable
implementation that is relatively easy to understand. One could, of course, create a much more
efficient implementation for any specific system based on knowledge of non-portable hardware
and operating system characteristics.
6-13 Part 1 shows the structure of a barrier, represented by the type barrier_t. You can see the
mutex (mutex) and the condition variable (cv). The threshold member is the number of threads in
the group, whereas counter is the number of threads that have yet to join the group at the barrier.
And cycle is the flag discussed in the previous paragraph. It is used to ensure that a thread
awakened from a barrier wait will immediately return to the caller, but will block in the barrier if it
calls the wait operation again before all threads have resumed execution.
15 The BARRIER_VALID macro defines a “magic number,” which we store into the valid
member and then check to determine whether an address passed to other barrier interfaces is
“reasonably likely to be” a barrier. This is an easy, quick check that will catch the most common
errors.*
* I always like to define magic numbers using hexadecimal constants that can be pronounced as
English words. For barriers, I invented my own restaurant called the “DB cafe,” or, in C syntax,
0xdbcafe. Many interesting (or at least mildly amusing) English words can be spelled using only
the letters a through f. There are even more possibilities if you allow the digit 1 to stand in for the
letter l. and the digit 0 to stand in for the letter o. (Whether you like the results will depend a lot on
the typeface in which you commonly read your code.)
barrier.h part 1 barrier_t
Part 2 shows definitions and prototypes that allow you to do something with the barrier_t
structure. First, you will want to initialize a new barrier.
4-6 You can initialize a static barrier at compile time by using the macro BARRIER_
INITIALIZER. You can instead dynamically initialize a barrier by calling the function barrier_init.
11-13 Once you have initialized a barrier, you will want to be able to use it. and the main thing to
be done with a barrier is to wait on it. When we’re done with a barrier, it would be nice to be able
to destroy the barrier and reclaim the resources it used. We’ll call these operations barrier_init,
barrier_wait, and barrier_destroy. All the operations need to specify upon which barrier they will
operate. Because barriers are synchronization objects, and contain both a mutex and a condition
variable (neither of which can be copied), we always pass a pointer to a barrier. Only the
initialization operation requires a second parameter, the number of waiters required before the
barrier opens.
To be consistent with Pthreads conventions, the functions all return an integer value,
representing an error number defined in <errno.h>. The value 0 represents success.
barrier.h part 2 interfaces
Now that you know the interface definition, you could write a program using barriers. But
then, the point of this section is not to tell you how to use barriers, but to help improve your
understanding of threaded programming by showing how to build a barrier. The following
examples show the functions provided by barrier.c, to implement the interfaces we’ve just seen in
barrier.h.
Part 1 shows barrier_init, which you would call to dynamically initialize a barrier, for
example, if you allocate a barrier with malloc.
12 Both the counter and threshold are set to the same value. The counter is the “working
counter” and will be reset to threshold for each barrier cycle.
14-16 If mutex initialization fails, barrier_init returns the failing status to the caller.
17-21 If condition variable (cv) initialization fails, barrier_init destroys the mutex it had already
created and returns the failure status—the status of pthread_mutex_destroy is ignored because the
failure to create the condition variable is more important than the failure to destroy the mutex.
22 The barrier is marked valid only after all initialization is complete. This does not completely
guarantee that another thread erroneously trying to wait on that barrier will detect the invalid
barrier rather than failing in some less easily diagnosable manner, but at least it is a token attempt.
barrier.c part 1 barrier_init
Part 2 shows the barrier_destroy function, which destroys the mutex and condition variable
(cv) in the barrier t structure. If we had allocated any additional resources for the barrier, we
would need to release those resources also.
8-9 First check that the barrier appears to be valid, and initialized, by looking at the valid member.
We don’t lock the mutex first, because that will fail, possibly with something nasty like a
segmentation fault, if the mutex has been destroyed or hasn’t been initialized. Because we do not
lock the mutex first, the validation check is not entirely reliable, but it is better than nothing, and
will only fail to detect some race conditions where one thread attempts to destroy a barrier while
another is initializing it, or where two threads attempt to destroy a barrier at nearly the same time.
19-22 If any thread is currently waiting on the barrier, return EBUSY.
24-27 At this point, the barrier is “destroyed”—all that’s left is cleanup. To minimize the chances of
confusing errors should another thread try to wait on the barrier before we’re done, mark the
barrier “not valid” by clearing valid, before changing any other state. Then, unlock the mutex,
since we cannot destroy it while is locked.
33-35 Destroy the mutex and condition variable. If the mutex destruction fails return the status;
otherwise, return the status of the condition variable destruction. Or, to put it another way, return
an error status if either destruction failed otherwise, return success.
barrier.c part 2 barrier_destroy
Finally, part 3 shows the implementation of barrier_wait.
10-11 First we verify that the argument barrier appears to be a valid barrier_t. We perform this
check before locking the mutex, so that barrier_destroy can safely destroy the mutex once it has
cleared the valid member. This is a simple attempt to minimize the damage if one thread attempts
to wait on a barrier while another thread is simultaneously either initializing or destroying that
barrier.
We cannot entirely avoid problems, since without the mutex, barrier_wait has no guarantee
that it will see the correct (up-to-date) value of valid. The valid check may succeed when the
barrier is being made invalid, or fail when the barrier is being made valid. Locking the mutex first
would do no good, because the mutex may not exist if the barrier is not fully initialized, or if it is
being destroyed. This isn’t a problem as long as you use the barrier correctly—that is, you
initialize the barrier before any thread can possibly try to use it, and do not destroy the barrier until
you are sure no thread will try to use it again.
17 Copy the current value of the barrier’s cycle into a local variable. The comparison of our
local cycle against the barrier_t structure’s cycle member becomes our condition wait predicate.
The predicate ensures that all currently waiting threads will return from barrier_wait when the last
waiter broadcasts the condition variable, but that any thread that calls barrier_wait again will wait
for the next broadcast. (This is the “tricky part” of correctly implementing a barrier.)
19-22 Now we decrease counter, which is the number of threads that are required but haven’t yet
waited on the barrier. When counter reaches 0, no more threads are needed—they’re all here and
waiting anxiously to continue to the next attraction. Now all we need to do is tell them to wake up.
We advance to the next cycle, reset the counter, and broadcast the barrier’s condition variable.
28-29 Earlier, I mentioned that a program often needs one thread to perform some cleanup or setup
between parallel regions. Each thread could lock a mutex and check a flag so that only one thread
would perform the setup. However, the setup may not need additional synchronization, for
example, because the other threads will wait at a barrier for the next parallel region, and, in that
case, it would be nice to avoid locking an extra mutex.
The barrier_wait function has this capability built into it. One and only one thread will return
with the special value of -1 while the others return 0. In this particular implementation, the one
that waits last and wakes the others will take the honor, but in principle it is “unspecified” which
thread returns -1. The thread that receives -1 can perform the setup, while others race ahead. If you
do not need the special return status, treat — 1 as another form of success. The proposed POSIX.lj
standard has a similar capability—one (unspecified) thread completing a barrier will return the
status BARRIER_SERIAL_THREAD.
95-� Any threaded code that uses condition variables should always either support deferred
cancellation or disable cancellation. Remember that there are two distinct types of cancellation:
deferred and asynchronous. Code that deals with asynchronous cancellation is rare. In general it
is difficult or impossible to support asynchronous cancellation in any code that acquires resources
(including locking a mutex). Programmers can’t assume any function supports asynchronous
cancellation unless its documentation specifically says so. Therefore we do not need to worry
about asynchronous cancellation.
We could code barrier_wait to deal with deferred cancellation, but that raises difficult
questions. How, for example, will the barrier wait ever be satisfied if one of the threads has been
canceled? And if it won’t be satisfied, what happens to all the other threads that have already
waited (or are about to wait) on that barrier? There are various ways to answer these questions.
One would be for barrier_wait to record the thread identifiers of all threads waiting on the barrier,
and for any thread that’s canceled within the wait to cancel all other waiters.
Or we might handle cancellation by setting some special error flag and broadcasting the
condition variable, and modifying barrier wait to return a special error when awakened in that way.
However, it makes little sense to cancel one thread that’s using a barrier. We’re going to disallow it,
by disabling cancellation prior to the wait, and restoring the previous state of cancellation
afterward. This is the same approach taken by the proposed POSIX.lj standard, by the
way—barrier waits are not cancellation points.
95-103 If there are more threads that haven’t reached the barrier, we need to wait for them. We do
that by waiting on the condition variable until the barrier has advanced to the next cycle—that is,
the barrier’s cycle no longer matches the local copy.
barrier.c part 3 barrier_wait
Finally, barrier_main.c is a simple program that uses barriers. Each thread loops on
calculations within a private array.
35,47 At the beginning and end of each iteration, the threads, running function thread_routine, all
wait on a barrier to synchronize the operation.
56-61 At the end of each iteration, the “lead thread” (the one receiving a -1 result from barrier_wait)
will modify the data of all threads, preparing them for the next iteration. The others go directly to
the top of the loop and wait on the barrier at line 35.
barrier_main.c
7.1.2 Read/write locks
A read/write lock is a lot like a mutex. It is another way to prevent more than one thread from
modifying shared data at the same time. But unlike a mutex it distinguishes between reading data
and writing data. A mutex excludes all other threads, while a read/write lock allows more than one
thread to read the data, as long as none of them needs to change it.
Read/write locks are used to protect information that you need to read frequently but usually
don’t need to modify. For example, when you build a cache of recently accessed information,
many threads may simultaneously examine the cache without conflict. When a thread needs to
update the cache, it must have exclusive access.
When a thread locks a read/write lock, it chooses shared read access or exclusive write access.
A thread that wants read access can’t continue while any thread currently has write access. A
thread trying to gain write access can’t continue when another thread currently has either write
access or read access.
When both readers and writers are waiting for access at the same time, the readers are given
precedence when the write lock is released. Read precedence favors concurrency because it
potentially allows many threads to accomplish work simultaneously. Write precedence on the
other hand would ensure that pending modifications to the shared data are completed before the
data is used. There’s no absolute right or wrong policy, and if you don’t find the implementation
here appropriate for you, it is easy to change.
Figure 7.3 shows the operation of a read/write lock being used to synchronize three threads,
called thread 1, thread 2, and thread 3. The figure is a sort of timing diagram, with time increasing
from left to right. Each of the lines beginning at the labels in the upper left designates the behavior
of a specific thread—solid for thread 1, dotted for thread 2, and dashed for thread 3. When the
lines drop within the rounded rectangle, they are interacting with the read/write lock. If the line
drops below the center line, it shows that the thread has the read/write lock locked, either for
exclusive write or for shared read. Lines that hover above the center line represent threads waiting
for the lock.
FIGURE 7.3 Read/write lock operation
In this example, thread 1 locks the read/write lock for exclusive write. Thread 2 tries to lock
the read/write lock for shared read and, finding it already locked for exclusive write, blocks. When
thread 1 releases the lock, it awakens thread 2, which then succeeds in locking the read/write lock
for shared read. Thread 3 then tries to lock the read/write lock for shared read and, because the
read/write lock is already locked for shared read, it succeeds immediately. Thread 1 then tries to
lock the read/write lock again for exclusive write access, and blocks because the read/write lock is
already locked for read access. When thread 3 unlocks the read/write lock, it cannot awaken thread
1, because there is another reader. Only when thread 2 also unlocks the read/write lock, and the
lock becomes unlocked, can thread 1 be awakened to lock the read/write lock for exclusive write
access.
The header file rwlock.h and the C source file rwlock.c demonstrate an implementation of
read/write locks using standard Pthreads mutexes and condition variables. This is a portable
implementation that is relatively easy to understand. One could, of course, create a much more
efficient implementation for any specific system based on knowledge of non-portable hardware
and operating system characteristics.
The rest of this section shows the details of a read/write lock package. First, rwlock.h
describes the interfaces, and then rwlock.c provides the implementation. Part 1 shows the structure
of a read/write lock, represented by the type rwlock_t.
7-9 Of course, there’s a mutex to serialize access to the structure. We’ll use two separate condition
variables, one to wait for read access (called read) and one to wait for write access (called,
surprisingly, write).
10 The rwlock_t structure has a valid member to easily detect common usage errors, such as
trying to lock a read/write lock that hasn’t been initialized. The member is set to a magic number
when the read/write lock is initialized, just as in barrier_init.
11-12 To enable us to determine whether either condition variable has waiters, we’ll keep a count of
active readers (r_active) and a flag to indicate an active writer (w_active).
13-14 We also keep a count of the number of threads waiting for read access (r_wait) and for write
access (w_wait).
17 Finally, we need a “magic number” for our valid member. (See the footnote in Section 7.1.1
if you missed this part of the barrier example.)
rwlock.h part 1 rwlock_t
We could have saved some space and simplified the code by using a single condition variable,
with readers and writers waiting using separate predicate expressions. We will use one condition
variable for each predicate, because it is more efficient. This is a common trade-off. The main
consideration is that when two predicates share a condition variable, you must always wake them
using pthread_cond_broadcast, which would mean waking all waiters each time the read/write
lock is unlocked.
We keep track of a boolean variable for “writer active,” since there can only be one. There
are also counters for “readers active,” “readers waiting,” and “writers waiting.” We could get by
without counters for readers and writers waiting. All readers are awakened simultaneously using a
broadcast, so it doesn’t matter how many there are. Writers are awakened only if there are no
readers, so we could dispense with keeping track of whether there are any threads waiting to write
(at the cost of an occasional wasted condition variable signal when there are no waiters).
We count the number of threads waiting for read access because the condition variable waits
might be canceled. Without cancellation, we could use a simple flag—“threads are waiting for
read” or “no threads are waiting for read.” Each thread could set it before waiting, and we could
clear it before broadcasting to wake all waiting readers. However, because we can’t count the
threads waiting on a condition variable, we wouldn’t know whether to clear that flag when a
waiting reader was canceled. This information is critical, because if there are no readers waiting
when the read/write lock is unlocked, we must wake a writer—but we cannot wake a writer if
there are waiting readers. A count of waiting readers, which we can decrease when a waiter is
canceled, solves the problem.
The consequences of “getting it wrong” are less important for writers than for readers.
Because we check for readers first, we don’t really need to know whether there are writers. We
could signal a “potential writer” anytime the read/write lock was released with no waiting readers.
But counting waiting writers allows us to avoid a condition variable signal when no threads are
waiting.
Part 2 shows the rest of the definitions and the function prototypes.
4-6 The RWLOCK_INITIALIZER macro allows you to statically initialize a read/write lock.
11-18 Of course, you must also be able to initialize a read/write lock that you cannot allocate
statically, so we provide rwl_init to initialize dynamically, and rwl_destroy to destroy a read/write
lock once you’re done with it. In addition, there are functions to lock and unlock the read/write
lock for either read or write access. You can “try to lock” a read/write lock, either for read or write
access, by calling rwl_readtrylock or rwl_writetrylock., just as you can try to lock a mutex by
calling pthread_mutex_trylock.
rwlock.h part 2 interfaces
The file rwlock.c contains the implementation of read/write locks. The following examples
break down each of the functions used to implement the rwlock.h interfaces.
Part 1 shows rwl_init, which initializes a read/write lock. It initializes the Pthreads
synchronization objects, initializes the counters and flags, and finally sets the valid sentinel to
make the read/write lock recognizable to the other interfaces. If we are unable to initialize the read
condition variable, we destroy the mutex that we’d already created. Similarly, if we are unable to
initialize the write condition variable, we destroy both the mutex and the read condition variable.
rwlock.c part 1 rwl_init
Part 2 shows the rwl_destroy function, which destroys a read/write lock.
8-9 We first try to verify that the read/write lock was properly initialized by checking the valid
member. This is not a complete protection against incorrect usage, but it is cheap, and it will catch
some of the most common errors. See the annotation for barrier.c, part 2, for more about how the
valid member is used.
10-30 Check whether the read/write lock is in use. We look for threads that are using or waiting for
either read or write access. Using two separate if statements makes the test slightly more readable,
though there’s no other benefit.
36-39 As in barrier_destroy, we destroy all Pthreads synchronization objects, and store each status
return. If any of the destruction calls fails, returning a nonzero value, rwl_destroy will return that
status, and if they all succeed it will return 0 for success.
rwlock.c part 2 rwl_destroy
Part 3 shows the code for rwl_readcleanup and rwl_writecleanup, two cancellation cleanup
handlers used in locking the read/write lock for read and write access, respectively. As you may
infer from this, read/write locks, unlike barriers, are cancellation points. When a wait is canceled,
the waiter needs to decrease the count of threads waiting for either a read or write lock, as
appropriate, and unlock the mutex.
rwlock.c part 3 cleanuphandlers
Part 4 shows rwl_readlock, which locks a read/write lock for read access. If a writer is
currently active (w_active is nonzero), we wait for it to broadcast the read condition variable. The
r_wait member counts the number of threads waiting to read. This could be a simple boolean
variable, except for one problem—when a waiter is canceled, we need to know whether there are
any remaining waiters. Maintaining a count makes this easy, since the cleanup handler only needs
to decrease the count.
This is one of the places where the code must be changed to convert our read/write lock from
“reader preference” to “writer preference,” should you choose to do that. To implement writer
preference, a reader must block while there are waiting writers (w_wait > 0), not merely while
there are active writers, as we do here.
15,21 Notice the use of the cleanup handler around the condition wait. Also, notice that we pass the
argument 0 to pthread_cleanup_pop so that the cleanup code is called only if the wait is canceled.
We need to perform slightly different actions when the wait is not canceled. If the wait is not
canceled, we need to increase the count of active readers before unlocking the mutex.
rwlock.c part 4 rwl_readlock
Part 5 shows rwl_readtrylock. This function is nearly identical to rwl_readlock, except that,
instead of waiting for access if a writer is active, it returns EBUSY. It doesn’t need a cleanup
handler, and has no need to increase the count of waiting readers.
This function must also be modified to implement “writer preference” read/write locks, by
returning EBUSY when a writer is waiting, not just when a writer is active.
rwlock.c part 5 rwl_readtrylock
13 Part 6 shows rwl_readunlock. This function essentially reverses the effect of rwl_readlock or
rwl_tryreadlock, by decreasing the count of active readers (r_active).
14-15 If there are no more active readers, and at least one thread is waiting for write access, signal
the write condition variable to unblock one. Note that there is a race here, and whether you should
be concerned about it depends on your notion of what should happen. If another thread that is
interested in read access calls rwl_readlock or rwl_tryreadlock before the awakened writer can run,
the reader may “win,” despite the fact that we just selected a writer.
Because our version of read/write locks has “reader preference,” this is what we usually want
to happen—the writer will determine that it has failed and will resume waiting. (It received a
spurious wakeup.) If the implementation changes to prefer writers, the spurious wakeup will not
occur, because the potential reader would have to block. The waiter we just unblocked cannot
decrease w_wait until it actually claims the lock.
�r lock.c part 6 rwl_readunlock
13 Part 7 shows rwl_writelock. This function is much like rwl_readlock, except for the predicate
condition on the condition variable wait. In part l, I explained that, to convert from “preferred
read” to ‘preferred write,” a potential reader would have to wait until there were no active or
waiting writers, whereas currently it waits only for active writers. The predicate in rwl_writelock
is the converse of that condition. Because we support “preferred read,” in theory, we must wait
here if there are any active or waiting readers. In fact, it is a bit simpler because if there are any
active readers, there cannot be any waiting readers—the whole point of a read/write lock is that
multiple threads can have read access at the same time. On the other hand, we do have to wait if
there are any active writers, because we allow only one writer at a time.
25 Unlike r_active, which is a counter, w_active is treated as a boolean. Or is it a counter?
There’s really no semantic difference, since the value of 1 can be con- sidered a boolean TRUE or
a count of 1—there can be only one active writer at any time.
rwlock.c part 7 rwl_ writelock
Part 8 shows rwl_writetrylock. This function is much like rwl_writelock, except that it
returns EBUSY if the read/wri �te lock is currently in us (either by a reader or by a writer) rather
than waiting for it to become free.
rwlock.c part 8 rwl_writetrylock
Finally, part 9 shows rwl_writeunlock. This function is called by a thread with a write lock, to
release the lock.
13-19 When a writer releases the read/write lock, it is always free; if there are any threads waiting
for access, we must wake one. Because we implement “preferred read” access, we first look for
threads that are waiting for read access. If there are any, we broadcast the read condition variable
to wake them all.
20-26 If there were no waiting readers, but there are one or more waiting writers, wake one of them
by signaling the write condition variable.
To implement a “preferred write” lock, you would reverse the two tests, waking a waiting
writer, if any, before looking for waiting readers.
rwlock.c part 9 rwl_writeunlock
Now that we have all the pieces, rwlock_main.c shows a program that uses read/write locks.
11-17 Each thread is described by a structure of type thread_t. The thread_num member is the
thread’s index within the array of thread t structures. The thread_id member is the pthread_t
(thread identifier) returned by pthread_create when the thread was created. The updates and reads
members are counts of the number of read lock and write lock operations performed by the thread.
The interval member is generated randomly as each thread is created, to determine how many
iterations the thread will read before it performs a write.
22-26 The threads cycle through an array of data_t elements. Each element has a read/write lock, a
data element, and a count of how many times some thread has updated the element.
48-56 The program creates a set of threads running the thread_routine function. Each thread loops
ITERATIONS times, practicing use of the read/write lock. It cycles through the array of data
elements in sequence, resetting the index (element) to 0 when it reaches the end. At intervals
specified by each thread’s interval member, the thread will modify the current data element instead
of reading it. The thread locks the read/write lock for write access, stores its thread_num as the
new data value, and increases the updates counter.
59-73 On all other iterations, thread_routine reads the current data element, locking the read/write
lock for read access. It compares the data value against its thread_num to determine whether it
was the most recent thread to update that data element, and, if so, it increments a counter.
95-103 On Solaris systems, increase the thread concurrency level to generate more interesting
activity. Without time-slicing of user threads, each thread would tend to execute sequentially
otherwise.
rwlock_main.c
7.2 Work queue manager
I’ve already briefly outlined the various models of thread cooperation. These include
pipelines, work crews, client/servers, and so forth. In this section, I present the development of a
“work queue,” a set of threads that accepts work requests from a common queue, processing them
(potentially) in parallel.
The work queue manager could also be considered a work crew manager, depending on your
reference point. If you think of it as a way to feed work to a set of threads, then “work crew”
might be more appropriate. I prefer to think of it as a queue that magically does work for you in
the background, since the presence of the work crew is almost completely invisible to the caller.
When you create the work queue, you can specify the maximum level of parallelism that you
need. The work queue manager interprets that as the maximum number of “engine” threads that it
may create to process your requests. Threads will be started and stopped as required by the
amount of work. A thread that finds nothing to do will wait a short time and then terminate. The
optimal “short time” depends on how expensive it is to create a new thread on your system, the
cost in system resources to keep a thread going that’s not doing anything, and how likely it is that
you’ll need the thread again soon. I’ve chosen two seconds, which is probably much too long.
The header file workq.h and the C source file workq.c demonstrate an implementation of a
work queue manager. Part 1 shows the two structure types used by the work queue package. The
workq_t type is the external representation of a work queue, and the workq_ele_t is an internal
representation of work items that have been queued.
6-9 The workq_ele_t structure is used to maintain a linked list of work items. It has a link
element (called next) and a data value, which is stored when the work item is queued and passed
to the caller's "engine function" with no interpretation.
14-16 Of course, there's a mutex to serialize access to the workq_t, and a condition variable (cv) on
which the engine threads wait for work to be queued.
17 The attr member is a thread attributes object, used when creating new engine threads. The
attributes object could instead have been a static variable within workq.c, but I chose to add a little
memory overhead to each work queue, rather than add the minor complexity of one-time
initialization of a static data item.
18 The first member points to the first item on the work queue. As an optimization to make it
easier to queue new items at the end of the queue, the last member points to the last item on the
queue.
19-24 These members record assorted information about the work queue. The valid member is a
magic number that's set when the work queue is initialized, as we've seen before in barriers and
read/write locks. (In this case, the magic number is the month and year of my daughter's birthday.)
The quit member is a flag that allows the "work queue manager" to tell engine threads to terminate
as soon as the queue is empty. The parallelism member records how many threads the creator
chose to allow the work queue to utilize, counter records the number of threads created, and idle
records the current number of threads that are waiting for work. The engine member is the user's
"engine function," supplied when the work queue was created. As you can see, the engine function
takes an "untyped" (void *) argument, and has no return value.
workq.h part 1 workq_t
Part 2 shows the interfaces we'll create for our work queue. We need to create and destroy
work queue managers, so we'll define workq_init and workq_destroy. Both take a pointer to a
workq_t structure. In addition, the initializer needs the maximum number of threads the manager
is allowed to create to service the queue, and the engine function. Finally, the program needs to be
able to queue work items for processing--we'll call the interface for this workq_add. It takes a
pointer to the workq_t and the argument that should be passed to the engine function.
workq.h part 2 interfaces
The file workq.c contains the implementation of our work queue. The following examples
break down each of the functions used to implement the workq.h interfaces.
Part 1 shows the workq_init function, which initializes a work queue. We create the Pthreads
synchronization objects that we need, and fill in the remaining members.
14-22 Initialize the thread attributes object attr so that the engine threads we create will run
detached. That means we do not need to keep track of their thread identifier values, or worry about
joining with them.
34-40 We're not ready to quit yet (we've hardly started!), so clear the quit flag. The parallelism
member records the maximum number of threads we are allowed to create, which is the
workq_init parameter threads. The counter member will record the current number of active
engine threads, initially 0, and idle will record the number of active threads waiting for more work.
And of course, finally, we set the valid member.
workq.c part 1 workq_init
Part 2 shows the workq_destroy function. The procedure for shutting down a work queue is a
little different than the others we've seen. Remember that the Pthreads mutex and condition
variable destroy function fail, returning EBUSY, when you try to destroy an object that is in use.
We used the same model for barriers and read/write locks. But we cannot do the same for work
queues--the calling program cannot know whether the work queue is in use, because the caller
only queues requests that are processed asynchronously.
The work queue manager will accept a request to shut down at any time, but it will wait for
all existing engine threads to complete their work and terminate. Only when the last work queue
element has been processed and the last engine thread has exited will workq_destroy return
successfully.
24 If the work queue has no threads, either it was never used or all threads have timed out and
shut down since it was last used. That makes things easy, and we can skip all the shutdown
complication.
25-33 If there are engine threads, they are asked to shut down by setting the quit flag in the workq_t
structure and broadcasting the condition variable to awaken any waiting (idle) engine threads.
Each engine thread will eventually run and see this flag. When they see it and find no more work,
they'll shut themselves down.
44-50 The last thread to shut down will wake up the thread that's waiting in workq_destroy, and the
shutdown will complete. Instead of creating a condition variable that's used only to wake up
workq_destroy, the last thread will signal the same condition variable used to inform idle engine
threads of new work. At this point, all waiters have already been awakened by a broadcast, and
they won't wait again because the quit flag is set. Shutdown occurs only once during the life of the
work queue manager, so there's little point to creating a separate condition variable for this
purpose.
workq.c part 2 workq_destroy
Part 3 shows workq_add, which accepts work for the queue manager system.
16-35 It allocates a new work queue element and initializes it from the parameters. It queues the
element, updating the first and last pointers as necessary.
40-45 If there are idle engine threads, which were created but ran out of work, signal the condition
variable to wake one.
46-59 If there are no idle engine threads, and the value of parallelism allows for more, create a new
engine thread. If there are no idle threads and it can't create a new engine thread, workq_add
returns, leaving the new element for the next thread that finishes its current assignment.
workq.c part 3 workq_add
That takes care of all the external interfaces, but we will need one more function, the start
function for the engine threads. The function, shown in part 4, is called workq_server. Although
we could start a thread running the caller's engine with the appropriate argument for each request,
this is more efficient. The workq_server function will dequeue the next request and pass it to the
engine function, then look for new work. It will wait if necessary and shut down only when a
certain period of time passes without any new work appearing, or when told to shut down by
workq_destroy.
Notice that the server begins by locking the work queue mutex, and the "matching" unlock
does not occur until the engine thread is ready to terminate. Despite this, the thread spends most of
its life with the mutex unlocked, either waiting for work in the condition variable wait or within
the caller's engine function.
29-62 When a thread completes the condition wait loop, either there is work to be done or the work
queue is shutting down (wq->quit is nonzero).
67-80 First, we check for work and process the work queue element if there is one. There could still
be work queued when workq_destroy is called, and it must all be processed before any engine
thread terminates.
The user's engine function is called with the mutex unlocked, so that the user's engine can run
a long time, or block, without affecting the execution of other engine threads. That does not
necessarily mean that engine functions can run in parallel--the caller-supplied engine function is
responsible for ensuring whatever synchronization is needed to allow the desired level of
concurrency or parallelism. Ideal engine functions would require little or no synchronization and
would run in parallel.
86-104 When there is no more work and the queue is being shut down, the thread terminates,
awakening workq_destroy if this was the last engine thread to shut down.
110-114 Finally we check whether the engine thread timed out looking for work, which would mean
the engine has waited long enough. If there's still no work to be found, the engine thread exits.
workq.c part 4 workq_server
Finally, workq_main.c is a sample program that uses our work queue manager. Two threads
queue work elements to the work queue in parallel. The engine function is designed to gather
some statistics about engine usage. To accomplish this, it uses thread-specific data. When the
sample run completes, main collects all of the thread-specific data and reports some statistics.
15-19 Each engine thread has an engine_t structure associated with the thread-specific data key
engine_key. The engine function gets the calling thread's value of this key, and if the current value
is NULL, creates a new engine t structure and assigns it to the key. The calls member of engine_t
structure-records the number of calls to the engine function within each thread.
29-37 The thread-specific data key's destructor function, destructor, adds the terminating thread's
engine_t to a list (engine_list_head), where main can find it later to generate the final report.
43-68 The engine function's work is relatively boring. The argument is a pointer to a power_t
structure, containing the members value and power. It uses a trivial loop to multiply value by itself
power times. The result is discarded in this example, and the power_t structure is freed.
73-98 A thread is started, by main, running the thread_routine function. In addition, main calls
thread_routine. The thread_routine function loops for some number of iterations, determined by
the macro ITERATIONS, creating and queuing work queue elements. The value and power
members of the power t structure are determined semi-randomly using rand_r. The function sleeps
for a random period of time, from zero to four seconds, to occasionally allow engine threads to
time out and terminate. Typically when you run this program you would expect to see summary
messages reporting some small number of engine threads, each of which processed some number
of calls--which total 50 calls (25 each from the two threads).
workq_main. c
7.3 But what about existing libraries? "The great art of riding, as I was saying is--
to keep your balance properly. Like this, you know--"
He let go the bridle, and stretched out both his arms to
show Alice what he meant, and this time he fell flat on
his back, right under the horse's feet.
--Lewis Carroll, Through the Looking-Glass
When you create a new library, all it takes is careful design to ensure that the library will be
thread-safe. As you decide what state is needed for the function, you can determine which state
needs to be shared between threads, which state should be managed by the caller through external
context handles, which state can be kept in local variables within a function, and so forth. You can
define the interfaces to the functions to support that state in the most efficient manner. But when
you're modifying an existing library to work with threads, you usually don't have that luxury. And
when you are using someone else's library, you may need simply to "make do."
7.3.1 Modifying libraries to be thread-safe
Many functions rely on static storage across a sequence of calls, for example, strtok or
getpwd. Others depend on returning a pointer to static storage, for example, asctime. This section
points out some techniques that can help when you need to make "legacy" libraries thread-safe,
using some well-known examples in the ANSI C run-time library.
The simplest technique is to assign a mutex to each subsystem. At any call into the subsystem
you lock the mutex; at any exit from the subsystem you unlock the mutex. Because this single
mutex covers the entire subsystem, we often refer to such a mechanism as a "big mutex" (see
Section 3.2.4). The mutex prevents more than one thread from executing within the subsystem at a
time. Note that this fixes only synchronization races, not sequence races (Section 8.1.2 describes
the distinction between the two). The best candidates for this approach are functions that do little
except maintain some internal database. That includes functions such as malloc and free that
manage an internal resource pool but grant limited (or no) external visibility into that pool.
One problem with using the "big mutex" approach is that you have to be careful about your
definition of "subsystem." You need to include all functions that share data or that call each other.
If malloc and free have one mutex while realloc uses another, then you've got a race as soon as
one thread calls realloc while another thread is in malloc or free.
And what if realloc is implemented to call malloc, copy data, and then call free on the old
pointer? The realloc function would lock the heap mutex and call malloc. The malloc function
would immediately try to lock the heap mutex itself, resulting in a deadlock. There are several
ways to solve this. One is to carefully separate each of the external interfaces into an internal
"engine" function that does the actual work and an external entry point that locks the subsystem
mutex and calls the engine. Other entry points within the subsystem that need the same engine
function would call it directly rather than using the normal entry point. That's often the most
efficient solution, but it is also harder to do. Another possibility is to construct a "recursive" mutex
that allows the subsystem to relock its own mutex without deadlock.* Now malloc and free are
allowed to relock the mutex held by realloc? but another thread trying to call any of them will be
blocked until realloc completely unlocks the recursive mutex.
* It is easy to construct a "recursive" mutex using a mutex, a condition variable, the pthread_t
value of the current owner (if any), and a count of the owner's "recursion depth." The depth is 0
when the recursive mutex is not locked, and greater than 0 when it is locked. The mutex protects
access to the depth and owner members, and the condition variable is used to wait for the depth to
become 0, should a thread wish to lock the recursive mutex while another thread has it locked.
Most functions with persistent state require more substantial changes than just a "big mutex,"
especially to avoid altering the interface. The asctime function, for example, returns a pointer to
the character string representation of a binary time. Traditionally, the string is formatted into a
static buffer declared within the asctime function, and the function returns a pointer to that buffer.
Locking a mutex within asctime isn't enough to protect the data. In fact, it is not even
particularly useful. After asctime returns, the mutex has been unlocked. The caller needs to read
the buffer, and there is nothing to prevent another thread from calling asctime (and "corrupting"
the first thread's result) before the first thread has finished reading or copying it. To solve this
problem using a mutex, the caller would need to lock a mutex before calling asctime, and then
unlock it only after it had finished with the data or copied the returned buffer somewhere "safe."
The problem can instead be fixed by recoding asctime to allocate a heap buffer using malloc,
formatting the time string into that buffer, and returning its address. The function can use a
thread-specific data key to keep track of the heap address so that it can be reused on the next call
within that thread. When the thread terminates, a destructor function can free the storage.
It would be more efficient to avoid using malloc and thread-specific data, but that requires
changing the interface to asctime. Pthreads adds a new thread-safe alternative to asctime, called
asctime_r, which requires the caller to pass the address and length of a buffer. The asctime_r
function formats the time string into the caller's buffer. This allows the caller to manage the buffer
in any way that's convenient. It can be on the thread's stack, in heap, or can even be shared
between threads. Although in a way this is "giving up" on the existing function and defining a new
function, it is often the best way (and sometimes the only practical way) to make a function
thread-safe.
7.3.2 Living with legacy libraries Sometimes you have to work with code you didn't write, and can't change. A lot of code is
now being made thread-safe, and most operating systems that support threads can be expected to
supply thread-safe implementations of the common bundled library packages. The "inner circle"
of thread-safe libraries will gradually increase to become the rule rather than the exception as
more application and library developers demand thread-safety.
But inevitably you'll find that a library you need hasn't been made thread-safe, for example,
an older version of the X Windows windowing system, or a database engine, or a simulation
package. And you won't have source code. Of course you'll immediately complain to the supplier
of the library and convince them to make the next version fully thread-safe. But what can you do
until the new version arrives?
If you really need the library, the answer is "use it anyway." There are a number of techniques
you can use, from simple to complex. The appropriate level of complexity required depends
entirely on the library's interface and how (as well as how much) you use the library in your code.
| Make the unsafe library into a server thread.
In some cases, you may find it convenient to restrict use of the library to one thread, making
that thread a “server" for the capabilities provided by the unsafe library. This technique is
commonly applied, for example, when using versions of the X11 protocol client library that are
not thread-safe. The main thread or some other thread created for the purpose processes queued
X11 requests on behalf of other threads. Only the server thread makes calls into the X11 library, so
it does not matter whether X11 is thread-safe.
| Write your own "big mutex" wrappers around the interfaces.
If the function you need has a "thread-safe interface" but not a "thread-safe implementation,"
then you may be able to encapsulate each call inside a wrapper function (or a macro) that locks a
mutex, calls the function, and then unlocks the mutex. This is just an external version of the "big
mutex" approach. By “thread-safe interface" I mean that the function relies on the static state, but
that any data returned to the caller isn't subject to alteration by later calls. For example, malloc fits
that category. The allocation of memory involves static data that needs to be protected, but once a
block has been allocated and returned to a caller, that address (and the memory to which it points)
will not be affected by later calls to malloc. The external "big mutex" is not a good solution for
libraries that may block for substantial periods of time---like X11 or any other network protocol.
While the result may be safe, it will be very inefficient unless you rarely use the library, because
other threads may be locked out for long periods of time while remote operations are taking place.
| Extend the implementation with external state.
A big mutex won't fix a function like asctime that writes data into a static buffer and returns
the address: The returned data must be protected until the caller is finished using it, and the data is
used outside the wrapper. For a function like strtok the data is in use until the entire sequence of
tokens has been parsed. In general, functions that have persistent static data are more difficult to
encapsulate.
A function like as asctime can be encapsulated by creating a wrapper function that locks a
mutex, calls the function, copies the return value into a thread-safe buffer, unlocks the mutex, and
then returns. The thread-safe buffer can be dynamically allocated by the wrapper function using
realloc, for instance. You can require the caller to free the buffer when done, which changes the
interface, or you can make the wrapper keep track of a per-thread buffer using thread-specific
data.
Alternatively, you could invent a new interface that requires the caller to supply a buffer. The
caller can use a stack buffer, or a buffer in heap, or, if properly synchronized (by the caller), it can
share the buffer between threads. Remember that if the wrapper uses thread-specific data to keep
track of a per-thread heap buffer, the wrapper can be made compatible with the original interface.
The other variants require interface changes: The caller must supply different inputs or it must be
aware of the need to free the returned buffer.
A function that keeps persistent state across a sequence of calls is more difficult to
encapsulate neatly. The static data must be protected throughout. The easiest way to do this is
simply to change the caller to lock a mutex before the first call and keep it locked until after the
final call of a sequence. But remember that no other thread can use the function until the mutex is
unlocked. If the caller does a substantial amount of processing between calls, a major processing
bottleneck can occur. Of course, this may also be difficult or impossible to integrate into a simple
wrapper--the wrapper would have to be able to recognize the first and last of any series of calls.
A better, but harder, way is to find some way to encapsulate the function (or a set of related
functions) into a new thread-safe interface. There is no general model for this transformation, and
in many cases it may be impossible. But often you just need to be creative, and possibly apply
some constraints. While the library function may not be easy to encapsulate, you may be able to
encapsulate "special cases" that you use. While strtok, for example, allows you to alter the token
delimiters at each call, most code does not take advantage of this flexibility. Without the
complication of varying delimiters, you could define a new token parsing model on top of strtok
where all tokens in a string are found by a thread-safe setup function and stored where they can be
retrieved one by one without calling strtok again. Thus, while the setup function would lock a
common mutex and serialize access across all threads, the information retrieval function could run
without any serialization.
8 Hints to avoid debugging "Other maps are such shapes, with their islands and capes!
But we've got our brave Captain to thank"
(So the crew would protest) "that he's bought us the best m
A perfect and absolute blank!"
--Lewis Carroll, The Hunting of the Snark
Writing a complicated threaded program is a lot harder than writing a simple synchronous
program, but once you learn the rules it is not much harder than writing a complicated
synchronous program. Writing a threaded program to perform a complicated asynchronous
function will usually be easier than writing the same program using more traditional asynchronous
programming techniques.
The complications begin when you need to debug or analyze your threaded program. That's
not so much because using threads is hard, but rather because the tools for debugging and
analyzing threaded code are less well developed and understood than the programming interfaces.
You may feel as if you are navigating from a blank map. That doesn't mean you can't utilize the
power of threaded programming right now, but it does mean that you need to be careful, and
maybe a little more creative, in avoiding the rocks and shoals of the uncharted waters.
Although this chapter mentions some thread debugging and analysis tools and suggests what
you can accomplish with them, my goal isn't to tell you about tools you can use to solve problems.
Instead, I will describe some of the common problems that you may encounter and impart
something resembling "sage advice" on avoiding those problems before you have to debug
them--or, perhaps more realistically, how to recognize which problems you may be encountering.
| Check your assumptions at the door.
Threaded programming is probably new to you. Asynchronous programming may be new to
you. If so, you'll need to be careful about your assumptions. You've crossed a bridge, and behavior
that's acceptable--or even required--in Synchronous Land can be dangerous across the river in
Asynchronous Land. You can learn the new rules without a lot of trouble, and with practice you'll
probably even feel comfortable with them. But you have to start by being constantly aware that
something's changed.
8.1 Avoiding incorrect code
"For instance, now," she went on, sticking a large piece of plaster on her fin-
get as she spoke, "there's the King's Messenger. He's in prison now,
being punished: and the trial doesn't even begin till next Wednesday:
and of course the crime comes last of all."
"Suppose he never commits the crime?" said Alice.
"That would be all the better, wouldn't it?" the Queen said, as she bound the
plaster round her finger with a bit of ribbon.
--Lewis Carroll, Through the Looking-Glass
Pthreads doesn't provide much assistance in debugging your threaded code. That is not
surprising, since POSIX does not recognize the concept of debugging at all, even in explaining
why the nearly universal SIGTRAP signal is not included in the standard. There is no standard
way to interact with your program or observe its behavior as it runs, although every threaded
system will provide some form of debugging tool. Even in the unlikely event that the developers
of the system had no concern for you, the poor programmer, they needed to debug their own code.
A vendor that provides threads with an operating system will provide at least a basic thread
"observation window" in a debugging utility. You should expect at minimum the ability to display
a list of the running threads and their current state, the state of mutexes and condition variables,
and the stack trace of all threads. You should also be able to set breakpoints in specified threads
and specify a "current thread" to examine registers, variables, and stack traces.
Because implementations of Pthreads are likely to maintain a lot of state in user mode, within
the process, debugging using traditional UNIX mechanisms such as ptrace or the proc file system
can be difficult. A common solution is to provide a special library that is called by the debugger,
which knows how to search through the address space of the process being debugged to find the
state of threads and synchronization objects. Solaris, for example, provides the libthread_db.so
shared library, and Digital UNIX provides libpthreaddebug.so.
A thread package placed on top of an operating system by a third party will not be able to
provide much integration with a debugger. For example, the portable "DCE threads" library
provides a built-in debug command parser that you can invoke from the debugger using the print
or call command to report the state of threads and synchronization objects within the process.*
This limited debugging support is at best inconvenient--you can't analyze thread state within a
core file after a program has failed, and it cannot understand (or report) the symbolic names of
program variables.
* For historical reasons, the function is called cma_debug. Should you find yourself stuck with
DCE threads code, try calling it, and enter the help command for a list of additional commands.
The following sections describe some of the most common classes of threaded programming
errors, with the intention of helping you to avoid these problems while designing, as well as
possibly making it easier to recognize them while debugging.
8.1.1 Avoid relying on "thread inertia" Always, always, remember that threads are asynchronous. That's especially important to keep
in mind when you develop code on uniprocessor systems where threads may be "slightly
synchronous." Nothing happens simultaneously on a uniprocessor, where ready threads are serially
timesliced at relatively predictable intervals. When you create a new thread on a uniprocessor or
unblock a thread waiting for a mutex or condition variable, it cannot run immediately unless it has
a higher priority than the creator or waker.
The same phenomenon may occur even on a multiprocessor, if you have reached the
"concurrency limit" of the process, for example, when you have more ready threads than there are
processors. The creator, or the thread waking another thread, given equal priority, will continue
running until it blocks or until the next timeslice (which may be many nanoseconds away).
This means that the thread that currently has a processor has an advantage. It tends to remain
in motion, exhibiting behavior vaguely akin to physical inertia. As a result, you may get away with
errors that will cause your code to break in mysterious ways when the newly created or awakened
thread is able to run immediately--when there are free processors. The following program,
inertia.c, demonstrates how this phenomenon can disrupt your program.
27-41 The question is one of whether the thread function printer_thread will see the value of
stringPtr that was set before the call to pthread_create, or the value set after the call to
pthread_create. The desired value is "After value." This is a very common class of programming
error. Of course, in most cases the problem is less obvious than in this simple example. Often, the
variable is uninitialized, not set to some benign value, and the result may be data corruption or a
segmentation fault.
39 Now, notice the delay loop. Even on a multiprocessor, this program won't break all the time.
The program will usually be able to change stringPtr before the new thread can begin executing--it
takes time for a newly created thread to get into your code, after all, and the "window of
opportunity" in this particular program is only a few instructions. The loop allows me to
demonstrate the problem by delaying the main thread long enough to give the printer thread time
to start. If you make this loop long enough, you will see the problem even on a uniprocessor, if
main is eventually timesliced.
inertia.c
The way to fix inertia.c is to set the "After value," the one you want the threads to see, before
creating the thread. That's not so hard, is it? There may still be a "Before value," whether it is
uninitialized storage or a value that was previously used for some other purpose, but the thread
you create can't see it. By the memory visibility rules given in Section 3.4, the new thread sees all
memory writes that occurred prior to the call into pthread_create. Always design your code so that
threads aren't started until after all the resources they need have been created and initialized
exactly the way you want the thread to see them.
| Never assume that a thread you create will wait for you.
You can cause yourself as many problems by assuming a thread will run "soon" as by
assuming it won't run "too soon." Creating a thread that relies on "temporary storage" in the
creator thread is almost always a bad idea. I have seen code that creates a series of threads, passing
a pointer to the same local structure to each, changing the structure member values each time. The
problem is that you can't assume threads will start in any specific order. All of those threads may
start after your last creation call, in which case they all get the last value of the data. Or the threads
might start a little bit out of order, so that the first and second thread get the same data, but the
others get what you intended them to get.
Thread inertia is a special case of thread races. Although thread races are covered much more
extensively in Section 8.1.2, thread inertia is a subtle effect, and many people do not recognize it
as a race. So test your code thoroughly on a multiprocessor, if at all possible. Do this as early as
possible during development, and continuously throughout development. Do this despite the fact
that, especially without a perfect threaded debugger, testing on a multiprocessor will be more
difficult than debugging on a uniprocessor. And, of course, you should carefully read the
following section.
8.1.2 Never bet your mortgage on a thread race
A race occurs when two or more threads try to get someplace or do something at the same
time. Only one can win. Which thread wins is determined by a lot of factors, not all of which are
under your control. The outcome may be affected by how many processors are on the system, how
many other processes are running, how much network overhead the system is handling, and other
things like that. That's a nondeterministic race. It probably won't come out the same if you run the
same program twice in a row. You don't want to bet on races like that.*
| When you write threaded code, assume that at any arbitrary point,
| within any statement of your program, each thread may go to deep for
| an unbounded period of time.
* My daughter had this figured out by the time she was three--when she wanted to race, she
told me ahead of time whether my job was to win or lose. There's really no point to leaving these
important things to chance!
Processors may execute your threads at differing rates, depending on processor load,
interrupts, and so forth. Time-slicing on a processor may interrupt a thread at any point for an
unspecified duration. During the time that a thread isn't running, any other thread may run and do
anything that synchronization protocols in your code don't specifically prevent it from doing,
which means that between any two instructions a thread may find an entirely different picture of
memory, with an entirely different set of threads active. The way to protect a thread's view of the
world from surprises is to rely only on explicit synchronization between threads.
Most synchronization problems will probably show up pretty quickly if you're debugging on
a multiprocessor. Threads with insufficient synchronization will compete for the honor of reaching
memory last. It is a minor irony of thread races that the "loser" generally wins because the
memory system will keep the last value written to an address. Sometimes, you won't notice a race
at all. But sometimes you'll get a mystifying wrong result, and sometimes you'll get a
segmentation fault.
Races are usually difficult to diagnose. The problem often won't occur at all on a
uniprocessor system because races require concurrent execution. The level of concurrency on a
uniprocessor, even with time-slicing, is fairly low, and often an unsynchronized sequence of writes
will complete before another thread gets a chance to read the inconsistent data. Even on a
multiprocessor, races may be difficult to reproduce, and they often refuse to reveal themselves to a
debugger. Races depend on the relative timing of thread execution--something a debugger is likely
to change.
Some races have more to do with memory visibility than with synchronization of multiple
writes. Remember the basic rules of memory visibility (see Section 3.4): A thread can always see
changes to memory that were performed by a thread previously running on the same processor. On
a uniprocessor all threads run on the same processor, which makes it difficult to detect memory
visibility problems during debugging. On a multiprocessor, you may see visibility races only when
the threads are scheduled on different processors while executing specific vulnerable sections of
code.
| No ordering exists between threads
| unless you cause ordering.
| Bill Gallmeister's corollary:
| "Threads will run in the most evil order possible."
You don't want to find yourself debugging thread races. You may never see the same outcome
twice. The symptoms will change when you try to debug the code--possibly by masquerading as
an entirely different kind of problem, not just as the same problem in a different place. Even worse,
the problem may never occur at all until a customer runs the code, and then it may fail every time,
but only in the customer's immense, monolithic application, and only after it has been running for
days. It will be running on a secured system with no network access, they will be unable to show
you the proprietary code, and will be unable to reproduce the problem with a simple test program.
| "Scheduling" is not the same as "synchronization."
It may appear at first that setting a thread to the SCHED_FIFO scheduling policy and
maximum priority would allow you to avoid using expensive synchronization mechanisms by
guaranteeing that no other thread can run until the thread blocks itself or lowers its priority. There
are several problems with this, but the main problem is that it won't work on a multiprocessor. The
SCHED_FIFO policy prevents preemption by another thread, but on a multiprocessor other
threads can run without any form of preemption.
Scheduling exists to tell the system how important a specific job (thread) is to your
application so it can schedule the job you need the most. Synchronization exists to tell the system
that no other thread can be allowed into the critical section until the calling thread is done.
In real life, a deterministic race, where the winner is guaranteed from the beginning, isn't
very exciting (except to a three year old). But a deterministic race represents a substantially safer
bet, and that's the kind of race you want to design into your programs. A deterministic race, as you
can guess, isn't much of a race at all. It is more like waiting in line--nice, organized, and
predictable. Excitement is overrated, especially when it comes to debugging complicated threaded
applications.
The simplest form of race is when more than one thread tries to write shared state without
proper synchronization, for example, when two threads increment a shared counter. The two
threads may fetch the same value from memory, increment it independently, and store the same
result into memory; the counter has been incremented by one rather than by two, and both threads
have the same result value.
A slightly more subtle race occurs when one thread is writing some set of shared data while
another thread reads that data. If the reads occur in a different order, or if the reader catches up to
the writer, then the reader may get inconsistent results. For example, one thread increments a
shared array index and then writes data into the array element at that index. Another thread fetches
the shared index before the writer has filled in the entire element and reads that element. The
reader finds inconsistent data because the element hasn't been completely set up yet. It may take
an unexpected code path because of something it sees there or it may follow a bad pointer.
Always design and code assuming that threads are more asynchronous than you can imagine.
Anyone who's written a lot of code knows that computers have little creatures that enjoy annoying
you. Remember that when you code with threads there are lots of them loose at the same time.
Take no chances, make no assumptions. Make sure any shared state is set up and visible before
creating the thread that will use it; or create it using static mutexes or pthread_once. Use a mutex
to ensure that threads can't read inconsistent data. If you must share stack data between threads, be
sure all threads that use the data have terminated before returning from the function that allocated
the storage.
"Sequence races" may occur when you assume some ordering of events, but that ordering
isn't coded into the application. Sequence races can occur even when you carefully apply
synchronization control to ensure data consistency. You can only avoid this kind of race by
ensuring that ordering isn't important, or by adding code that forces everything to happen in the
order it needs to happen.
For example, imagine that three threads share a counter variable. Each will store a private
copy of the current value and increment the shared counter. If the three threads are performing the
same function, and none of them cares which value of the counter they get, then it is enough to
lock a mutex around the fetch and increment operation. The mutex guarantees that each thread
gets a distinct value, and no values are skipped. There's no race because none of the threads cares
who wins.
But if it matters which value each thread receives, that simple code will not do the job. For
example, you might imagine that threads are guaranteed to start in the order in which they are
created, so that the first thread gets the value 1, the second gets the value 2, and so forth. Once in a
while (probably while you're debugging), the threads will get the value you expect, and everything
will work, and at other times, the threads will happen to run in a different order.
There are several ways to solve this. For example, you could assign each of the threads the
proper value to begin with, by incrementing the counter in the thread that creates them and passing
the appropriate value to each thread in a data structure. The best solution, though, is to avoid the
problem by designing the code so that startup order doesn't matter. The more symmetrical your
threads are, and the fewer assumptions they make about their environment, the less chance that
this kind of race will happen.
Races aren't always way down there at the level of memory address references, though. They
can be anywhere. The traditional ANSI C library, for example, allows a number of sequence races
when you use certain functions in an application with multiple threads. The readdir function, for
example, relies on static storage within the function to maintain context across a series of identical
calls to readdir. If one thread calls readdir while another thread is in the middle of a sequence of its
own calls to readdir, the static storage will be overwritten with a new context.
| "Sequence races" can occur even when all your code uses mutexes to
| protect shared data!
This race occurs even if readdir is "thread aware" and locks a mutex to protect the static
storage. It is not a synchronization race, it is a sequence race. Thread A might call readdir to scan
directory/usr/bin, for example, which locks the mutex, returns the first entry, and then unlocks the
mutex. Thread B might then call readdir to scan directory /usr/include, which also locks the mutex,
returns the first entry, and then unlocks the mutex. Now thread A calls readdir again expecting the
second entry in /usr/bin; but instead it gets the second entry in /usr/include. No interface has
behaved improperly, but the end result is wrong. The interface to readdir simply is not appropriate
for use by threads.
That's why Pthreads specifies a set of new reentrant functions, including readdir_r, which has
an additional argument that is used to maintain context across calls. The additional argument
solves the sequence race by avoiding any need for shared data. The call to readdir_r in thread A
returns the first entry from/usr/bin in thread A's buffer, and the-call to readdir r in thread B returns
the first entry from /usr/include in thread B's buffer ... and the second call in thread A returns the
second entry from /usr/bin in thread A's buffer. Refer to pipe.c, in Section 4.1, for a program that
uses readdir_r.
Sequence races can also be found at higher levels of coding. File descriptors in a process, for
example, are shared across all threads. If two threads attempt to getc from the same file, each
character in the file can go to only one thread. Even though getc itself is thread-safe, the sequence
of characters seen by each thread is not deterministic--it depends on the ordering of each thread's
independent calls to getc. They may alternate, each getting every second character throughout the
file. Or one may get 2 or 100 characters in a row and then the other might get 1 character before
being preempted for some reason.
There are a number of ways you can resolve the getc race. You can open the file under two
separate file descriptors and assign one to each thread. In that way, each thread sees every
character, in order. That solves the race by removing the dependency on ordering. Or you can lock
the file across the entire sequence of gets operations in each thread, which solves the race by
enforcing the desired order. The program putchar.c, back in Section 6.4.2, shows a similar
situation.
Usually a program that doesn't care about ordering will run more efficiently than a program
that enforces some particular ordering, first, because enforcing the ordering will always introduce
computational overhead that's not directly related to getting the job done. Remember Amdahl’s
law. "Unordered" programs are more efficient because the greatest power of threaded
programming is that things can happen concurrently, and synchronization prevents concurrency.
Running an application on a multiprocessor system doesn't help much if most processors spend
their time waiting for one to finish something.
8.1.3 Cooperate to avoid deadlocks Like races, deadlocks are the result of synchronization problems in a program. While races
are resource conflicts caused by insufficient synchronization, deadlocks are usually conflicts in the
use of synchronization. A deadlock can happen when any two threads share resources. Essentially
a deadlock occurs when thread A has resource 1 and can't continue until it has resource 2, while
thread B has resource 2 and can't continue until it has resource 1.
The most common type of deadlock in a Pthreads program is mutex deadlock, where both
resources are mutexes. There is one really important advantage of a deadlock over a race: It is
much easier to debug the problem. In a race, the threads do something incorrectly and move on.
The problem shows up sometime later, usually as a side effect. But in a deadlock the threads are
still there waiting, and always will be--if they could go anywhere, it wouldn't be a deadlock. So
when you attach to the process with the debugger or look at a crash dump, you can see what
resources are involved. With a little detective work you can often determine why it happened.
The most likely cause is a resource ordering inconsistency. The study of deadlocks goes way
back to the early days of operating system design. Anyone who's taken computer science courses
has probably run into the classic dining philosophers problem. Some philosophers sit at a round
table with plates of spaghetti; each alternately eats and discusses philosophy. Although no utensils
are required to discuss philosophy, each philosopher requires two forks to eat. The table is set with
a single fork between each pair. The philosophers need to synchronize their eating and discussion
to prevent deadlock. The most obvious form of deadlock is when all philosophers simultaneously
pick up one fork each and refuse to put it down.
There's always a way to make sure that your philosophers can all eat, eventually. For example,
a philosopher can take the fork to her right, and then look to her left. If the fork is available, she
can take it and eat. If not, she should return the fork she's holding to the table and chat awhile.
(That is the mutex backoff strategy discussed in Section 3.2.5.1.) Since the philosophers are all in
a good mood and none has recently published papers severely critical of adjoining colleagues,
those who get to eat will in reasonably short order return both of their forks to the table so that
their colleagues on each side can proceed.
A more reliable (and more sanitary) solution is to skip the spaghetti and serve a dish that can
be eaten with one fork. Mutex deadlocks can't happen if each thread has only one mutex locked at
a time. It is a good idea to avoid calling functions with a mutex locked. First, if that function (or
something it calls) locks another mutex, you could end up with a deadlock. Second, it is a good
idea to lock mutexes for as short a time as possible (remember, locking a mutex prevents another
thread from "eating"--that is, executing---concurrently). Calling printf, though, isn't likely to cause
a deadlock in your code, because you don't lock any ANSI C library mutexes, and the ANSI C
library doesn't lock any of your mutexes. If the call is into your own code, or if you call a library
that may call back into your code, be careful.
If you need to lock more than one mutex at a time, avoid deadlocks by using a strict
hierarchy or a backoff algorithm. The main disadvantage of mutex backoff is that the backoff loop
can run a long time if there are lots of other threads locking the mutexes, even if they do so
without any possibility of a deadlock. The backoff algorithm assumes that other threads may lock
the first mutex after having locked one or more of the other mutexes. If all threads always lock
mutexes in the order they're locked by the backoff loop, then you've got a fixed locking hierarchy
and you don't need the backoff algorithm.
When a program has hung because of a deadlock, you require two important capabilities of
your threaded debugger. First, it allows you to run your program in a mode where mutex
ownership is recorded, and may be displayed using debugger commands. Finding a thread that is
blocked on some mutex while it owns other mutexes is a good indication that you may have a
deadlock. Second, you would like to be able to examine the call stack of threads that own mutexes
to determine why the mutexes have remained locked.
The call stack may not always be sufficient, though. One common cause of a deadlock is that
some thread has returned from a function without unlocking a mutex. In this case, you may need a
more sophisticated tool to trace the synchronization behavior of the program. Such a tool would
allow you to examine the data and determine, for example, that function bad_lock locked a mutex
and failed to unlock that mutex.
8.1.4 Beware of priority inversion
"Priority inversion" is a problem unique to applications (or libraries) that rely on realtime
priority scheduling. Priority inversion involves at least three threads of differing priority. The
differing priorities are important--priority inversion is a conflict between synchronization and
scheduling requirements. Priority inversion allows a low-priority thread to indefinitely prevent a
higher-priority thread from running. The result usually is not a deadlock (though it can be), but it
is always a severe problem. See Section 5.5.4 for more about priority inversion.
Most commonly, a priority inversion results from three threads of differing priority sharing
resources. One example of a priority inversion is when a low-priority thread locks a mutex, and is
preempted by a high-priority thread, which then blocks on the mutex currently locked by the
low-priority thread. Normally, the low-priority thread would resume, allowing it to unlock the
mutex, which would unblock the high-priority thread to continue. However, if a medium-priority
thread was awakened (possibly by some action of the high-priority thread), it might prevent the
lower-priority thread from running. The medium-priority thread (or other threads it awakens) may
indefinitely prevent the low-priority thread from releasing the mutex, so a high-priority thread is
blocked by the action of a lower-priority thread.
If the medium-priority thread blocks, the low-priority thread will be allowed to resume and
release the mutex, at which point operation resumes. Because of this, many priority inversion
deadlocks resolve themselves after a short time. If all priority inversion problems in a program
reliably resolve themselves within a short time, the priority inversion may become a performance
issue rather than a correctness issue. In either case, priority inversion can be a severe problem.
Here are a few ideas to avoid priority inversion:
� Avoid realtime scheduling entirely. That clearly is not practical in many realtime
applications, however.
� Design your threads so that threads of differing priority do not need to use the same
mutexes. This may be impractical, too; many ANSI C functions, for example, use
mutexes.
� Use priority ceiling mutexes (Section 5.5.5.1) or priority inheritance (Section 5.5.5.2).
These are optional features of Pthreads and will not be available everywhere. Also, you
cannot set the mutex priority protocol for mutexes you do not create, including those
used by ANSI C functions.
� Avoid calling functions that may lock mutexes you didn't create in any thread with
elevated priority.
8.1.5 Never share condition variables between predicates
our code will usually be cleaner and more efficient if you avoid using a single condition
variable to manage more than one predicate condition. You should not, for example, define a
single "queue" condition variable that is used to awaken threads waiting for the queue to become
empty and also threads waiting for an element to be added to the queue.
But this isn't just a performance issue (or it would be in another section). If you use
pthread_cond_signal to wake threads waiting on these shared condition variables, the program
may hang with threads waiting on the condition variable and nobody left to wake them up.
Why? Because you can only signal a condition variable when you know that a single thread
needs to be awakened, and that any thread waiting on the condition variable may be chosen. When
multiple predicates share a condition variable, you can never be sure that the awakened thread was
waiting for the predicate you set. If it was not, then it will see a spurious wakeup and wait again.
Your signal has been lost, because no thread waiting for your predicate had a chance to see that it
had changed.
It is not enough for a thread to resignal the condition variable when it gets a spurious wakeup,
either. Threads may not wake up in the order they waited, especially when you use priority
scheduling. "Resignaling" might result in an infinite loop with a few high-priority threads (all with
the wrong predicate) alternately waking each other up.
The best solution, when you really want to share a condition variable between predicates, is
always to use pthread_cond_broadcast. But when you broadcast, all waiting threads wake up to
reevaluate their predicates. You always know that one set or the other cannot proceed--so why
make them all wake up to find out? If 1 thread is waiting for write access, for example, and 100
are waiting for read access, all 101 threads must wake up when the broadcast means that it is now
OK to write, but only the one writer can proceed--the other 100 threads must wait again. The
result of this imprecision is a lot of wasted context switches, and there are more useful ways to
keep your computer busy.
8.1.6 Sharing stacks and related memory corrupters There's nothing wrong with sharing stack memory between threads. That is, it is legal and
sometimes reasonable for a thread to allocate some variable on its own stack and communicate
that address to one or more other threads. A correctly written program can share stack addresses
with no risk at all; however (this may come as a surprise), not every program is written correctly,
even when you want it to be correct. Sharing stack addresses can make small programming errors
catastrophic, and these errors can be very difficult to isolate.
| Returning from the function that allocates shared stack memory, when
| other threads may still use that data, will result in undesirable behavior.
If you share stack memory, you must ensure that it is never possible for the thread that owns
the stack to "pop" that shared memory from the stack until all other threads have forever ceased to
make use of the shared data. Should the owning thread return from a stack frame containing the
data, for example, the owning thread may call another function and thereby reallocate the space
occupied by the shared variable. One or both of the following possible outcomes will eventually
be observed:
1. Data written by another thread will be overwritten with saved register values, a return PC,
or whatever. The shared data has been corrupted.
2. Saved register values, return PC, or whatever will be overwritten by another thread
modifying the shared data. The owning thread's call frame has been corrupted.
Having carefully ensured that there is no possible way for the owning thread to pop the stack
data while other threads are using the shared data, are you safe? Maybe not. We're stretching the
point a little, but remember, we're talking about a programming error--maybe a silly thing like
failing to initialize a pointer variable declared with auto storage class, for example. A pointer to
the shared data must be stored somewhere to be useful---other threads have no other way to find
the proper stack address. At some point, the pointer is likely to appear in various locations on the
stack of every thread that uses the data. None of these pointers will necessarily be erased when the
thread ceases to make use of the stack.
Writes through uninitialized pointers are a common programming error, regardless of threads,
so to some extent this is nothing new or different. However, in the presence of threads and shared
stack data, each thread has an opportunity to corrupt data used by some other thread
asynchronously. The symptoms of that corruption may not appear until some time later, which can
pose a particularly difficult debugging task.
If, in your program, sharing stack data seems convenient, then by all means take advantage of
the capability. But if something unexpected happens during debugging, start by examining the
code that shares stack data particularly carefully. If you routinely use an analysis tool that reports
use of uninitialized variables (such as Third Degree on Digital UNIX), you may not need to worry
about this class of problem--or many others.
8.2 Avoiding performance problems
"Well, in our country," said Alice, still panting a little, "you'd generally
get to somewhere else--if you ran very fast for a long time as we've
been doing."
“A slow sort of country?” said the Queen. "Now, here, you see, it takes all the
running you can do, to keep in the same place. If you want to get some-
where else, you must run at least twice as fast as that!"
--Lewis Carroll, Through the Looking-Glass
Sometimes, once a program works, it is "done." At least, until you want to make it do
something else. In many cases, though, "working" isn't good enough. The program needs to meet
performance goals. Sometimes the performance goals are clear: "must perform so many
transactions in this period of time." Other times, the goals are looser: "must be very fast."
This section gives pointers on determining how fast you're going, what's slowing you up, and
how to tell (maybe) when you're going as fast as you can go. There are some very good tools to
help you, and there will be a lot more as the industry adjusts to supporting eager and outspoken
thread programmers. But there are no portable standards for threaded analysis tools. If your
vendor supports threads, you'll probably find at least a thread-safe version of prof, which is a
nearly universal UNIX tool. Each system will probably require different switches and
environments to use it safely for threads, and the output will differ.
Performance tuning requires more than just answering the traditional question, "How much
time does the application spend in each function?" You have to analyze contention on mutexes, for
example. Mutexes with high contention may need to be split into several mutexes controlling
more specialized data (finer-grain concurrency), which can improve performance by increasing
concurrency. If finer grain mutexes have low contention, combining them may improve
performance by reducing locking overhead.
8.2.1 Beware of concurrent serialization
The ideal parallel code is a set of tasks that is completely compute-bound. They never
synchronize, they never block--they just "think." If you start with a program that calls three
compute-bound functions in series, and change it to create three threads each running one of those
functions, the program will run (nearly) three times faster. At least, it should do so if you're
running on a multiprocessor with at least three CPUs that are, at that moment, allocated for your
use.
The ideal concurrent code is a set of tasks that is completely I/O-bound. They never
synchronize, and do little computation--they just issue I/O requests and wait for them. If you start
with a program that writes chunks of data to three separate files (ideally, on three separate disks,
with separate controllers), and change it to create three threads, each writing one of those chunks
of data, all three I/O operations can progress simultaneously.
But what if you've gone to all that trouble to write a set of compute-bound parallel or
I/O-bound concurrent threads and it turns out that you've just converted a straight-line serialized
program into a multithreaded serialized program? The result will be a slower program that
accomplishes the same result with substantially more overhead. Most likely, that is not what you
intended. How could that have happened?
Let's say that your compute-bound operations call malloc and free in their work. Those
functions modify the static process state, so they need to perform some type of synchronization.
Most likely, they lock a mutex. If your threads run in a loop calling malloc and free, such that a
substantial amount of their total time may be spent within those functions, you may find that
there's very little real parallelism. The threads will spend a lot of time blocked on the mutex while
one thread or another allocates or frees memory.
Similarly, the concurrent I/O threads may be using serialized resources. If the threads
perform "concurrent" I/O using the same stdio FILE stream, for example, they will be locking
mutexes to update the stream's shared buffer. Even if the threads are using separate files, if they
are on the same disk there will be locking within the file system to synchronize the file cache and
so forth. Even when using separate disks, true concurrency may be subject to limitations in the I/O
bus or disk controller subsystems.
The point of all this is that writing a program that uses threads doesn't magically grant
parallelism or even concurrency to your application. When you're analyzing performance, be
aware that your program can be affected by factors that aren't within your control. You may not
even be able to see what's happening in the file system, but what you can't see can hurt you.
8.2.2 Use the right number of mutexes
The first step in making a library thread-safe may be to create a "big mutex" that protects all
entries into the library. If only one thread can execute within the library at a time, then most
functions will be thread-safe. At least, no static data will be corrupted. If the library has no
persistent state that needs to remain consistent across a series of calls, the big mutex may seem to
be enough. Many libraries are left in this state. The standard X11 client library (Xlib) provides
limited support for this big mutex approach to thread-safety, and has for years.
But thread-safety isn't enough anymore--now you want the library to perform well with
threads. In most cases, that will require redesigning the library so that multiple threads can use it
at the same time. The big mutex serializes all operations in the library, so you are getting no
concurrency or parallelization within the library. If use of that library is the primary function of
your threads, the program would run faster with a single thread and no synchronization. That big
mutex in Xlib, remember, keeps all other threads from using any Xlib function until the first
thread has received its response from the server, and that might take quite a while.
Map out your library functions, and determine what operations can reasonably run in parallel.
A common strategy is to create a separate mutex for each data structure, and use those mutexes to
serialize access to the shared data, rather than using the "big mutex" to serialize access to the
library.
With a profiler that supports threads, you can determine that you have too much mutex
activity, by looking for hot spots within calls to pthread_mutex_lock, pthread_mutex_unlock, and
pthread_mutex_trylock. However, this data will not be conclusive, and it may be very difficult to
determine whether the high activity is due to too much mutex contention or too much locking
without contention. You need more specific information on mutex contention and that requires
special tools. Some thread development systems provide detailed visual tracing information that
shows synchronization costs. Others provide "metering" information on individual mutexes to tell
how many times the mutex was locked, and how often threads found the mutex already locked.
8.2.2.1 Too many mutexes will not help
Beware, too, of exchanging a "big" mutex for lots of "tiny" mutexes. You may make matters
worse. Remember, it takes time to lock a mutex, and more time to unlock that mutex. Even if you
increase parallelism by designing a locking hierarchy that has very little contention, your threads
may spend so much time locking and unlocking all those mutexes that they get less real work
done.
Locking a mutex also affects the memory subsystem. In addition to the time you spend
locking and unlocking, you may decrease the efficiency of the memory system by excessive
locking. Locking a mutex, for example, might invalidate a block of cache on all processors. It
might stall all bus activity within some range of physical addresses.
So find out where you really need mutexes. For example, in the previous section I suggested
creating a separate mutex for each data structure. Yet, if two data structures are usually used
together, or if one thread will hardly ever need to use one data structure while another thread is
using the second data structure, the extra mutex may decrease your overall performance.
8.2.3 Never fight over cache lines No modern computer reads data directly from main memory. Memory that is fast enough to
keep up with the computer is too expensive for that to be practical. Instead, data is fetched by the
memory management unit into a fast local cache array. When the computer writes data, that, too,
goes into the local cache array. The modified data may also be written to main memory
immediately or may be "flushed" to memory only when needed.
So if one processor in a multiprocessor system needs to read a value that another processor
has in its cache, there must be some "cache coherency" mechanism to ensure that it can find the
correct data. More importantly, when one processor writes data to some location, all other
processors that have older copies of that location in cache need to copy the new data, or record
that the old data is invalid.
Computer systems commonly cache data in relatively large blocks of 64 or 128 bytes. That
can improve efficiency by optimizing the references to slow main memory. It also means that,
when the same 64- or 128-byte block is cached by multiple processors, and one processor writes
to any part of that block, all processors caching the block must throw away the entire block.
This has serious implications for high-performance parallel computation. If two threads
access different data within the same cache block, no thread will be able to take advantage of the
(fast) cached copy on the processor it is using. Each read will require a new cache fill from main
memory, slowing down the program.
Cache behavior may vary widely even on different computer systems using the same
microprocessor chip. It is not possible to write code that is guaranteed to be optimal on all possible
systems. You can substantially improve your chances, however, by being very careful to align and
separate any performance-critical data used by multiple threads.
You can optimize your code for a particular computer system by determining the cache
characteristics of that system, and designing your code so that no two threads will ever need to
write to the same cache block within performance-critical parallel loops. About the best you can
hope to do without optimizing for a particular system would be to ensure that each thread has a
private, page-aligned, segment of data. It is highly unlikely that any system would use a cache
block as large as a page, because a page includes far too much varied data to provide any
performance advantage in the memory management unit.
9 POSIX threads mini-reference This chapter is a compact reference to the POSIX. lc standard.
9.1 POSIX 1003.1 c-1995 options
Pthreads is intended to address a wide variety of audiences. High-performance
computational programs can use it to support parallel decomposition of loops.
Realtime programs can use it to support concurrent realtime I/O. Database and
network servers can use it to easily support concurrent clients. Business or soft-
ware development programs can use it to take advantage of parallel and concurrent
operations on time-sharing systems.
The Pthreads standard allows you to determine which optional capabilities are
provided by the system, by defining a set of feature- �test macros, which are shox nq
in Table 9.1. Any implementation of Pthreads must inform you whether each
option is supported, by three means:
?By making a formal statement of support in the POSIX Conformance Doc-
ument. You can use this information to help design your application to
work on specific systems.
?By defining compile-time symbolic constants in the <unistd. h> header file.
You can test for these symbolic constants using #ifdef or #ifndef prepro-
cessor conditionals to support a variety of Pthreads systems.
?By returning a positive nonzero value when the sys 給 n?function is called
with the associated s 緎 conf symbol. (This is not usually useful for the
"feature-test" macros that specify whether options are present--if they are
not, the associated interfaces usually are not supplied, and your code will
not link, and may not even compile.)
You might, for example, choose to avoid relying on priority scheduling because
after reading the conformance documents you discovered that three out of the
four systems you wish to support do not provide the feature. Or you might prefer
to use priority inheritance for your mutexes on systems that provide the feature,
but write the code so that it will not try to access the mutex protocol attribute on
systems that do not provide that option. Symbolic constant,
sysconf symbol name
Description
POSIX THREADS
--SC THREADS
POSIX THREAD ATTR STACKSIZE
-- � �SC THREAD AT R ST CKSIZE
POSIX THREAD ATTR STACKADDR
-- � �SC THREAD AT R ST CKADDR
POSIX THREAD PRIORITY SCHEDULING
--SC THREAD PRIORITY SCHEDULING
POSIX THREAD PRIO INHERIT
SC THREAD PaSo INHERIT
POSIX THREAD PRIO PROTECT
�SC THREAD PR O PROTECT
POSIX THREAD PROCESS SHARED
sc THREAD PRSCESS
POSIX THREAD SAFE FUNCTIONS
-- �SC THREAD SA E FUNCTIONS
You can use threads (if your system
doesn't define this, you're out of luck).
You can control the size of a thread's
stack.
You can allocate and control a
thread's stack.
You can use realtime scheduling.
You can create priority inheritance
mutexes.
You can create priority ceiling mutexes.
You can create mutexes and condition
variables that can be shared with an-
other process.
You can use the special "_r" library
functions that provide thread-safe
behavior.
TABLE 9.1 POSIX 1003. lc-1995 options
9.2 POSIX 1003.1c-1995 limits
The Pthreads standard allows you to determine the run-time limits of the sys-
tem that may affect your application, for example, how many threads you can
create, by defining a set of macros, which are shown in Table 9.2. Any implemen-
tation of Pthreads must inform you of its limits, by three means:
?By making a formal statement in the POSIX Conformance Document. You
can use this information to help design your application to work on specific
systems.
?By defining compile-time symbolic constants in the <limits. h> header file.
The symbolic constant may be omitted from <limits. h> when the limit is
at least as large as the required minimum, but cannot be determined at
compile time, for example, if it depends on available memory space. You
can test for these symbolic constants using #ifdef or #ifndef preproces-
sor conditionals.
?By returning a positive nonzero value when the sysconf function is called
with the associated sysconf symbol.
You might, for example, design your application to rely on no more than 64
threads, if the conformance documents showed that three out of the four systemsRun-time
invariant values,
sysconf symbol name
Description
PTHREAD DESTRUCTOR ITERATIONS
�SC THREAD DESTRUC OR ITERATIONS
PTHREAD KEYS MAX
�SC THREAD K YS MAX
PTHREAD STACK MIN
SC THREAD STXCK MIN
PTHREAD THREADS MAX
SC THREAD THREXDS MAX
Maximum number of attempts to
destroy a thread's thread-specific data
on termination (must be at least 4).
Maximum number of thread-specific
data keys available per process (must
be at least 128).
Minimum supported stack size for a
thread.
Maximum number of threads support-
ed per process (must be at least 64).
TABLE 9.2 POSIX 1003.1c-1995 limits
you wish to support do not support additional threads. Or you might prefer to
write conditional code that relies on the value of the PTHREAD THREADS MAX SylTl-
bolic constant (if defined) or call sysconf to determine the lim]- �t at run me.
9.3 POSIX 1003.1c-1995 interfaces
The interfaces are sorted by functional categories: threads, mutexes, and so
forth. Within each category, the interfaces are listed in alphabetical order.
Figure 9.1 describes the format of the entries.
First, the header entry (1) shows the name of the interface. If the interface is
an optional feature of Pthreads, then the name of the feature-test macro for that
?
?
?
┅
┅
pthread_mutexattr_getpshared ....................................... [_POS IX_THREAD_PROCESS_S
HARED ]
int pthread_mutexattr_getpshared (
const pthread_mutexattr_t *attr,
int *pshared);
�Determine whether mutexes crea d with attr can be shared by multiple processes.
References: 3.2, 5.2.1
Headers: <pthread. h>
Errors: [EINVAL] attr invalid.
Hint: pshared mutexes must be allocated in shared memory.
FIGURE 9.1 Mini-reference formatoption is shown at the end of the line, in brackets. The
interface pthread_
mutexattr_getpshared, for example, is an option under the _POSIX_THREAD_
PROCESS SHARED feature.
The prototype entry (2) shows the full C language prototype for the interface,
describing how to call the function, with all argument types.
The description entry (3) gives a brief synopsis of the interface. In this case,
the purpose of the interface is to specify whether mutexes created using the
attributes object can be shared between multiple processes.
Functions with arguments that have symbolic values, like pshared in this
example, will include a table (4) that describes each possible value. The default
value of the argument (the state of a new thread, or the default value of an
attribute in a new attributes object, in this case PTHREAD_PROCESS_PRIVATE) is
indicated by showing the name in bold.
The references entry (5) gives cross-references to the primary sections of this
book that discuss the interface, or other closely related interfaces.
The headers entry (6) shows the header files needed to compile code using the
function. If more than one header is shown, you need all of them.
The errors entry (7) describes each of the possible error numbers returned by
the interface; Because Pthreads distinguishes between mandatory error detection
("if occurs" in POSIX terms) and optional error detection ("if detected" in POSIX
terms), the errors that an interface must report (if they occur) are shown in bold
(see Section 9.3.1 for details on Pthreads errors).
The hint entry (8) gives a single, and inevitably oversimplified, philosophical
comment regarding the interface. Some hints point out common errors in using
the interface; others describe something about the designers' intended use of the
interface, or some fundamental restriction of the interface. In pthread_mutexattr_
getpshared, for example, the hint points out that a mutex created to be "process
shared" must be allocated in shared memory that's accessible by all participating
processes.
9.3.1 Error detection and reporting
The POSIX standard distinguishes carefully between two categories of error:
1. Mandatory ("if occurs") errors involve circumstances beyond the control of
the programmer. These errors must always be detected and reported by the
system using a particular error code. If you cannot create a new thread
because your process lacks sufficient virtual memory, then the implemen-
tation must always tell you. You can't possibly be expected to check
whether there's enough memory before creating the thread--for one thing,
you have no way to know how much memory would be required.
2. Optional ("if detected") errors are problems that are usually your mistake.
You might try to lock a mutex that hadn't been initialized, for example, or
try to unlock a mutex that's locked by another thread. Some systems maynot detect these errors,
but they're still errors in your code, and you ought
to be able to avoid them without help from the system.
While it would be "nice" for the system to detect optional errors and return the
appropriate error number, sometimes it takes a lot of time to check or is difficult
to check reliably. It may be expensive, for example, for the system to determine
the identity of the current thread. Systems may therefore not remember which
thread locked a mutex, and would be unable to detect that the unlock was erro-
neous. It may not make sense to slow down the basic synchronization operations
for correct programs just to make it a little easier to debug incorrect programs.
Systems may provide debugging modes where some or all of the optional
errors are detected. Digital UNIX, for example, provides "error check" mutexes
and a "metered" execution mode, where the ownership of mutexes is always
tracked and optional errors in locking and unlocking mutexes are reported. The
UNIX98 specification includes "error check" mutexes (Section 10.1.2), so they will
soon be available on most UNIX systems.
9.3.2 Use of void* type
ANSI C requires that you be allowed to convert any pointer type to void* and
back, with the result being identical to the original value. However, ANSI C does
not require that all pointer types have the same binary representation. Thus, a
lon9* that you convert to void* in order to pass into a thread's start routine
must always be used as a long*, not as, for example, a char*. In addition, the
result of converting between pointer and integer types is "implementation
defined." Most systems supporting UNIX will allow you to cast an integer value to
void* and back, and to mix pointer types--but be aware that the code may not
work on all systems.
Some other standards, notably the POSIX. lb realtime standard, have solved
the same problem (the need for an argument or structure member that can take
any type value) in different ways. The $駁 event: structure in POSIX. lb, for exam-
ple, includes a member that contains a value to be passed into a signal-catching
function, called sigev_value. Instead of defining sigev_value as a void*, how-
ever, and relying on the programmer to provide proper type casting, the sigev_
value member is a union sigval, containing overlayed int and void* members.
This mechanism avoids the problem of converting between integer and pointer
types, eliminating one of the conflicts with ANSI C guarantees.
9.3.3 Threads
Threads provide concurrency, the ability to have more than one "stream of
execution" within a process at the same time. Each thread has its own hardware
registers and stack. All threads in a process share the full virtual address space,
plus all file descriptors, signal actions, and other process resources. pthread_attr_destroy
int pthread_attr_destroy (
pthread_attr_t * attr );
Destroy a thread attributes object. The object can no longer be used.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [EINVAL] attr is invalid.
Hint: Does not affect threads created using attr.
pthread_affr_getdetachstate
int pthread_attr_getdetachstate (
const pthread_attr_t *attr,
int *detachstate);
Determine whether threads created with attr will mn detached.
References:
Headers:
Errors:
Hint:
2, 5.2.3
<pthread. h>
[EINVAL] attr is invalid.
You can't join or cancel detached threads.
pthread_attr_getstackaddr ..................................................... [ _POS I X_THREAD_ATTR_S
TACKADDR ]
int pthread_attr_getstackaddr (
const pthread_attr_t *attr,
void **stackaddr );
Determine the address of the stack on which threads created with attr will run.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [ EINVAL ] attr is invalid.
[ENOSYS] stacksize not supported.
Hint: Create only one thread for each stack address! I
pthread_attr_getstacksize ....................................................... [ POSiX THREAD
ATTR_STACKSIZE ]
int pthread_attr_getstacksize ( - -
const pthread_attr_t *attr,
size_t *stacksize );
Determine the size of the stack on which threads created with attr will run.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [ EINVAL ] attr invalid.
[ENOSYS] stacksize not supported.
Hint: Use on newly created attributes object to find the default stack size.
pthread_attr_init
int pthread_attr_init (
pthread_attr_t *attr );
Initialize a thread attributes object with default attributes.
References: 2, 5.2.3
Headers: <pthread. h>
�Errors: [ - �NOM M] insufficient memory for attr.
Hint: Use to define thread types.
P;hread_attr_setdetachstate
int pthread_attr_setdetachstate (
pthread_attr_t *attr,
int detachstate);
�Speci whetherthreads created withattrwillrun detached.
References:
Headers:
Errors:
Hint:
2, 5.2.3
<pthread. h>
[ ETNVAL ] attr invalid.
�[ -INVAL] detachstate invalid.
You can't join or cancel detached threads.
pthread_affcsetstackaddr ..................................................... [ _POS I X_THREAD_ATTR_S
TACKADDR ]
int pthread_attr_setstackaddr (
pthread_attr_t * attr,
void * stackaddr );
Threads created with attr will run on the stack starting at stackaddr. Must be at
least PTHREAD_STACK_MIN bytes.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [ EINVAL ] attr invalid.
�[EI OS 維] stackaddr not supported.
Hint: Create only one thread for each stack address, and be careful of
stack alignment.
pthread_aitr_s ﹖ stacksize ....................................................... [_POS I
X_THREAD_ATTR_STACKS I Z E ]
int pthread_attr_setstacksize (
pthread_attr_t *attr,
size t stacksize);
Threads created with attr will mn on a stack of at least stacksize bytes. Must be
at least PTHREAD_STACK_MIN bytes.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [ EINVAL] attr or stacksize invalid.
�[EII VAL] stacksize tOO small or too big.
[ENO$1 S] stacksize not supported.
�Hint: Find the default first thread_attr_getstacksize). then increase
by multiplying. Use only if a thread needs more than the default.
pthread_?eole
int pthread_create (
pthread_t
const pthread_attr_t
void
void
*tid,
*attr,
*(*start) (void *),
*arg);
Create a thread running the start function, essentially an asynchronous call to the
function start with argument value arg. The attr argument specifies optional
creation attributes, and the identification of the new thread is returned in tid.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [ m..IIqVAL ] attr invalid.
�[ EAi AIN] insufficient resources.
Hint: All resources needed by thread must already be initialized. pthread_detach
int pthread_detach (
pthread_t thread );
Detach the thread. Use this to detach the main thread or to "change your mind"
after creating a joinable thread in which you are no longer interested.
References: 2, 5.2.3
Headers: <pthread. h>
Errors: [ EINVAL ] thread is not a joinable thread.
[ESRCH] no thread could be found for ID thread.
Hint: Detached threads cannot be joined or canceled; storage is freed
immediately on termination.
pthread_equal
int pthread_equal (
pthread_t tl,
pthread_t t2 );
Return value 0 if tl and t2 are equal, otherwise return nonzero.
References: 2, 5.2.3
Headers: <pthread. h>
Hint: Compare pthread_self against stored thread identifier.
pthread_exit
int pthread_exit (
void *value_ptr );
Terminate the calling thread, returning the value value_ptr to any joining thread.
References: 2, 5.2.3
Headers: <pthread. h>
Hint: value_ptr is treated as a value, not the address of a value.
pthread_.join
int pthread_join (
pthread_t thread,
void **value_ptr );
Wait for thread to terminate, and return thread's exit value if value_ptr is not
NULL. This also detaches thread on successful completion.
References: 2, 5.2.3
Headers: <pthread. h>
� �Errors: [ .INvaI I thread is not a joinable thread.
�[w. SRCI ] no thread could be found for ID thread.
�[ ED .ADLK ] attempt to join with self.
Hint: Detached threads cannot be joined or canceled. pthread_self
pthread_t pthread_self (void);
Return the calling thread's ID.
References: 2, 5.2.3
Headers: <pthread. h>
Hint: Use to set thread's scheduling parameters.
sched_yield
int sched_yield (void);
Make the calling thread ready, after other ready threads of the same priority, and
select a new thread to run. This can allow cooperating threads of the same priority
to share processor resources more equitably, especially on a uniprocessor. This
function is from P0SIX. lb (realtime extensions), and is declared in <sched.h>. It
reports errors by setting the return value to -1 and storing an error code in errno.
References: 2, 5.2.3
Headers: <sched. h>
Errors: [ENOSYS] sched_yield not supported.
Hint: Use before locking mutex to reduce chances of a timeslice while mu-
tex is locked.
9.3.4 Mutexes
Mutexes provide synchronization, the ability to control how threads share
resources. You use mutexes to prevent multiple threads from modifying shared
data at the same time, and to ensure that a thread can read consistent values for
a set of resources (for example, memory) that may be modified by other threads.
pthread_mutexattr_destroy
int pthread_mutexattr_destroy (
pthread_mutexattr_t *attr );
Destroy a mutex attributes object. The object can no longer be used.
References: 3.2, 5.2.1
Headers: <pthread. h>
Errors: [ EINVAL ] attr invalid.
Hint: Does not affect mutexes created using attr. pthread mutexattr
getpshared .............................................. ?POSiX THREAD PROCESS SHARED]
int pthread_mutexattr_getpshared (
const pthread_mutexattr_t *attr,
int *pshared );
Determine whether mutexes created with attr can be shared by multiple processes.
References:
Headers:
Errors:
Hint:
3.2, 5.2. l
<pthread. h>
[ EINVAL ] attr invalid.
pshared mutexes must be allocated in shared memory.
pthread_mutexattr_init
int pthread_mutexattr_init (
pthread_mutexattr_t *attr );
Initialize a mutex attributes object with default attributes.
References: 3.2, 5.2.1
Headers: <pthread. h>
Errors: [ ENOMEM] insufficient memory for attr.
Hint: Use to define mutex types.
pthread_mutexattr_setpshared ............................................... [_POSIX_THREAD
PROCESS_SHARED ]
int pthread_mutexattr_setpshared (
pthread_mutexattr_t *attr,
int pshared);
Mutexes created withattr canbe shared between processesfit he pthread_mutex_t
variable is allocated in memory shared by the processes.
References:
Headers:
Errors:
Hint:
3.2, 5.2.1
<pthread. h>
[EINVAL] attr or detachstate invalid.
pshared mutexes must be allocated in shared memory. pthread_mutex_destroy
int pthread_mutex_destroy (
pthread_mutex_t
Destroy a mutex that you no longer need.
References: 3.2, 5.2.1
Headers: <pthread. h>
Errors: [ EBUSY ] mutex is in use.
[ EINVAL ] mutex is invalid.
Hint:
*mutex);
Safest after unlocking mutex, when no other threads will lock.
pthread_mutex_init
int pthread_mutex_init (
pthread_mutex_t *mutex,
const pthread_mutexattr_t *attr);
Initialize a mutex. The attr argument specifies optional creation attributes.
References: 3.2, 5.2.1
Headers: <pthread. h>
�Errors: [ .aG&IN] insufficient resources (other than memory).
� �[ .NOM .M] insufficient memory.
� �[ .P .ItM 1 no privilege to perform operation.
[EBUSY] mutex is already initialized.
[ EINVAL ] attr is invalid.
Hint: Use static initialization instead, if possible.
pthread_mutex_lock
int pthread_mutex_lock (
pthread_mutex_t *mutex );
Lock a mutex. If the mutex is currently locked, the calling thread is blocked until
mutex is unlocked. On return, the thread owns the mutex until it calls pthread_
mutex unlock.
References:
Headers:
Errors:
Hint:
3.2, 5.2.1
<pthread. h>
�l IZII VAL] thread priority exceeds mutex priority ceiling.
[ EINVAL ] mutex is invalid.
[ EDEADLK ] calling thread already owns mutex.
Always unlock within the same thread. I pthread_mutex_trylock
int pthread_mutex_trylock (
pthread mutex t
-- -- *mutex );
Lock a mutex. If the mutex is currently locked, returns immediately with EBUSY. 0131-
erwise, calling thread becomes owner until it unlocks.
References: 3.2, 5.2.1
Headers: <pthread. h>
Er �rors: [ .INVAL] thread priority exceeds mutex priority ceiling.
�[ .suse] mutex is already locked.
[ EINVAL ] mutex is invalid.
� �[ .DEAD .K ] calling thread already owns mutex.
Hint: Always unlock within the same thread.
pthread_mutex_unlock
int pthread_mutex_unlock (
pthread mutex t
-- -- *mutex );
Unlock a mutex. The mutex becomes unowned. If any threads are waiting for the
mutex, one is awakened (scheduling policy SCHED FIFO and SCHED RR policy wait-
ers are chosen in priority order, then any �others a e chosen in unspecified order).
References: 3.2, 5.2.1
Headers: <pthread. h>
Errors: [ EINVAL ] mutex is invalid.
[ EPERM ] calling thread does not own mutex.
Hint: Always unlock within the same thread.
9.3.5 Condition variables
Condition variables provide communication, the ability to wait for some shared
resource to reach some desired state, or to signal that it has reached some state
in which another thread may be interested. Each condition variable is closely
associated with a mutex that protects the state of the resource.
� �,;hread_cond ; h_destroy
int pthread_condattr_destroy (
pthread_condattr_t *attr );
Destroy a condition variable attributes object. The object can no longer be used.
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [ EINVAL ] attr invalid.
Hint: Does not affect condition variables created using attr.
pthread_condattr_getpshared ............................................... [_POS I X_THREAD_PROCES
S_SHARED ]
int pthread_condattr_getpshared (
const pthread_condattr_t *attr,
int *pshared );
Determine whether condition variables created with attr can be shared by multiple
processes.
[
May I
[
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [ EINVAL ] attr invalid.
Hint: pshared condition variables must be allocated in shared memory
and used with pshared mutexes.
pthread_condattr_init
int pthread_condattr_init (
pthread_condattr_t *attr );
Initialize a condition variable attributes object with default attributes.
References: 3.3, 5.2.2
Headers: <pthread. h>
� �Errors: [ .NOI M] insufficient memory for attr.
Hint: Use to define condition variable types.
pthread_condaflr_setpshared ................................................
[_POSIX_THREAD_PROCESS_SHARED]
int pthread_condattr_setpshared (
pthread_condattr_t *attr,
int pshared);
Condition variables created with attr can be shared between processes if the
pthread_cond_t variable is allocated in memory shared by the processes. References:
Headers:
Errors:
Hint:
3.3, 5.2.2
<pthread. h>
[ EINVAL ] attr or detachstate invalid.
pshared condition variables must be allocated in shared memory
and used with pshared mutexes.
pthread_cond_destroy
int pthread_cond_destroy (
pthread_cond_t *cond );
Destroy condition variable cond that you no longer need.
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [EBUSY] cond is in use.
[EINVAL] cond is invalid.
Hint: Safest after wakeup from cond, when no other threads will wait.
pthread_cond_init
int pthread_cond_init (
pthread cond t *cond,
�const p hreaLcondattr_t *attr);
Initialize a condition variable cond. The attr argument specifies optional creation
attributes.
References: 3.3, 5.2.2
Headers: <pthread. h>
�Errors: ?l aoalN] insufficient resources (other than memory).
[ = �.Nol I] insufficient memory.
[EBUSY] cond is already initialized.
[ EINVAL ] attr is invalid.
Hint: Use static initialization instead, if possible.
pthread_cond_broadcast
int pthread_cond_broadcast (
pthread_cond_t *cond );
Broadcast condition variable cond, waking all current waiters.
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [EINVAL] cond is invalid.
Hint: Use when more than one waiter may respond to predicate change
or if any waiting thread may not be able to respond. pthread_cond_signal
int pthread_cond_signal (
pthread_cond_t *cond );
Signal condition variable cond, waking one waiting thread. If SCHED_FIFO or SCHED_ILR
policy threads are waiting, the highest-priority waiter is awakened. Otherwise, an
unspecified waiter is awakened.
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [ EINVAL ] cond is invalid.
Hint: Use when any waiter can respond, and only one need respond. (All
waiters are equal.)
pthread_cond_timedwait
int pthread_cond_timedwait (
pthread_cond_t * cond,
pthread_mutex t *mutex,
�const struct imespec *abstirae);
Wait on condition variable cond, until awakened by a signal or broadcast, or until
the absolute time abst 駌 ae is reached.
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [IgTII4 .DOUT] time specified by abstime has passed.
[EINVAL] cond, mutex, or abstime is invalid.
[EINVAL ] different mutexes for concurrent waits.
[EINVAL ] mutex is not owned by calling thread.
Hint: Mutex is always unlocked (before wait) and relocked (after wait)
inside pthread_cond_tiraedwait, even if the wait fails, times out, or
is canceled.
pthread_cond_wait
int pthread_cond_wait (
pthread_cond_t * cond,
pthread_mutex_t *mutex );
Wait on condition variable cond, until awakened by a signal or broadcast.
References: 3.3, 5.2.2
Headers: <pthread. h>
Errors: [EINVAL ] cond or mutex is invalid.
[ EINVAL] different mutexes for concurrent waits.
[ EINVAL] mutex is not owned by calling thread.
Hint: Mutex is always unlocked (before wait) and relocked (after wait) in-
side pthread_cond_wait, even if the wait fails or is canceled. 9.3.6 Cancellation
Cancellation provides a way to request that a thread terminate "gracefully"
when you no longer need it to complete its normal execution. Each thread can
control how and whether cancellation affects it, and can repair the shared state
as it terminates due to cancellation.
pthread_cancel
int pthread_cancel (
pthread_t thread );
Requests that thread be canceled.
References: 5.3
Headers: <pthread. h>
Errors: [ ESRCH ] no thread found corresponding to thread.
Hint: Cancellation is asynchronous. Use pthread_join to wait for termi-
nation of thread if necessary.
pthread_cleanup_pop
void pthread_cleanup_pop (int execute);
Pop the most recently pushed cleanup handlen Invoke the cleanup handler if exe-
cute is nonzero.
References: 5.3
Headers: <pthread. h>
Hint: Specify execute as nonzero to avoid duplication of common cleanup
code.
pthread_cleanup_push
void pthread_cleanup_push (
void (*routine) (void * ) ,
void *arg );
Push a new cleanup handler onto the thread's stack of cleanup handlers. Invoke
the cleanup handler if execute is nonzero. Each cleanup handler pushed onto the
stack is popped and invoked with the argument arg when the thread exits by call-
ing pthread_exit, when the thread acts on a cancellation request, or when the
thread calls pthread_cleanup_pop with a nonzero execute argument.
References: 5.3
Headers: <pthread. h>
Hint: pthread_cleanup_push and pthread_cleanup_pop must be paired
in the same lexical scope. pthread_setcancelstate
int pthread_setcancelstate (
int state,
int *oldstate );
Atomically set the calling thread's cancelability state to state and return the pre-
vious cancelability state at the location referenced by oldstate.
References:
Headers:
Errors:
Hint:
5.3
<pthread. h>
[EINVAL ] state is invalid.
Use to disable cancellation around "atomic" code that includes can-
cellation points.
pthread_setcanceltype
int pthread_setcanceltype (
int type,
int *oldtype );
Atomically set the calling thread's cancelability type to type and return the previ-
ous cancelability type at the location referenced by oldtype.
References:
Headers:
Errors:
Hint:
5.3
<pthread. h>
[ EINVAL ] type is invalid.
Use with caution--most code is not safe for use with asynchronous
cancelability type. pthread_testcancel
void pthread_testcancel (void);
Creates a deferred cancellation point in the calling thread. The call has no effect if
the current cancelability state is PTHREAD_CANCEL_DISABLE.
References: 5.3
Headers: <pthread. h>
Hint: Cancellation is asynchronous. Use pthread_j oin to wait for termi-
nation of thread if necessary.
9.3.7 Thread-specific data
Thread-specific data provides a way to declare variables that have a common
"name" in all threads, but a unique value in each thread. You should consider
using thread-specific data in a threaded program in many cases where a non-
threaded program would use "static" data. When the static data maintains con-
text across a series of calls to some function, for example, the context should
generally be thread-specific. (If not, the static data must be protected by a
mutex.)
pthread_getspecific
void *pthread_getspecific (
pthread_key_t key );
Return the current value of key in the calling thread. If no value has been set for
key in the thread, NULL is returned.
References: 5.4, 7.2, 7.3.1
Headers: <pthread. h>
Errors: The effect of calling pthread_getspecific with an invalid key is un-
defined. No errors are detected.
Hint: Calling pthread_getspecific in a destructor function will return
NULL. Use destructor's argument instead.
pthread_key_create
int pthread_key_create (
pthread_key_t *key,
void (*destructor) (void *));
Create a thread-specific data key visible to all threads. All existing and new threads
have value NULL for key until set using pthread_setspecific. When any thread
with a nOn-NULL value for key terminates, destructor is called with key's current
value for that thread. References:
Headers:
Errors:
Hint:
5.4, 7.2, 7.3.1
<pthread. h>
[!gAGAIN] insufficient resources or PTHREAD KEYS MAX exceeded.
�[ - �NOI }!] insufficient memory to create the key.
Each key (pthread_key_t variable) must be created only once; use
a mutex or pthread_once.
pthread_key_delete
int pthread_key_delete (
pthread_key_t key );
Delete a thread-specific data key. This does not change the value of the thread-
specific data key for any thread and does not run the key's destructor in any thread,
so it should be used with great caution.
References: 5.4
Headers: <pthread. h>
Errors: [ EINVAL ] key is invalid.
Hint: Use only when you know all threads have NULL value.
pthread_setspecific
int pthread_setspecific (
pthread_key_t key,
const void *value);
Associate a thread-specific value within the calling thread for the specified key.
References: 5.4, 7.2, 7.3.1
Headers: <pthread. h>
�Errors: [ ENOM 4] insufficient memory.
[EINVAL] key is invalid.
Hint: If you set a value of NULL, the key's destructor will not be called at
thread termination.
9.3.8 Realtime scheduling
Realtime scheduling provides a predictable response time to important events
within the process. Note that "predictable" does not always mean "fast," and in
many cases realtime scheduling may impose overhead that results in slower exe-
cution. Realtime scheduling is also subject to synchronization problems such as
priority inversion (Sections 5.5.4 and 8.1.4}, although Pthreads provides optional
facilities to address some of these problems. pthread_attr_getinheritsched .......................................
[ _POS IX_THREAD_PRI ORI TY_SCHEDUL ING ]
int pthread_attr_getinheritsched (
const pthread_attr_t *attr,
int *inheritsched);
Determine whether threads created with attr will run using the scheduling policy
and parameters of the creator or those specified in the attributes object. The default
inheritsched is implementation-defined.
PTHREAD
References:
Headers:
Errors:
5.2.3, 5.5
<pthread. h>
� �[ .l OSYS ] priority scheduling is not supported.
[ EINVAL ] attr invalid.
pthread_attr_getschedparam ...................................... [_POS I
X_THREAD_PRIORITY_SCHEDULING ]
int pthread_attr_getschedparam (
const pthread_attr_t *attr,
struct sched_param *param);
Determine the scheduling parameters used by threads created with attr. The default
param is implementation defined.
References: 5.2.3, 5.5
Headers: <pthread. h>
�Errors: [ .N0SYS 1 priority scheduling is not supported.
[ EINVAL ] attr invalid.
pthread_attr_getschedpolicy ....................................... [_POS I X_THREAD_PRI ORI
TY_SCHEDUL ING ]
int pthread_attr_getschedpolicy (
const pthread_attr_t *attr,
int *policy);
Determine the scheduling policy used by threads created with attr. The default
policy is implementation defined. References:
Headers:
Errors:
5.2.3, 5.5
<pthread. h>
[ENOS 維] priority scheduling is not supported.
[ EINVAL ] attr invalid.
�pth[e(ld_(il [_get$ 給 pe ................................................. [_POS i X_THREAD_PRIORI
TY_SCHEDUL ING ]
int pthread_attr_getscope (
const pthread_attr_t *attr,
int *contentionscope );
Determine the contention scope used by threads created with attr. The default is
implementation defined.
�con ent ionscope
� �PTHREAD SCOPE PROCESS ead conten with other
� reads in the process for pro-
cessor resoumes,
PTHREAD SCOPE SYSTEM Thread contends with threads
in all processes for processor
�resotlrc s.
References:
Headers:
Errors:
Hint:
5.2.3, 5.5
<pthread. h>
[ENOS 維 ] priority scheduling is not supported.
[ EINVAL ] attr invalid.
Implementation must support one or both of these, but need not
support both. pthread attr setinheritsched .......................................
[_POSIX_THREAD_PRIORITY_SCHEDULING]
int pthread_attr_setinheritsched (
pthread_attr_t * attr,
int inheritsched );
Specify whether threads created with attr will run using the scheduling policy and
parameters of the creator or those specified in the attributes object. When you
change the scheduling policy or parameters in a thread attributes object, you must
change the inheritsched attribute from PTHREAD_INHERIT_SCHED to PTHREAD_
�EXPLICIT SCHED. q le default is imp]ementation-defmed.
�PTffiREAD_INHER T_SCHED
References:
Headers:
Errors:
5.2.3, 5.5
<pthread. h>
[ENOS 維 l priority scheduling is not supported.
[ EINVAL ] attr or inheritsched invalid.
pthread_attr_setschedparam ...................................... [ _POS I
X_THREAD_PRIORITY_SCHEDULING ]
int pthread_attr_setschedparam (
pthread_attr_t *attr,
const struct sched_param *param);
Specify the scheduling parameters used by threads created with attr. The default
param is implementation defined.
References: 5.2.3, 5.5
Headers: <pthread. h>
Errors: [ ENOSlrS 1 priority scheduling is not supported.
[ EINVAL] attr or param invalid.
[ ENOTSUP ] param set to supported value.
pthread_attr_setschedpolicy ....................................... [ _POS I X_THREAD_PRI ORI
TY_SCHEDULING ]
int pthread_attr_setschedpolicy (
pthread_attr_t *attr,
int policy );
Specify the scheduling policy used by threads created with attr. The default
policy is implementation defined. References:
Headers:
Errors:
5.2.3, 5.5
<pthread. h>
�[ .NOS 維] priority scheduling is not supported.
[ EINVAL ] attr or policy invalid.
[ ENOTSUP] param set to supported value.
�Dthre( d_(litr_sets 給 De .................................................. [ _POS I
X_THREAD_PRIORITY_SCHEDUL ING ]
int pthread_attr_setscope (
pthread_attr_t *attr,
int contentionscope);
�Speci the contention scope used by threads created with attr. The default is
implementation defined.
References:
Headers:
Errors:
Hint:
5.2.3, 5.5
<pthread. h>
�[ .NOS 維 1 priority scheduling is not supported.
[ EINVAL ] attr or contentionscope invalid.
[ ENOTSUP] contentionscope set to supported value.
Implementation must support one or both of these, but need not
support both. pthread_getschedparam ............................................. [ _POS IX_THREAD_PRI
ORI TY_SCHEDUL ING ]
int pthread_getschedparam (
pthread_t thread,
int *policy
struct sched_param *param);
Determine the scheduling policy and parameters (param) currently used by thread.
SCHED_FIFO
SCHED RR
SCHED OTHER
References:
Headers:
Errors:
Hint:
5.2.3, 5.5
<pthread. h>
[ENOS 維] priority scheduling is not supported.
[ ESRCH ] thread does not refer to an existing thread.
Try to avoid dynamically modifying thread scheduling policy and
parameters, if possible.
pthread_mutex_9etprioceilin9 ................................................... [ _POS I X_THREAD_PRI
O_PROTECT ]
int pthread_mutex_getprioceiling (
const pthread_mutex_t *mutex,
int *prioceiling );
Determine the priority ceiling at which threads will run while owningmutex.
References: 3.2, 5.2.1, 5.5.5
Headers: <pthread. h>
Errors: [ w. NOS 維] priority scheduling is not supported.
[ EINVAL ] mutex invalid.
Hint: Protect protocol is inappropriate unless the creator of the mutex
also creates and controls all threads that might lock the mutex.
pthread_mutex_setprioceiling .................................................... [_POS I
X_THREAD_PRIO_PROTECT ]
int pthread_mutex_getprioceiling (
pthread_mutex_t *mutex,
int prioceiling,
int *old_ceiling );
Specify the priority ceiling at which threads will run while owning mutex. Returns
previous priority ceiling for mutex.
References: 3.2, 5.2.1, 5.5.5
Headers: <pthread. h>
�Errors: [ .lqos 緎 ] priority scheduling is not supported.
[EINVAL] mutex invalid, or prioceiling out of range.
[EPERM] no privilege to set prioceiling.
Hint: Protect protocol is inappropriate unless the creator of the mutex
also creates and controls all threads that might lock the mutex.
pthread_mutexattr_getprioceiling ............................................. [_POSIX_THREAD PRIO
PROTECT]
int pthread_mutexattr_getprioceiling (
const pthread_mutexattr_t *attr,
int *prioceiling);
Determine the priority ceiling at which threads will run while owning a mutex cre-
ated with attr.
References: 3.2, 5.2.1, 5.5.5
Headers: <pthread. h>
Errors: [w. NOS 維 ] priority scheduling is not supported.
[ EINVAL ] attr invalid.
Hint: Protect protocol is inappropriate unless the creator of the mutex
also creates and controls all threads that might lock the mutex.
pthread_mutexattr_getprotoco]... [_POSIX_THREAD_PRIO INHERIT POSIX THREAD PRIO
PROTECT]
int pthread_mutexattr_getprotocol (
const pthread_mutexattr_t *attr,
int *protocol);
� �Determine whether mutexes created with attr have p ority ceiling protocol ro-
�tecO, priority inheritance protocol ( heNO, or no priority protocol (nond.
PTHREAD._pR'rO_NOHE
PTHREAD
References:
Headers:
Errors:
Hint:
3.2, 5.2.1, 5.5.5
<pthread. h>
[ENOS 維 ] priority scheduling is not supported.
[ EINVAL ] attr invadid.
Inherit protocol is expensive, and protect protocol is inappropriate
unless the creator of the mutex also creates and controls all threads
that might lock the mutex.
pthread_mutexattr_setprioceiling ...............................................
[_POSIX_THREAD_PRIO_PROTECT ]
int pthread_mutexattr_setprioceiling (
pt hr e ad_mutexat t r_t * attr,
int prioceiling );
Specify the priority ceiling at which threads will run while owning a mutex created
with attr. The value of prioceiling must be a valid priority parameter for the
SCHED FIFO policy.
References: 3.2, 5.2.1, 5.5.5
Headers: <pthread. h>
�Errors: [EI OS 維 l priority scheduling is not supported.
[ EINVAL ] attr or prioceiling invalid.
[EPERM] no permission to set prioceiling.
Hint: Protect protocol is inappropriate unless the creator of the mutex
also creates and controls all threads that might lock the mutex.
pthread_mutexattr_setprotocol .... (_POS I X_THREAD_PR I O_I NHERI T_POS I
X_THREAD_PR I O_PROTECT ]
int pthread__mutexattr_setprotocol (
pt hr e ad_mut exatt r_t * attr,
int protocol );
Specify whether mutexes created with attr have priority ceiling protocol (protect),
priority inheritance protocol (inheriO, or no priority protocol (none). References:
Headers:
Errors:
Hint:
<pthread. h>
[ENOS 維] priority scheduling is not supported.
[ EINVAL ] attr or protocol invalid.
[w. NOTSUP] protocol value is not supported.
Inherit protocol is expensive, and protect protocol is inappropriate
unless the creator of the mutex also creates and controls all threads
that might lock the mutex.
� � �pthre( d_setschedpc rc m .............................................. [_POS IX_THREAD_PRIORI
TY_SCHEDUL ING ]
int pthread_setschedparam (
pthread_t thread,
int policy
const struct sched_param *param);
�Speci the scheduling policyand parameters(param) to be used by thread.
References:
Headers:
Errors:
Hint:
5.5
<pthread. h>
�[ -NOS 維] priority scheduling is not supported.
[ESRCH] thread does not refer to an existing thread.
[EINVAL] policy or param is invalid.
[ ENOTSUP ] policy or param is unsupported value.
[EPERM] no permission to set policy or param.
Try to avoid dynamically modifying thread scheduling policy and
parameters, if possible. sched_9 ﹖ _priorit 綺 max ..............................................................
[_V0$IX_PRZORZT 綺 �SCHEVUL a]
int sched_get_priority_max (
int policy);
Return the maximum integer priority allowed for the specified scheduling policy.
SCHED FIFO
policy
Run thre ad
SCHED RR
SCHED OTHER
References: 5.5.2
Headers:
Errors:
Hint:
Implementation defined (may
�be SCHE FIFO SCHED_RR,
or something else).
<sched.h>
�[ .NOS 維] priority scheduling is not supported.
[EINVAL] policy is invalid.
Priority min and max are integer values--you can compute relative
values, for example, half and quarter points in range.
�sched_Qetpriori _min ................................................................
[_POSIX_PRIORITY_SCHEDULING]
int sched_get_priority_min (
int policy);
�Returnthe minimum integer priorityallowed rthe specified schedulingpolicy.
References;
Headers:
Errors:
Hint:
5.5.2
<sched. h>
[ ENOSYS ] priority scheduling is not supported.
�[ .INVAL] policy is invalid.
Priority min and max are integer values--you can compute relative
values, for example, half and quarter points in range. 9.3.9 Fork handlers
Pthreads provides some new functions to help the new threaded environment
to coexist with the traditional process-based UNIX environment. Creation of a
child process by copying the full address space, for example, causes problems for
threaded applications because the fork call is asynchronous with respect to
other threads in the process.
pthread_atfork
int pthread_atfork (
void (*prepare) (void) ,
void (*parent) (void) ,
void (*child) (void));
Define "fork handlers" that are run when the process creates a child process. Allows
protection of synchronization objects and shared data in the child process (which
is otherwise difficult to control).
References: 6.1.1
Headers: <unistd. h>*
Errors: [ENOMEM] insufficient space to record the handlers.
Hint: All resources needed by child must be protected.
9.3.10 Stdio
Pthreads provides some new functions, and new versions of old functions, to
access ANSI C stdio features safely from a threaded process. For safety reasons,
the old forms of single-character access to stdio buffers have been altered to lock
the file stream, which can decrease performance. You can change old code to
instead lock the file stream manually and, within that locked region, use new
character access operations that do not lock the file stream.
flockfile
void flockfile (
FILE *file);
Increase the lock count for a stdio file stream to gain exclusive access to the file
stream. If the file stream is currently locked by another thread, the calling thread
is blocked until the lock count for the file stream becomes zero. If the calling thread
already owns the file stream lock, the lock count is incremented--an identical num-
ber of calls to funlockfile is required to release the file stream lock.
* Digital UNIX and Solaris both (incorrectly) place the definition in <pthread. h>. The UNIX 98
brand will require that they be fixed. Although most stdio functions, such as printf and fgets, are
thread-safe, you
may sometimes find that it is important that a sequence of printf calls, for exam-
ple, from one thread cannot be separated by calls made from another thread. Also,
a few stdio functions are not thread-safe and can only be used while the file stream
is locked by the caller.
References: 6.4.1
Headers: <stdio. h>
Hint: Use to protect a sequence of stdio operations.
ffrylockfile
int ftrylockfile (
FILE *file);
If the file stream is currently locked by another thread, return a nonzero value. Oth-
erwise, increase the lock count for the file stream, and return the value zero.
References: 6.4.1
Headers: <stdio. h>
Hint: Use to protect a sequence of stdio operations.
funlockfile
void funlockfile (
FILE *file);
Decrease the lock count for a stdio file stream that was previously locked by a cor-
responding call to funlockfile. If the lock count becomes 0, release the lock so
that another thread can lock it.
References: 6.4.1
Headers: <stdio. h>
Hint: Use to protect a sequence of stdio operations.
getc_unlocked
int getc_unlocked (
FILE *file);
Return a single character from the stdio stream file, without locking the file stream.
This operation must only be used while the file stream has been locked by calling
flockfile, or when you know that no other thread may access the file stream con-
currently. Returns E0F for read errors or end-of-file condition.
References: 6.4.2
Headers: <stdio. h>
Hint: Replace old calls to getc to retain fastest access. getchar_unlocked
int getc_unlocked (void);
Return a single character from the stdio stream stdin without locking the file
stream. This operation must only be used while the file stream has been locked by
calling flockfile, or when you know that no other thread may access the file
stream concurrently. Returns E0F for read errors or end-of-file condition.
References: 6.4.2
Headers: <stdio. h>
Hint: Replace old calls to getchar to retain fastest access.
putc_unlocked
int putc_unlocked (
int c,
FILE *file);
Write a single character c (interpreted as an unsigned char) to the stdio stream file
without locking the file stream. This operation must only be used while the file
stream has been locked by calling flockfile, or when you know that no other
thread may access the file stream concurrently. Returns the character or the value
EOF if an error occurred.
References: 6.4.2
Headers: <stdio. h>
Hint: Replace old calls to putc to retain fastest access.
putchar_unlocked
int putchar_unlocked (
int c );
Write a single character c (interpreted as an unsigned char) to the stdio stream
stdout without locking the file stream. This operation must only be used while the
file stream has been locked by calling flockfile, or when you know that no other
thread may access the file stream concurrently. Returns the character or the value
EOF if an error occurred.
References: 6.4.2
Headers: <stdio. h>
Hint: Replace old calls to putchar to retain fastest access.
9.3.11 Thread-safe functions
Thread-safe functions provide improved access to traditional features of
ANSI C and POSIX that cannot be effectively made thread-safe without interface
changes. These routines are designated by the" r" suffix added to the traditional
function name they replace, for example, getlogin_r for getlogin. getlogin_r
int getlogin_r (
char * name,
size t namesize);
Write the user name associated with the current process into the buffer pointed to
by name. The buffer is namesize bytes long, and should have space for the name
and a terminating null character. The maximum size of the login name is r,0GIN_
NAME MAX.
References: 6.5.1
Headers: <unistd. h>
readdir_r
int readdir r (
DIR *dirp,
struct dirent *entry,
struct dirent **result);
Return a pointer (result) to the directory entry at the current position in the direc-
tory stream to which dirp refers. Whereas readdir retains the current position us-
ing a static variable, readdir_r uses the entry parameter, supplied by the caller.
References: 6.5.2
Headers: <sys/types. h>, <dirent. h>
Errors: [ EBADF] dirp is not an open directory stream.
strtok_r
char
*strtok r (
char * s,
const char *sep,
char **lasts);
Return a pointer to the next token in the string s. Whereas strtok retains the current
position within a string using a static variable, strtok_r uses the lasts parameter,
supplied by the caller.
References: 6.5.3
Headers: <string. h>
asctime_r
char *asctime r (
const struct tm*tm,
char *buf);
�Conve the "broken-down"time in the structure pointed to bytm into a string,
which is stored in the buffer pointed to bybuf. The buffer pointed to bybufmustcontain at least 26
bytes. The function returns a pointer to the buffer on success,
or NULL on failure.
References: 6.5.4
Headers: <time. h>
ctime_r
char *ctime r (
const time t *clock,
char *buf );
Convert the calendar time pointed to by clock into a string representing the local
time, which is stored in the buffer pointed to by bur. The buffer pointed to by bur
must contain at least 26 bytes. The function returns a pointer to the buffer on suc-
cess, or NULL on failure.
References: 6.5.4
Headers: <time. h>
gmtime_r
struct tm *gmtime_r (
const time t *clock,
struct tm *result);
Convert the calendar time pointed to by clock into a "broken-down time" expressed
as Coordinated Universal Time (UTC), which is stored in the structure pointed to
by result. The function returns a pointer to the structure on success, 0rNULL on
failure.
References: 6.5.4
Headers: <time. h>
Iocaltime_r
struct tm *localtime r (
const time t *clock,
struct tm *result);
Convert the calendar time pointed to by clock into a "broken-down time" expressed
as local time, which is stored in the structure pointed to by result. The function
returns a pointer to the structure on success, or NULL on failure.
References: 6.5.4
Headers: <time. h>rand_r
int rand r (
unsigned int *seed);
Return the next value in a sequence of pseudorandom integers in the range of 0 to
RAND MAX. Whereas rand uses a static variable to maintain the context between a
series of calls, rand r uses the value pointed to by seed, which is supplied by the
caller.
References: 6.5.5
Headers: <stdlib. h>
I getgrgid_r
getgrgid_r (
gid_t gid,
struct group *group,
char *buffer,
size t bufsize,
struct group **result);
Locate an entry from the group database with a group id matching the gid argu-
ment. The group entry is stored in the memory pointed to by buffer, which con-
tains bufsize bytes, and a pointer to the entry is stored at the address pointed to
by result. The maximum buffer size required can be determined by calling
sysconf with the Sc GETGR R SIZE_MAX parameter.
References: 6.5.6
Headers: <sys/types. h>, <grp. h>
Errors: [ ERANGE ] the specified buffer is too small.
getgrnam_r
int
getgrnam_r (
const char *name,
struct group *group,
char *buffer,
size t bufsize,
struct group **result);
Locate an entry from the group database with a group name matching the name
argument. The group entry is stored in the memory pointed to by buffer, which
contains bufsize bytes, and a pointer to the entry is stored at the address pointed
to by result. The maximum buffer size required can be determined by calling
sysconf with the SC GETGR R SIZE_MAX parameter.
References: 6.5.6
Headers: <sys/types. h>, <grp. h>
Errors: [ ERANGE ] the specified buffer is too small. getpwuid_r
int getpwuid_r (
uid t uid,
struct passwd *pwd,
char *buffer,
size t bufsize,
struct passwd **result);
Locate an entry from the user database with a user id matching the uid argument.
The user entry is stored in the memory pointed to by buffer, which contains
bufsize bytes, and a pointer to the entry is stored at the address pointed to by
result. The maximum buffer size required can be determined by calling sysconf
with the SC GETPW R SIZE_MAX parameter.
References: 6.5.6
Headers: <sys/types. h>, <pwd. h>
Errors: [ ERANGE ] the specified buffer is too small.
getpwnam_r
int getpwnam_r (
const char
struct passwd
char
size t
struct passwd
* name,
* pwd,
*buffer,
bufsize,
**result);
Locate an entry from the user database with a user name matching the name argu-
ment. The user entry is stored in the memory pointed to by buffer, which contains
bufsize bytes, and a pointer to the entry is stored at the address pointed to by
result. The maximum buffer size required can be determined by calling sysconf
with the sc GETPW R SIZE_MAX parameter.
References: 6.5.6
Headers: <sys/types. h>, <pwd. h>
Errors: [ ERANGE ] the specified buffer is too small.
9.3.12 Signals
Pthreads provides functions that extend the POSIX signal model to support
multithreaded processes. A]I threads in a process share the same signal actions.
Each thread has its own pending and blocked signal masks. The process also
has a pending signal mask so that asynchronous signals can pend against the
process when all threads have the signal blocked. In a multithreaded process,
the behavior of sigprocmask is undefined. pthread_kill
int pthread_kill (
pthread_t thread,
int sig);
Request that the signal sig be delivered to thread. If sig is 0, no signal is sent, but
error checking is performed. If the action of the signal is to terminate, stop, or con-
tinue, then the entire process is affected.
References: 6.6.3
Headers: <signal. h>
Errors: [ =.SRCa] no thread corresponding to thread.
[w. INVAL] sig is an invalid signal number.
Hint: To terminate a thread, use cancellation.
pthread_sigmask
int pthread_sigmask (
int how,
const sigset_t *set,
sigset_t *oset);
Control the masking of signals within the calling thread.
how
S lG_UNBLOCK ResUlthag set iS the intersection of
� �the curren set d the ment
SIG SETMASK
References:
Headers:
Errors:
Hint:
6.6.2
<signal.h>
[EINVAL] how is not one of the defined values.
You cannot prevent delivery of asynchronous signals to the process
unless the signal is blocked in all threads.
sigtimedwait
int sigtimedwait (
const sigset_t *set,
siginfo_t * info,
const struct timespec *timeout);
If a signal in set is pending, atomically clear it from the set of pending signals and
return the signal number in the si_signo member of info. The cause of the signalshall be stored in
the Hi_code member. If any value is queued to the selected signal,
return the first queued value in the Hi_value member. If no signal in set is pend-
ing, suspend the calling thread until one or more become pending. If the time in-
terval specified by timeout passes, s igtimedwait will return with the error EAGAIN.
This function returns the signal number--on error, it returns-1 and sets errno to
the appropriate error code.
References: 6.6.4
Headers: <signal. h>
Errors: [EINVAL] set contains an invalid signal number.
lEAGAIN] the timeout interval passed.
[ ENOS 維 ] realtime signals are not supported.
Hint: Use only for asynchronous signal delivery. All signals in set must
be masked in the calling thread, and should usually be masked in
all threads.
sigwait
int sigwait (
const sigset_t *set,
int *sig);
If a signal in set is pending, atomically clear it from the set of pending signals and
return the signal number in the location referenced by sig. If no signal in set is
pending, suspend the calling thread until one or more become pending.
References: 6.6.4
Headers: <signal. h>
Errors: [ EINVAL ] set contains an invalid signal number.
Hint: Use only for asynchronous signal delivery. All signals in set must
be masked in the calling thread, and should usually be masked in
all threads.
sigwaitinfo
int sigwaitinfo (
const sigset_t *set,
siginfo_t *info );
If a signal in set is pending, atomically clear it from the set of pending signals and
return the signal number in the si_signo member of info. The cause of the signal
shall be stored in the si code member. If any value is queued to the selected signal,
�return the first queued alue in the Hi_value member. If no signal in set is pend-
ing, suspend the calling thread until one or more become pending. This function re-
turns the signal number--on error, it returns -1 and sets errno to the appropriate
error code. References:
Headers:
Errors:
Hint:
6.6.4
<signal.h>
[EINVAL] set contains an invalid signal number.
[ ENOS 維 ] realtime signals are not supported.
Use only for asynchronous signal delivery. All signals in set must
be masked in the calling thread, and should usually be masked in
all threads.
9.3.13 Semaphores
Semaphores come from POSIX. lb (POSIX 1003.1b-1993) rather than from
Pthreads. They follow the older UNIX convention for reporting errors. That is, on
failure they return a value of -1 and store the appropriate error number into
errno. All of the semaphore functions require the header file <semaphore. h>.
sem_destroy .................................................................................................... [_POS
IX_SEMAPHORES ]
int sem_destroy (
sem t *sem);
Destroy an unnamed semaphore.
References: 6.6.6
Headers: <semaphore. h>
�Errors: [ .INVAL] value exceeds SEM VALUE MAX.
�[ .NOS 維 ] semaphores are not supported.
[ EBUSY ] threads (or processes) are currently blocked on sem.
sem_init ............................................................................................................ [_POS
IX_SEMAPHORES ]
int sem init (
sem t *sem,
int pshared,
unsigned int value);
Initialize an unnamed semaphore. The initial value of the semaphore counter is
value. If the pshared argument has a nonzero value, the semaphore can be shared
between processes. With a zero value, it can be shared only between threads in the
same process.
References:
Headers:
Errors:
Hint:
6.6.6
<semaphore. h>
[EINVAI,] sero is not a valid semaphore.
[ ENOSPC] a required resource has been exhausted.
[ ENOS 維 ] semaphores are not supported.
�[ EI'IEI I the process lacks appropriate privilege.
Use a value of 1 for a lock, a value of 0 for waiting. s ﹎
_trywai! ..................................................................................................... 淿
�POSIX_SEMAPHOR $ I
int sem__trywait (
sem_t *sem);
Try to wait on a semaphore (or "try to lock" the semaphore). If the semaphore value
is greater than zero, decrease the value by one. If the semaphore value is 0, then
return immediately with the error EAGAIN.
References: 6.6.6
Headers: <semaphore. h>
Errors: [ EAGAIN] the semaphore was already locked.
�[ .INVAL] sem is not a valid semaphore.
[ EINTR ] the function was interrupted by a signal.
�[ -NOS 維 ] semaphores are not supported.
�[ ED .ADLK ] a deadlock condition was detected.
Hint: When the semaphore's initial value was 1, this is a lock operation;
when the initial value was 0, this is a wait operation.
sem_post ...................................................................................................
[_POSIX_SEMAPHORES ]
int sero_post (
sem_t *sem);
Post a wakeup to a semaphore. If there are waiting threads {or processes), one is
awakened. Otherwise the semaphore value is incremented by one.
References: 6.6.6
Headers: <semaphore. h>
�Errors: [ .TNVAL] sem is not a valid semaphore.
�[ -NOSYS ] semaphores are not supported.
Hint: May be used from within a signal-handling function.
sem_wait ............................................................................................. [_POSIX_SEMAPHORES ]
int sem wait (
sem_t *sem);
Wait on a semaphore (or lock the semaphore). If the semaphore value is greater
than zero, decrease the value by one. If the semaphore value is 0, then the calling
thread (or process) is blocked until it can successfully decrease the value or until
interrupted by a signal.
References: 6.6.6
Headers: <semaphore. h>
�Errors: [ -INVAL] sem is not a valid semaphore.
[EINTR] the function was interrupted by a signal.
�[ -N0S? ] semaphores are not supported.
[EDEADLK ] a deadlock condition was detected.
Hint: When the semaphore's initial value was 1, this is a lock operation;
when the initial value was 0, this is a wait operation. 10 Future standardization
Three primary standardization efforts affect Pthreads programmers. X/Open's
XSH5 is a new interface specification that includes POSIX. lb, Pthreads, and a set
of additional thread functions (part of the Aspen fast-track submission). The
POSIX. lj draft standard proposes to add barriers, read/write locks, spinlocks,
and improved support for "relative time" waits on condition variables. The
POSIX. 14 draft standard (a "POSIX Standard Profile") gives direction for manag-
ing the various options of Pthreads in a multiprocessor environment.
10,1 X/Open XSH5 [UNIX98]
Mutex type attribute:
Read/write locks:
� �int p hread_rwlock_init (pthread_rwlock_t * 1ock,
const pthread_rwlockattr_t *attr);
�int pthread_rwlock_destroy (pth ead_rwlock_t *rwlock);
�pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITI IZER;
� �int p hread_rwlock_rdlock (pthread_rwloc t *rWlOck);
int pthread_rwlock_tryrdlock (
�pthread_rwlock_t *rw ock);
�int pthread_rwlock_unlock (pthread_ lock_t *rwlock);
�int pthread_rwlock_wrlock (pth ead_rw!ock_t *rwlock);
int pthread_rwlock_trywrlock (
pthread_rwlock_t *rwlock);
int pthread_rwlockattr_init (
pthread_rwlockattr_t *attr);
int pthread rwlockattr destroy (
int pthread_rwlockattr_getpshared (
const pthread_rwlockattr_t *attr, int *pshared);
int pthread_rwlockattr_setpShared (
pthread_rwlockattr_t *attr, int pshared);
347Parallel I/O:
Miscellaneous:
X/Open, which is part of The Open Group, owns the UNIX trademark and
develops UNIX industry portability specifications and brands. The X/Open brands
include XPG3, XPG4, UNIX93, and UNIX95. UNIX95 is also known as "SPEC 1170"
or the "Single UNIX Specification."
X/Open recently published the X/Open CAE Specification, System Interfaces
and Headers, Issue 5 (also known as XSH5), which is part of the new UNIX98
brand. XSH5 requires conformance to the POSIX. l-1996 standard, which in-
cludes the POSIX. lb and POSIX. lc amendments. The XSH5 specification also
adds a set of extensions to POSIX. This section discusses the XSH5 extensions
that specifically affect threaded programs. You can recognize a system conform-
� �ing to XSH5 by a definition for the _XOPEI _VERSIOI symbol, in <un 駍 �t d. h>, to the
value 500 or higher.
The most valuable contribution of UNIX98 to the threaded programming
industry, however, is possibly the development of a standardized, portable testing
system. A number of complicated issues arise when developing an implementa-
tion of Pthreads, and some subtle aspects of the standard are ambiguous. Such
an industry-wide testing system will require all vendors implementing UNIX98
branded systems to agree on interpretations of Pthreads.
10.1.1 POSIX options for XSH5
Some of the features that are options in the Pthreads standard are required
by XSH5. If your code relies on these Pthreads options, it will work on any sys-
tem conforming to XSH5:
?POSIX THREADS: Threads are supported.
?POSIX THREAD ATTR STACKADDR: The stackaddrattribute is supported.
?POSIX THREAD ATTR STACKSIZE: The stacksize attribute is supported.
?POSIX THREAD PROCESS SHARED: Mutexes, condition variables, and XSH5
read/write locks can be shared between processes. ?_POSIX_THREAD_SAFE_FUNCTIONS:
The Pthreads thread-safe functions are
supported.
Several additional Pthreads options are "bundled" into the XSH5 realtime
threads option group. If your system conforms to XSH5 and supports the XOPEN
REALTIME_THREADS option, then these Pthreads options are also supported:
?_POSIX_THREAD_PRIORITY_SCHEDULING: Realtime priority scheduling is
supported.
?_POSIX_THREAD_PRIO_PROTECT: Priority ceiling mutexes are supported.
?_POSIX_THREAD_PRIO_INHERIT: Priority inheritance mutexes are supported.
10.1.2 Mutex type
The DCE threads package provided an extension that allowed the program-
mer to specify the "kind" of mutex to be created. DCE threads supplied fast,
recursive, and nonrecursive mutex kinds. The XSH5 specification changes the
attribute name from "kind" to "type," renames fast to default, renames nonrecur-
sive to errorcheck, and adds a new type, normal (Table 10.1).
A normal mutex is not allowed to detect deadlock errors--that is, a thread will
hang if it tries to lock a normal mutex that it already owns. The default mutex
type, like the DCEfast mutex,* provides implementation-defined error checking.
That is, default may be mapped to one of the other standard types or may be
something entirely different.
Mutex type Definition
PTHREAD_MUTEX_NORMAL Basic mutex with no specific error checking built
in. Does not report a deadlock error.
PTHREAD_MUTEX_RECURSIVE Allows any thread to lock the mutex "recursively"
--it must unlock an equal number of times to
release the mutex.
PTHREAD MUTEX ERRORCHECK
PTHREAD MUTEX DEFAULT
Detects and reports simple usage errors--an
attempt to unlock a mutex that's not locked by
the calling thread (or that isn't locked at all), or an
attempt to relock a mutex the thread already
owns.
The default mutex type, with very loose semantics
to allow unfettered innovation and experimenta-
tion. May be mapped to any of the other three de-
fined types, or may be something else entirely.
TABLE 10.1 XSH5 mutex types
* DCE threads implemented fast mutexes much like the definition of XSH5 normal mutexes,
with no error checking. This was not, however, specification of intent. As an application developer,
you can use any of the mutex types almost inter-
changeably as long as your code does not depend on the implementation to detect
(or fail to detect) any particular errors. Never write code that counts on an imple-
mentation failing to detect any error. Do not lock a mutex in one thread and
unlock it in another thread, for example, even if you are sure that the error won't
be reported--use a semaphore instead, which has no "ownership" semantics.
All mutexes, regardless of type, are created using pthread_rautex_in 駎, de-
stroyed using pthread_mutex_destroy, and manipulated using pthread_mutex_
lock, pthread_mutex_unlock, and pthread_mutex_trylock.
Normal mutexes will usually be the fastest implementation possible for the
�machine, but vill provide the least error checking.
Recursive mutexes are primarily useful for converting old code where it is dif-
ficult to establish clear boundaries of synchronization, for example, when you
must call a function with a mutex locked and the function you call--or some
function it calls--may need to lock the same mutex. I have never seen a situation
where recursive mutexes were required to solve a problem, but I have seen many
cases where the alternate (and usually "better") solutions were impractical. Such
situations frequently lead developers to create recursive mutexes, and it makes
more sense to have a single implementation available to everyone. (But your code
will usually be easier to follow, and perform better, if you avoid recursire
mutexes.)
Errorcheck mutexes were devised as a debugging tool, although less intrusive
debugging tools (where available) can be more powerful. To use errorcheck
mutexes you must recompile code to turn the debugging feature on and off. It is far
more useful to have an external option to force all mutexes to record debugging
data. You may want to use errorcheck mutexes in final "production" code, of
course, to detect serious problems early, but be aware that errorcheck mutexes
will almost always be much slower than normal mutexes due to the extra state and
checking.
Default mutexes allow each implementation to provide the mutex semantics
the vendor feels will be most useful to the target audience. It may be useful to
make errorcheck mutexes the default, for example, to improve the threaded
debugging environment of a system. Or the vendor may choose to make normal
mutexes the default to give most programs the benefit of any extra speed.
pthread_mutexaflr_geflype
int pthread_mutexattr_gettype (
const pthread_mutexattr_t *attr,
int *type);
�Speci thetype of mutexes createdwith attr. References:
Errors:
Hint:
3.2, 5.2.1, 10.1.2
[EINVAL] type invalid.
[ EINVAL ] attr invalid.
Normal mutexes will usually be fastest; errorcheck mutexes are use-
ful for debugging; recursive mutexes can be useful for making old
interfaces thread-safe.
pthread_mutexattr_settype
int pthread_mutexattr_settype (
pthread_mutexattr_t * attr,
int type );
Determine the type of mutexes created with attr.
type
�PTHREAD_MUTEX_DEFAULT Unspecified e.
�PTHREAD_MUTEX_NORMAL Basic mutex, th no
�checl g.
PTHREAD MUTEX RECURSIVE Thread can relock a mutex it
owns.
�PTHREAD MUTEX ERRORCHECK Checks for us errors.
References: 3.2, 5.2.1, 10.1.2
Errors: [EINVAL] type invalid.
[ EINVAL ] attr invalid.
Hint: Normal mutexes will usually be fastest; errorcheck mutexes are use-
ful for debugging; recursive mutexes can be useful for making old
interfaces thread-safe.
10.1.3 Set concurrency level
When you use Pthreads implementations that schedule user threads onto
some smaller set of kernel entities (see Section 5.6.3), it may be possible to have
ready user threads while all kernel entities allocated to the process are busy. Some
implementations, for example, "lock" a kernel entity to a user thread that
blocks in the kernel, until the blocking condition, for example an I/O request, is
completed. The system will create some reasonable number of kernel execution
entities for the process, but eventually the pool of kernel entities may become
exhausted. The process may be left with threads capable of performing useful
work for the application, but no way to schedule them.
The pthread_setconcurrency function addresses this limitation by allowing
the application to ask for more kernel entities. If the application designer realizes
that 10 out of 15 threads may at any time become blocked in the kernel, and it is
important for those other 5 threads to be able to continue processing, then the
application may request that the kernel supply 15 kernel entities. If it is impor-
tant that at least 1 of those 5 continue, but not that all continue, then the
application could request the more conservative number of 11 kernel entities. Or
if it is OK for all threads to block once in a while, but not often, and you know
that only rarely will more than 6 threads block at any time, the application could
request 7 kernel entities.
The pthread_setconcurrency function is a hint, and implementations may
ignore it or modify the advice. You may use it freely on any system that conforms
to the UNIX98 brand, but many systems will do nothing more than set a value
that is returned by pthread_oetconcurrency. On Digital UNIX, for example,
there is no need to set a fixed concurrency level, because the kernel mode and
user mode schedulers cooperate to ensure that ready user threads cannot be pre-
vented from running by other threads blocked in the kernel.
pthread_getconcurrency
int pthread_getconcurrency ( );
Returns the value set by a previous pthread_setconcurrency call. If there have
been no previous calls to pthread_setconcurrency, returns 0 to indicate that the
implementation is maintaining the concurrency level automatically.
References: 5.6.3, 10.1.3
Errors: none.
Hint: Concurrency level is a hint. It may be ignored by any implementa-
tion, and will be ignored by an implementation that does not need
it to ensure concurrency.
pthread_setconcurrency
int pthread_getconcurrency (int new_level);
Allows the application to inform the threads implementation of its desired mini-
mum concurrency level. The actual level of concurrency resulting from this call is
unspecified. References:
Errors:
Hint:
5.6.3, 10.1.3
[w. INVAL] new_level is negative.
lEAGAIN] new level exceeds a system resource.
Concurrency level is a hint. It may be ignored by any implementa-
tion, and will be ignored by an implementation that does not need
it to ensure concurrency.
10.1.4 Stack guard size
Guard size comes from DCE threads. Most thread implementations add to the
thread's stack a "guard" region, a page or more of protected memory. This pro-
tected page is a safety zone, to prevent a stack overflow in one thread from
corrupting another thread's stack. There are two good reasons for wanting to
control a thread's guard size:
1. It allows an application or library that allocates large data arrays on the
stack to increase the default guard size. For example, if a thread allocates
two pages at once, a single guard page provides little protection against
stack overflows--the thread can corrupt adjoining memory without touch-
ing the protected page.
2. When creating a large number of threads, it may be that the extra page for
each stack can become a severe burden. In addition to the extra page, the
kernel's memory manager has to keep track of the differing protection on
adjoining pages, which may strain system resources. Therefore, you may
sometimes need to ask the system to "trust you" and avoid allocating any
guard pages at all for your threads. You can do this by requesting a guard
size of 0 bytes.
pthread_attr_getguardsize
int pthread_attr_getguardsize (
const pthread_attr_t *attr,
size t *guardsize);
Determine the size of the guard region for the stack on which threads created with
attr will run.
References: 2, 5.2.3
Errors: [ w. INVAL] attr invalid.
Hint: Specify 0 to fit lots of stacks in an address space, or increase default
guardsize for threads that allocate large buffers on the stack. pthread_attr_setguardsize
int pthread_attr_setguardsize (
pthread_attr_t * attr,
size_t guardsize );
Threads created with attr will run on a stack with guardsize bytes protected
against stack overflow. The implementation may round guardsize up to the next
multiple of PAGESIZE. Specifying a value of 0 for guardsize will cause threads
created using the attributes object to run without stack overflow protection.
References: 2, 5.2.3
�Errors: [ .INVAL] guardsize or attr invalid.
Hint: Specify 0 to fit lots of stacks in an address space, or increase default
guardsize for threads that allocate large buffers on the stack.
10.1.5 Parallel I/0
Many high-performance systems, such as database engines, use threads, at
least in part, to gain performance through parallel I/O. Unfortunately, Pthreads
doesn't directly support parallel I/O. That is, two threads can independently
issue I/O operations for files, or even for the same file, but the POSIX file I/0
model places some restrictions on the level of parallelism.
One bottleneck is that the current file position is an attribute of the file descrip-
tor. To read or write data from or to a specific position within a file, a thread must
call lseek to seek to the proper byte offset in the file, and then read or write. If
more than one thread does this at the same time, the first thread might seek, and
then the second thread seek to a different place before the first thread can issue
the read or write operation.
The X/Open pread and pwrite functions offer a solution, by making the seek
and read or write combination atomic. Threads can issue pread or pwrite opera-
tions in parallel, and, in principle, the system can process those I/0 requests
completely in parallel without locking the file descriptor.
pread
size t pread (
int fildes,
void *buf,
size_t nbyte,
off_t off set);
Read nbyte bytes from offset offset in the file opened on file descriptor fildes,
placing the result into buf. The file descriptor's current offset is not affected, allow-
ing multiple pread and/or pwrite operations to proceed in parallel. References:
Errors:
Hint:
none
[EINVAL] offset is negative.
[ EOVERFLOW] attempt to read beyond maximum.
[ENXIO] request outside capabilities of device.
[ESPXPEI file is pipe.
Allows high-performance parallel I/O.
pwrite
size t pwrite (
int fildes,
const void *buf,
size t nbyte,
off t offset);
� Vrite nbyte bytes to offset offset in the file opened on file descAptor fildes, from
buf. The file descAptor's current offset is not affected, allowing multiplepread and/
or pwrite operations to proceed in parallel.
References: none
�Errors: [ .INVAL] offset is negative.
[ESPlPE] file is pipe.
Hint: Allows high-performance parallel I/O.
10.1.6 Cancellation points
Most UNIX systems support a substantial number of interfaces that do not
come from POSIX. The select and poll interfaces, for example, should be
deferred cancellation points. Pthreads did not require these functions to be can-
cellation points, however, because they do not exist within POSIX. 1.
The select and poll functions, however, along with many others, exist in
X/Open. The XSH5 standard includes an expanded list of cancellation points
covering X/Open interfaces.
Additional functions that must be cancellation points in XSH5:
getmsg pread sigpause
getpmsg putmsg usleep
lockf putpmsg wait3
msgrcv pwrite waitid
msgsnd readv writev
poll selectAdditional functions that may be
catclose fsetpos
catgets ftello
catopen ftw
closelog fwprintf
dbm close fwscanf
dbm delete getgrent
dbm fetch getpwent
dbm_nextkey getutxent
dbm_open getutxid
dbm store getutxline
dlcTose getw
dlopen getwc
endgrent getwchar
endpwent iconv close
endutxent iconv_open
fgetwc ioctl
fgetws mkstemp
fputwc nftw
fputws openlog
fseeko pclose
cancellation points in XSH5:
popen
pututxline
putw
putwc
putwchar
readdir r
seekdir
semop
setgrent
setpwent
setutxent
syslog
ungetwc
vfprintf
vfwprintf
vprintf
vwprintf
wprintf
wscanf
10,2 POSIX 1003. lj
Condition variable wait clock:
Barriers:
int barrier_attr_init (barrier_attr_t *attr);
int barrier-attr.destroy (barrier_attr_t *attr);
int barrier_attr_getpshared (
const barrier attr t *attr, int *pShared);
� �int barrier_at r3etp ared (
barrier attr t attr, int pshared);
int barrier_init (barrier t *barrier,
�const barrier_attr_t attr, int count);
int barrier destroy (barrier t barrier);
�in barrier_wait (barrier_t *barrier); �Reader/ iterlocks:
� �int rwlock_attr_init (rw!ock tt t *attr);
int rwlock_attr_destroy (rwlock_attr_t *attr);
int rwlock_attr_getpshared (
const rwlock_attr_t *attr, int *pshared);
int rwlock_attr_setpshared (
rwlock_attr_t *attr, int pshared);
int rwlock init (
int rwlock rlock (rwlock_t *lock);
int rwlockZtimedrlock (rwlock_t *lock,
const struct timespec *timeout);
�int rwlock_tryrlock (rwloc t *lock);
�int rwlock_wlock ( 1ock_t *lock);
�int rwlock timedw ock (rwlock t *lock,
�const truct timespet *timeout);
int rwlock_trywlock (rwlock_t *lock);
�int rwlock_unlock (rw ock_t *lock);
Spinlocks:
Thread abort:
The same POSIX working group that developed POSIX. lb and Pthreads has
developed a new set of extensions for realtime and threaded programming. Most
of the extensions relevant to threads (and to this book) are the result of proposals
developed by the POSIX 1003.14 profile group, which specialized in "tuning" the
existing POSIX standards for multiprocessor systems.
POSIX. lj adds some thread synchronization mechanisms that have been com-
mon in a wide range of multiprocessor and thread programming, but that had been
omitted from the original Pthreads standard. Barriers and spinlocks are primarily
useful for fine-grained parallelism, for example, in systems that automaticallygenerate parallel
code from program loops. Read/write locks are useful in shared
data algorithms where many threads are allowed to read simultaneously, but only
one thread can be allowed to update data.
10.2.1 Barriers
"Barriers" are a form of synchronization most commonly used in parallel
decomposition of loops. They're almost never used except in code designed to run
only on multiprocessor systems. A barrier is a "meeting place" for a group of
associated threads, where each will wait until all have reached the barrier. When
the last one waits on the barrier, all the participating threads are released.
See Section 7.1.1 for details of barrier behavior and for an example showing
how to implement a barrier using standard Pthreads synchronization. (Note that
the behavior of this example is not precisely the same as that proposed by
POSIX. lj.)
10.2.2 Read/write locks
A read/write lock {also sometimes known as "reader/writer lock") allows one
thread to exclusively lock some shared data to write or modify that data, but also
allows multiple threads to simultaneously lock the data for read access. UNIX98
specifies "read/write locks" very similar to POSIX. lj reader/writer locks. Although
X/Open intends that the two specifications will be functionally identical, the
names are different to avoid conflict should the POSIX standard change before
approval.*
If your code relies on a data structure that is frequently referenced, but only
occasionally updated, you should consider using a read/write lock rather than a
mutex to access that data. Most threads will be able to read the data without
waiting; they'll need to block only when some thread is in the process of modify-
ing the data. (Similarly, a thread that desires to write the data will be blocked if
any threads are reading the data.)
See Section 7.1.2 for details of read/write lock behavior and for an example
showing how to implement a read/write lock using standard Pthreads synchroni-
zation. (Note that the behavior of this example is not precisely the same as that
proposed by POSIX. lj.)
*The POSIX working group is considering the possibility of adapting the XSH5 read/write
lock definition and abandoning the original POSIX. lj names, but the decision hasn't yet been
made. 10.2.3 Spinlocks
Spinlocks are much like mutexes. There's been a lot of discussion about
whether it even makes sense to standardize on a spinlock interface--since POSIX
specifies only a source level API, there's very little POSIX. lj says about them that
distinguishes them from mutexes. The essential idea is that a spinlock is the
most primitive and fastest synchronization mechanism available on a given hard-
ware architecture. On some systems, that may be a single "test and set"
instruction--on others, it may be a substantial sequence of "load locked, test,
store conditional, memory barrier" instructions.
The critical distinction is that a thread trying to lock a spinlock does not nec-
essarily block when the spinlock is already held by another thread. The intent is
that the thread will "spin," retrying the lock rapidly until it succeeds in locking
the spinlock. (This is one of the "iffy" spots--on a uniprocessor it had better
block, or it'll do nothing but spin out the rest of its timeslice... or spin to eter-
nity if it isn't timesliced.)
Spinlocks are great for fine-grained parallelism, when the code is intended to
run only on a multiprocessor, carefully tuned to hold the spinlock for only a few
instructions, and getting ultimate performance is more important than sharing
the system resources cordially with other processes. To be effective, a spinlock
must never be locked for as long as it takes to "context switch" from one thread to
another. If it does take as long or longer, you'll get better overall performance by
blocking and allowing some other thread to do useful work.
POSIX. lj contains two sets of spinlock functions: one set with a sp 駈_ prefix,
which allows spinlock synchronization between processes; and the other set with
�a l l;hread_ prefix, allowing spinlock synchronization between threads within a
process. This, you will notice, is very different from the model used for mutexes,
condition variables, and read/write locks, where the same functions were used
and the pshared attribute specifies whether the resulting synchronization object
can be shared between processes.
The rationale for this is that spinlocks are intended to be very fast, and should
not be subject to any possible overhead as a result of needing to decide, at run
time, how to behave. It is, in fact, unlikely that the implementation of spin_lock
and pl:hread_sp 駈_lock will differ on most systems, but the standard allows
them to be different.
10.2.4 Condition variable wait clock
Pthreads condition variables support only "absolute time" timeouts. That is,
the thread specifies that it is willing to wait until "Jan 1 00:00:00 GMT 2001,"
rather than being able to specify that it wants to wait for "1 hour, 10 minutes."
The reason for this is that a condition variable wait is subject to wakeups for var-
ious reasons that are beyond your control or not easy to control. When you wake
early from a "1 hour, 10 minute" wait it is difficult to determine how much of thattime is left. But
when you wake early from the absolute wait, your target time is
still "Jan 1 00:00:00 GMT 2001." (The reasons for early wakeup are discussed in
Section 3.3.2.)
Despite all this excellent reasoning, "relative time" waits are useful. One
important advantage is that absolute system time is subject to external changes.
It might be modified to correct for an inaccurate clock chip, or brought up-to-date
with a network time server, or adjusted for any number of other reasons. Both
relative time waits and absolute time waits remain correct across that adjust-
ment, but a relative time wait expressed as if it were an absolute time wait
cannot. That is, when you want to wait for "1 hour, 10 minutes," but the best you
can do is add that interval to the current clock and wait until that clock time, the
system can't adjust the absolute timeout for you when the system time is
changed.
POSIX. lj addresses this issue as part of a substantial and pervasive "cleanup"
of POSIX time services. The standard (building on top of POSIX. lb, which intro-
�duced the realtime clock ihnctions, and the CLOCK REALTII IE clock) introduces a
ne �w system clock called CLOCK MONOTO IC. This new clock isn't a "relative timer"
in the traditional sense, but it is never decreased, and it is never modified by date
or time changes on the system. It increases at a constant rate. A "relative time"
wait i �s nothing more than taking the current absolute value of the CLOCK MO IOTONIC
clock, adding some fixed offset (4200 seconds for a wait of 1 hour and 10 minutes),
and waiting until that value of the clock is reached.
This is accomplished by adding the condition variable attribute clock. You set
the clock attribute in a thread attributes object using pthread_condattr setclock
�and request the current value by calling pthread_condattr_getclock. he default
value is CLOCK_MONOTONIC, on the assumption that most condition waits are
intervals.
While this assumption may be incorrect, and it may seem to be an incompati-
ble change from Pthreads (and it is, in a way), this was swept under the rug due
to the fact that the timed condition wait function suffered from a problem that
POSIX. lj found to be extremely common through the existing body of POSIX
standards. "Time" in general was only very loosely defined. A timed condition
wait, for example, does not say precisely what the timeout argument means. Only
that "an error is returned if the absolute time specified by abstime passes (that is,
system time equals or exceeds abstime)." The intent is clear--but there are no spe-
cific implementation or usage directives. One might reasonably assume that one
should acquire the current time using clock_gettime (CLOCK_REALTIME, &now}, as
suggested in the associated rationale. However, POSIX "rationale" is little more
than historical commentary, and is not part of the formal standard. Furthermore,
clock_gettime is a part of the optional _P0SIX_TIMERS subset of POSIX. lb, and
therefore may not exist on many systems supporting threads.
POSIX. lj is attempting to "rationalize" all of these loose ends, at least for systems
that implement the eventual POSIX. lj standard. Of course, the CLOCK MONOTONZC
�feature is under an option of its own, and additionally relies on the _POSZX_TI ERSoption, so it
isn't a cure-all. In the absence of these options, there is no clock
attribute, and no way to be sure of relative timeout behavior--or even completely
portable behavior.
10.2.5 Thread abort
The pl:hread_abort: function is essentially fail-safe cancellation. It is used only
when you want to be sure the thread will terminate immediately. The dangerous
aspect of pt;hread_abort; is that the thread does not run cleanup handlers or have
any other opportunity to clean up after itself. That is, if the target thread has a
mutex locked, the thread will terminate with the mutex still locked. Because you
cannot unlock the mutex from another thread, the application must be prepared
to abandon that mutex entirely. Further, it means that any other threads that
might be waiting for the abandoned mutex will continue to wait for the mutex for-
ever unless they are also terminated by calling ptxhread_abort:.
In general, real applications cannot recover from aborting a thread, and you
should never, ever, use pthread_abort:. However, for a certain class of applications
this capability is required. Imagine, for example, a realtime embedded control sys-
tem that cannot shut down and must run reliably across any transient failure in
some algorithm. Should a thread encounter a rare boundary condition bug, and
hang, the application must recover.
In such a system, all wait operations use timeouts, because realtime response
is critical. Should one thread detect that something hasn't happened in a reason-
able time, for example, a navigational thread hasn't received sensor input, it will
notify an "error manager." If the error manager cannot determine why the thread
monitoring the sensor hasn't responded, it will try to recover. It may attempt to
cancel the sensor thread to achieve a safe shutdown, but if the sensor thread fails
to respond to the cancel in a reasonable time, the application must continue any-
way. The error manager would then abort the sensor thread, analyze and correct
any data structures it might have corrupted, create and advertise new mutexes if
necessary, and create a new sensor thread.
10.3 POSIX 1003.14
POSIX. 14 is a different sort of standard, a "POSIX Standard profile." Unlike
Pthreads and POSIX. lj, POSIX. 14 does not add any new capabilities to the POSIX
family. Instead, it attempts to provide some order to the maze of options that
faces implementors and users of POSIX.
The POSIX. 14 specifies which POSIX optional behavior should be considered
"required" for multiprocessor hardware systems. It also raises some of the mini-
mum values defined for various POSIX limits. The POSIX. 14 working group alsodevised
recommendations for additional POSIX interfaces based on the substan-
tial multiprocessing and threading experience of the members. Many of the inter-
faces developed by POSIX. 14 have been included in the POSIX. lj draft standard.
Once POSIX. 14 becomes a standard, in theory, producers of POSIX imple-
mentations will be able to claim conformance to POSIX. 14. And those who wish
to develop multithreaded applications may find it convenient to look for POSIX. 14
conformance rather than simply Pthreads conformance. (It remains to be seen
whether vendors or users will actually do this, since experience with POSIX Stan-
dard Profiles is currently slight.}
The POSIX. 14 working group also tried to address important issues such as
these:
?Providing a way for threaded code to determine the number of active
processors.
?Providing a way for threads to be "bound" onto physicai processors.
?Providing a "processor management" command to control which processors
are used by the system.
Although these capabilities are universally available in all multiprocessor sys-
tems of which the working group was aware, they were dropped from the
standard because of many unresolved issues, including these:
?What good does it do to know how many processors there are, if you cannot
tell how many your code may use at any time? Remember, the information
can change while you are asking for it. What is really needed is a function
asking the question "Would the current process benefit from creation of
another thread?" We don't know how to answer that question, or how to
provide enough information on all reasonable architectures that the appli-
cation can answer it.
?How can we bind a thread to a processor across a wide range of multipro-
cessor architecture? On a nonuniform memory access system, for example,
representing the processors as a uniform array of integer identifiers would
be misleading and useless--binding one thread to processor 0 and another
closely cooperative thread to processor 1 might put them across a relatively
slow communications port rather than on two processors sharing a bank of
memory.
Eventually, some standards organization (possibly POSIX) will need to address
these issues and develop portable interfaces. The folks who attempt this feat may
find that they need to limit the scope of the standard to a field narrower than
"systems on which people may wish to use threads."