DB 1

LinearModelMethodology.pdf

International Statistical Review (2010), 78, 2, 316–328 doi:10.1111/j.1751-5823.2010.00118.x

Short Book Reviews Editor: Simo Puntanen

Linear Model Methodology André I. Khuri Chapman & Hall/CRC, 2010, xix + 542 pages, £ 63.99 / US$ 99.95, hardcover ISBN: 978-1-58488-481-1

Table of contents

1. Linear models: some historical perspectives 8. Balanced linear models 2. Basic elements of linear algebra 9. The adequacy of Satterthwaite’s approximation 3. Basic concepts in matrix algebra 10. Unbalanced fixed-effects models 4. The multivariate normal distribution 11. Unbalanced random and mixed models 5. Quadratic forms in normal variables 12. Additional topics in linear models 6. Full rank linear models 13. Generalized linear models 7. Less-than-full-rank linear models

Readership: All readers interested in regression presented with a mix of theory and practice.

The material on which this book is based has been taught in a couple of courses at the University of Florida for about 20 years and the author’s skills and experience in doing this are superbly represented in this fine text. The presentation itself leans more toward the theoretical aspects, but there are numerous exercises that reinforce both the theoretical and the practical aspects of regression. (However, no solutions are provided.) “Chapters 11 and 12 can be particularly helpful to graduate students looking for dissertation topics.” (Preface) This is an excellent, reliable, and comprehensive text.

Norman R. Draper: draper@stat.wisc.edu Department of Statistics, University of Wisconsin – Madison

1300 University Avenue, Madison, WI 53706-1532, USA

Knowledge Discovery for Counterterrorism and Law Enforcement David Skillicorn Chapman & Hall/CRC, 2008, xx + 330 pages, £ 49.99 / US$ 79.95, hardcover ISBN: 978-1-4200-7399-7

Table of contents

1. Introduction 6. Looking inside groups – relationship discovery 2. Data 7. Discovery from public textual data 3. High-level principles 8. Discovery in private communication 4. Looking for risk – prediction and anomaly detection 9. Discovering mental and emotional state 5. Looking for similarity – clustering 10. The bottom line

Readership: Anyone first venturing into knowledge discovery for counterterrorism.

C© 2010 The Authors. Journal compilation C© 2010 International Statistical Institute. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

SHORT BOOK REVIEWS 317

This is a discursive book, outlining all sorts of methods, which might be used in counterterrorism, and speculating on how they might be employed. There are very few real applied examples and these are only described in brief. I suppose this should not surprise us for this area of application, but it detracts from the text’s interest. There is a whole 30-page chapter on cluster analysis, which has just one artificial example of three people and their height and age. Even if no counterterrorism examples can be used, surely something more stimulating could have been found. The standard result on the large number of false positives that arise in searching information for terrorists properly appears, though the calculation itself is not given (and strangely enough Bayes’ Theorem does not appear at all). The author’s comment on the result is worth quoting in full: “. . . the ACM committee assume that 99.999% accuracies are unattainably remote, despite the fact that defect rates well below this are commonplace in many industrial settings, not by some kind of magic but by working at the process to reduce defects.” As a statistician, this kind of positive thinking leaves me very skeptical.

Antony Unwin: unwin@math.uni-augsburg.de Universität Augsburg, Institut für Mathematik

D-86135 Augsburg, Germany

Statistical Methods for Categorical Data Analysis, Second Edition Daniel A. Powers, Yu Xie Emerald Group, 2008, xvii + 317 pages, £ 39.99 / US$ 69.95, hardcover ISBN: 978-0-1237-2562-2

Table of contents

1. Introduction 6. Statistical models for event occurrence 2. Review of linear regression models 7. Models for ordinal dependent variables 3. Models for binary data 8. Models for nominal dependent variables 4. Loglinear models for contingency tables A. The matrix approach to regression 5. Multilevel models for binary data B. Maximum likelihood estimation

Readership: Social science researchers.

There are quite a few books on analyzing categorical data. This one has the expressed aim of integrating the transformational approach familiar to statisticians with the latent variable approach “often taken by economists.” It covers a fairly wide range of models in a reasonably successful manner. Though it has a certain amount of mathematics, this is not covered in any great depth. Linear regression is explained in matrix form in an appendix and the Bayes factor is described as “complicated and beyond the scope of this book.” In keeping with the other books in this area, there are disappointingly few graphics and mosaic plots that do not get a mention. In contrast with some of the other books, there are not many motivating examples and when the examples included are analyzed the results are not discussed in any detail. If they had been (or if graphics had been used) the authors might have noticed the two errors in Table 6.8 on page 191 that I spotted. Although there are no exercises, there is a supporting website, which includes code for the examples using a variety of software packages. The book is now in its second edition, so it has already achieved a certain amount of recognition. With better examples it could get more.

Antony Unwin: unwin@math.uni-augsburg.de Universität Augsburg, Institut für Mathematik

D-86135 Augsburg, Germany

318 SHORT BOOK REVIEWS

From Finite Sample to Asymptotic Methods in Statistics Pranab K. Sen, Julio M. Singer, Antonio C. Pedroso de Lima Cambridge University Press, 2009, xii + 386 pages, £ 45.00 / US$ 70.00, hardcover ISBN: 978-0-521-87722-0

Table of contents

1. Motivation and basic tools 7. Asymptotic distributions 2. Estimation theory 8. Asymptotic behavior of estimators and tests 3. Hypothesis testing 9. Categorical data models 4. Elements of statistical decision theory 10. Regression models 5. Stochastic processes: an overview 11. Weak convergence and Gaussian processes 6. Stochastic convergence and probability inequalities

Readership: Advanced undergraduate or beginning graduate students in statistics, biostatistics, or applied statistics, academic researchers in statistically oriented fields.

The authors point out in the preface that “. . . , our intent is to provide a broad view of finite- sample statistical methods, to examine their merits and caveats, and to judge how far asymptotic results eliminate some of the detected impasses, providing the basis for sound application of approximate statistical inference in large samples.” The book succeeds admirably in its aim of providing an overview of finite-sample (exact or small) methods, appraising their scope and integrating them to asymptotic (approximate or large-sample) inference. The treatment of the material is application-oriented and yet mathematically rigorous.

In Chapter 1 the authors motivate their approach through a set of illustrative examples ranging from very simple to more complex applications. Also a summary of some basic tools and results (on matrix algebra, real analysis, probability distributions, order statistics, and quantiles) needed in the text is provided. Chapters 2 and 3 lay out the two building blocks of statistical inference, estimation and testing, and in these chapters the authors address the important issues relating to likelihood, sufficiency and invariance, among others. The chapter titles shown above indicate the range of topics covered in the text.

The book is very well written and clear. The overall standard of explanation is very good, new ideas are accompanied by several worked examples, although one might have wished also numerical examples with some indication of how the theory performs in practice. There are also a large number of suitable exercises for the reader. In my view, this text can be warmly recommended for lecture courses in asymptotic statistics and courses in statistical inference.

Erkki P. Liski: erkki.liski@uta.fi Department of Mathematics and Statistics FI-33014 University of Tampere, Finland

SHORT BOOK REVIEWS 319

Steps Towards a Unified Basis for Scientific Models and Methods Inge S. Helland World Scientific, 2010, xviii + 257 pages, £ 56.00 / US$ 75.00, hardcover ISBN: 978-981-4280-85-3

Table of contents

1. The basic elements 8. Multivariate data analysis and statistics 2. Statistical theory and practice 9. Quantum mechanics and the diversity of concepts 3. Statistical inference under symmetry 10. Epilogue 4. The transition from statistics to quantum theory A.1. Mathematical aspects of basic statistics 5. Quantum mechanics from a statistical basis A.2. Transformation groups and group transformations 6. Further development of quantum mechanics A.3. Technical aspects of quantum mechanics 7. Decisions in statistics A.4. Some aspects of partial least squares regression

Readership: Those interested in the broader aspects of statistical theory and concepts and especially those with a concern with links to quantum theory.

This wide-ranging book aims to address and link broad conceptual issues, in particular in statistical theory and quantum theory. The introductory chapter discusses complementarity in a wide sense and introduces the notion of conceptually defined variables, c-variables. These link with counterfactual variables and latent variables in the sense used in statistical theory, but are intended to be broader. There follows a remarkably clear and compact summary of the theory of statistical inference, limited mainly by a concentration on transformation models. Remarks on a range of more applied issues make an interesting commentary on the more mathematical parts. Chapters 5 and 6 deal with quantum mechanics, starting with a summary account of the conventional approach and then leading to a development from a new set of axioms claimed to have a clearer intuitive content, an aspect which the author considers important. The final Chapters cover a wide range of topics, mostly statistical. The writing is lucid. Whether a useful synthesis has been achieved is unclear to this reviewer.

David Cox: david.cox@nuffield.ox.ac.uk Nuffield College, New Road

Oxford, OX1 1NF, UK

320 SHORT BOOK REVIEWS

A First Course in Probability and Statistics B. L. S. Prakasa Rao World Scientific, 2008, xii + 317 pages, £ 26.00 / US$ 48.00, softcover (also available as hard- cover) ISBN: 978-981-283-654-0

Table of contents

1. Why statistics? 8. Estimation 2. Probability on discrete sample spaces 9. Interval estimation and testing of hypotheses 3. Discrete probability distributions 10. Linear regression and correlation 4. Continuous probability distributions Appendix A. References 5. Multivariate probability distributions Appendix B. Answers to selected exercises 6. Functions of random vectors Appendix C. Tables 7. Approximations to some probability distributions

Readership: Undergraduate courses in statistics and probability, mathematics students who are studying probability.

This book assumes that the reader has completed a course on calculus and has a thorough knowledge and understanding of this. The approach is very mathematical, with many proofs included. The text while advertised for those doing Social Science and Business Administration may find the title misleading as it is certainly suitable for those studying mathematics and statistics but maybe be difficult for the other subject disciplines.

The book is very comprehensive in its coverage of the topics included and contains a wealth of exercises at the end of each chapter. Solutions to only a selected few of these exercises can be found in the appendix and there are no solutions for Chapters 8–10. For a first course it would have been useful to include more solutions such as question 10.1 asks the reader to determine the regression line that best fits five given points.

This is a book that is intended for those of a mathematical mind, who have a background in calculus, and a good grasp of mathematics in general.

Susan Starkings: starkisa@lsbu.ac.uk Centre for Learning Support and Development, London South Bank University

103 Borough Road, London, SE1 0AA, UK

Statistics for Engineers: An Introduction S. J. Morrison Wiley, 2009, xiv + 177 pages, € 46.00 / £ 39.95 / US$ 70.00, hardcover ISBN: 978-0-470-74556-4

Table of contents

1. Nature of variability 8. Conclusion 2. Basic statistical methods Appendix A: Guidelines 3. Production Appendix B: Recommended books 4. Engineering design Appendix C: Periodicals 5. Research and development Appendix D: Supplementary bibliography 6. Background Appendix E: Statistical tables 7. Quality management

Readership: Students on or considering courses in engineering.

SHORT BOOK REVIEWS 321

This book is written by an engineer for an engineering readership and contains practical advice and guidance on the statistical results obtained in a variety of situations. A broad range of statistical methods that is relevant to engineering with the minimum of mathematics and maximum of explanation is the essence of this text.

The book is very comprehensive in its coverage of the engineering topics and fully explains the techniques used here. The text focuses on the statistical methods that engineers need, how they work and how to use them safely. Also included is a wealth of relevant references at the end of each chapter as well as those in the appendices. There are no exercises for the reader to attempt as it is assumed that the lecturer will provide these but it is extremely useful for explanations.

This is a book that is intended for engineering students and is to be recommended to have in any college or university that has students studying this subject discipline.

Susan Starkings: starkisa@lsbu.ac.uk Centre for Learning Support and Development, London South Bank University

103 Borough Road, London, SE1 0AA, UK

Exploring the Origin, Extent, and Future of Life: Philosophical, Ethical and Theological Perspectives Constance M. Bertka (Editor) Cambridge University Press, 2009, xii + 324 pages, £ 65.00 / US$ 120.00, hardcover ISBN: 978-0-521-86363-6

Table of contents

1. Astrobiology in societal context (Constance M. Bertka) 9. A historical perspective on the extent and search for life Part I. Origin of Life (Steven J. Dick)

2. Emergence and the experimental pursuit of the origin of 10. The search for extraterrestrial life: epistemology, ethics, life (Robert M. Hazen) and worldviews (Mark Lupisella)

3. From Aristotle to Darwin, to Freeman Dyson: changing 11. The implications of discovering extraterrestrial life: definitions of life viewed in historical context (James different searches, different issues (Margaret S. Race) E. Strick) 12. God, evolution, and astrobiology (Cynthia S.W. Crysdale)

4. Philosophical aspects of the origin-of-life problem: the Part III. Future of Life emergence of life and the nature of science (Iris Fry) 13. Planetary ecosynthesis on Mars: restoration ecology and

5. The origin of terrestrial life: a Christian perspective environmental ethics (Christopher P. McKay) (Ernan McMullin) 14. The trouble with intrinsic value: an ethical primer for

6. The alpha and the omega: reflections on the origin and astrobiology (Kelly C. Smith) future of life from the perspective of Christian theology 15. God’s preferential option for life: a Christian perspective and ethics (Celia Deane-Drummond) on astrobiology (Richard O. Randolph)

Part II. Extent of Life 16. Comparing stories about the origin, extent, and future of 7. A biologist’s guide to the Solar System (Lynn J. life: an Asian religious perspective (Francisca Cho)

Rothschild) 8. The quest for habitable worlds and life beyond the Solar

System (Carl Pilcher, Jack J. Lissauer)

Readership: Readers interested in Astrobiology.

I was intrigued with the title to see how statistics was to be used in a book of this title and indeed why it was sent to the ISI for a review. The book is divided into three parts namely (i) Origin of life, (ii) Extent of life, and (iii) Future of life. The text contains very little statistics; however, those that are present show an interesting use of statistics. I would not suggest that this is a statistics book in any way, shape or form just that it has some, albeit very little, statistics contained within its pages. So unless one is interested in the area of astrobiology, religion, and ethics or has a philosophical interest then this is not one for you. Having said that I found the

322 SHORT BOOK REVIEWS

book to be very interesting and refreshingly different form the usual academic books that comes my way.

The book was completed with support from the National Aeronautics and Space Administra- tion and the John Temple Foundation. It is a valuable text for graduate students and researchers with an interest in astrobiology.

Susan Starkings: starkisa@lsbu.ac.uk Centre for Learning Support and Development, London South Bank University

103 Borough Road, London, SE1 0AA, UK

Philosophical Transactions of The Royal Society A, 367 (1906) Theme Issue ‘Statistical Challenges of High-dimensional Data’ David L. Banks, Peter J. Bickel, Iain M. Johnstone, D. Michael Titterington (Editors) Royal Society Publishing, 2009, 236 pages, £ 58.00, softcover ISBN: 978-0-85403-779-7

Table of contents

Introduction (I.M. Johnstone, D.M. Titterington) Statistical inference for exploratory data analysis and model Selective inference in complex research (Y. Benjamini, diagnostics (A. Buja, D. Cook, H. Hofmann, M. Lawrence,

R. Heller, D. Yekutieli) E.-K. Lee, D.F. Swayne, H. Wickham) Observed universality of phase transitions in high- Sufficient dimension reduction and prediction in regression

dimensional geometry, with implications for modern data (K.P. Adragni, R.D. Cook) analysis and signal processing (D. Donoho, J. Tanner) Identifying graph clusters using variational inference and

On landmark selection and sampling in high-dimensional links to covariance parametrization (D. Barber) data analysis (M.-A. Belabbas, P.J. Wolfe) Classification of sparse high-dimensional vectors (Yu. I.

An overview of recent developments in genomics and Ingster, C. Pouet, A.B. Tsybakov) associated statistical methods (P.J. Bickel, J.B. Brown, Feature selection by higher criticism thresholding achieves H. Huang, Q. Li) the optimal phase diagram (D. Donoho, J. Jin)

Cherry-picking for complex data: robust structure discovery (D.L. Banks, L. House, K. Killourhy)

Readership: A very good book for those who are interested in knowing what is meant by high- dimensional problems, where they are coming from, and how statisticians and computer and information scientists are solving them.

The book under review is a collection of 11 excellent articles reprinted from the Philosophical Transactions of the Royal Society, vol. 367, pages 4235 through 4470. The front cover reminds us that the Royal Society is the world’s longest running science journal, from the back cover we learn the Society was founded in 1660. Another great name is associated with these papers, they were prepared as a part of the program Statistical Theory and Methods for Complex High- Dimensional Data at the Isaac Newton Institute for Mathematical Sciences in Cambridge, UK.

For anyone working or wishing to work in this area or just learn what is going on in the most happening part of our subject, this is a wonderful book, providing introduction and overview of what is now known as well as new emerging ideas, methods, theorems, and conjectures. Topics covered include variable/feature selection, regression, classification, visual exploration and novel visual confirmatory analysis, multiple tests, robust structure hunting, graph clusters, model selection, and sufficient dimension regression. Among all these exciting theoretical and practical developments, perhaps the most wonderful are the conjectures on and partial verification of phase transitions in high-dimensional multiple testing by Donoho and Tanner (pp. 4273–4294), and the partial verification of optimality of Tukey’s Higher Criticism under phase transition by Donoho and Jin (pp. 4449–4470).

SHORT BOOK REVIEWS 323

A brief review of the papers follows. In their introductory as well as survey of high-dimensional problems, Johnstone and Titterington explain that high dimension means a high-dimensional parameter, usually but not always accompanied by a relatively small replication. Such problems defy the old requirement that the number of sampling units should exceed the number of parameters, more so the better. These problems arise in molecular biology, image processing, communication, and other diverse areas. These problems are usually solved by a sparsity assumption that only a small number of many parameters are nonzero. But one does not know which ones are nonzero, so the problem remains even with the sparsity assumption. It turns out that the signals, that is, the nonzero parameters should be sufficiently large in magnitude to be detected. One could describe this as the first stage of the high-dimensional statistics. This stage still continues but a second stage has begun too. People working in this area have begun to worry about what happens when the sparsity assumption fails at least partly, that is, there are fewer signals than what is provided for by assumed levels of sparsity and the signals may be both rare and weak. Then the methods for high dimensions developed in the first stage breaks down. Donoho and Tanner, in what is perhaps the most stunning article, show the level of sparsity at which breakdown begins to show up is surprisingly stable over different examples and domains and relate this to combinatorial geometry of polytopes. This is very important work still in its infancy, there will be many beautiful as well as useful results.

Since many of these problems originate in molecular biology, in fact in Genomics specifically, Bickel et al. provide a very useful survey of old and very new problems in different subareas of Genomics and the solutions being offered within Classical Statistics, Machine Learning and Bayesian Analysis.

Buja et al. show how visual display, generally considered part of exploratory analysis, can also be used for confirmatory analysis like testing of hypotheses, this seems a very novel idea. Banks et al. discuss robust structure discovery, which is important since robustness of high-dimensional analysis is rarely studied. Adragni and Cook discuss (sufficient) dimension reduction, which is somewhat like principal components, but is applied to inverse regression and aims at being nonparametric. The idea is due to K.C. Li but developed a lot by these authors. Benjamini et al. provide a lovely introduction to the famous Benjamini–Hochberg multiple test and two very useful new techniques to cope with selection bias and problems of multiple tests, for example to discover genes influencing some disease or resistance to it, conducted independently at different sites.

This is only a sparse picture of a complex high-dimensional landscape that unfolds in these eleven articles. In the introductory survey of the first article, among other things there is mention of a “brief encounter with Bayesian Statistics.” If the followers of Reverend Bayes are invited to contribute to another volume, I am sure the brief encounter would explode into another set of new concepts like “shotguns,” “horseshoes,” and other quite new and successful principles for variable selection in very complex problems that do not violate the very strict scientific standards laid down by Sir Isaac.

Jayanta K. Ghosh: ghosh@stat.purdue.edu Department of Statistics, Purdue University

West Lafayette, IN 47909, USA

324 SHORT BOOK REVIEWS

R Through Excel: A Spreadsheet Interface for Statistics, Data Analysis, and Graphics Richard M. Heiberger, Erich Neuwirth Springer, 2009, xxiv + 342 pages, € 54.95 / £ 49.99 / US$ 64.95, softcover ISBN: 978-1-4419-0051-7

Table of contents

1. Getting started 9. What is least squares? 2. Using RExcel and R Commander 10. Multiple regression – two X -variables 3. Getting data into R 11. Polynomial regression 4. Normal and t distributions 12. Multiple regression – three or more X -variables 5. Normal and t workbook 13. Contingency tables and the chi-square test 6. t-Tests A. Installation of RExcel 7. One-way ANOVA B. Nuisances – installation, startup, or execution 8. Simple linear regression

Readership: Students, researchers, and others who wish to use R but avoid the command line.

This book is essentially a manual for the RExcel software. RExcel is an add-in to Excel which allows access to the statistical functionality of R including user-contributed packages via the Excel interface. Although the book contains 342 pages, there is limited text. Most commonly a page consists of one or more screenshots showing how to use RExcel. The whole book is reproduced in color, on glossy paper.

Readers are guided through the menu system (which is based on R Commander) to see how to carry out common statistical procedures. The level of statistical understanding required is roughly that of an introductory applied statistics subject at university level (t-tests, one- way ANOVA, multiple regression, contingency tables). A number of workbooks demonstrating various statistical calculations and procedures come with RExcel. These are intended to support statistics courses. A workbook titled Demo Files for the book R through Excel is described in the book. It covers topics such as data formats, normal and t distributions, and linear regression.

Two appendices deal with installation and possible problems. I found very few errors. Unfortunately, the help system changed with R 2.10, so the help

window shown on p.36 no longer applies. All the author names and dates in the bibliography are duplicated. The index is rather limited, with no entry for “workbook,” for example, when I wanted to look that up.

For anyone wishing to learn RExcel this book would be a useful purchase.

David J. Scott: d.scott@auckland.ac.nz Department of Statistics, The University of Auckland

Private Bag 92019, Auckland 1142, New Zealand

SHORT BOOK REVIEWS 325

Machine Learning: An Algorithmic Perspective Stephen Marsland Chapman & Hall/CRC, 2009, xvi + 390 pages, £ 38.69 / US$ 62.96, hardcover ISBN: 978-1-4200-6718-7

Table of contents

1. Introduction 9. Unsupervised learning 2. Linear discriminants 10. Dimensionality reduction 3. The multi-layer perception 11. Optimization and search 4. Radial basis functions and splines 12. Evolutionary learning 5. Support vector machines 13. Reinforcement learning 6. Learning with trees 14. Markov Chain Monte Carlo (MCMC) methods 7. Decision by committee: ensemble learning 15. Graphical models 8. Probability and learning 16. Python

Readership: Undergraduate computer science and engineering students.

This book is intended to be a practical first introduction to the ideas and methods of machine learning, for those without substantial mathematical machinery to hand. It thus places emphasis on algorithms, rather than the mathematics behind them, and is liberally illustrated with many programming examples, using Python. It includes a basic primer on Python and has an accompanying website.

It has excellent breadth, and is comprehensive in terms of the topics it covers, both in terms of methods (e.g., including neural networks, support vector machines, ensemble methods, tree classifiers, reinforcement learning, stochastic methods, tracking, belief networks, etc.) and in terms of concepts and theory (e.g., dimensionality issues, optimization, etc.).

There is a “further reading” section at the end of each chapter, which is useful, but the references have not been collected together in a list at the end of the book, which can be a disadvantage – something to consider for the second edition.

Overall, I think the author has succeeded in his aim: the book provides an accessible introduction to machine learning. It would be excellent as a first exposure to the subject, and would put the various ideas in context, before moving on to a more elaborate and deep treatment, such as that in Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning.

This book also includes the first occurrence I have seen in print of a reference to a zettabyte of data (1021 bytes) – a reference to “all the world’s computers” being estimated to contain almost a zettabyte by 2010.

David J. Hand: d.j.hand@imperial.ac.uk Mathematics Department, Imperial College

London SW7 2AZ, UK

326 SHORT BOOK REVIEWS

Introduction to Social Statistics: The Logic of Statistical Reasoning Thomas Dietz, Linda Kalof Wiley-Blackwell, 2009, xxxviii + 569 pages, € 32.20 / £ 27.99 / US$ 94.95, hardcover ISBN: 978-1-4051-6902-8

Table of contents

1. An introduction to quantitative analysis 10. Using sampling distributions: hypothesis tests 2. Some basic concepts 11. The subtle logic of analysis of variance 3. Displaying data one variable at a time 12. Goodness of fit and models of frequency tables 4. Describing data 13. Bivariate regression and correlation 5. Plotting relationships and conditional distributions 14. Basics of multiple regression 6. Causation and models of causal effects Appendix A. Summary of variables in examples 7. Probability Appendix B. Mathematics review 8. Sampling distributions and inference Appendix C. Statistical tables 9. Using sampling distributions: confidence intervals

Readership: Undergraduate students in the social sciences.

The book opens with the statement that statistics is hard. While I am not sure that that is the best way to encourage readership and sales, it is certainly a message I endorse, especially for the intended readership of the book. People who are studying statistics as a necessary sideline to their real interests often feel discouraged by the effort needed to master it, and reflect this back on themselves (as in “I am unable to do it”), with the implication that they are inadequate in some way. Instead it is healthier to take the attitude of this book – that the subject matter is intrinsically hard, and one should expect to have to work at it.

Unfortunately, to illustrate the hardness of statistics, the authors quote Persi Diaconis (misspelling his name) saying “Our brains are just not wired to do probability problems very well.” Diaconis was, as he said, speaking of probability (he was discussing the Monty Hall problem), and probability and statistics are very different kinds of beast. One is a branch of mathematics, describing the empirical consequences of given models, and the other is technology of inferring likely underlying structures from empirical data.

The authors then go on to say “it was not until the 1600s, when Galileo correctly analyzed chances in games based on dice, that people began to understand the probabilities that underpin random processes.” It is true that Galileo did investigate such matters, in the early part of the seventeenth century, but his paper was not published until 1718 and people normally date the start of a formal understanding of probability to the correspondence between Pascal and Fermat in 1654, again about a gambling problem.

The book criticizes the cookbook approach to statistics. This is a criticism with which I agree entirely. However, I think the authors have not moved as far from this approach as they could. One illustration is provided by Chapter 4, which essentially lists various simple descriptive statistics, with very little comparative evaluation. In fact the appropriate statistic to use (e.g., mean or median) depends on the question one is trying to answer, and that fact can be used to develop statistical ideas and tools from a clearly noncookbook orientation.

I liked the description of quantitative data analysis as a craft, and also the notion of learning by statistics by apprenticeship. Elsewhere this has been described as a strategy for learning statistical thinking, as opposed to the statistical machinery taught on most courses.

The book is a wealth of effective and helpful real examples, and includes exercises after each substantive chapter. It has been beautifully produced. I think the pace and presentation are exactly right for the intended audience.

SHORT BOOK REVIEWS 327

So it has some good things about it. But (in my opinion!) it also has some bad things. The authors appear to regard the law of large numbers and the central limit theorem as the same thing (e.g., p266 “the Law of Large Numbers, which is also known as the Central Limit Theorem”). The words “nonparametric” and “distribution-free” do not appear in the index, and I did not spot them in the body of the book. Surely this is a major omission, as nonparametric methods are widely used in the social sciences. The book refers to the “controversy” over the relationship between levels of measurement and choice of statistical technique. But I would suggest that this controversy has evaporated with the recognition of the distinction between pragmatic and representational aspects of measurement. Of all the sections of the book, I found this discussion the weakest – which is a little disappointing since measurements of both types figure large in the social sciences. The book adopts an entirely frequentist perspective, with just one page mentioning the alternative Bayesian view – but I am afraid I found that description unconvincing since the description of subjective probability was rather confused. Given the huge progress in practical application of Bayesian methods, I think this topic deserved better. Put together, such criticisms mean the book has a rather old-fashioned feel, in terms of statistical methodology.

There is some discussion of missing data, but I would have liked more on this and indeed on data quality in general. I recognize that this is something of a hobby horse of mine but, after all, the first thing students discover when they step out of the classroom and have to apply their hard-won statistical expertise in practice is that the data facing them are not as clean as the data on which they have been practicing. The real world is a messy place, and so are real data.

Overall, as will probably be obvious, I found the book rather frustrating. While there is a lot I like about it, I found the lack of rigor – to the extent of quite often saying things that I would argue are wrong – grating. It might not matter to a student for whom this is the only statistics course they ever study, and who never uses statistics in later life, but presumably the authors hope that many of the readers will go on to use the ideas and tools.

David J. Hand: d.j.hand@imperial.ac.uk Mathematics Department, Imperial College

London SW7 2AZ, UK

Foundations of Factor Analysis, Second Edition Stanley A. Mulaik Chapman & Hall/CRC, 2009, xxiv + 524 pages, £ 39.99 / US$ 79.95, hardcover ISBN: 978-1-4200-9961-4

Table of contents

1. Introduction 9. Other models of factor analysis 2. Mathematical foundations for factor analysis 10. Factor rotation 3. Composite variables and linear transformations 11. Orthogonal analytic rotation 4. Multiple and partial correlations 12. Oblique analytic rotation 5. Multivariate normal distribution 13. Factor scores and factor indeterminacy 6. Fundamental equations of factor analysis 14. Factorial invariance 7. Methods of factor extraction 15. Confirmatory-factor analysis 8. Common-factor analysis

Readership: Researchers and graduate students interested in fundamental issues of factor analysis.

The first sentence of the Preface says it all: “This is a book for those who want or need to get to the bottom of things.” The first edition of this book appeared almost 40 years ago. It began

328 SHORT BOOK REVIEWS

with a bit more mysterious sentence: “When I was nine years old, I dismantled the family alarm clock.”

I must say that I am very happy that the author has taken the challenge to update and revise this precious book into the second edition. It will be an important source for decades to come. Although many topics in the first version still remain relevant, there are good grounds for the new edition, mostly due to the development of factor analysis, which has (again) taken huge steps since 1972. The history of factor analysis is very long and rather complicated. Therefore it is quite natural that the usual books do not necessarily help in understanding where all the equations and different procedures actually came from. But, this is not a usual book.

As its title suggests, it digs deep into the foundations of factor analysis, shedding light on dozens of questions concerning models, estimation, interpretation, generally applied rules, various algorithms and backgrounds of things, even philosophical and historical notes, etc. All this, together with some jokes here and there, makes reading the book like following a series of enjoyable lectures.

All the way the topics are explained clearly, and mathematics is taught, as it is needed to understand a derivation of an equation or some procedure. Although there are numerous equations and formulas, there is also a great deal of words explaining them and offering more insight. Overall, the book is worth having nearby if you find yourself facing serious questions like “why?” or “who?” or “how?” related to factor analysis.

Kimmo Vehkalahti: kimmo.vehkalahti@helsinki.fi Department of Social Research

FI-00014 University of Helsinki, Finland

copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.