Plane Answers to Complex Questions

Errata for Fifth Edition

Preface to Fifth Edition, Preface to Fourth Edition, Preface to Third Edition, Preface to Second Edition, Preface to First Edition, Table of Contents

Preface to the Fifth Edition

I prepared the fifth edition of Plane Answers (PA-V) in conjunction with a new edition of Advanced Linear Modeling (ALM-III) (Christensen, 2019). The emphasis in both revisions was to include more material on Statistical Learning. ALM-III has far more changes in it than PA-V. (ALM-III is about 50 per cent longer than ALM-II.) In ALM-III, all but the first three chapters (and Chapter 13) are devoted to dependent data. I regretfully concluded that almost all of the mixed models chapter in PA needed to go into ALM-III. The one exception is that I moved the discussion of BLUP into Chapter 6 of PA-V.

The biggest changes in PA-V are listed below.

Section 1.3 has been restructured to isolate the more difficult parts.
Section 2.9 is a new section on biased estimation and the variance-bias tradeoff.
Subsection 3.2.1 is a short new subsection that introduces the importance of small F statistics.
A new Exercise 3.7b helps establish Fieller's method prior to its application in Exercises 6.9.1,2,3.
Section 4.1 contains some cleaner notation for one-way ANOVA computations.
Section 5.1 is a new section containing my overall view of common multiple comparison procedures.
Subsubsection 6.3.3.2 discusses best predictors for loss functions other than squared error.
Section 6.6 now contains the discussion of BLUP.
The section on polynomial regression and one-way ANOVA now contains a table of polynomial contrasts.
Subsection 7.5.3 contains new material on characterizing the interaction space in an unbalanced two-way ANOVA.
Subsection 9.1.1 introduces ACOVA ideas for models with dependent or heteroscedastic data.
I thought about just deleting Section 9.3 but opted for attempting to make it more relevant.
Section 11.2 has some new results on checking whether models qualify as generalized split plot models.
New Subsection~11.2.3 addresses the analysis of (generalized) split plot designs when there is missing data in the subplots.
As mentioned, mixed models got moved to ALM-III because it fit naturally into ALM-III's emphasis on dependent data.
Subsection 12.4.1 includes additional discussion of testing for heteroscedasticity.
Subsection 12.4.2 introduces the Huber-White sandwich estimator.
I changed the order of the chapters on variable selection and collinearity from previous versions of PA in order to smooth the presentation with ALM-III.
The collinearity chapter contains a new Section 13.2 on variance inflation factors.
The singular value decomposition of a matrix X, Theorem 13.3.1, has been generalized and the relationship between ridge regression and principal component regression explicated.
Section 13.6 is a new section on different approaches to estimation. While this stands on its own, it also motivates material in ALM-III.
Section 14.1 now discusses information criteria for model selection as well as cost complexity pruning.
Section 14.2 has examples illustrating issues with larger data sets.
Section 14.3 contains more discussion of variable selection.
Section 14.4 introduces boosting, bagging and the random part of random forests. The application of these subjects is closely related to nonparametric regression as discussed in Chapter 1 of ALM-III.
Appendix B has a number of refinements in the results. I also decided to rename ``orthogonal matrices'' as ``orthonormal matrices'' because it is a clearly better name.
Appendix D has a new section on identifiability.

A big part of the effort in producing PA-V was just cleaning the text. After 30 years you would think, by now, I would be happy with it.

While PA is a book on Linear Model Theory, Christensen (2015) illustrates the use of most of the theory presented in this book. There are a number of related topics discussed on my website in various places. These include computer code for the applications book as well as for ALM-III, cf. Rcode and R-ALMIII.

I have quite assiduously avoided doing asymptotic theory in PA, and that remains true in PA-V. There are many sources that discuss asymptotics for linear model theory. The appendix to Christensen and Lin (2015) uses a number of the most important results.

I would like to thank Fletcher Christensen and Joe Cavanaugh both of whom I have used as a sounding board for years on linear model issues. I thank Mohammad Hattab for numerous suggestions. Since the last edition of the book, Steve Fienberg and Ingram Olkin have both died. They were the subject matter editors for Springer when PA was first published in 1987. I sent the book to a lot of publishers and Steve was the only person who took seriously the efforts of an assistant professor from Montana State University. (Steve had been one of my professors at the University of Minnesota.) He recommended it to Ingram who both liked it a lot and gave me a large number of suggestions (virtually all of which remain in the book). I owe both of them a great debt!

Some people think that Plane Answers is an example of the old maxim, ``If all you have is a hammer, everything looks like a nail.'' I prefer to think that if you have a good enough hammer, almost everything actually is a nail.

Preface to the Fourth Edition

As with the prefaces to the second and third editions, this focuses on changes to the previous edition. The preface to the first edition discusses the core of the book.

Two substantial changes have occurred in Chapter~3. Subsection 3.3.2 uses a simplified method of finding the reduced model and includes some additional discussion of applications. In testing the generalized least squares models of Section~3.8, even though the data may not be independent or homoscedastic, there are conditions under which the standard F statistic (based on those assumptions) still has the standard F distribution under the reduced model. Section 3.8 contains a new subsection examining such conditions.

The major change in the fourth edition has been a more extensive discussion of best prediction and associated ideas of R^2 in Sections~6.3 and 6.4. It also includes a nice result that justifies traditional uses of residual plots. One portion of the new material is viewing best predictors (best linear predictors) as perpendicular projections of the dependent random variable y into the space of random variables that are (linear) functions of the predictor variables x. A new subsection on inner products and perpendicular projections for more general spaces facilitates the discussion. While these ideas were not new to me, their inclusion here was inspired by deLaubenfels (2006).

Section 9.1 has an improved discussion of least squares estimation in ACOVA models. A new Section~9.5 examines Milliken and Graybill's generalization of Tukey's one degree of freedom for nonadditivity test.

A new Section 10.5 considers estimable parameters that can be known with certainty when C(X) \not \subset C(V) in a general Gauss--Markov model. It also contains a relatively simple way to estimate estimable parameters that are not known with certainty. The nastier parts in Sections~10.1--10.4 are those that provide sufficient generality to allow C(X) \not \subset C(V). The approach of Section~10.5 seems more appealing.

In Sections 12.4 and 12.6 the point is now made that ML and REML methods can also be viewed as method of moments or estimating equations procedures.

The biggest change in Chapter~13 is a new title. The plots have been improved and extended. At the end of Section~13.6 some additional references are given on case deletions for correlated data as well as an efficient way of computing case deletion diagnostics for correlated data.

The old Chapter 14 has been divided into two chapters, the first on variable selection and the second on collinearity and alternatives to least squares estimation. Chapter 15 includes a new section on penalized estimation that discusses both ridge and lasso estimation and their relation to Bayesian inference. There is also a new section on orthogonal distance regression that finds a regression line by minimizing orthogonal distances, as opposed to least squares, which minimizes vertical distances.

Appendix D now contains a short proof of the claim: If the random vectors x and y are independent, then any vector-valued functions of them, say g(x) and h(y), are also independent.

Another significant change is that I wanted to focus on Fisherian inference, rather than the previous blend of Fisherian and Neyman--Pearson inference. In the interests of continuity and conformity, the differences are soft-pedaled in most of the book. They arise notably in new comments made after presenting the traditional (one-sided) F test in Section~3.2 and in a new Subsection~5.6.1 on multiple comparisons. The Fisherian viewpoint is expanded in Appendix~F, which is where it primarily occurred in the previous edition. But the change is most obvious in Appendix E. In all previous editions, Appendix E existed just in case readers did not already know the material. While I still expect most readers to know the ``how to'' of Appendix E, I no longer expect most to be familiar with the ``why'' presented there.

Other minor changes are too numerous to mention and, of course, I have corrected all of the typographic errors that have come to my attention. Comments by Jarrett Barber led me to clean up Definition~2.1.1 on identifiability.

My thanks to Fletcher Christensen for general advice and for constructing Figures 10.1 and 10.2. (Little enough to do for putting a roof over his head all those years. :-)

Preface to the Third Edition

The third edition of Plane Answers includes fundamental changes in how some aspects of the theory are handled. Chapter 1 includes a new section that introduces generalized linear models. Primarily, this provides a definition so as to allow comments on how aspects of linear model theory extend to generalized linear models.

For years I have been unhappy with the concept of estimability. Just because you cannot get a linear unbiased estimate of something does not mean you cannot estimate it. For example, it is obvious how to estimate the ratio of two contrasts in an ANOVA, just estimate each one and take their ratio. The real issue is that if the model matrix X is not of full rank, the parameters are not identifiable. Section 2.1 now introduces the concept of identifiability and treats estimability as a special case of identifiability. This change also resulted in some minor changes in Section 2.2.

In the second edition, Appendix F presented an alternative approach to dealing with linear parametric constraints. In this edition I have used the new approach in Section 3.3. I think that both the new approach and the old approach have virtues, so I have left a fair amount of the old approach intact.

Chapter 8 contains a new section with a theoretical discussion of models for factorial treatment structures and the introduction of special models for homologous factors. This is closely related to the changes in Section 3.3.

In Chapter 9, reliance on the normal equations has been eliminated from the discussion of estimation in ACOVA models --- something I should have done years ago! In the previous editions, Exercise 9.3 has indicated that Section 9.1 should be done with projection operators, not normal equations. I have finally changed it. (Now Exercise 9.3 is to redo Section 9.1 with normal equations.)

Appendix F now discusses the meaning of small F statistics. These can occur because of model lack of fit that exists in an unsuspected location. They can also occur when the mean structure of the model is fine but the covariance structure has been misspecified

In addition there are various smaller changes including the correction of typographical errors. Among these are very brief introductions to nonparametric regression and generalized additive models; as well as Bayesian justifications for the mixed model equations and classical ridge regression. I will let you discover the other changes for yourself.

Preface to the Second Edition

The second edition of Plane Answers has many additions and a couple of deletions. New material includes additional illustrative examples in Appendices A and B and Chapters 2 and 3, as well as discussions of Bayesian estimation, near replicate lack of fit tests, testing the independence assumption, testing variance components, the interblock analysis for balanced incomplete block designs, nonestimable constraints, analysis of unreplicated experiments using normal plots, tensors, and properties of Kronecker products and Vec operators. The book contains an improved discussion of the relation between ANOVA and regression, and an improved presentation of general Gauss-Markov models. The primary material that has been deleted are the discussions of weighted means and of log-linear models. The material on log-linear models was included in Christensen (1990b), so it became redundant here. Generally, I have tried to clean up the presentation of ideas wherever it seemed obscure to me.

Much of the work on the second edition was done while on sabbatical at the University of Canterbury in Christchurch, New Zealand. I would particularly like to thank John Deely for arranging my sabbatical. Through their comments and criticisms, four people were particularly helpful in constructing this new edition. I would like to thank Wes Johnson, Snehalata Huzurbazar, Ron Butler, and Vance Berger.

Preface to the First Edition

This book was written to rigorously illustrate the practical application of the projective approach to linear models. To some, this may seem contradictory. I contend that it is possible to be both rigorous and illustrative, and that it is possible to use the projective approach in practical applications. Therefore, unlike many other books on linear models, the use of projections and subspaces does not stop after the general theory. They are used wherever I could figure out how to do it. Solving normal equations and using calculus (outside of maximum likelihood theory) are anathema to me. This is because I do not believe that they contribute to the understanding of linear models. I have similar feelings about the use of side conditions. Such topics are mentioned when appropriate and thenceforward avoided like the plague.

On the other side of the coin, I just as strenuously reject teaching linear models with a coordinate free approach. Although Joe Eaton assures me that the issues in complicated problems frequently become clearer when considered free of coordinate systems, my experience is that too many people never make the jump from coordinate free theory back to practical applications. I think that coordinate free theory is better tackled after mastering linear models from some other approach. In particular, I think it would be very easy to pick up the coordinate free approach after learning the material in this book. See Eaton (1983) for an excellent exposition of the coordinate free approach.

By now it should be obvious to the reader that I am not very opinionated on the subject of linear models. In spite of that fact, I have made an effort to identify sections of the book where I express my personal opinions.

Although in recent revisions I have made an effort to cite more of the literature, the book contains comparatively few references. The references are adequate to the needs of the book, but no attempt has been made to survey the literature. This was done for two reasons. First, the book was begun about 10 years ago, right after I finished my Masters degree at the University of Minnesota. At that time I was not aware of much of the literature. The second reason is that this book emphasizes a particular point of view. A survey of the literature would best be done on the literature's own terms. In writing this, I ended up reinventing a lot of wheels. My apologies to anyone who's work I have overlooked.

Using the Book

This book has been extensively revised, and the last five chapters were written at Montana State University. At Montana State we require a year of Linear Models for all of our statistics graduate students. In our three-quarter course, I usually end the first quarter with Chapter 4 or in the middle of Chapter 5. At the end of winter quarter, I have finished Chapter 9. I consider the first nine chapters to be the core material of the book. I go quite slowly because all of our Masters students are required to take the course. For Ph. D. students, I think a one-semester course might be the first nine chapters, and a two-quarter course might have time to add some topics from the remainder of the book.

I view the chapters after 9 as a series of important special topics from which instructors can choose material but which students should have access to even if their course omits them. In our third quarter, I typically cover (at some level) Chapters 11 to 14. The idea behind the special topics is not to provide an exhaustive discussion but rather to give a basic introduction that will also enable readers to move on to more detailed works such as Cook and Weisberg (1982) and Haberman (1974).

Appendices A-E provide required background material. My experience is that the student's greatest stumbling block is linear algebra. I would not dream of teaching out of this book without a thorough review of Appendices A and B.

The main prerequisite for reading this book is a good background in linear algebra. The book also assumes knowledge of mathematical statistics at the level of, say, Lindgren or Hogg and Craig. Although I think a mathematically sophisticated reader could handle this book without having had a course in statistical methods, I think that readers who have had a methods course will get much more out of it.

The exercises in this book are presented in two ways. In the original manuscript, the exercises were incorporated into the text. The original exercises have not been relocated. It has been my practice to assign virtually all of these exercises. At a later date, the editors from Springer-Verlag and I agreed that other instructors might like more options in choosing problems. As a result, a section of additional exercises was added to the end of the first nine chapters and some additional exercises were added to other chapters and appendices. I continue to recommend requiring nearly all of the exercises incorporated in the text. In addition, I think there is much to be learned about linear models by doing, or at least reading, the additional exercises.

Many of the exercises are provided with hints. These are primarily designed so that I can quickly remember how to do them. If they help anyone other than me, so much the better.

Acknowledgments

I am a great believer in books. The vast majority of my knowledge about statistics has been obtained by starting at the beginning of a book and reading until I covered what I had set out to learn. I feel both obligated and privileged to thank the authors of the books from which I first learned about linear models: Daniel and Wood, Draper and Smith, Scheffe, and Searle.

In addition, there are a number of people who have substantially influenced particular parts of this book. Their contributions are too diverse to specify, but I should mention that, in several cases, their influence has been entirely by means of their written work. (Moreover, I suspect that in at least one case, the person in question will be loathe to find that his writings have come to such an end as this.) I would like to acknowledge Kit Bingham, Carol Bittinger, Larry Blackwood, Dennis Cook, Somesh Das Gupta, Seymour Geisser, Susan Groshen, Shelby Haberman, David Harville, Cindy Hertzler, Steve Kachman, Kinley Larntz, Dick Lund, Ingram Olkin, S. R. Searle, Anne Torbeyns, Sandy Weisberg, George Zyskind, and all of my students. Three people deserve special recognition for their pains in advising me on the manuscript: Robert Boik, Steve Fienberg, and Wes Johnson.

The typing of the first draft of the manuscript was done by Laura Cranmer and Donna Stickney.

I would like to thank my family: Sharon, Fletch, George, Doris, Gene, and Jim, for their love and support. I would also like to thank my friends from graduate school who helped make those some of the best years of my life.

Finally, there are two people without whom this book would not exist: Frank Martin and Don Berry. Frank because I learned how to think about linear models in a course he taught. This entire book is just an extension of the point of view that I developed in Frank's class. And Don because he was always there ready to help - from teaching my first statistics course to being my thesis adviser and everywhere in between.

Since I have never even met some of these people, it would be most unfair to blame anyone but me for what is contained in the book. (Of course, I will be more than happy to accept any and all praise.) Now that I think about it, there may be one exception to the caveat on blame. If you don't like the diatribe on prediction in Chapter 6, you might save just a smidgen of blame for Seymour (even though he did not see it before publication).

Table of Contents - Fifth Edition

Prefaces

Preface to the Fifth Edition
Preface to the Fourth Edition
Preface to the Third Edition
Preface to the Second Edition
Preface to the First Edition

1 Introduction

1.1 Random Matrices and Vectors
1.2 Multivariate Normal Distributions
1.3 Distributions of Quadratic Forms

1.3.1 Results for General Covariance Matrices

1.4 Generalized Linear Models
1.5 Additional Exercises

2 Estimation

2.1 Identifiability and Estimability
2.2 Estimation: Least Squares
2.3 Estimation: Best Linear Unbiased
2.4 Estimation: Maximum Likelihood
2.5 Estimation: Minimum Variance Unbiased
2.6 Sampling Distributions of Estimates
2.7 Generalized Least Squares
2.8 Normal Equations
2.9 Variance-Bias Tradeoff

2.9.1 Estimable functions

2.10 Bayesian Estimation

2.10.1 Distribution Theory

2.11 Additional Exercises

3 Testing

3.1 More About Models
3.2 Testing Models

3.2.1 Small Test Statistics
3.2.2 A Generalized Test Procedure

3.3 Testing Linear Parametric Functions

3.3.1 A Generalized Test Procedure
3.3.2 Testing an Unusual Class of Hypotheses

3.4 Discussion
3.5 Testing Single Degrees of Freedom in a Given Subspace
3.6 Breaking a Sum of Squares into Independent Components

3.6.1 General Theory
3.6.2 Two-Way ANOVA

3.7 Confidence Regions
3.8 Tests for Generalized Least Squares Models

3.8.1 Conditions for Simpler Procedures

3.9 Additional Exercises

4 One-Way ANOVA

4.1 Analysis of Variance
4.2 Estimating and Testing Contrasts
4.3 Additional Exercises

5 Multiple Comparison Techniques

5.1 Basic Ideas
5.2 Scheffe 's Method
5.3 Least Significant Difference Method
5.4 Bonferroni Method
5.5 Tukey's Method
5.6 Multiple Range Tests: Newman--Keuls and Duncan
5.7 Summary
5.7.1 Fisher Versus Neyman--Pearson
5.8 Additional Exercises

6 Regression Analysis

6.1 Simple Linear Regression
6.2 Multiple Regression

6.2.1 Partitioned Model
6.2.2 Nonparametric Regression and Generalized Additive Models

6.3 General Prediction Theory

6.3.1 Discussion
6.3.2 General Prediction
6.3.3 Best Prediction

6.3.3.1 Residuals
6.3.3.2 Other loss functions

6.3.4 Best Linear Prediction

6.3.4.1 Relation to Least Squares Estimation
6.3.4.2 Residuals

6.3.5 Inner Products and Orthogonal Projections in General Spaces

6.4 Multiple Correlation

6.4.1 Squared Predictive Correlation

6.5 Partial Correlation Coefficients
6.6 Best Linear Unbiased Prediction
6.7 Testing Lack of Fit

6.7.1 The Traditional Test
6.7.2 Near Replicate Lack of Fit Tests
6.7.3 Partitioning Methods
6.7.4 Nonparametric Methods

6.8 Polynomial Regression and One-Way ANOVA
6.9 Additional Exercises

7 Multifactor Analysis of Variance

7.1 Balanced Two-Way ANOVA Without Interaction

7.1.1 Contrasts

7.2 Balanced Two-Way ANOVA with Interaction

7.2.1 Interaction Contrasts

7.3 Polynomial Regression and the Balanced Two-Way ANOVA
7.4 Two-Way ANOVA with Proportional Numbers
7.5 Two-Way ANOVA with Unequal Numbers: General Case

7.5.1 Without Interaction
7.5.2 Interaction
7.5.3 Characterizing the Interaction Space

7.6 Three or More Way Analyses

7.6.1 Balanced Analyses
7.6.2 Unbalanced Analyses

7.7 Additional Exercises

8 Experimental Design Models

8.1 Completely Randomized Designs
8.2 Randomized Complete Block Designs: Usual Theory
8.3 Latin Square Designs
8.4 Factorial Treatment Structures
8.5 More on Factorial Treatment Structures
8.6 Additional Exercises

9 Analysis of Covariance

9.1 Estimation of Fixed Effects

9.1.1 Generalized Least Squares

9.2 Estimation of Error and Tests of Hypotheses
9.3 Another Adjusted Model and Missing Data
9.4 Balanced Incomplete Block Designs
9.5 Testing a Nonlinear Full Model
9.6 Additional Exercises

10 General Gauss--Markov Models

10.1 BLUEs with an Arbitrary Covariance Matrix
10.2 Geometric Aspects of Estimation
10.3 Hypothesis Testing
10.4 Least Squares Consistent Estimation
10.5 Perfect Estimation and More

11 Split Plot Models

11.1 A Cluster Sampling Model
11.2 Generalized Split Plot Models

11.2.1 Estimation and Testing of Estimable Functions
11.2.2 Testing Models
11.2.3 Unbalanced Subplots

11.3 The Split Plot Design
11.4 Identifying the Appropriate Error

11.4.1 Subsampling
11.4.2 Two-Way ANOVA with Interaction

11.5 Exercise: An Unusual Split Plot Analysis

12 Model Diagnostics

12.1 Leverage

12.1.1 Mahalanobis Distances
12.1.2 Diagonal Elements of the Projection Operator
12.1.3 Examples

12.2 Checking Normality

12.2.1 Other Applications for Normal Plots

12.3 Checking Independence

12.3.1 Serial Correlation

12.4 Heteroscedasticity and Lack of Fit

12.4.1 Heteroscedasticity
12.4.2 Huber-White (Robust) Sandwich Estimator
12.4.3 Lack of Fit
12.4.4 Residual Plots

12.5 Updating Formulae and Predicted Residuals
12.6 Outliers and Influential Observations
12.7 Transformations

13 Collinearity and Alternative Estimates

13.1 Defining Collinearity
13.2 Tolerance and Variance Inflation Factors
13.3 Regression in Canonical Form and on Principal Components

13.3.1 Regression in canonical form
13.3.2 Principal Component Regression
13.3.3 Generalized Inverse Regression

13.4 Classical Ridge Regression

13.4.1 Ridge Applied to Principal Components

13.5 More on Mean Squared Error
13.6 Robust Estimation and Alternative Distance Measures
13.7 Orthogonal Regression

14 Variable Selection

14.1 All Possible Regressions and Best Subset Regression

14.1.1 R^2
14.1.2 Adjusted R^2
14.1.3 Mallows's C_p
14.1.4 Information Criteria: AIC, BIC
14.1.5 Cost complexity pruning

14.2 Stepwise Regression

14.2.1 Traditional Forward Selection
14.2.2 Backward Elimination
14.2.3 Other Methods

14.3 Discussion of Traditional Variable Selection Techniques

14.3.1 R^2
14.3.2 Influential Observations
14.3.3 Exploratory Data Analysis
14.3.4 Multiplicities
14.3.5 Predictive models
14.3.6 Overfitting

14.4 Modern Forward Selection: Boosting, Bagging, and Random Forests

14.4.1 Boosting

14.4.1.1 Alternatives

14.4.2 Bagging

14.4.2.1 A simple example
14.4.2.2 Discussion

14.4.3 Random Forests

Appendix A: Vector Spaces
Appendix B: Matrix Results

B.1 Basic Ideas
B.2 Eigenvalues and Related Results
B.3 Projections
B.4 Miscellaneous Results
B.5 Properties of Kronecker Products and Vec Operators
B.6 Tensors
B.7 Exercises

Appendix C: Some Univariate Distributions
Appendix D: Multivariate Distributions

D.1 Identifiability

Appendix E: Inference for One Parameter

E.1 Testing
E.2 P values
E.3 Confidence Intervals
E.4 Final Comments on Significance Testing

Appendix F: Significantly Insignificant Tests

F.1 Lack of Fit and Small F Statistics
F.2 The Effect of Correlation and Heteroscedasticity on F Statistics

Appendix G: Randomization Theory Models

G.1 Simple Random Sampling
G.2 Completely Randomized Designs
G.3 Randomized Complete Block Designs

References
Index
Author Index

Buy Plane Answers now!

Web design by Ronald Christensen (2007) and Fletcher Christensen (2008)