Defining statistical models; formulæ



next up previous contents
Next: Regression models; fitted Up: Statistical models in Previous: Statistical models in

Defining statistical models; formulæ

The template for a statistical model is a linear regression model with independent, homoscedastic errors

In matrix terms this would be written

where the is the response vector, is the model matrix or design matrix and has columns , , , , the determining variables. Very often will be a column of 1s defining an intercept term.

Examples.

Before giving a formal specification, a few examples may usefully set the picture.

Suppose y, x, x0, x1, x2, ... are numeric variables, X is a matrix and A, B, C, ... are factors. The following formulæ on the left side below specify statistical models as described on the right.


y ~ x
y ~ 1 + x
Both imply the same simple linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one.
y ~ -1 + x
y ~ x - 1
Simple linear regression of y on x through the origin, (that is, without an intercept term).
log(y) ~ x1 + x2
Multiple regression of the transformed variable, log(y), on and (with an implicit intercept term).
y ~ poly(x,2)
y ~ 1 + x + I(x^2)
Polynomial regression of y on x of degree 2. The first form uses orthogonal polynomials, and the second uses explicit powers, as basis.
y ~ X + poly(x,2)
Multiple regression y with model matrix consisting of the matrix X as well as polynomial terms in x to degree 2.
y ~ A
Single classification analysis of variance model of y, with classes determined by A.
y ~ A + x
Single classification analysis of covariance model of y, with classes determined by A, and with covariate x.
y ~ A*B
y ~ A + B + A:B
y ~ B %in% A
y ~ A/B
Two factor non-additive model of y on A and B. The first two specify the same crossed classification and the second two specify the same nested classification. In abstract terms all four specify the same model subspace.
y ~ (A + B + C)^2
y ~ A*B*C - A:B:C
Three factor experiment but with a model containing main effects and two factor interactions only. Both formulæ specify the same model.
y ~ A * x
y ~ A/x
y ~ A/(1 + x) - 1
Separate simple linear regression models of y on x within the levels of A, with different codings. The last form produces explicit estimates of as many different intercepts and slopes as there are levels in A.
y ~ A*B + Error(C)
An experiment with two treatment factors, A and B, and error strata determined by factor C. For example a split plot experiment, with whole plots, (and hence also subplots), determined by factor C.

The operator ~ is used to define a model formula in S-PLUS. The form, for an ordinary linear model, is response ~ term term term

response
is a vector or matrix, (or expression evaluating to a vector or matrix) defining the response variable(s).
is an operator, either + or -, implying the inclusion or exclusion of a term in the model, (the first is optional).
term
is either In all cases each term defines a collection of columns either to be added to or removed from the model matrix. A 1 stands for an intercept column and is by default included in the model matrix unless explicitly removed.

The formula operators are similar in effect to the Wilkinson and Rogers notation used used by such programs a Glim and Genstat. One inevitable change is that the operator ``.'' becomes ``:'' since the period is a valid name character in S-PLUS. The notation is summarised as in the Table 1 (based on Chambers & Hastie, p. 29).

 
Table 1:   Summary of model operator semantics

Note that inside the parentheses that usually enclose function arguments all operators have their normal arithmetic meaning. The function I() is an identity function used only to allow terms in model formulæ to be defined using arithmetic operators.

Note particularly that the model formulæ specify the columns of the model matrix, specification of the parameters is implicit. This is not the case in other contexts, for example in fitting nonlinear models



next up previous contents
Next: Regression models; fitted Up: Statistical models in Previous: Statistical models in



Erik Moledor
Tue Jan 31 21:02:18 EST 1995
s