Defining statistical models; formulæ
Next: Regression models; fitted
Up: Statistical models in
Previous: Statistical models in
The template for a statistical model is a linear regression model with
independent, homoscedastic errors

In matrix terms this would be written

where the
is the response vector,
is the model
matrix or design matrix and has columns
,
,
,
, the determining variables. Very often
will be a column of 1s defining an intercept term.
Examples.
Before giving a formal specification, a few examples may usefully set the
picture.
Suppose y, x, x0, x1, x2, ... are
numeric variables, X is a matrix and A, B, C,
... are factors. The following formulæ on the left side below
specify statistical models as described on the right.
y ~ x
y ~ 1 + x
Both imply the same simple linear regression model of y on x. The
first has an implicit intercept term, and the second an explicit one.
y ~ -1 + x
y ~ x - 1
Simple linear regression of y on x through the origin, (that is,
without an intercept term).
log(y) ~ x1 + x2
Multiple regression of the transformed variable, log(y), on
and
(with an implicit intercept term).
y ~ poly(x,2)
y ~ 1 + x + I(x^2)
Polynomial regression of y on x of degree 2. The first form uses
orthogonal polynomials, and the second uses explicit powers, as basis.
y ~ X + poly(x,2)
Multiple regression y with model matrix consisting of the matrix X as
well as polynomial terms in x to degree 2.
y ~ A
Single classification analysis of variance model of y, with classes
determined by A.
y ~ A + x
Single classification analysis of covariance model of y, with classes
determined by A, and with covariate x.
y ~ A*B
y ~ A + B + A:B
y ~ B %in% A
y ~ A/B
Two factor non-additive model of y on A and B. The first two specify
the same crossed classification and the second two specify the same nested
classification. In abstract terms all four specify the same model
subspace.
y ~ (A + B + C)^2
y ~ A*B*C - A:B:C
Three factor experiment but with a model containing main effects and two
factor interactions only. Both formulæ specify the same model.
y ~ A * x
y ~ A/x
y ~ A/(1 + x) - 1
Separate simple linear regression models of y on x within the levels of
A, with different codings. The last form produces explicit estimates of
as many different intercepts and slopes as there are levels in A.
y ~ A*B + Error(C)
An experiment with two treatment factors, A and B, and error
strata determined by factor C. For example a split plot experiment,
with whole plots, (and hence also subplots), determined by factor C.
The operator ~ is used to define a model formula in S-PLUS. The
form, for an ordinary linear model, is
response ~
term
term
term
- response
- is a vector or matrix, (or expression evaluating to
a vector or matrix) defining the response variable(s).
- is an operator, either + or -, implying
the inclusion or exclusion of a term in the model, (the first is optional).
- term
- is either
- a vector or matrix expression, or 1,
- a factor, or
- a formula expression consisting of factors, vectors or matrices
connected by formula operators.
In all cases each term defines a collection of columns either to be added
to or removed from the model matrix. A 1 stands for an intercept
column and is by default included in the model matrix unless explicitly
removed.
The formula operators are similar in effect to the Wilkinson and
Rogers notation used used by such programs a Glim and Genstat.
One inevitable change is that the operator ``.'' becomes ``:''
since the period is a valid name character in S-PLUS. The notation is
summarised as in the Table 1 (based on Chambers & Hastie,
p. 29).
Table 1: Summary of model operator semantics
Note that inside the parentheses that usually enclose function arguments
all operators have their normal arithmetic meaning. The function I() is an identity function used only to allow terms in model formulæ
to be defined using arithmetic operators.
Note particularly that the model formulæ specify the columns of
the model matrix, specification of the parameters is implicit. This is
not the case in other contexts, for example in fitting nonlinear models
Next: Regression models; fitted
Up: Statistical models in
Previous: Statistical models in
Erik Moledor
Tue Jan 31 21:02:18 EST 1995
s