Index vectors. Selecting and modifying subsets of a data set



next up previous contents
Next: Objectstheir modes Up: Simple manipulations; numbers Previous: Character vectors

Index vectors. Selecting and modifying subsets of a data set

Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in square brackets. More generally any expression that evaluates to a vector may have subsets of its elements similarly selected be appending an index vector in square brackets immediately after the expression.

Such index vectors can be any of four distinct types.

1. A logical vector.
In this case the index vector must be of the same length as the vector from which elements are to be selected. Values corresponding to T in the index vector are selected and those corresponding to F omitted. For example

y <- x[!is.na(x)]

creates (or re-creates) an object y which will contain the non-missing values of x, in the same order. Note that if x has missing values, y will be shorter than x. Also

(x+1)[(!is.na(x)) & x>0] -> z

creates an object z and places in it the values of the vector x+1 for which the corresponding value in x was both non-missing and positive.

2. A vector of positive integral quantities.
In this case the values in the index vector must lie in the the set {1, 2, ..., length(x)}. The corresponding elements of the vector are selected and concatenated, in that order, in the result. The index vector can be of any length and the result is of the same length as the index vector. For example x[6] is the sixth component of x and

x[1:10]

selects the first 10 elements of x, (assuming length(x) 10). Also

c("x","y")[rep(c(1,2,2,1), times=4)]

(an admittedly unlikely thing to do) produces a character vector of length 16 consisting of "x", "y", "y", "x" repeated four times.

3. A vector of negative integral quantities.
Such an index vector specifies the values to be excluded rather than included. Thus

y <- x[-(1:5)]

gives y all but the first five elements of x.

4. A vector of character strings.
This possibility only applies where an object has a names attribute to identify its components. In this case a subvector of the names vector may be used in the same way as the positive integral labels in 2. above.

lunch <- fruit[c("apple","orange")]

This option is particularly useful in connection with data frames, as we shall see later.

An indexed expression can also appear on the receiving end of an assignment, in which case the assignment operation is performed only on those elements of the vector. The expression must be of the form vector[index_vector] as having an arbitrary expression in place of the vector name does not make much sense here.

The vector assigned must match the length of the index vector, and in the case of a logical index vector it must again be the same length as the vector it is indexing.

For example

x[is.na(x)] <- 0

replaces any missing values in x by zeros and

y[y<0] <- -y[y<0]

has the same effect as

y <- abs(y) gif



next up previous contents
Next: Objectstheir modes Up: Simple manipulations; numbers Previous: Character vectors



Erik Moledor
Tue Jan 31 21:02:18 EST 1995