A specific example



next up previous contents
Next: The function tapply() Up: Categories and factors Previous: Categories and factors

A specific example

Suppose, for example, we have a sample of 30 tax accountants from the all states and territoriesgif and their individual state of origin is specified by a character vector of state mnemonics as

> state <- c("tas", "sa",  "qld", "nsw", "nsw", "nt",  "wa",  "wa", 
             "qld", "vic", "nsw", "vic", "qld", "qld", "sa",  "tas", 
             "sa",  "nt",  "wa",  "vic", "qld", "nsw", "nsw", "wa", 
             "sa",  "act", "nsw", "vic", "vic", "act")

For some purposes it is convenient to represent such information by a numeric vector with the distinct values in the original (in this case the state labels) represented by a small integer. Such a numeric vector is called a category. However at the same time it is important to preserve the correspondence of these new integer labels with the originals. This is done via the levels attribute of the category.

More formally, when a category is formed from such a vector the sorted unique values in the vector form the levels attribute of the category, and the values in the category are in the set 1, 2, ..., k where k is the number of unique values. The value at position j in the factor is i if the ith sorted unique value occurred at position j of the original vector.

Hence the assignment

stcode <- category(state)

creates a category with values and attributes as follows

> stcode
 [1] 6 5 4 2 2 3 8 8 4 7 2 7 4 4 5 6 5 3 8 7 4 2 2 8 5 1 2 7 7 1
 attr(, "levels"):
 [1] "act" "nsw" "nt"  "qld" "sa"  "tas" "vic" "wa"

Notice that in the case of a character vector, ``sorted'' means sorted in alphabetical order.

A factor is similarly created using the factor() function:

statef <- factor(state)

The print() function now handles the factor object slightly differently:

> statef
 [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa  
[16] tas sa  nt  wa vic qld nsw nsw wa  sa  act nsw vic vic act

If we remove the factor class, however, using the function unclass(), it becomes virtually identical to the category:

> unclass(statef)
 [1] 6 5 4 2 2 3 8 8 4 7 2 7 4 4 5 6 5 3 8 7 4 2 2 8 5 1 2 7 7 1
attr(, "levels"):
[1] "act" "nsw" "nt"  "qld" "sa"  "tas" "vic" "wa"



Erik Moledor
Tue Jan 31 21:02:18 EST 1995