Suppose, for example, we have a sample of 30 tax accountants from the all
states and territories
and their individual
state of origin is specified by a character vector of state mnemonics as
> state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa",
"qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas",
"sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa",
"sa", "act", "nsw", "vic", "vic", "act")
For some purposes it is convenient to represent such information by a numeric vector with the distinct values in the original (in this case the state labels) represented by a small integer. Such a numeric vector is called a category. However at the same time it is important to preserve the correspondence of these new integer labels with the originals. This is done via the levels attribute of the category.
More formally, when a category is formed from such a vector the sorted unique values in the vector form the levels attribute of the category, and the values in the category are in the set 1, 2, ..., k where k is the number of unique values. The value at position j in the factor is i if the ith sorted unique value occurred at position j of the original vector.
Hence the assignment
stcode <- category(state)
creates a category with values and attributes as follows
> stcode [1] 6 5 4 2 2 3 8 8 4 7 2 7 4 4 5 6 5 3 8 7 4 2 2 8 5 1 2 7 7 1 attr(, "levels"): [1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"
Notice that in the case of a character vector, ``sorted'' means sorted in alphabetical order.
A factor is similarly created using the factor() function:
statef <- factor(state)
The print() function now handles the factor object slightly differently:
> statef [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa [16] tas sa nt wa vic qld nsw nsw wa sa act nsw vic vic act
If we remove the factor class, however, using the function unclass(), it becomes virtually identical to the category:
> unclass(statef) [1] 6 5 4 2 2 3 8 8 4 7 2 7 4 4 5 6 5 3 8 7 4 2 2 8 5 1 2 7 7 1 attr(, "levels"): [1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"