Why is the terminology of labels and levels in factors so weird?

后端 未结 2 2041
野性不改
野性不改 2020-11-30 00:32

An example of a non-settable function would be labels. You can only set factor labels when they are created with the factor function. There is no labels&l

相关标签:
2条回答
  • 2020-11-30 01:03

    The labels function sounds like the perfect fit for getting the labels of a factor.

    ...but the labels function has nothing to do with factors! It is used as a generic way of getting something to "label" an object. For atomic vectors, this would be the names. But if there are no names, the labels function returns the element indices coerced to strings - something like as.character(seq_along(x)).

    ...So that's what your seeing when you try labels on a factor. The factor is an integer vector without any names, but with a levels attribute.

    A factor has no labels. It only has levels. The labels argument to factor is just a way to be able to give a set of strings but produce another set of strings as the levels... But to confuse things further, the dput function prints the levels attributes as .Label! I think that is a legacy thing...

    # Translate lower case letters to upper case.
    f <- factor(letters[2:4], letters[1:3], LETTERS[1:3])
    dput(f)
    #structure(c(2L, 3L, NA), .Label = c("A", "B", "C"), class = "factor")
    attributes(f)
    #$levels
    #[1] "A" "B" "C"
    #
    #$class
    #[1] "factor"
    

    However, since labels is a generic function, it would probably be a good idea to define labels.factor as follows (currently there is none). Perhaps something for R core to consider?

    labels.factor <- function(x, ...) as.character(x)
    
    0 讨论(0)
  • 2020-11-30 01:24

    I think the way to think about the difference between labels and levels (ignoring the labels() function that Tommy describes in his answer) is that levels is intended to tell R which values to look for in the input (x) and what order to use in the levels of the resulting factor object, and labels is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of the factor object returned by factor() that is called labels ... just the levels, which have been adjusted by the labels argument ... (clear as mud).

    For example:

    > f <- factor(x=c("a","b","c"),levels=c("c","d","e"))
    > f
    [1] <NA> <NA> c  
    Levels: c d e
    > str(f)
    Factor w/ 3 levels "c","d","e": NA NA 1
    

    Because the first two elements of x were not found in levels, the first two elements of f are NA. Because "d" and "e" were included in levels, they show up in the levels of f even though they did not occur in x.

    Now with labels:

    > f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E"))
    > f
    [1] <NA> <NA> C   
    Levels: C D E
    

    After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:

    > f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c"))
    > f
    [1] <NA> <NA> a   
    Levels: a b c
    

    Another way to think about levels is that factor(x,levels=L1,labels=L2) is equivalent to

    f <- factor(x,levels=L1)
    levels(f) <- L2
    

    I think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...

    0 讨论(0)
提交回复
热议问题