factor() command in R is for categorical variables with hierarchy level only?

前端 未结 2 498
栀梦
栀梦 2021-01-07 11:01

I\'m quite confused on when to use

factor(educ) or factor(agegroup)
in R. Is it used for categorical ordered data? or can I just use to i
相关标签:
2条回答
  • 2021-01-07 11:25

    You can flag a factor as ordered by creating it with ordered(x) or with factor(x, ordered=TRUE). The "Details" section of ?factor explains that:

    Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.

    You can confirm the first part of that quote (that they differ only in their class) by comparing the attributes of these two objects:

    f  <- factor(letters[3:1], levels=letters[3:1])
    of <- ordered(letters[3:1], levels=letters[3:1])
    attributes(f)
    # $levels
    # [1] "c" "b" "a"
    # 
    # $class
    # [1] "factor"
    attributes(of)
    # $levels
    # [1] "c" "b" "a"
    # 
    # $class
    # [1] "ordered" "factor" 
    

    Various factor-handling R functions (the "methods and model-fitting functions" of the second part of that quote) will then use is.ordered() to test for the presence of that "ordered" class indicator, taking it as a directive to treat an ordered factor differently than an unordered one. Here are a couple of examples:

    ## The print method for factors. (Type 'print.factor' to see the function's code)
    print(f)
    # [1] c b a
    # Levels: c b a
    print(of)
    # [1] c b a
    # Levels: c < b < a
    
    ## The contrasts function. (Type 'contrasts' to see the function's code.)
    contrasts(of)
    #                 .L         .Q
    # [1,] -7.071068e-01  0.4082483
    # [2,]  4.350720e-18 -0.8164966
    # [3,]  7.071068e-01  0.4082483
    contrasts(f)
    #   b a
    # c 0 0
    # b 1 0
    # a 0 1
    
    0 讨论(0)
  • 2021-01-07 11:40

    I don't really see a clear question here, so perhaps a simple example would suffice as an answer.

    Imagine we have the following data.

    set1 <- c("AA", "B", "BA", "CC", "CA", "AA", "BA", "CC", "CC")
    

    We want to factor this data.

    f.set1 <- factor(set1)
    

    Let's look at the output. Note that R has just alphabetized the levels, but does not say that this implies hierarchy (see the "levels" line).

    f.set1
    # [1] AA B  BA CC CA AA BA CC CC
    # Levels: AA B BA CA CC
    is.ordered(f.set1)
    # [1] FALSE
    

    However, using as.numeric on the factored data might fool you into thinking it is hierarchical. Note that "5" comes before "4" in the output below, and note also the alphabetized output of table(f.set1) (which also happens if you simply did table(set1).

    as.numeric(f.set1)
    # [1] 1 2 3 5 4 1 3 5 5
    table(f.set1)
    # f.set1
    # AA  B BA CA CC 
    #  2  1  2  1  3 
    

    Let's now compare this with what happens when we use the ordered argument along with the levels argument. Using levels plus ordered = TRUE tells us that this categorical data is hierarchical, in the order specified by levels (not alphabetically or in the order that we've entered the data).

    o.set1 <- factor(set1, 
                     levels = c("CA", "BA", "AA", "CC", "B"), 
                     ordered = TRUE)
    

    Even viewing the output shows us hierarchy now.

    o.set1
    # [1] AA B  BA CC CA AA BA CC CC
    # Levels: CA < BA < AA < CC < B
    is.ordered(o.set1)
    # [1] TRUE
    

    As do the functions as.numeric and table.

    as.numeric(o.set1)
    # [1] 3 5 2 4 1 3 2 4 4
    table(o.set1)
    # o.set1
    # CA BA AA CC  B 
    #  1  2  2  3  1
    

    So, to summarize, factor() by itself just creates essentially a non-hierarchical sorted factor of your categorical data; factor() with the levels and ordered = TRUE arguments create hierarchical categories.

    Alternatively, use ordered() if you directly want to create ordered factors. The order of the categories still need to be specified:

    ordered(set1, levels = c("CA", "BA", "AA", "CC", "B"))
    
    0 讨论(0)
提交回复
热议问题