How to convert from category to numeric in r

后端 未结 3 695
一个人的身影
一个人的身影 2021-01-13 00:45

Here is my problem:

I have a table with categories and I want to rank them:

category
dog
cat
fish
dog
dog

What I want is to add

相关标签:
3条回答
  • 2021-01-13 01:22

    Hopefully category is a factor variable. If not, convert it to factor:

    category <- as.factor(category)
    

    You could use the relevel function to assigned level 1 to the category "dog" as follows:

    levels(category) <- relevel(category, ref = "dog")
    

    and then create a data frame using following code:

    df <- data.frame(category,as.numeric(category))
    colnames(df) <- c("category","rank")
    

    as.numeric function returns the levels of the factors which is the rank in your case.

    0 讨论(0)
  • 2021-01-13 01:24

    I assume that if you write "ranks" you mean ranks. I further assume you want to rank according to number of occurrence.

    cats <- factor(c("dog", "cat", "fish", "dog", "dog"))
    
    #see help("rank") for other possibilities to break ties
    ranks <- rank(-table(cats), ties.method="first")
    
    DF <- data.frame(category=cats, rank=ranks[as.character(cats)])
    
    print(DF)
    #   category rank
    # 1      dog    1
    # 2      cat    2
    # 3     fish    3
    # 4      dog    1
    # 5      dog    1
    
    0 讨论(0)
  • 2021-01-13 01:29

    Just for the sake of completeness and because the solution I posted in a comment is an inefficient (and pretty ugly) fix, I'll post an answer too.

    It turned out that OP's starting setting was something like the following:

    x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
    x = factor(x)
    

    At the end, a manually specified numerical categorization of x was wanted. As an example, let's suppose that the following matching is wanted:

    cat -> 1, dog -> 2, fish -> 3, catfish -> 4
    

    So, some alternatives:

    sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, 
                                                                    USE.NAMES = F)
    #[1] 1 2 3 2 2 1 3 4
    
    match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match' 
                                                 #calls 'match_transform' that coerces
                                                 #`factor` to `character`, so no need
                                                 #for 'as.character(x)'
                                      #(http://svn.r-project.org/R/trunk/src/main/unique.c)
    #[1] 1 2 3 2 2 1 3 4
    
    local({    #just to not change 'x'
    levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
    as.numeric(x)
    })
    #[1] 1 2 3 2 2 1 3 4
    
    library(fastmatch)
    fmatch(x, c("cat", "dog", "fish", "catfish"))  #a faster alternative to 'match'
    #[1] 1 2 3 2 2 1 3 4
    

    And a benchmarking on a larger vector:

    X = rep(as.character(x), 1e5)
    X = factor(X)
    f1 = function() sapply(as.character(X), switch, 
                "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
    f2 = function() match(X, c("cat", "dog", "fish", "catfish")) 
    f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
                                                           as.numeric(X)}
    library(fastmatch)
    f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))
    
    library(microbenchmark)
    microbenchmark(f1(), f2(), f3(), f4(), times = 10)
    #Unit: milliseconds
    # expr         min          lq      median         uq       max neval
    # f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291    10
    # f2()   22.043657   22.786647   23.987263   31.45057  111.9600    10
    # f3()   32.704779   32.919150   38.865853   47.67281  134.2988    10
    # f4()    8.814958    8.823309    9.856188   19.66435  104.2827    10
    sum(f1() != f2())
    #[1] 0
    sum(f2() != f3())
    #[1] 0
    sum(f3() != f4())
    #[1] 0
    
    0 讨论(0)
提交回复
热议问题