Here is my problem:
I have a table with categories and I want to rank them:
category
dog
cat
fish
dog
dog
What I want is to add
Hopefully category is a factor variable. If not, convert it to factor:
category <- as.factor(category)
You could use the relevel function to assigned level 1 to the category "dog" as follows:
levels(category) <- relevel(category, ref = "dog")
and then create a data frame using following code:
df <- data.frame(category,as.numeric(category))
colnames(df) <- c("category","rank")
as.numeric
function returns the levels of the factors which is the rank in your case.
I assume that if you write "ranks" you mean ranks. I further assume you want to rank according to number of occurrence.
cats <- factor(c("dog", "cat", "fish", "dog", "dog"))
#see help("rank") for other possibilities to break ties
ranks <- rank(-table(cats), ties.method="first")
DF <- data.frame(category=cats, rank=ranks[as.character(cats)])
print(DF)
# category rank
# 1 dog 1
# 2 cat 2
# 3 fish 3
# 4 dog 1
# 5 dog 1
Just for the sake of completeness and because the solution I posted in a comment is an inefficient (and pretty ugly) fix, I'll post an answer too.
It turned out that OP's starting setting was something like the following:
x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
x = factor(x)
At the end, a manually specified numerical categorization of x
was wanted. As an example, let's suppose that the following matching is wanted:
cat -> 1, dog -> 2, fish -> 3, catfish -> 4
So, some alternatives:
sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4,
USE.NAMES = F)
#[1] 1 2 3 2 2 1 3 4
match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match'
#calls 'match_transform' that coerces
#`factor` to `character`, so no need
#for 'as.character(x)'
#(http://svn.r-project.org/R/trunk/src/main/unique.c)
#[1] 1 2 3 2 2 1 3 4
local({ #just to not change 'x'
levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
as.numeric(x)
})
#[1] 1 2 3 2 2 1 3 4
library(fastmatch)
fmatch(x, c("cat", "dog", "fish", "catfish")) #a faster alternative to 'match'
#[1] 1 2 3 2 2 1 3 4
And a benchmarking on a larger vector:
X = rep(as.character(x), 1e5)
X = factor(X)
f1 = function() sapply(as.character(X), switch,
"cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
f2 = function() match(X, c("cat", "dog", "fish", "catfish"))
f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
as.numeric(X)}
library(fastmatch)
f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), f4(), times = 10)
#Unit: milliseconds
# expr min lq median uq max neval
# f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291 10
# f2() 22.043657 22.786647 23.987263 31.45057 111.9600 10
# f3() 32.704779 32.919150 38.865853 47.67281 134.2988 10
# f4() 8.814958 8.823309 9.856188 19.66435 104.2827 10
sum(f1() != f2())
#[1] 0
sum(f2() != f3())
#[1] 0
sum(f3() != f4())
#[1] 0