I have a dataframe df:
colour shape
\'red\' circle
\'blue\' square
\'blue\' circle
\'green\' sphere
And a double matrix m with named rows/
A rather simple (and fast!) alternative is to use a matrix to index into your matrix:
# Your data
d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'))
m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
# Create index matrix - each row is a row/col index
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
# Now use it and add as the id column...
d2 <- cbind(id=m[i], d)
d2
# id color shape
#1 1 red circle
#2 5 blue square
#3 2 blue circle
#4 9 green sphere
The match
function is used to find the corresponding numeric index for a particular string.
Note that in newer version of R (2.13 and newer I think), you can use character strings in the index matrix. Unfortunately, the color and shape columns are typically factors
, and cbind
doesn't like that (it uses the integer codes), so you need to coerce them with as.character
:
i <- cbind(as.character(d$color), as.character(d$shape))
...I suspect that using match
is more efficient though.
EDIT I measured and it seems to be about 20% faster to use match
:
# Make 1 million rows
d <- d[sample.int(nrow(d), 1e6, TRUE), ]
system.time({
i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
d2 <- cbind(id=m[i], d)
}) # 0.46 secs
system.time({
i <- cbind(as.character(d$color), as.character(d$shape))
d2 <- cbind(id=m[i], d)
}) # 0.55 secs