Populate a new column in a dataframe with a lookup from a double matrix

后端 未结 6 1084
感动是毒
感动是毒 2021-02-15 20:40

I have a dataframe df:

colour  shape
\'red\'   circle
\'blue\'  square
\'blue\'  circle
\'green\' sphere

And a double matrix m with named rows/

6条回答
  •  野的像风
    2021-02-15 21:19

    A rather simple (and fast!) alternative is to use a matrix to index into your matrix:

    # Your data
    d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'))
    m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
    
    # Create index matrix - each row is a row/col index
    i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
    
    # Now use it and add as the id column...
    d2 <- cbind(id=m[i], d)
    
    d2
    #  id color  shape
    #1  1   red circle
    #2  5  blue square
    #3  2  blue circle
    #4  9 green sphere
    

    The match function is used to find the corresponding numeric index for a particular string.

    Note that in newer version of R (2.13 and newer I think), you can use character strings in the index matrix. Unfortunately, the color and shape columns are typically factors, and cbind doesn't like that (it uses the integer codes), so you need to coerce them with as.character:

    i <- cbind(as.character(d$color), as.character(d$shape))
    

    ...I suspect that using match is more efficient though.

    EDIT I measured and it seems to be about 20% faster to use match:

    # Make 1 million rows
    d <- d[sample.int(nrow(d), 1e6, TRUE), ]
    
    system.time({
      i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
      d2 <- cbind(id=m[i], d)
    }) # 0.46 secs
    
    
    system.time({
      i <- cbind(as.character(d$color), as.character(d$shape))
      d2 <- cbind(id=m[i], d)
    }) # 0.55 secs
    

提交回复
热议问题