Populate a new column in a dataframe with a lookup from a double matrix

后端 未结 6 1930
旧时难觅i
旧时难觅i 2021-02-15 20:27

I have a dataframe df:

colour  shape
\'red\'   circle
\'blue\'  square
\'blue\'  circle
\'green\' sphere

And a double matrix m with named rows/

6条回答
  •  执笔经年
    2021-02-15 20:58

    A rather simple (and fast!) alternative is to use a matrix to index into your matrix:

    # Your data
    d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'))
    m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
    
    # Create index matrix - each row is a row/col index
    i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
    
    # Now use it and add as the id column...
    d2 <- cbind(id=m[i], d)
    
    d2
    #  id color  shape
    #1  1   red circle
    #2  5  blue square
    #3  2  blue circle
    #4  9 green sphere
    

    The match function is used to find the corresponding numeric index for a particular string.

    Note that in newer version of R (2.13 and newer I think), you can use character strings in the index matrix. Unfortunately, the color and shape columns are typically factors, and cbind doesn't like that (it uses the integer codes), so you need to coerce them with as.character:

    i <- cbind(as.character(d$color), as.character(d$shape))
    

    ...I suspect that using match is more efficient though.

    EDIT I measured and it seems to be about 20% faster to use match:

    # Make 1 million rows
    d <- d[sample.int(nrow(d), 1e6, TRUE), ]
    
    system.time({
      i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
      d2 <- cbind(id=m[i], d)
    }) # 0.46 secs
    
    
    system.time({
      i <- cbind(as.character(d$color), as.character(d$shape))
      d2 <- cbind(id=m[i], d)
    }) # 0.55 secs
    

提交回复
热议问题