Populate a new column in a dataframe with a lookup from a double matrix

后端 未结 6 1946
旧时难觅i
旧时难觅i 2021-02-15 20:27

I have a dataframe df:

colour  shape
\'red\'   circle
\'blue\'  square
\'blue\'  circle
\'green\' sphere

And a double matrix m with named rows/

相关标签:
6条回答
  • 2021-02-15 20:58

    A rather simple (and fast!) alternative is to use a matrix to index into your matrix:

    # Your data
    d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'))
    m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
    
    # Create index matrix - each row is a row/col index
    i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
    
    # Now use it and add as the id column...
    d2 <- cbind(id=m[i], d)
    
    d2
    #  id color  shape
    #1  1   red circle
    #2  5  blue square
    #3  2  blue circle
    #4  9 green sphere
    

    The match function is used to find the corresponding numeric index for a particular string.

    Note that in newer version of R (2.13 and newer I think), you can use character strings in the index matrix. Unfortunately, the color and shape columns are typically factors, and cbind doesn't like that (it uses the integer codes), so you need to coerce them with as.character:

    i <- cbind(as.character(d$color), as.character(d$shape))
    

    ...I suspect that using match is more efficient though.

    EDIT I measured and it seems to be about 20% faster to use match:

    # Make 1 million rows
    d <- d[sample.int(nrow(d), 1e6, TRUE), ]
    
    system.time({
      i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m)))
      d2 <- cbind(id=m[i], d)
    }) # 0.46 secs
    
    
    system.time({
      i <- cbind(as.character(d$color), as.character(d$shape))
      d2 <- cbind(id=m[i], d)
    }) # 0.55 secs
    
    0 讨论(0)
  • 2021-02-15 20:59

    Another answer Using the reshape2 and plyr (optional just for join) packages.

    require(plyr)
    require(reshape2)
    
    Df <- data.frame(colour = c("red", "blue", "blue", "green"), 
                      shape = c("circle", "square", "circle", "sphere"))
    
    Mat <- matrix(1:9, dimnames = list(c("red", "blue", "green"),
                                       c("circle", "square", "sphere")), 
                        nrow = 3)
    
    Df2 <- melt.array(Mat, varnames = c("colour", "shape"))
    
    join(Df, Df2)
    result <- join(Df, Df2)
    
    join(Df, Df2)
    Joining by: colour, shape
      colour  shape value
    1    red circle     1
    2   blue square     5
    3   blue circle     2
    4  green sphere     9
    

    Hope this help

    0 讨论(0)
  • 2021-02-15 21:10

    You could also convert the matrix m to a vector and then match the ID to the colour and shape values:

    df<-data.frame(colour=c("red","blue","blue","green"),
      shape=c("circle","square","circle","sphere"))
    
    
    m<-matrix(1:9,nrow=3,dimnames=list(c("red","blue","green"),
      c("circle","square","sphere")))
    
    
    mVec<-as.vector(m)
    

    The next step matches the colour in df to the appropriate dimname in the m matrix, then adds an integer corresponding to the shape. The result in the index of the m vector with the corresponding ID.

    df$ID<-mVec[match(df$colour, dimnames(m)[[1]]) + (dim(m)[1]*
      (match(df$shape, dimnames(m)[[2]]) - 1))]
    
    0 讨论(0)
  • 2021-02-15 21:19
    #recreating your data
    dat <- read.table(text="colour  shape
    'red'   circle
    'blue'  square
    'blue'  circle
    'green' sphere", header=TRUE)
    
    d2 <- matrix(c(1:9), ncol=3, nrow=3, byrow=TRUE)
    dimnames(d2) <-list(c('circle', 'square', 'sphere'),
    c("red", "blue", "green"))
    d2<-as.table(d2)
    
    #make a list of matching to the row and column names of the look up matrix
    LIST <- list(match(dat[, 2], rownames(d2)), match(dat[, 1], colnames(d2)))
    #use sapply to index the lookup matrix using the row and col values from LIST 
    id <- sapply(seq_along(LIST[[1]]), function(i) d2[LIST[[1]][i], LIST[[2]][i]])
    #put it all back together
    data.frame(id=id, dat)
    
    0 讨论(0)
  • 2021-02-15 21:23

    I think I might win the shortest answer contest here as long as those are character vectors rather than factors which might be more expected unless you made specifid effort to avoid. It really only adds cbind to convert the two df "character" vectors to a two column matrix expected by the [.matrix function that you were very close to success in using. (And it also seems reasonably expressive.)

    # Data construct
    d <- data.frame(color=c('red','blue','blue','green'), 
    shape=c('circle','square','circle','sphere'), stringsAsFactors=FALSE)
     m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere')))
    # Code:
    
     d$id <- with( d, m [ cbind(color, shape) ] )
     d
      color  shape id
    1   red circle  1
    2  blue square  5
    3  blue circle  2
    4 green sphere  9
    
    0 讨论(0)
  • 2021-02-15 21:25

    merge() is your friend here. To use it, we need an appropriate data frame to merge with containing the stacked version of your ID matrix. I create that as newdf with the code below:

    df <- data.frame(matrix(1:9, ncol = 3))
    colnames(df) <- c("circle","square","sphere")
    rownames(df) <- c("red","blue","green")
    
    newdf <- cbind.data.frame(ID = unlist(df), 
                              expand.grid(colour = rownames(df), 
                                          shape = colnames(df)))
    

    Which results in:

    > newdf
            ID colour  shape
    circle1  1    red circle
    circle2  2   blue circle
    circle3  3  green circle
    square1  4    red square
    square2  5   blue square
    square3  6  green square
    sphere1  7    red sphere
    sphere2  8   blue sphere
    sphere3  9  green sphere
    

    Then with your original data in object df2, defined using

    df2 <- data.frame(colour = c("red","blue","blue","green"),
                      shape = c("circle","square","circle","sphere"))
    

    use merge()

    > merge(newdf, df2, sort = FALSE)
      colour  shape ID
    1    red circle  1
    2   blue circle  2
    3   blue square  5
    4  green sphere  9
    

    You can store that and rearrange the columns if you need that:

    > res <- merge(newdf, df2, sort = FALSE)
    > res <- res[,c(3,1,2)]
    > res
      ID colour  shape
    1  1    red circle
    2  2   blue circle
    3  5   blue square
    4  9  green sphere
    
    0 讨论(0)
提交回复
热议问题