Lookup value from another column that matches with variable

后端 未结 3 1589
-上瘾入骨i
-上瘾入骨i 2021-01-21 01:12

I have a dataframe that looks like:

animal_id   trait_id    sire_id dam_id
    1         25.05        0       0
    2         -46.3        1       2
    3                


        
相关标签:
3条回答
  • 2021-01-21 01:30

    You can use match; match(col, df$animal_id) gives corresponding index of elements from col in the animal_id, which can be used further to locate the values of trait:

    df[c("trait_sire", "trait_dam")] <- 
        lapply(df[c("sire_id", "dam_id")], function(col) df$trait_id[match(col, df$animal_id)])
    
    df
    #  animal_id trait_id sire_id dam_id trait_sire trait_dam
    #1         1    25.05       0      0         NA        NA
    #2         2   -46.30       1      2      25.05    -46.30
    #3         3    41.60       1      2      25.05    -46.30
    #4         4   -42.76       3      4      41.60    -42.76
    #5         5   -10.99       3      4      41.60    -42.76
    #6         6   -49.81       5      4     -10.99    -42.76
    
    0 讨论(0)
  • 2021-01-21 01:39

    You can do this using match (in base R) in one run (no need to loop over)

    df[c("trait_sire", "trait_dam")] <- 
    cbind(with(df, trait_id[match(sire_id, animal_id)]), 
          with(df, trait_id[match(dam_id, animal_id)]))
    
      # animal_id trait_id sire_id dam_id trait_sire trait_dam
    # 1         1    25.05       0      0         NA        NA
    # 2         2   -46.30       1      2      25.05    -46.30
    # 3         3    41.60       1      2      25.05    -46.30
    # 4         4   -42.76       3      4      41.60    -42.76
    # 5         5   -10.99       3      4      41.60    -42.76
    # 6         6   -49.81       5      4     -10.99    -42.76
    
    0 讨论(0)
  • 2021-01-21 01:50

    With data.table joins...

    library(data.table)
    setDT(DT)    
    
    DT[, trait_sire := 
      .SD[.SD, on=.(animal_id = sire_id), x.trait_id ]
    ]
    
    DT[, trait_dam := 
      .SD[.SD, on=.(animal_id = dam_id), x.trait_id ]
    ]
    
       animal_id trait_id sire_id dam_id trait_sire trait_dam
    1:         1    25.05       0      0         NA        NA
    2:         2   -46.30       1      2      25.05    -46.30
    3:         3    41.60       1      2      25.05    -46.30
    4:         4   -42.76       3      4      41.60    -42.76
    5:         5   -10.99       3      4      41.60    -42.76
    6:         6   -49.81       5      4     -10.99    -42.76
    

    The syntax is x[i, on=, j] where j is some function of the columns. To see how it works, try out DT[DT, on=.(animal_id = dam_id)] and variations. Some notes:

    1. The i.* / x.* syntax helps to distinguish where a column is taken from.
    2. When j is v := expression, the expression is assigned to column, v.
    3. The join x[i, ...] uses rows of i to look up rows of x.
    4. The on= syntax is like .(xcol = icol).
    5. Inside j, the table itself can be written as .SD.

    One advantage of this approach over match is that it extends to joins on more than one column, like on = .(xcol = icol, xcol2 = icol2) or even "non equi joins" like on = .(xcol < icol). Also, it's part of a consistent syntax for operating on the table (explained in the package's introductory material), rather than specialized code for one task.

    0 讨论(0)
提交回复
热议问题