Multiple pairwise differences based on column name patterns

后端 未结 3 1179
悲哀的现实
悲哀的现实 2021-01-15 02:31

I have a data.table, dt:

dt

Id  v1 v2 v3 x1 x2 x3
1   7  1  3  5  6  8
2   1  3  5  6  8  5
3   3  5  6  8  5  1

v1, v2, v3 an

相关标签:
3条回答
  • 2021-01-15 02:41

    Your data looks like it belongs in a long format, for which the calculation you're after would become trivial:

    # reshape
    DT_long = melt(DT, id.vars='Id', measure.vars = patterns(v = '^v', x = '^x'))
    DT_long
    #       Id variable     v     x
    # 1:     1        1     7     5
    # 2:     2        1     1     6
    # 3:     3        1     3     8
    # 4:     1        2     1     6
    # 5:     2        2     3     8
    # 6:     3        2     5     5
    # 7:     1        3     3     8
    # 8:     2        3     5     5
    # 9:     3        3     6     1
    

    Now it's easy:

    DT_long[ , diff := v - x][]
    #       Id variable     v     x  diff
    # 1:     1        1     7     5     2
    # 2:     2        1     1     6    -5
    # 3:     3        1     3     8    -5
    # 4:     1        2     1     6    -5
    # 5:     2        2     3     8    -5
    # 6:     3        2     5     5     0
    # 7:     1        3     3     8    -5
    # 8:     2        3     5     5     0
    # 9:     3        3     6     1     5
    

    You can then use dcast to reshape back to wide, but it's usually worth considering whether keeping the dataset in this long form is better for the whole analysis.

    0 讨论(0)
  • 2021-01-15 02:44

    You can use set in the loop.

    library(data.table)
    
    DT <- fread('Id  v1 v2 v3 x1 x2 x3
    1   7  1  3  5  6  8
    2   1  3  5  6  8  5
    3   3  5  6  8  5  1')
    
    for (i in 1:3) {
      set(DT,j=paste0("Diff_",i),value = DT[[paste0("v",i)]]-DT[[paste0("x",i)]])
    }
    
    DT
    #>    Id v1 v2 v3 x1 x2 x3 Diff_1 Diff_2 Diff_3
    #> 1:  1  7  1  3  5  6  8      2     -5     -5
    #> 2:  2  1  3  5  6  8  5     -5     -5      0
    #> 3:  3  3  5  6  8  5  1     -5      0      5
    

    Created on 2020-05-27 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2021-01-15 03:03

    You could split by whether the column contains x and then take the difference of the resulting data tables.

    new_cols <- 
      do.call('-', split.default(dt[,-1], grepl('x', names(dt)[-1])))
    
    dt[, paste0('diff', seq_along(new_cols)) := new_cols]
    
    dt
    #    Id v1 v2 v3 x1 x2 x3 diff1 diff2 diff3
    # 1:  1  7  1  3  5  6  8     2    -5    -5
    # 2:  2  1  3  5  6  8  5    -5    -5     0
    # 3:  3  3  5  6  8  5  1    -5     0     5
    

    Or using similar logic to the code snippet in the question you could do

    newnames <- paste0("diff",1:3)
    v <- paste0("v",1:3)
    x <- paste0("x",1:3)
    
    dt[, (newnames) := Map('-', mget(v), mget(x))]
    
    dt
    #    Id v1 v2 v3 x1 x2 x3 diff1 diff2 diff3
    # 1:  1  7  1  3  5  6  8     2    -5    -5
    # 2:  2  1  3  5  6  8  5    -5    -5     0
    # 3:  3  3  5  6  8  5  1    -5     0     5
    
    0 讨论(0)
提交回复
热议问题