Calculate the difference between the largest and smallest column for each row

后端 未结 2 1572
暖寄归人
暖寄归人 2021-01-21 23:46

The title is pretty straight forward - how can I calculate the difference between the largest and smallest value column-wise, for each row?

Let\'s assume this is my data

相关标签:
2条回答
  • 2021-01-22 00:15

    Here's an attempt using my old favourite max.col with a bit of matrix indexing:

    rw <- seq_len(nrow(dat))
    dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))]
    #[1] 3 9 3 3
    

    This should be much faster on large datasets, as per:

    # 5 million big enough?
    dat <- dat[sample(1:4,5e6,replace=TRUE),]
    
    system.time({
      rw <- seq_len(nrow(dat))
      dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))]
    })
    #   user  system elapsed 
    #   2.43    0.20    2.63 
    
    system.time({
      apply(X = dat, MARGIN = 1, function(x) diff(range(x)))
    })
    #   user  system elapsed 
    #  94.91    0.17   95.16 
    
    0 讨论(0)
  • 2021-01-22 00:18

    1

    For each row (using apply with MARGIN = 1), use range to obtain a vector of the minimum and maximum value and then diff to obtain a difference of those values

    apply(X = df, MARGIN = 1, function(x) diff(range(x)))
    #[1] 3 9 3 3
    

    2

    If you want speedier solution, you can use parallel maxima and minima (pmax and pmin)

    do.call(pmax, df) - do.call(pmin, df)
    #[1] 3 9 3 3
    


    Data

    df = structure(list(a = c(1L, 0L, 3L, 9L), b = c(2L, 3L, 2L, 8L), 
        c = c(3L, 6L, 1L, 7L), d = c(4L, 9L, 4L, 6L)), .Names = c("a", 
    "b", "c", "d"), class = "data.frame", row.names = c(NA, -4L))
    

    Timings

    dat <- df[sample(1:4,5e6,replace=TRUE),]
    rw <- seq_len(nrow(dat))
    
    system.time({
        apply(X = dat, MARGIN = 1, function(x) diff(range(x)))
    })
    #STILL RUNNING...
    
    system.time({
        rw <- seq_len(nrow(dat))
        dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))]
    })
    #   user  system elapsed 
    #   3.48    0.11    3.59 
    
    system.time(do.call(pmax, dat) - do.call(pmin, dat))
    #   user  system elapsed 
    #   0.23    0.00    0.26 
    
    identical(do.call(pmax, dat) - do.call(pmin, dat),
          dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))])
    #[1] TRUE
    
    0 讨论(0)
提交回复
热议问题