Calculate the difference between the largest and smallest column for each row

折月煮酒 提交于 2020-07-16 04:50:21

问题


The title is pretty straight forward - how can I calculate the difference between the largest and smallest value column-wise, for each row?

Let's assume this is my data:

a b c d
1 2 3 4
0 3 6 9
3 2 1 4
9 8 7 6

For each row, I want to find the difference between the column with the highest value and the column with the lowest value - the result looks like this:

3
9
3
3

Any help is greatly appreciated!


回答1:


1

For each row (using apply with MARGIN = 1), use range to obtain a vector of the minimum and maximum value and then diff to obtain a difference of those values

apply(X = df, MARGIN = 1, function(x) diff(range(x)))
#[1] 3 9 3 3

2

If you want speedier solution, you can use parallel maxima and minima (pmax and pmin)

do.call(pmax, df) - do.call(pmin, df)
#[1] 3 9 3 3


Data
df = structure(list(a = c(1L, 0L, 3L, 9L), b = c(2L, 3L, 2L, 8L), 
    c = c(3L, 6L, 1L, 7L), d = c(4L, 9L, 4L, 6L)), .Names = c("a", 
"b", "c", "d"), class = "data.frame", row.names = c(NA, -4L))

Timings

dat <- df[sample(1:4,5e6,replace=TRUE),]
rw <- seq_len(nrow(dat))

system.time({
    apply(X = dat, MARGIN = 1, function(x) diff(range(x)))
})
#STILL RUNNING...

system.time({
    rw <- seq_len(nrow(dat))
    dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))]
})
#   user  system elapsed 
#   3.48    0.11    3.59 

system.time(do.call(pmax, dat) - do.call(pmin, dat))
#   user  system elapsed 
#   0.23    0.00    0.26 

identical(do.call(pmax, dat) - do.call(pmin, dat),
      dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))])
#[1] TRUE



回答2:


Here's an attempt using my old favourite max.col with a bit of matrix indexing:

rw <- seq_len(nrow(dat))
dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))]
#[1] 3 9 3 3

This should be much faster on large datasets, as per:

# 5 million big enough?
dat <- dat[sample(1:4,5e6,replace=TRUE),]

system.time({
  rw <- seq_len(nrow(dat))
  dat[cbind(rw, max.col(dat))] - dat[cbind(rw, max.col(-dat))]
})
#   user  system elapsed 
#   2.43    0.20    2.63 

system.time({
  apply(X = dat, MARGIN = 1, function(x) diff(range(x)))
})
#   user  system elapsed 
#  94.91    0.17   95.16 


来源:https://stackoverflow.com/questions/43728522/calculate-the-difference-between-the-largest-and-smallest-column-for-each-row

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!