Mutating multiple columns dynamically while conditioning on specific rows

前端 未结 1 1500
清歌不尽
清歌不尽 2021-02-14 01:47

I know there are several similar questions around here, but none of them seems to address the precise issue I\'m having.

set.seed(4)
df = data.frame(
  Key = c(\         


        
相关标签:
1条回答
  • 2021-02-14 02:27

    With this dplyr command,

    df %>% mutate_at(.vars = vars(cols), .funs = function(x) ifelse(df$Key == "A", 0, x))
    

    You are actually evaluating the statement df$Key == "A", n times, where n=the number of columns you have.

    One work around is to pre-define the rows you want to change:

    idx = which(DF$Key=="A")
    DF %>% mutate_at(.vars = vars(cols), .funs = function(x){x[idx]=0;x})
    

    A cleaner and better way, correctly pointed out by @IceCreamToucan (see comments below), is to use the function replace, while passing it the extra parameters:

    DF %>% mutate_at(.vars = vars(cols), replace, DF$Key == 'A', 0)
    

    We can put all these approaches to test, and I think dplyr and data.table are comparable.

    #simulate data
    set.seed(100)
    Key = sample(LETTERS[1:3],1000000,replace=TRUE)
    DF = as.data.frame(data.frame(Key,matrix(runif(1000000*10),nrow=1000000,ncol=10)))
    DT = as.data.table(DF)
    
    cols = grep("[35789]", names(DF), value = TRUE)
    
    #long method
    system.time(DF %>% mutate_at(.vars = vars(cols), .funs = function(x) ifelse(DF$Key == "A", 0, x)))
    user  system elapsed 
      0.121   0.035   0.156 
    
    #old base R way
    system.time(DF[idx,cols] <- 0)
       user  system elapsed 
      0.085   0.021   0.106 
    
    #dplyr
    # define function
    func = function(){
           idx = which(DF$Key=="A")
           DF %>% mutate_at(.vars = vars(cols), .funs = function(x){x[idx]=0;x})
    }
    system.time(func())
    user  system elapsed 
      0.020   0.006   0.026
    
    #data.table
    system.time(DT[Key=="A", (cols) := 0])
       user  system elapsed 
      0.012   0.001   0.013 
    #replace with dplyr
    system.time(DF %>% mutate_at(.vars = vars(cols), replace, DF$Key == 'A', 0))
    user  system elapsed 
      0.007   0.001   0.008
    
    0 讨论(0)
提交回复
热议问题