Loop over rows of dataframe applying function with if-statement

前端 未结 3 500
攒了一身酷
攒了一身酷 2020-12-13 02:16

I\'m new to R and I\'m trying to sum 2 columns of a given dataframe, if both the elements to be summed satisfy a given condition. To make things clear, what I want to do is:

相关标签:
3条回答
  • 2020-12-13 02:45

    This operation doesn't require loops, apply statements or if statements. Vectorised operations and subsetting is all you need:

    t.d <- within(t.d, V4 <- V1 + V3)
    t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0
    t.d
    
      V1 V2 V3 V4
    1  1  4  7  0
    2  2  5  8 10
    3  3  6  9  0
    

    Why does this work?

    In the first step I create a new column that is the straight sum of columns V1 and V4. I use within as a convenient way of referring to the columns of d.f without having to write d.f$V all the time.

    In the second step I subset all of the rows that don't fulfill your conditions and set V4 for these to 0.

    0 讨论(0)
  • 2020-12-13 02:50

    ifelse is your friend here:

    t.d$V4<-ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0)
    
    0 讨论(0)
  • 2020-12-13 03:00

    I'll chip in and provide yet another version. Since you want zero if the condition doesn't mach, and TRUE/FALSE are glorified versions of 1/0, simply multiplying by the condition also works:

    t.d<-as.data.frame(matrix(1:9,ncol=3))
    t.d <- within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9))
    

    ...and it happens to be faster than the other solutions ;-)

    t.d <- data.frame(V1=runif(2e7, 1, 2), V2=1:2e7, V3=runif(2e7, 5, 10))
    system.time( within(t.d, V4 <- (V1+V3)*(V1>1 & V3<9)) )         # 3.06 seconds
    system.time( ifelse((t.d$V1>1)&(t.d$V3<9), t.d$V1+ t.d$V3, 0) ) # 5.08 seconds
    system.time( { t.d <- within(t.d, V4 <- V1 + V3); 
                   t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0 } )       # 4.50 seconds
    
    0 讨论(0)
提交回复
热议问题