r - apply function to each row of a data.table

前端 未结 1 1346
小蘑菇
小蘑菇 2021-02-14 11:49

I\'m looking to use data.table to improve speed for a given function, but I\'m not sure I\'m implementing it the correct way:

Data

相关标签:
1条回答
  • 2021-02-14 12:29

    Having spent the time since asking this question looking into what data.table has to offer, researching data.table joins thanks to @eddi's pointer (for example Rolling join on data.table, and inner join with inequality), I've come up with a solution.

    One of the tricky parts was moving away from the thought of 'apply a function to each row', and redesigning the solution to use joins.

    And, there will no doubt be better ways of programming this, but here's my attempt.

    ## want to find a lkpId for each id, that has the minimum difference between 'thisTime' and 'lkpTime'
    ## and where the lkpId contains both 'thisLocation' and 'finalLocation'
    
    ## find all lookup id's where 'thisLocation' matches 'lookupLocation'
    ## and where thisTime - lkpTime > 0
    setkey(dt, thisLocation)
    setkey(dt_lookup, lkpLocation)
    
    dt_this <- dt[dt_lookup, {
      idx = thisTime - i.lkpTime > 0
      .(id = id[idx],
        lkpId = i.lkpId,
        thisTime = thisTime[idx],
        lkpTime = i.lkpTime)
    },
    by=.EACHI]
    
    ## remove NAs
    dt_this <- dt_this[complete.cases(dt_this)]
    
    ## find all matching 'finalLocation' and 'lookupLocaiton'
    setkey(dt, finalLocation)
    ## inner join (and only return the id columns)
    dt_final <- dt[dt_lookup, nomatch=0, allow.cartesian=TRUE][,.(id, lkpId)]
    
    ## join dt_this to dt_final (as lkpId must have both 'thisLocation' and 'finalLocation')
    setkey(dt_this, id, lkpId)
    setkey(dt_final, id, lkpId)
    
    dt_join <- dt_this[dt_final, nomatch=0]
    
    ## take the combination with the minimum difference between 'thisTime' and 'lkpTime'
    dt_join[,timeDiff := thisTime - lkpTime]
    
    dt_join <- dt_join[ dt_join[order(timeDiff), .I[1], by=id]$V1]  
    
    ## equivalent dplyr code
    # library(dplyr)
    # dt_this <- dt_this %>%
    #   group_by(id) %>%
    #   arrange(timeDiff) %>%
    #   slice(1) %>%
    #   ungroup 
    
    0 讨论(0)
提交回复
热议问题