How to calculate number of occurrences per minute for a large dataset

前端 未结 3 723
时光说笑
时光说笑 2021-01-06 02:42

I have a dataset with 500k appointments lasting between 5 and 60 minutes.

tdata <- structure(list(Start = structure(c(1325493000, 1325493600, 1325494200,         


        
3条回答
  •  再見小時候
    2021-01-06 03:06

    Here's a strategy - order by start time, then unlist the data by going start,end,start,end,... and see if that vector needs to be reordered. If it doesn't, then there are no conflicts and if it does you can see how many appointments (and which appointments if you like) conflict with each other.

    # Using Roland's example:
    DF <- read.table(text="                Start,                 End,  Location,  Room
    1,2012-01-02 08:30:00,2012-01-02 08:40:00,LocationA,RoomA
    2,2012-01-02 08:40:00,2012-01-02 08:50:00,LocationA,RoomA
    3,2012-01-02 08:50:00,2012-01-02 09:55:00,LocationA,RoomA
    4,2012-01-02 09:00:00,2012-01-02 09:10:00,LocationA,RoomA
    5,2012-01-02 09:00:00,2012-01-02 09:10:00,LocationA,RoomB
    6,2012-01-02 09:10:00,2012-01-02 09:20:00,LocationA,RoomB",header=TRUE,sep=",",stringsAsFactors=FALSE)
    
    dt = data.table(DF)
    
    # the conflicting appointments
    dt[order(Start),
       .SD[unique((which(order(c(rbind(Start, End))) != 1:(2*.N)) - 1) %/% 2 + 1)],
       by = list(Location, Room)]
    #    Location  Room               Start                 End
    #1: LocationA RoomA 2012-01-02 08:50:00 2012-01-02 09:55:00
    #2: LocationA RoomA 2012-01-02 09:00:00 2012-01-02 09:10:00
    
    # and a speedier version of the above, that avoids constructing the full .SD:
    dt[dt[order(Start),
          .I[unique((which(order(c(rbind(Start, End))) != 1:(2*.N)) - 1) %/% 2 + 1)],
          by = list(Location, Room)]$V1]
    

    Perhaps the formula for going from unmatched order to correct indices above can be simplified, I didn't spend too much time thinking about it and just used the first thing that got the job done.

提交回复
热议问题