Algorithmic issue: determining “user sessions”

后端 未结 4 2194
日久生厌
日久生厌 2021-02-20 06:12

I\'ve got a real little interesting (at least to me) problem to solve (and, no, it is not homework). It is equivalent to this: you need to determine \"sessions\" and \"sessions

4条回答
  •  [愿得一人]
    2021-02-20 06:32

    You are asking for an online algorithm, i.e. one that can calculate a new set of sessions incrementally for each new input time.

    Concerning the choice of data structure for the current set of sessions, you can use a balanced binary search tree. Each sessions is represented by a pair (start,end) of start time and end time. The nodes of the search tree are ordered by their start time. Since your sessions are separated by at least max_inactivity, i.e. no two sessions overlap, this will ensure that the end times are ordered as well. In other words, ordering by start times will already order the sessions consecutively.

    Here some pseudo-code for insertion. For notational convenience, we pretend that sessions is an array, though it's actually a binary search tree.

    insert(time,sessions) = do
        i <- find index such that
             sessions[i].start <= time && time < session[i+1].start
    
        if (sessions[i].start + max_inactivity >= time)
            merge  time  into  session[i]
        else if (time >= sessions[i+1].start - max_inactivity)
            merge  time  into  sessions[i+1]
        else
            insert  (time,time)  into  sessions
    
        if (session[i] and session[i+1] overlap)
            merge  session[i] and session[i+1]
    

    The merge operation can be implemented by deleting and inserting elements into the binary search tree.

    This algorithm will take time O(n log m) where m is the maximum number of sessions, which you said is rather small.

    Granted, implementing a balanced binary search tree is no easy task, depending on the programming language. The key here is that you have to split the tree according to a key and not every ready-made library supports that operation. For Java, I would use the TreeSet class; as said, the element type E is a single session given by start and end time. Its floor() and ceiling() methods will retrieve the sessions I've denoted with sessions[i] and sessions[i+1] in my pseudo-code.

提交回复
热议问题