Interval tree with added dimension of subset matching?

前端 未结 3 855
孤城傲影
孤城傲影 2021-01-01 14:54

This is an algorithmic question about a somewhat complex problem. The foundation is this:

A scheduling system based on available slots and reserved slot

相关标签:
3条回答
  • 2021-01-01 15:02

    The suggested approaches by Arne and tinker were both helpful, but not ultimately sufficient. I came up with a hybrid approach that solves it well enough.

    The main problem is that it's a three-dimensional issue, which is difficult to solve in all dimensions at once. It's not just about matching a time overlap or a tag overlap, it's about matching time slices with tag overlaps. It's simple enough to match slots to other slots based on time and even tags, but it's then pretty complicated to match an already matched availability slot to another reservation at another time. Meaning, this scenario in which one availability can cover two reservations at different times:

    +---------+
    | A, B    |
    +---------+
    xxxxx xxxxx
    x A x x A x
    xxxxx xxxxx
    

    Trying to fit this into constraint based programming requires an incredibly complex relationship of constraints which is hardly manageable. My solution to this was to simplify the problem…

    Removing one dimension

    Instead of solving all dimensions at once, it simplifies the problem enormously to largely remove the dimension of time. I did this by using my existing interval tree and slicing it as needed:

    def __init__(self, slots):
        self.tree = IntervalTree(slots)
    
    def timeslot_is_available(self, start: datetime, end: datetime, attributes: set):
        candidate = Slot(start.timestamp(), end.timestamp(), dict(type=SlotType.RESERVED, attributes=attributes))
        slots = list(self.tree[start.timestamp():end.timestamp()])
        return self.model_is_consistent(slots + [candidate])
    

    To query whether a specific slot is available, I take only the slots relevant at that specific time (self.tree[..:..]), which reduces the complexity of the calculation to a localised subset:

      |      |             +-+ = availability
    +-|------|-+           xxx = reservation
      |  +---|------+
    xx|x  xxx|x
      |  xxxx|
      |      |
    

    Then I confirm the consistency within that narrow slice:

    @staticmethod
    def model_is_consistent(slots):
        def can_handle(r):
            return lambda a: r.attributes <= a.attributes and a.contains_interval(r)
    
        av = [s for s in slots if s.type == SlotType.AVAILABLE]
        rs = [s for s in slots if s.type == SlotType.RESERVED]
    
        p = Problem()
        p.addConstraint(AllDifferentConstraint())
        p.addVariables(range(len(rs)), av)
    
        for i, r in enumerate(rs):
            p.addConstraint(can_handle(r), (i,))
    
        return p.getSolution() is not None
    

    (I'm omitting some optimisations and other code here.)

    This part is the hybrid approach of Arne's and tinker's suggestions. It uses constraint-based programming to find matching slots, using the matrix algorithm suggested by tinker. Basically: if there's any solution to this problem in which all reservations can be assigned to a different available slot, then this time slice is in a consistent state. Since I'm passing in the desired reservation slot, if the model is still consistent including that slot, this means it's safe to reserve that slot.

    This is still problematic if there are two short reservations assignable to the same availability within this narrow window, but the chances of that are low and the result is merely a false negative for an availability query; false positives would be more problematic.

    Finding available slots

    Finding all available slots is a more complex problem, so again some simplification is necessary. First, it's only possible to query the model for availabilities for a particular set of tags (there's no "give me all globally available slots"), and secondly it can only be queried with a particular granularity (desired slot length). This suits me well for my particular use case, in which I just need to offer users a list of slots they can reserve, like 9:15-9:30, 9:30-9:45, etc.. This makes the algorithm very simple by reusing the above code:

    def free_slots(self, start: datetime, end: datetime, attributes: set, granularity: timedelta):
        slots = []
        while start < end:
            slot_end = start + granularity
            if self.timeslot_is_available(start, slot_end, attributes):
                slots.append((start, slot_end))
            start += granularity
    
        return slots
    

    In other words, it just goes through all possible slots during the given time interval and literally checks whether that slot is available. It's a bit of a brute-force solution, but works perfectly fine.

    0 讨论(0)
  • 2021-01-01 15:08

    Your problem can be solved using constraint programming. In python this can be implemented using the python-constraint library.

    First, we need a way to check if two slots are consistent with each other. this is a function that returns true if two slots share a tag and their rimeframes overlap. In python this can be implemented using the following function

    def checkNoOverlap(slot1, slot2):
        shareTags = False
        for tag in slot1['tags']:
            if tag in slot2['tags']:
                shareTags = True
                break    
        if not shareTags: return True
        return not (slot2['begin'] <= slot1['begin'] <= slot2['end'] or 
                    slot2['begin'] <= slot1['end'] <= slot2['end'])
    

    I was not sure whether you wanted the tags to be completely the same (like {foo: bar} equals {foo: bar}) or only the keys (like {foo: bar} equals {foo: qux}), but you can change that in the function above.

    Consistency check

    We can use the python-constraint module for the two kinds of functionality you requested.

    The second functionality is the easiest. To implement this, we can use the function isConsistent(set) which takes a list of slots in the provided data structure as input. The function will then feed all the slots to python-constraint and will check if the list of slots is consistent (no 2 slots that shouldn't overlap, overlap) and return the consistency.

    def isConsistent(set):
            #initialize python-constraint context
            problem = Problem()
            #add all slots the context as variables with a singleton domain
            for i in range(len(set)):
                problem.addVariable(i, [set[i]])        
            #add a constraint for each possible pair of slots
            for i in range(len(set)):
                for j in range(len(set)):
                    #we don't want slots to be checked against themselves
                    if i == j:
                        continue
                    #this constraint uses the checkNoOverlap function
                    problem.addConstraint(lambda a,b: checkNoOverlap(a, b), (i, j))
            # getSolutions returns all the possible combinations of domain elements
            # because all domains are singleton, this either returns a list with length 1 (consistent) or 0 (inconsistent)
            return not len(problem.getSolutions()) == 0
    

    This function can be called whenever a user wants to add a reservation slot. The input slot can be added to the list of already existing slots and the consistency can be checked. If it is consistent, the new slot an be reserverd. Else, the new slot overlaps and should be rejected.

    Finding available slots

    This problem is a bit trickier. We can use the same functionality as above with a few significant changes. Instead of adding the new slot together with the existing slot, we now want to add all possible slots to the already existing slots. We can then check the consistency of all those possible slots with the reserved slots and ask the constraint system for the combinations that are consistent.

    Because the number of possible slots would be infinite if we didn't put any restrictions on it, we first need to declare some parameters for the program:

    MIN = 149780000 #available time slots can never start earlier then this time
    MAX = 149790000 #available time slots can never start later then this time
    GRANULARITY = 1*60 #possible time slots are always at least one minut different from each other
    

    We can now continue to the main function. It looks a lot like the consistency check, but instead of the new slot from the user, we now add a variable to discover all available slots.

    def availableSlots(tags, set):
        #same as above
        problem = Problem()
        for i in range(len(set)):
            problem.addVariable(i, [set[i]])
        #add an extra variable for the available slot is added, with a domain of all possible slots
        problem.addVariable(len(set), generatePossibleSlots(MIN, MAX, GRANULARITY, tags))
        for i in range(len(set) +1):
            for j in range(len(set) +1):
                if i == j:
                    continue
                problem.addConstraint(lambda a, b: checkNoOverlap(a, b), (i, j))
        #extract the available time slots from the solution for clean output
        return filterAvailableSlots(problem.getSolutions())
    

    I use some helper functions to keep the code cleaner. They are included here.

    def filterAvailableSlots(possibleCombinations):
        result = []
        for slots in possibleCombinations:
            for key, slot in slots.items():
                if slot['type'] == 'available':
                    result.append(slot)
    
        return result
    
    def generatePossibleSlots(min, max, granularity, tags):
        possibilities = []
        for i in range(min, max - 1, granularity):
            for j in range(i + 1, max, granularity):
                possibleSlot = {
                                  'type': 'available',
                                  'begin': i,
                                  'end': j,
                                  'tags': tags
                }
                possibilities.append(possibleSlot)
        return tuple(possibilities)
    

    You can now use the function getAvailableSlots(tags, set) with the tags for which you want the available slots and a set of already reserved slots. Note that this function really return all the consistent possible slots, so no effort is done to find the one of maximum lenght or for other optimalizations.

    Hope this helps! (I got it to work as you described in my pycharm)

    0 讨论(0)
  • 2021-01-01 15:17

    Here's a solution, I'll include all the code below.

    1. Create a table of slots, and a table of reservations

    example tables

    2. Create a matrix of reservations x slots

    which is populated by true or false values based on whether that reservation-slot combination are possible

    example boolean combinations matrix

    3. Figure out the best mapping that allows for the most Reservation-Slot Combinations

    Note: my current solution scales poorly with very large arrays as it involves looping through all possible permutations of a list with size = number of slots. I've posted another question to see if anyone can find a better way of doing this. However, this solution is accurate and can be optimized

    Python Code Source

    Part 1

    from IPython.display import display
    import pandas as pd
    import datetime
    
    available_data = [
        ['SlotA', datetime.time(11, 0, 0), datetime.time(12, 30, 0), set(list('ABD'))],
        ['SlotB',datetime.time(12, 0, 0), datetime.time(13, 30, 0), set(list('C'))],
        ['SlotC',datetime.time(12, 0, 0), datetime.time(13, 30, 0), set(list('ABCD'))],
        ['SlotD',datetime.time(12, 0, 0), datetime.time(13, 30, 0), set(list('AD'))],
    ]
    
    reservation_data = [
        ['ReservationA', datetime.time(11, 15, 0), datetime.time(12, 15, 0), set(list('AD'))],
        ['ReservationB', datetime.time(11, 15, 0), datetime.time(12, 15, 0), set(list('A'))],
        ['ReservationC', datetime.time(12, 0, 0), datetime.time(12, 15, 0), set(list('C'))],
        ['ReservationD', datetime.time(12, 0, 0), datetime.time(12, 15, 0), set(list('C'))],
        ['ReservationE', datetime.time(12, 0, 0), datetime.time(12, 15, 0), set(list('D'))]
    ]
    
    reservations = pd.DataFrame(data=reservation_data, columns=['reservations', 'begin', 'end', 'tags']).set_index('reservations')
    slots = pd.DataFrame(data=available_data, columns=['slots', 'begin', 'end', 'tags']).set_index('slots')
    
    display(slots)
    display(reservations)
    

    Part 2

    def is_possible_combination(r):
        return (r['begin'] >= slots['begin']) & (r['end'] <= slots['end']) & (r['tags'] <= slots['tags'])
    
    solution_matrix = reservations.apply(is_possible_combination, axis=1).astype(int)
    display(solution_matrix)
    

    Part 3

    import numpy as np
    from itertools import permutations
    
    # add dummy columns to make the matrix square if it is not
    sqr_matrix = solution_matrix
    if sqr_matrix.shape[0] > sqr_matrix.shape[1]:
        # uhoh, there are more reservations than slots... this can't be good
        for i in range(sqr_matrix.shape[0] - sqr_matrix.shape[1]):
            sqr_matrix.loc[:,'FakeSlot' + str(i)] = [1] * sqr_matrix.shape[0]
    elif sqr_matrix.shape[0] < sqr_matrix.shape[1]:
        # there are more slots than customers, why doesn't anyone like us?
        for i in range(sqr_matrix.shape[0] - sqr_matrix.shape[1]):
            sqr_matrix.loc['FakeCustomer' + str(i)] = [1] * sqr_matrix.shape[1]
    
    # we only want the values now
    A = solution_matrix.values.astype(int)
    
    # make an identity matrix (the perfect map)
    imatrix = np.diag([1]*A.shape[0])
    
    # randomly swap columns on the identity matrix until they match. 
    n = A.shape[0]
    
    # this will hold the map that works the best
    best_map_so_far = np.zeros([1,1])
    
    for column_order in permutations(range(n)):
        # this is an identity matrix with the columns swapped according to the permutation
        imatrix = np.zeros(A.shape)
        for row, column in enumerate(column_order):
            imatrix[row,column] = 1
    
        # is this map better than the previous best?
        if sum(sum(imatrix * A)) > sum(sum(best_map_so_far)):
            best_map_so_far = imatrix
    
        # could it be? a perfect map??
        if sum(sum(imatrix * A)) == n:
            break
    
    if sum(sum(imatrix * A)) != n:
        print('a perfect map was not found')
    
    output = pd.DataFrame(A*imatrix, columns=solution_matrix.columns, index=solution_matrix.index, dtype=int)
    display(output)
    
    0 讨论(0)
提交回复
热议问题