Re-assign column values in a pandas df

后端 未结 3 1051
傲寒
傲寒 2021-02-02 14:02

This question is related to rostering or staffing. I\'m trying to assign various jobs to individuals (employees). Using the df below,

`[Person]` =          


        
3条回答
  •  猫巷女王i
    2021-02-02 14:32

    Update

    There's a live version of this answer online that you can try for yourself.

    Here's an answer in the form of the allocatePeople function. It's based around precomputing all of the indices where the areas repeat within an hour:

    from collections import Counter
    import numpy as np
    import pandas as pd
    
    def getAssignedPeople(df, areasPerPerson):
        areas = df['Area'].values
        places = df['Place'].values
        times = pd.to_datetime(df['Time']).values
        maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
        assignmentCount = Counter()
        assignedPeople = []
        assignedPlaces = {}
        heldPeople = {}
        heldAreas = {}
        holdAvailable = True
        person = 0
    
        # search for repeated areas. Mark them if the next repeat occurs within an hour
        ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
        holds = np.zeros(areas.size, dtype=bool)
        holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
    
        for area,place,hold in zip(areas, places, holds):
            if (area, place) in assignedPlaces:
                # this unique (area, place) has already been assigned to someone
                assignedPeople.append(assignedPlaces[(area, place)])
                continue
    
            if assignmentCount[person] >= areasPerPerson:
                # the current person is already assigned to enough areas, move on to the next
                a = heldPeople.pop(person, None)
                heldAreas.pop(a, None)
                person += 1
    
            if area in heldAreas:
                # assign to the person held in this area
                p = heldAreas.pop(area)
                heldPeople.pop(p)
            else:
                # get the first non-held person. If we need to hold in this area, 
                # also make sure the person has at least 2 free assignment slots,
                # though if it's the last person assign to them anyway 
                p = person
                while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
                    p += 1
    
            assignmentCount.update([p])
            assignedPlaces[(area, place)] = p
            assignedPeople.append(p)
    
            if hold:
                if p==maxPerson:
                    # mark that there are no more people available to perform holds
                    holdAvailable = False
    
                # this area recurrs in an hour, mark that the person should be held here
                heldPeople[p] = area
                heldAreas[area] = p
    
        return assignedPeople
    
    def allocatePeople(df, areasPerPerson=3):
        assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
        df = df.copy()
        df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
        return df
    

    Note the use of df['Person'].unique() in allocatePeople. That handles the case where people are repeated in the input. It is assumed that the order of people in the input is the desired order in which those people should be assigned.

    I tested allocatePeople against the OP's example input (example1 and example2) and also against a couple of edge cases I came up with that I think(?) match the OP's desired algorithm:

    ds = dict(
    example1 = ({
        'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
        'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],                 
        'Area' : ['A','B','C','D','E','D','E','F','G'],     
        'On' : ['1','2','3','4','5','6','7','8','9'], 
        'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],   
        }),
    example2 = ({
        'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],                 
        'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],                 
        'Area' : ['X','X','X','X','X','X','X','X','X'],     
        'On' : ['1','2','3','3','3','3','3','3','3'], 
        'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],   
        }),
    
    long_repeats = ({
        'Time' : ['8:03:00','8:17:00','8:20:00','8:25:00','8:30:00','8:31:00','8:35:00','8:45:00','8:50:00'],                 
        'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],                 
        'Area' : ['A','A','A','A','B','C','C','C','B'],  
        'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 4','Person 4','Person 3'],   
        'On' : ['1','2','3','4','5','6','7','8','9'],                      
        }),
    many_repeats = ({
        'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
        'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 2'],                 
        'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'F'],     
        'On' : ['1','2','3','4','5','6','7','8','9'], 
        'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
        }),
    large_gap = ({
        'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
        'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 3'],                 
        'Area' : ['A', 'B', 'C', 'D', 'E', 'F', 'D', 'D', 'D'],     
        'On' : ['1','2','3','4','5','6','7','8','9'], 
        'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
        }),
    different_times = ({
        'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','09:42:00','09:45:00','09:50:00'],                 
        'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 1'],                 
        'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'G'],     
        'On' : ['1','2','3','4','5','6','7','8','9'], 
        'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
        })
    )
    
    expectedPeoples = dict(
        example1 = [1,1,1,2,3,2,3,2,3],
        example2 = [1,1,1,1,1,1,1,1,1],
        long_repeats = [1,1,1,2,2,3,3,3,2],
        many_repeats = [1,1,1,2,2,3,3,2,3],
        large_gap = [1,1,1,2,3,3,2,2,3],
        different_times = [1,1,1,2,2,2,3,3,3],
    )
    
    for name,d in ds.items():
        df = pd.DataFrame(d)
        expected = ['Person %d' % i for i in expectedPeoples[name]]
        ap = allocatePeople(df)
    
        print(name, ap, sep='\n', end='\n\n')
        np.testing.assert_array_equal(ap['Person'], expected)
    

    The assert_array_equal statements pass, and the output matches OP's expected output:

    example1
           Time    Place Area On    Person
    0   8:03:00  House 1    A  1  Person 1
    1   8:17:00  House 2    B  2  Person 1
    2   8:20:00  House 3    C  3  Person 1
    3   8:28:00  House 4    D  4  Person 2
    4   8:35:00  House 5    E  5  Person 3
    5  08:40:00  House 1    D  6  Person 2
    6  08:42:00  House 2    E  7  Person 3
    7  08:45:00  House 3    F  8  Person 2
    8  08:50:00  House 2    G  9  Person 3
    
    example2
          Time    Place Area On    Person
    0  8:03:00  House 1    X  1  Person 1
    1  8:17:00  House 2    X  2  Person 1
    2  8:20:00  House 3    X  3  Person 1
    3  8:28:00  House 1    X  3  Person 1
    4  8:35:00  House 2    X  3  Person 1
    5  8:40:00  House 3    X  3  Person 1
    6  8:42:00  House 1    X  3  Person 1
    7  8:45:00  House 2    X  3  Person 1
    8  8:50:00  House 3    X  3  Person 1
    

    The output for my test cases matches my expectations as well:

    long_repeats
          Time    Place Area    Person On
    0  8:03:00  House 1    A  Person 1  1
    1  8:17:00  House 2    A  Person 1  2
    2  8:20:00  House 3    A  Person 1  3
    3  8:25:00  House 4    A  Person 2  4
    4  8:30:00  House 1    B  Person 2  5
    5  8:31:00  House 1    C  Person 3  6
    6  8:35:00  House 2    C  Person 3  7
    7  8:45:00  House 3    C  Person 3  8
    8  8:50:00  House 2    B  Person 2  9
    
    many_repeats
           Time    Place Area On    Person
    0   8:03:00  House 1    A  1  Person 1
    1   8:17:00  House 2    B  2  Person 1
    2   8:20:00  House 3    C  3  Person 1
    3   8:28:00  House 4    D  4  Person 2
    4   8:35:00  House 1    D  5  Person 2
    5  08:40:00  House 1    E  6  Person 3
    6  08:42:00  House 2    E  7  Person 3
    7  08:45:00  House 1    F  8  Person 2
    8  08:50:00  House 2    F  9  Person 3
    
    large_gap
           Time    Place Area On    Person
    0   8:03:00  House 1    A  1  Person 1
    1   8:17:00  House 2    B  2  Person 1
    2   8:20:00  House 3    C  3  Person 1
    3   8:28:00  House 4    D  4  Person 2
    4   8:35:00  House 1    E  5  Person 3
    5  08:40:00  House 1    F  6  Person 3
    6  08:42:00  House 2    D  7  Person 2
    7  08:45:00  House 1    D  8  Person 2
    8  08:50:00  House 3    D  9  Person 3
    
    different_times
           Time    Place Area On    Person
    0   8:03:00  House 1    A  1  Person 1
    1   8:17:00  House 2    B  2  Person 1
    2   8:20:00  House 3    C  3  Person 1
    3   8:28:00  House 4    D  4  Person 2
    4   8:35:00  House 1    D  5  Person 2
    5  08:40:00  House 1    E  6  Person 2
    6  09:42:00  House 2    E  7  Person 3
    7  09:45:00  House 1    F  8  Person 3
    8  09:50:00  House 1    G  9  Person 3
    

    Let me know if it does everything you wanted, or if it still needs some tweaks. I think everyone is eager to see you fulfill your vision.

提交回复
热议问题