Re-assign column values in a pandas df

后端未结

关注

 3  1048

This question is related to rostering or staffing. I\'m trying to assign various jobs to individuals (employees). Using the df below,

`[Person]` =


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  花落未央        
                
              
                            
                2021-02-02 14:18
              
            
            
                                                                       
In writing my other answer, I slowly came around to the idea that the OP's algorithm might be easier to implement with an approach that focuses on the jobs (which can be different), instead of the people (which are all the same). Here's a solution that uses the job-centric approach:

from collections import Counter
import numpy as np
import pandas as pd

def assignJob(job, assignedix, areasPerPerson):
    for i in range(len(assignedix)):
        if (areasPerPerson - len(assignedix[i])) >= len(job):
            assignedix[i].extend(job)
            return True
    else:
        return False

def allocatePeople(df, areasPerPerson=3):
    areas = df['Area'].values
    times = pd.to_datetime(df['Time']).values
    peopleUniq = df['Person'].unique()
    npeople = int(np.ceil(areas.size / float(areasPerPerson)))

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    jobs =[]
    _jobdict = {}
    for i,(area,hold) in enumerate(zip(areas, holds)):
        if hold:
            _jobdict[area] = job = _jobdict.get(area, []) + [i]
            if len(job)==areasPerPerson:
                jobs.append(_jobdict.pop(area))
        elif area in _jobdict:
            jobs.append(_jobdict.pop(area) + [i])
        else:
            jobs.append([i])
    jobs.sort()

    assignedix = [[] for i in range(npeople)]
    for job in jobs:
        if not assignJob(job, assignedix, areasPerPerson):
            # break the job up and try again
            for subjob in ([sj] for sj in job):
                assignJob(subjob, assignedix, areasPerPerson)

    df = df.copy()
    for i,aix in enumerate(assignedix):
        df.loc[aix, 'Person'] = peopleUniq[i]
    return df


This version of allocatePeople has also been extensively tested and passes all of the same checks described in my other answer.

It does have more looping than my other solution, so it is likely to be slightly less efficient (though it'll only matter if your dataframe is very large, say 1e6 rows and up). On the other hand, it is somewhat shorter and, I think, more straightforward and easy to understand.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2021-02-02 14:20
              
            
            
                                                                       
Ok, before we delve into the logic of the problem it is worthwhile to do some housekeeping to tidy-up the data and bring it into a more useful format:

#Create table of unique people
unique_people = df[['Person']].drop_duplicates().sort_values(['Person']).reset_index(drop=True)

#Reformat time column
df['Time'] = pd.to_datetime(df['Time'])


Now, getting to the logic of the problem, it is useful to break the problem down in to stages.  Firstly, we will want to create individual jobs (with job numbers) based on the 'Area' and the time between them. i.e. jobs in the same area, within an hour can share the same job number.

#Assign jobs
df= df.sort_values(['Area','Time']).reset_index(drop=True)
df['Job no'] = 0
current_job = 1   
df.loc[0,'Job no'] = current_job
for i in range(rows-1):
    prev_row = df.loc[i]
    row = df.loc[i+1]
    time_diff = (row['Time'] - prev_row['Time']).seconds //3600
    if (row['Area'] == prev_row['Area'])  & (time_diff == 0):
        pass
    else:
        current_job +=1
    df.loc[i+1,'Job no'] = current_job


With this step now out of the way, it is a simple matter of assigning 'Persons' to individual jobs:

df= df.sort_values(['Job no']).reset_index(drop=True)
df['Person'] = ""
df_groups = df.groupby('Job no')
for group in df_groups:
    group_size = group[1].count()['Time']
    for person_idx in range(len(unique_people)):
        person = unique_people.loc[person_idx]['Person']
        person_count = df[df['Person']==person]['Person'].count()
        if group_size <= (3-person_count):
            idx = group[1].index.values
            df.loc[idx,'Person'] = person
            break


And finally,

df= df.sort_values(['Time']).reset_index(drop=True)
print(df)


I've attempted to code this in a way that is easier to unpick, so there may well be efficiencies to be made here.  The aim however was to set out the logic used.

This code gives the expected results on both data sets, so I hope it answers your question.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  猫巷女王i        
                
              
                            
                2021-02-02 14:32
              
            
            
                                                                       
Update

There's a live version of this answer online that you can try for yourself.

Here's an answer in the form of the allocatePeople function. It's based around precomputing all of the indices where the areas repeat within an hour:

from collections import Counter
import numpy as np
import pandas as pd

def getAssignedPeople(df, areasPerPerson):
    areas = df['Area'].values
    places = df['Place'].values
    times = pd.to_datetime(df['Time']).values
    maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
    assignmentCount = Counter()
    assignedPeople = []
    assignedPlaces = {}
    heldPeople = {}
    heldAreas = {}
    holdAvailable = True
    person = 0

    # search for repeated areas. Mark them if the next repeat occurs within an hour
    ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
    holds = np.zeros(areas.size, dtype=bool)
    holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')

    for area,place,hold in zip(areas, places, holds):
        if (area, place) in assignedPlaces:
            # this unique (area, place) has already been assigned to someone
            assignedPeople.append(assignedPlaces[(area, place)])
            continue

        if assignmentCount[person] >= areasPerPerson:
            # the current person is already assigned to enough areas, move on to the next
            a = heldPeople.pop(person, None)
            heldAreas.pop(a, None)
            person += 1

        if area in heldAreas:
            # assign to the person held in this area
            p = heldAreas.pop(area)
            heldPeople.pop(p)
        else:
            # get the first non-held person. If we need to hold in this area, 
            # also make sure the person has at least 2 free assignment slots,
            # though if it's the last person assign to them anyway 
            p = person
            while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
                p += 1

        assignmentCount.update([p])
        assignedPlaces[(area, place)] = p
        assignedPeople.append(p)

        if hold:
            if p==maxPerson:
                # mark that there are no more people available to perform holds
                holdAvailable = False

            # this area recurrs in an hour, mark that the person should be held here
            heldPeople[p] = area
            heldAreas[area] = p

    return assignedPeople

def allocatePeople(df, areasPerPerson=3):
    assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
    df = df.copy()
    df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
    return df


Note the use of df['Person'].unique() in allocatePeople. That handles the case where people are repeated in the input. It is assumed that the order of people in the input is the desired order in which those people should be assigned.

I tested allocatePeople against the OP's example input (example1 and example2) and also against a couple of edge cases I came up with that I think(?) match the OP's desired algorithm:

ds = dict(
example1 = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 5','House 1','House 2','House 3','House 2'],                 
    'Area' : ['A','B','C','D','E','D','E','F','G'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 2','Person 3','Person 4','Person 5','Person 4','Person 5','Person 6','Person 7'],   
    }),
example2 = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','8:40:00','8:42:00','8:45:00','8:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 1','House 2','House 3','House 1','House 2','House 3'],                 
    'Area' : ['X','X','X','X','X','X','X','X','X'],     
    'On' : ['1','2','3','3','3','3','3','3','3'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1'],   
    }),

long_repeats = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:25:00','8:30:00','8:31:00','8:35:00','8:45:00','8:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 3','House 2'],                 
    'Area' : ['A','A','A','A','B','C','C','C','B'],  
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 4','Person 4','Person 3'],   
    'On' : ['1','2','3','4','5','6','7','8','9'],                      
    }),
many_repeats = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 2'],                 
    'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'F'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
    }),
large_gap = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','08:42:00','08:45:00','08:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 3'],                 
    'Area' : ['A', 'B', 'C', 'D', 'E', 'F', 'D', 'D', 'D'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
    }),
different_times = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','8:28:00','8:35:00','08:40:00','09:42:00','09:45:00','09:50:00'],                 
    'Place' : ['House 1','House 2','House 3','House 4','House 1','House 1','House 2','House 1','House 1'],                 
    'Area' : ['A', 'B', 'C', 'D', 'D', 'E', 'E', 'F', 'G'],     
    'On' : ['1','2','3','4','5','6','7','8','9'], 
    'Person' : ['Person 1','Person 1','Person 1','Person 2','Person 3','Person 4','Person 3','Person 5','Person 6'],   
    })
)

expectedPeoples = dict(
    example1 = [1,1,1,2,3,2,3,2,3],
    example2 = [1,1,1,1,1,1,1,1,1],
    long_repeats = [1,1,1,2,2,3,3,3,2],
    many_repeats = [1,1,1,2,2,3,3,2,3],
    large_gap = [1,1,1,2,3,3,2,2,3],
    different_times = [1,1,1,2,2,2,3,3,3],
)

for name,d in ds.items():
    df = pd.DataFrame(d)
    expected = ['Person %d' % i for i in expectedPeoples[name]]
    ap = allocatePeople(df)

    print(name, ap, sep='\n', end='\n\n')
    np.testing.assert_array_equal(ap['Person'], expected)


The assert_array_equal statements pass, and the output matches OP's expected output:

example1
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 5    E  5  Person 3
5  08:40:00  House 1    D  6  Person 2
6  08:42:00  House 2    E  7  Person 3
7  08:45:00  House 3    F  8  Person 2
8  08:50:00  House 2    G  9  Person 3

example2
      Time    Place Area On    Person
0  8:03:00  House 1    X  1  Person 1
1  8:17:00  House 2    X  2  Person 1
2  8:20:00  House 3    X  3  Person 1
3  8:28:00  House 1    X  3  Person 1
4  8:35:00  House 2    X  3  Person 1
5  8:40:00  House 3    X  3  Person 1
6  8:42:00  House 1    X  3  Person 1
7  8:45:00  House 2    X  3  Person 1
8  8:50:00  House 3    X  3  Person 1


The output for my test cases matches my expectations as well:

long_repeats
      Time    Place Area    Person On
0  8:03:00  House 1    A  Person 1  1
1  8:17:00  House 2    A  Person 1  2
2  8:20:00  House 3    A  Person 1  3
3  8:25:00  House 4    A  Person 2  4
4  8:30:00  House 1    B  Person 2  5
5  8:31:00  House 1    C  Person 3  6
6  8:35:00  House 2    C  Person 3  7
7  8:45:00  House 3    C  Person 3  8
8  8:50:00  House 2    B  Person 2  9

many_repeats
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 1    D  5  Person 2
5  08:40:00  House 1    E  6  Person 3
6  08:42:00  House 2    E  7  Person 3
7  08:45:00  House 1    F  8  Person 2
8  08:50:00  House 2    F  9  Person 3

large_gap
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 1    E  5  Person 3
5  08:40:00  House 1    F  6  Person 3
6  08:42:00  House 2    D  7  Person 2
7  08:45:00  House 1    D  8  Person 2
8  08:50:00  House 3    D  9  Person 3

different_times
       Time    Place Area On    Person
0   8:03:00  House 1    A  1  Person 1
1   8:17:00  House 2    B  2  Person 1
2   8:20:00  House 3    C  3  Person 1
3   8:28:00  House 4    D  4  Person 2
4   8:35:00  House 1    D  5  Person 2
5  08:40:00  House 1    E  6  Person 2
6  09:42:00  House 2    E  7  Person 3
7  09:45:00  House 1    F  8  Person 3
8  09:50:00  House 1    G  9  Person 3


Let me know if it does everything you wanted, or if it still needs some tweaks. I think everyone is eager to see you fulfill your vision.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复