Re-assign column values in a pandas df

后端未结

关注

 3  1047

傲寒 2021-02-02 14:02

This question is related to rostering or staffing. I\'m trying to assign various jobs to individuals (employees). Using the df below,

`[Person]` =


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   鱼传尺愫
                                             
                
                
                (楼主)
            
              
              
                2021-02-02 14:20
              

            
            
                        
Ok, before we delve into the logic of the problem it is worthwhile to do some housekeeping to tidy-up the data and bring it into a more useful format:

#Create table of unique people
unique_people = df[['Person']].drop_duplicates().sort_values(['Person']).reset_index(drop=True)

#Reformat time column
df['Time'] = pd.to_datetime(df['Time'])


Now, getting to the logic of the problem, it is useful to break the problem down in to stages.  Firstly, we will want to create individual jobs (with job numbers) based on the 'Area' and the time between them. i.e. jobs in the same area, within an hour can share the same job number.

#Assign jobs
df= df.sort_values(['Area','Time']).reset_index(drop=True)
df['Job no'] = 0
current_job = 1   
df.loc[0,'Job no'] = current_job
for i in range(rows-1):
    prev_row = df.loc[i]
    row = df.loc[i+1]
    time_diff = (row['Time'] - prev_row['Time']).seconds //3600
    if (row['Area'] == prev_row['Area'])  & (time_diff == 0):
        pass
    else:
        current_job +=1
    df.loc[i+1,'Job no'] = current_job


With this step now out of the way, it is a simple matter of assigning 'Persons' to individual jobs:

df= df.sort_values(['Job no']).reset_index(drop=True)
df['Person'] = ""
df_groups = df.groupby('Job no')
for group in df_groups:
    group_size = group[1].count()['Time']
    for person_idx in range(len(unique_people)):
        person = unique_people.loc[person_idx]['Person']
        person_count = df[df['Person']==person]['Person'].count()
        if group_size <= (3-person_count):
            idx = group[1].index.values
            df.loc[idx,'Person'] = person
            break


And finally,

df= df.sort_values(['Time']).reset_index(drop=True)
print(df)


I've attempted to code this in a way that is easier to unpick, so there may well be efficiencies to be made here.  The aim however was to set out the logic used.

This code gives the expected results on both data sets, so I hope it answers your question.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复