Re-assign column values in a pandas df

后端 未结 3 1047
傲寒
傲寒 2021-02-02 14:02

This question is related to rostering or staffing. I\'m trying to assign various jobs to individuals (employees). Using the df below,

`[Person]` =          


        
3条回答
  •  鱼传尺愫
    2021-02-02 14:20

    Ok, before we delve into the logic of the problem it is worthwhile to do some housekeeping to tidy-up the data and bring it into a more useful format:

    #Create table of unique people
    unique_people = df[['Person']].drop_duplicates().sort_values(['Person']).reset_index(drop=True)
    
    #Reformat time column
    df['Time'] = pd.to_datetime(df['Time'])
    

    Now, getting to the logic of the problem, it is useful to break the problem down in to stages. Firstly, we will want to create individual jobs (with job numbers) based on the 'Area' and the time between them. i.e. jobs in the same area, within an hour can share the same job number.

    #Assign jobs
    df= df.sort_values(['Area','Time']).reset_index(drop=True)
    df['Job no'] = 0
    current_job = 1   
    df.loc[0,'Job no'] = current_job
    for i in range(rows-1):
        prev_row = df.loc[i]
        row = df.loc[i+1]
        time_diff = (row['Time'] - prev_row['Time']).seconds //3600
        if (row['Area'] == prev_row['Area'])  & (time_diff == 0):
            pass
        else:
            current_job +=1
        df.loc[i+1,'Job no'] = current_job
    

    With this step now out of the way, it is a simple matter of assigning 'Persons' to individual jobs:

    df= df.sort_values(['Job no']).reset_index(drop=True)
    df['Person'] = ""
    df_groups = df.groupby('Job no')
    for group in df_groups:
        group_size = group[1].count()['Time']
        for person_idx in range(len(unique_people)):
            person = unique_people.loc[person_idx]['Person']
            person_count = df[df['Person']==person]['Person'].count()
            if group_size <= (3-person_count):
                idx = group[1].index.values
                df.loc[idx,'Person'] = person
                break
    

    And finally,

    df= df.sort_values(['Time']).reset_index(drop=True)
    print(df)
    

    I've attempted to code this in a way that is easier to unpick, so there may well be efficiencies to be made here. The aim however was to set out the logic used.

    This code gives the expected results on both data sets, so I hope it answers your question.

提交回复
热议问题