This question is related to rostering or staffing. I\'m trying to assign various jobs to individuals (employees). Using the df
below,
`[Person]` =
In writing my other answer, I slowly came around to the idea that the OP's algorithm might be easier to implement with an approach that focuses on the jobs (which can be different), instead of the people (which are all the same). Here's a solution that uses the job-centric approach:
from collections import Counter
import numpy as np
import pandas as pd
def assignJob(job, assignedix, areasPerPerson):
for i in range(len(assignedix)):
if (areasPerPerson - len(assignedix[i])) >= len(job):
assignedix[i].extend(job)
return True
else:
return False
def allocatePeople(df, areasPerPerson=3):
areas = df['Area'].values
times = pd.to_datetime(df['Time']).values
peopleUniq = df['Person'].unique()
npeople = int(np.ceil(areas.size / float(areasPerPerson)))
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
jobs =[]
_jobdict = {}
for i,(area,hold) in enumerate(zip(areas, holds)):
if hold:
_jobdict[area] = job = _jobdict.get(area, []) + [i]
if len(job)==areasPerPerson:
jobs.append(_jobdict.pop(area))
elif area in _jobdict:
jobs.append(_jobdict.pop(area) + [i])
else:
jobs.append([i])
jobs.sort()
assignedix = [[] for i in range(npeople)]
for job in jobs:
if not assignJob(job, assignedix, areasPerPerson):
# break the job up and try again
for subjob in ([sj] for sj in job):
assignJob(subjob, assignedix, areasPerPerson)
df = df.copy()
for i,aix in enumerate(assignedix):
df.loc[aix, 'Person'] = peopleUniq[i]
return df
This version of allocatePeople
has also been extensively tested and passes all of the same checks described in my other answer.
It does have more looping than my other solution, so it is likely to be slightly less efficient (though it'll only matter if your dataframe is very large, say 1e6
rows and up). On the other hand, it is somewhat shorter and, I think, more straightforward and easy to understand.