In Python how do I find all the missing days in a sorted list of dates?
Sort the list of dates and iterate over it, remembering the previous entry. If the difference between the previous and current entry is more than one day, you have missing days.
Here's one way to implement it:
from datetime import date, timedelta
from itertools import tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
b.next()
return izip(a, b)
def missing_dates(dates):
for prev, curr in pairwise(sorted(dates)):
i = prev
while i + timedelta(1) < curr:
i += timedelta(1)
yield i
dates = [ date(2010, 1, 8),
date(2010, 1, 2),
date(2010, 1, 5),
date(2010, 1, 1),
date(2010, 1, 7) ]
for missing in missing_dates(dates):
print missing
Output:
2010-01-03
2010-01-04
2010-01-06
Performance is O(n*log(n)) where n is the number of days in the span when the input is unsorted. As your list is already sorted, it will run in O(n).
A good way of getting this done in Python is as follows. You need not worry about efficiency unless you have dates from multiple years in your list and this code always needs to run as per user interaction and yield output immediately.
Create a function that gives you all the dates from start_date
to end_date
. And use it.
import datetime
def get_dates(start_date, end_date):
span_between_dates = range(end_date - start_date).days
for index in span_between_dates + 1:
# +1 is to make start and end dates inclusive.
yield start_date + datetime.timedelta(index)
my_date_list = ['2017-03-05', '2017-03_07', ...]
# Edit my_date_list as per your requirement.
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
if date not in my_date_list:
print date
get_dates
function should be defined.
my_other_date_list = [] # your other date range
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
if (date in my_date_range) and (date in my_other_date_list):
print ('overlapping dates between 2 lists:')
print date
elif (date in my_date_range) and (date not in my_other_date_list):
print ('missing dates:')
print date
using sets
>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23), date(2010, 2, 24), date(2010, 2, 25),
date(2010, 2, 26), date(2010, 3, 1), date(2010, 3, 2)]
>>> date_set = set(d[0] + timedelta(x) for x in range((d[-1] - d[0]).days))
>>> missing = sorted(date_set - set(d))
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
>>>
USING A FOR LOOP
The imports you'll need:
import datetime
from datetime import date, timedelta
Let's say you have a sorted list called dates
with several missing dates in it.
First select the first and last date:
start_date = dates[0]
end_date = dates[len(dates)-1]
Than count the number of days between these two dates:
numdays = (end_date - start_date).days
Than create a new list with all dates between start_date and end_date:
all_dates = []
for x in range (0, (numdays+1)):
all_dates.append(start_date + datetime.timedelta(days = x))
Than check with dates are in all_dates
but not in dates
by using a for loop with range and adding these dates to dates_missing:
dates_missing = []
for i in range (0, len(all_dates)):
if (all_dates[i] not in dates):
dates_missing.append(all_dates[i])
else:
pass
Now you'll have a list called dates_missing
with all the missing dates.
import datetime
DAY = datetime.timedelta(days=1)
# missing dates: a list of [start_date, end)
missing = [(d1+DAY, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1) > DAY]
def date_range(start_date, end, step=DAY):
d = start_date
while d < end:
yield d
d += step
missing_dates = [d for d1, d2 in missing for d in date_range(d1, d2)]
>>> from datetime import datetime, timedelta
>>> date_list = [datetime(2010, 2, 23),datetime(2010, 2, 24),datetime(2010, 2, 25),datetime(2010, 2, 26),datetime(2010, 3, 1),datetime(2010, 3, 2)]
>>>
>>> date_set=set(date_list) # for faster membership tests than list
>>> one_day = timedelta(days=1)
>>>
>>> test_date = date_list[0]
>>> missing_dates=[]
>>> while test_date < date_list[-1]:
... if test_date not in date_set:
... missing_dates.append(test_date)
... test_date += one_day
...
>>> print missing_dates
[datetime.datetime(2010, 2, 27, 0, 0), datetime.datetime(2010, 2, 28, 0, 0)]
This also works for datetime.date
objects, but the OP says the list is datetime.datetime
objects