Split a list of dates into subsets of consecutive dates

我们两清 提交于 2020-08-27 07:16:05

问题


I've got an array of dates that can contain multiple date ranges in it.

dates = [
 '2020-01-01',
 '2020-01-02',
 '2020-01-03',
 '2020-01-06',
 '2020-01-07',
 '2020-01-08'
]

In this example, the list contains 2 separate consecutive date ranges (2020-01-01 to 2020-01-03 & 2020-01-06 to 2020-01-08)

I'm attempting to figure out how I would loop through this list and find all the consecutive date ranges.

One of the articles I'm looking at (How to detect if dates are consecutive in Python?) seems to have a good approach, however, I'm struggling to implement this logic in my use case.


回答1:


More itertools has a function called consecutive_groups that does this for you:

Or you can view the source code and copy it's approach:

from datetime import datetime
from itertools import groupby
from operator import itemgetter

def consecutive_groups(iterable, ordering=lambda x: x):
    for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
        yield map(itemgetter(1), g)

for g in consecutive_groups(dates, lambda x: datetime.strptime(x, '%Y-%m-%d').toordinal()):
    print(list(g))

['2020-01-01', '2020-01-02', '2020-01-03']
['2020-01-06', '2020-01-07', '2020-01-08']



回答2:


This assumes that single-date "ranges" are still represented by 2 dates:

def makedate(s):
    return datetime.strptime( s, "%Y-%m-%d" )
def splitIntoRanges( dates ):
    ranges = []
    start_s = last_s = dates[0]
    last = makedate(start_s)
    for curr_s in dates[1:]:
        curr = makedate(curr_s)
        if (curr - last).days > 1:
            ranges.append((start_s,last_s))
            start_s = curr_s
        last_s = curr_s
        last = curr
    return ranges + [(start_s,last_s)]



回答3:


I took a similar, though definitely not quite as elegant approach as @Scott:

ranges = []

dates = [datetime.strptime(date, '%Y-%m-%d') for date in dates]
start = dates[0]

for i in range(1, len(dates)):
    if (dates[i] - dates[i-1]).days == 1 and i==len(dates)-1:
        end = dates[i]
        ranges.append(f'{start} to {end}')
        start = dates[i]
    elif (dates[i] - dates[i - 1]).days > 1:
        end = dates[i - 1]
        ranges.append(f'{start} to {end}')
        start = dates[i]
    else:
        continue



回答4:


I found the key to my solution in a second post and pieced it together.

There are two parts to my issue:

  1. How do I represent a list of dates in an effective manner

Answer: https://stackoverflow.com/a/9589929/2150673

pto = [
    '2020-01-03',
    '2020-01-08',
    '2020-01-02',
    '2020-01-07',
    '2020-01-01',
    '2020-01-06'
]

ordinal_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto]
  1. Once you have a list of dates in integer representation, you can simply look for consecutive integers and get the upper and lower bounds of each range, and then convert back to yyyy-mm-dd format.

Answer: https://stackoverflow.com/a/48106843

def ranges(nums):
    nums = sorted(set(nums))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    return list(zip(edges, edges))

My complete function:

def get_date_ranges(pto_list: list) -> list:
    pto_dates = [datetime.datetime.strptime(i, '%Y-%m-%d').toordinal() for i in pto_list]
    nums = sorted(set(pto_dates))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s + 1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    ordinal_ranges = list(zip(edges, edges))
    date_bounds = []
    for start, end in ordinal_ranges:
        date_bounds.append((
            datetime.datetime.fromordinal(start).strftime('%Y-%m-%d'),
            datetime.datetime.fromordinal(end).strftime('%Y-%m-%d')
        ))
    return date_bounds



回答5:


You can find all the consecutive date ranges and append them to a list of list and access your ranges based on the index but I prefer using keys within a dictionary for readability.

Here is how: (note: please read comments)

dates = [datetime.strptime(d, "%Y-%m-%d") for d in dates] # new datetime parsed from a string
date_ints = [d.toordinal() for d in dates]  # toordinal() returns the day count from the date 01/01/01 in integers
ranges = {}; arange = []; prev=0; index=0; j=1
for i in date_ints: # iterate through date integers
    if i+1 == date_ints[index] + 1 and i - 1 == prev: # check and compare if integers are in sequence
        arange.append(dates[index].strftime("%Y-%m-%d"))
    elif prev == 0: # append first date to 'arange' list since 'prev' has not been updated
        arange.append(dates[index].strftime("%Y-%m-%d"))
    else:
        ranges.update({f'Range{j}': tuple(arange)}) # integer are no longer in sequence, update dictionary with new range  
        arange = []; j += 1                                   # clear 'arange' and start appending to new range  
        arange.append(dates[index].strftime("%Y-%m-%d"))
    index += 1; prev = i
ranges.update({f'Range{j}': tuple(arange)})
print(ranges)  
print(ranges['Range1'])  # access a range based on the associated key
print(ranges['Range2']) 

outputs:

{'Range1': ('2020-01-01', '2020-01-02', '2020-01-03'), 'Range2': ('2020-01-06', '2020-01-07', '2020-01-08')}
('2020-01-01', '2020-01-02', '2020-01-03')
('2020-01-06', '2020-01-07', '2020-01-08')


来源:https://stackoverflow.com/questions/59774541/split-a-list-of-dates-into-subsets-of-consecutive-dates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!