How do I find missing dates in a list of sorted dates?

前端 未结 8 1337
孤街浪徒
孤街浪徒 2021-02-04 04:15

In Python how do I find all the missing days in a sorted list of dates?

相关标签:
8条回答
  • 2021-02-04 04:26

    Sort the list of dates and iterate over it, remembering the previous entry. If the difference between the previous and current entry is more than one day, you have missing days.

    Here's one way to implement it:

    from datetime import date, timedelta
    from itertools import tee, izip
    
    def pairwise(iterable):
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        b.next()
        return izip(a, b)
    
    def missing_dates(dates):
        for prev, curr in pairwise(sorted(dates)):
            i = prev
            while i + timedelta(1) < curr:
                i += timedelta(1)
                yield i
    
    dates = [ date(2010, 1, 8),
              date(2010, 1, 2),
              date(2010, 1, 5),
              date(2010, 1, 1),
              date(2010, 1, 7) ]
    
    for missing in missing_dates(dates):
        print missing
    

    Output:

    2010-01-03
    2010-01-04
    2010-01-06
    

    Performance is O(n*log(n)) where n is the number of days in the span when the input is unsorted. As your list is already sorted, it will run in O(n).

    0 讨论(0)
  • 2021-02-04 04:26

    A good way of getting this done in Python is as follows. You need not worry about efficiency unless you have dates from multiple years in your list and this code always needs to run as per user interaction and yield output immediately.

    1. Get missing dates from one list (sorted or not)

    Create a function that gives you all the dates from start_date to end_date. And use it.

    import datetime
    
    def get_dates(start_date, end_date):
        span_between_dates = range(end_date - start_date).days
        for index in span_between_dates + 1:
            # +1 is to make start and end dates inclusive.
            yield start_date + datetime.timedelta(index)
    
    my_date_list = ['2017-03-05', '2017-03_07', ...]
    # Edit my_date_list as per your requirement.
    start_date = min(my_date_list)
    end_date = max(my_date_list)
    for current_date in get_dates(start_date, end_date)
        if date not in my_date_list:
            print date
    
    1. Get missing or overlapping dates between two date ranges.

    get_dates function should be defined.

    my_other_date_list = []  # your other date range
    start_date = min(my_date_list)
    end_date = max(my_date_list)
    for current_date in get_dates(start_date, end_date)
        if (date in my_date_range) and (date in my_other_date_list):
            print ('overlapping dates between 2 lists:')
            print date
        elif (date in my_date_range) and (date not in my_other_date_list):
            print ('missing dates:')
            print date
    
    0 讨论(0)
  • 2021-02-04 04:27

    using sets

    >>> from datetime import date, timedelta
    >>> d = [date(2010, 2, 23), date(2010, 2, 24), date(2010, 2, 25),
             date(2010, 2, 26), date(2010, 3, 1), date(2010, 3, 2)]
    >>> date_set = set(d[0] + timedelta(x) for x in range((d[-1] - d[0]).days))
    >>> missing = sorted(date_set - set(d))
    >>> missing
    [datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
    >>> 
    
    0 讨论(0)
  • 2021-02-04 04:30

    USING A FOR LOOP

    The imports you'll need:

    import datetime
    from datetime import date, timedelta
    

    Let's say you have a sorted list called dates with several missing dates in it.

    First select the first and last date:

    start_date = dates[0]
    end_date = dates[len(dates)-1]
    

    Than count the number of days between these two dates:

    numdays = (end_date - start_date).days
    

    Than create a new list with all dates between start_date and end_date:

    all_dates = []
    for x in range (0, (numdays+1)):
    all_dates.append(start_date + datetime.timedelta(days = x))
    

    Than check with dates are in all_dates but not in dates by using a for loop with range and adding these dates to dates_missing:

    dates_missing = []
    
    for i in range (0, len(all_dates)):
       if (all_dates[i] not in dates):
           dates_missing.append(all_dates[i])
       else:
           pass
    

    Now you'll have a list called dates_missing with all the missing dates.

    0 讨论(0)
  • 2021-02-04 04:36
    import datetime
    
    DAY = datetime.timedelta(days=1)
    # missing dates: a list of [start_date, end)
    missing = [(d1+DAY, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1) > DAY]
    
    def date_range(start_date, end, step=DAY):
        d = start_date
        while d < end:
            yield d
            d += step
    
    missing_dates = [d for d1, d2 in missing for d in date_range(d1, d2)]
    
    0 讨论(0)
  • 2021-02-04 04:38
    >>> from datetime import datetime, timedelta
    >>> date_list = [datetime(2010, 2, 23),datetime(2010, 2, 24),datetime(2010, 2, 25),datetime(2010, 2, 26),datetime(2010, 3, 1),datetime(2010, 3, 2)]
    >>> 
    >>> date_set=set(date_list)         # for faster membership tests than list
    >>> one_day = timedelta(days=1)
    >>> 
    >>> test_date = date_list[0]
    >>> missing_dates=[]
    >>> while test_date < date_list[-1]:
    ...     if test_date not in date_set:
    ...         missing_dates.append(test_date)
    ...     test_date += one_day
    ... 
    >>> print missing_dates
    [datetime.datetime(2010, 2, 27, 0, 0), datetime.datetime(2010, 2, 28, 0, 0)]
    

    This also works for datetime.date objects, but the OP says the list is datetime.datetime objects

    0 讨论(0)
提交回复
热议问题