Efficient way to find missing elements in an integer sequence

前端 未结 16 1196
鱼传尺愫
鱼传尺愫 2020-12-01 04:32

Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomp

相关标签:
16条回答
  • 2020-12-01 05:14

    With this code you can find any missing values in a sequence, except the last number. It in only required to input your data into excel file with column name "numbers".

    import pandas as pd
    import numpy as np
    
    data = pd.read_excel("numbers.xlsx")
    
    data_sort=data.sort_values('numbers',ascending=True)
    index=list(range(len(data_sort)))
    data_sort['index']=index
    data_sort['index']=data_sort['index']+1
    missing=[]
    
    for i in range (len(data_sort)-1):
        if data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]>1:
            gap=data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]
            numerator=1
            for j in range (1,gap):          
                mis_value=data_sort['numbers'].iloc[i+1]-numerator
                missing.append(mis_value)
                numerator=numerator+1
    print(np.sort(missing))
    
    0 讨论(0)
  • 2020-12-01 05:16

    We found a missing value if the difference between two consecutive numbers is greater than 1:

    >>> L = [10,11,13,14,15,16,17,18,20]
    >>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
    [12, 19]
    

    Note: Python 3. In Python 2 use itertools.izip.

    Improved version for more than one value missing in a row:

    >>> import itertools as it
    >>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
    >>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
                                  it.islice(L, 1, None)) 
         for diff in range(1, y - x) if diff]
    [12, 13, 19]
    
    0 讨论(0)
  • 2020-12-01 05:21
    def missing_elements(inlist):
        if len(inlist) <= 1:
            return []
        else:
            if inlist[1]-inlist[0] > 1:
                return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:])
            else:
                return missing_elements(inlist[1:])
    
    0 讨论(0)
  • 2020-12-01 05:22

    I stumbled on this looking for a different kind of efficiency -- given a list of unique serial numbers, possibly very sparse, yield the next available serial number, without creating the entire set in memory. (Think of an inventory where items come and go frequently, but some are long-lived.)

    def get_serial(string_ids, longtail=False):
      int_list = map(int, string_ids)
      int_list.sort()
      n = len(int_list)
      for i in range(0, n-1):
        nextserial = int_list[i]+1
        while nextserial < int_list[i+1]:
          yield nextserial
          nextserial+=1
      while longtail:
        nextserial+=1
        yield nextserial
    [...]
    def main():
      [...]
      serialgenerator = get_serial(list1, longtail=True)
      while somecondition:
        newserial = next(serialgenerator)
    

    (Input is a list of string representations of integers, yield is an integer, so not completely generic code. longtail provides extrapolation if we run out of range.)

    There's also an answer to a similar question which suggests using a bitarray for efficiently handling a large sequence of integers.

    Some versions of my code used functions from itertools but I ended up abandoning that approach.

    0 讨论(0)
  • 2020-12-01 05:28

    Using scipy lib:

    import math
    from scipy.optimize import fsolve
    
    def mullist(a):
        mul = 1
        for i in a:
            mul = mul*i
        return mul
    
    a = [1,2,3,4,5,6,9,10]
    s = sum(a)
    so = sum(range(1,11))
    mulo = mullist(range(1,11))
    mul = mullist(a)
    over = mulo/mul
    delta = so -s
    # y = so - s -x
    # xy = mulo/mul
    def func(x):
        return (so -s -x)*x-over
    
    print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))
    

    Timing it:

    $ python -mtimeit -s "$(cat with_scipy.py)" 
    
    7 8
    
    100000000 loops, best of 3: 0.0181 usec per loop
    

    Other option is:

    >>> from sets import Set
    >>> a = Set(range(1,11))
    >>> b = Set([1,2,3,4,5,6,9,10])
    >>> a-b
    Set([8, 7])
    

    And the timing is:

    Set([8, 7])
    100000000 loops, best of 3: 0.0178 usec per loop
    
    0 讨论(0)
  • My take was to use no loops and set operations:

    def find_missing(in_list):
        complete_set = set(range(in_list[0], in_list[-1] + 1))
        return complete_set - set(in_list)
    
    def main():
        sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
        print find_missing(sample)
    
    if __name__ == "__main__":
        main()
    
    # => set([19, 12])
    
    0 讨论(0)
提交回复
热议问题