Efficient way to find missing elements in an integer sequence

前端未结

关注

 16  1196

Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomp

相关标签:

16条回答

再見小時候

2020-12-01 05:14

With this code you can find any missing values in a sequence, except the last number. It in only required to input your data into excel file with column name "numbers".

import pandas as pd
import numpy as np

data = pd.read_excel("numbers.xlsx")

data_sort=data.sort_values('numbers',ascending=True)
index=list(range(len(data_sort)))
data_sort['index']=index
data_sort['index']=data_sort['index']+1
missing=[]

for i in range (len(data_sort)-1):
    if data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]>1:
        gap=data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]
        numerator=1
        for j in range (1,gap):          
            mis_value=data_sort['numbers'].iloc[i+1]-numerator
            missing.append(mis_value)
            numerator=numerator+1
print(np.sort(missing))

0 讨论(0)

猫巷女王i

2020-12-01 05:16

We found a missing value if the difference between two consecutive numbers is greater than 1:

>>> L = [10,11,13,14,15,16,17,18,20]
>>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
[12, 19]

Note: Python 3. In Python 2 use itertools.izip.

Improved version for more than one value missing in a row:

>>> import itertools as it
>>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
>>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
                              it.islice(L, 1, None)) 
     for diff in range(1, y - x) if diff]
[12, 13, 19]

0 讨论(0)

走了就别回头了

2020-12-01 05:21

def missing_elements(inlist):
    if len(inlist) <= 1:
        return []
    else:
        if inlist[1]-inlist[0] > 1:
            return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:])
        else:
            return missing_elements(inlist[1:])

0 讨论(0)

爱一瞬间的悲伤

2020-12-01 05:22
I stumbled on this looking for a different kind of efficiency -- given a list of unique serial numbers, possibly very sparse, yield the next available serial number, without creating the entire set in memory. (Think of an inventory where items come and go frequently, but some are long-lived.)
```
def get_serial(string_ids, longtail=False):
  int_list = map(int, string_ids)
  int_list.sort()
  n = len(int_list)
  for i in range(0, n-1):
    nextserial = int_list[i]+1
    while nextserial < int_list[i+1]:
      yield nextserial
      nextserial+=1
  while longtail:
    nextserial+=1
    yield nextserial
[...]
def main():
  [...]
  serialgenerator = get_serial(list1, longtail=True)
  while somecondition:
    newserial = next(serialgenerator)
```
(Input is a list of string representations of integers, yield is an integer, so not completely generic code. longtail provides extrapolation if we run out of range.)

There's also an answer to a similar question which suggests using a bitarray for efficiently handling a large sequence of integers.

Some versions of my code used functions from itertools but I ended up abandoning that approach.
0 讨论(0)
发布评论:

提交评论
- 加载中...

既然无缘

2020-12-01 05:28

Using scipy lib:

import math
from scipy.optimize import fsolve

def mullist(a):
    mul = 1
    for i in a:
        mul = mul*i
    return mul

a = [1,2,3,4,5,6,9,10]
s = sum(a)
so = sum(range(1,11))
mulo = mullist(range(1,11))
mul = mullist(a)
over = mulo/mul
delta = so -s
# y = so - s -x
# xy = mulo/mul
def func(x):
    return (so -s -x)*x-over

print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))

Timing it:

$ python -mtimeit -s "$(cat with_scipy.py)" 

7 8

100000000 loops, best of 3: 0.0181 usec per loop

Other option is:

>>> from sets import Set
>>> a = Set(range(1,11))
>>> b = Set([1,2,3,4,5,6,9,10])
>>> a-b
Set([8, 7])

And the timing is:

Set([8, 7])
100000000 loops, best of 3: 0.0178 usec per loop

0 讨论(0)

不要未来只要你来

2020-12-01 05:29

My take was to use no loops and set operations:

def find_missing(in_list):
    complete_set = set(range(in_list[0], in_list[-1] + 1))
    return complete_set - set(in_list)

def main():
    sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
    print find_missing(sample)

if __name__ == "__main__":
    main()

# => set([19, 12])

0 讨论(0)