Suppose we have two items missing in a sequence of consecutive integers and the missing elements lie between the first and last elements. I did write a code that does accomp
With this code you can find any missing values in a sequence, except the last number. It in only required to input your data into excel file with column name "numbers".
import pandas as pd
import numpy as np
data = pd.read_excel("numbers.xlsx")
data_sort=data.sort_values('numbers',ascending=True)
index=list(range(len(data_sort)))
data_sort['index']=index
data_sort['index']=data_sort['index']+1
missing=[]
for i in range (len(data_sort)-1):
if data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]>1:
gap=data_sort['numbers'].iloc[i+1]-data_sort['numbers'].iloc[i]
numerator=1
for j in range (1,gap):
mis_value=data_sort['numbers'].iloc[i+1]-numerator
missing.append(mis_value)
numerator=numerator+1
print(np.sort(missing))
We found a missing value if the difference between two consecutive numbers is greater than 1
:
>>> L = [10,11,13,14,15,16,17,18,20]
>>> [x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1]
[12, 19]
Note: Python 3. In Python 2 use itertools.izip
.
Improved version for more than one value missing in a row:
>>> import itertools as it
>>> L = [10,11,14,15,16,17,18,20] # 12, 13 and 19 missing
>>> [x + diff for x, y in zip(it.islice(L, None, len(L) - 1),
it.islice(L, 1, None))
for diff in range(1, y - x) if diff]
[12, 13, 19]
def missing_elements(inlist):
if len(inlist) <= 1:
return []
else:
if inlist[1]-inlist[0] > 1:
return [inlist[0]+1] + missing_elements([inlist[0]+1] + inlist[1:])
else:
return missing_elements(inlist[1:])
I stumbled on this looking for a different kind of efficiency -- given a list of unique serial numbers, possibly very sparse, yield the next available serial number, without creating the entire set in memory. (Think of an inventory where items come and go frequently, but some are long-lived.)
def get_serial(string_ids, longtail=False):
int_list = map(int, string_ids)
int_list.sort()
n = len(int_list)
for i in range(0, n-1):
nextserial = int_list[i]+1
while nextserial < int_list[i+1]:
yield nextserial
nextserial+=1
while longtail:
nextserial+=1
yield nextserial
[...]
def main():
[...]
serialgenerator = get_serial(list1, longtail=True)
while somecondition:
newserial = next(serialgenerator)
(Input is a list of string representations of integers, yield is an integer, so not completely generic code. longtail provides extrapolation if we run out of range.)
There's also an answer to a similar question which suggests using a bitarray for efficiently handling a large sequence of integers.
Some versions of my code used functions from itertools but I ended up abandoning that approach.
Using scipy
lib:
import math
from scipy.optimize import fsolve
def mullist(a):
mul = 1
for i in a:
mul = mul*i
return mul
a = [1,2,3,4,5,6,9,10]
s = sum(a)
so = sum(range(1,11))
mulo = mullist(range(1,11))
mul = mullist(a)
over = mulo/mul
delta = so -s
# y = so - s -x
# xy = mulo/mul
def func(x):
return (so -s -x)*x-over
print int(round(fsolve(func, 0))), int(round(delta - fsolve(func, 0)))
Timing it:
$ python -mtimeit -s "$(cat with_scipy.py)"
7 8
100000000 loops, best of 3: 0.0181 usec per loop
Other option is:
>>> from sets import Set
>>> a = Set(range(1,11))
>>> b = Set([1,2,3,4,5,6,9,10])
>>> a-b
Set([8, 7])
And the timing is:
Set([8, 7])
100000000 loops, best of 3: 0.0178 usec per loop
My take was to use no loops and set operations:
def find_missing(in_list):
complete_set = set(range(in_list[0], in_list[-1] + 1))
return complete_set - set(in_list)
def main():
sample = [10, 11, 13, 14, 15, 16, 17, 18, 20]
print find_missing(sample)
if __name__ == "__main__":
main()
# => set([19, 12])