Find gaps in a sequence of Strings

后端 未结 4 1797
执念已碎
执念已碎 2020-12-20 17:45

I have got a sequence of strings - 0000001, 0000002, 0000003.... upto 2 million. They are not contiguous. Meaning there are gaps. Say after 0000003 the next str

相关标签:
4条回答
  • 2020-12-20 18:10

    You could sort the list of ids and then step through it once only:

    def find_gaps(ids):
        """Generate the gaps in the list of ids."""
        j = 1
        for id_i in sorted(ids):
            while True:
                id_j = '%07d' % j
                j += 1
                if id_j >= id_i:
                    break
                yield id_j
    
    >>> list(find_gaps(["0000001", "0000003", "0000006"]))
    ['0000002', '0000004', '0000005']
    

    If the input list is already in order, then you can avoid the sorted (though it does little harm: Python's adaptive mergesort is O(n) if the list is already sorted).

    0 讨论(0)
  • 2020-12-20 18:10

    I would suggest take it int rather than string for processing and then making it a string again in output

    j=0
    n=2000000
    #create a list of int number from your string
    foo = [i for i in range(n)]
    #creating gaps
    foo.remove(1)
    foo.remove(50)
    while j<n:
        for i in foo:
            if i>j:
                print '%07d'%j
                j+=1
            j+=1
    
    0 讨论(0)
  • 2020-12-20 18:12
    seq = *the sequence of strings*
    n = 2000000
    
    gaps = set(str(i).zfill(7) for i in range(1,n+1)) - set(seq)
    
    0 讨论(0)
  • 2020-12-20 18:13

    For storing sequence of 2 millions ints you can use bitarray. Here each bit means one integer (the integer of that index in bitarray). Example code:

    gaps = []
    # bitarray is 0 based
    a = bitarray.bitarray(total + 1)
    a.setall(False)
    for sid in curr_ids:
        a[int(sid)] = True
    for i in range(1, total):
        if not a[i]:
            gaps.append('%07d' %(i))
    return gaps
    
    0 讨论(0)
提交回复
热议问题