Given a string of a million numbers, return all repeating 3 digit numbers

前端 未结 13 1445
误落风尘
误落风尘 2020-12-22 15:41

I had an interview with a hedge fund company in New York a few months ago and unfortunately, I did not get the internship offer as a data/software engineer. (They also asked

相关标签:
13条回答
  • 2020-12-22 16:11

    Here is a NumPy implementation of the "consensus" O(n) algorithm: walk through all triplets and bin as you go. The binning is done by upon encountering say "385", adding one to bin[3, 8, 5] which is an O(1) operation. Bins are arranged in a 10x10x10 cube. As the binning is fully vectorized there is no loop in the code.

    def setup_data(n):
        import random
        digits = "0123456789"
        return dict(text = ''.join(random.choice(digits) for i in range(n)))
    
    def f_np(text):
        # Get the data into NumPy
        import numpy as np
        a = np.frombuffer(bytes(text, 'utf8'), dtype=np.uint8) - ord('0')
        # Rolling triplets
        a3 = np.lib.stride_tricks.as_strided(a, (3, a.size-2), 2*a.strides)
    
        bins = np.zeros((10, 10, 10), dtype=int)
        # Next line performs O(n) binning
        np.add.at(bins, tuple(a3), 1)
        # Filtering is left as an exercise
        return bins.ravel()
    
    def f_py(text):
        counts = [0] * 1000
        for idx in range(len(text)-2):
            counts[int(text[idx:idx+3])] += 1
        return counts
    
    import numpy as np
    import types
    from timeit import timeit
    for n in (10, 1000, 1000000):
        data = setup_data(n)
        ref = f_np(**data)
        print(f'n = {n}')
        for name, func in list(globals().items()):
            if not name.startswith('f_') or not isinstance(func, types.FunctionType):
                continue
            try:
                assert np.all(ref == func(**data))
                print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                    'f(**data)', globals={'f':func, 'data':data}, number=10)*100))
            except:
                print("{:16s} apparently crashed".format(name[2:]))
    

    Unsurprisingly, NumPy is a bit faster than @Daniel's pure Python solution on large data sets. Sample output:

    # n = 10
    # np                    0.03481400 ms
    # py                    0.00669330 ms
    # n = 1000
    # np                    0.11215360 ms
    # py                    0.34836530 ms
    # n = 1000000
    # np                   82.46765980 ms
    # py                  360.51235450 ms
    
    0 讨论(0)
  • 2020-12-22 16:11

    -Telling from the perspective of C. -You can have an int 3-d array results[10][10][10]; -Go from 0th location to n-4th location, where n being the size of the string array. -On each location, check the current, next and next's next. -Increment the cntr as resutls[current][next][next's next]++; -Print the values of

    results[1][2][3]
    results[2][3][4]
    results[3][4][5]
    results[4][5][6]
    results[5][6][7]
    results[6][7][8]
    results[7][8][9]
    

    -It is O(n) time, there is no comparisons involved. -You can run some parallel stuff here by partitioning the array and calculating the matches around the partitions.

    0 讨论(0)
  • 2020-12-22 16:12

    The simple O(n) solution would be to count each 3-digit number:

    for nr in range(1000):
        cnt = text.count('%03d' % nr)
        if cnt > 1:
            print '%03d is found %d times' % (nr, cnt)
    

    This would search through all 1 million digits 1000 times.

    Traversing the digits only once:

    counts = [0] * 1000
    for idx in range(len(text)-2):
        counts[int(text[idx:idx+3])] += 1
    
    for nr, cnt in enumerate(counts):
        if cnt > 1:
            print '%03d is found %d times' % (nr, cnt)
    

    Timing shows that iterating only once over the index is twice as fast as using count.

    0 讨论(0)
  • 2020-12-22 16:12

    Image as answer:

    IMAGE AS ANSWER

    Looks like a sliding window.

    0 讨论(0)
  • 2020-12-22 16:14
    inputStr = '123456123138276237284287434628736482376487234682734682736487263482736487236482634'
    
    count = {}
    for i in range(len(inputStr) - 2):
        subNum = int(inputStr[i:i+3])
        if subNum not in count:
            count[subNum] = 1
        else:
            count[subNum] += 1
    
    print count
    
    0 讨论(0)
  • 2020-12-22 16:16

    Here is my solution:

    from collections import defaultdict
    string = "103264685134845354863"
    d = defaultdict(int)
    for elt in range(len(string)-2):
        d[string[elt:elt+3]] += 1
    d = {key: d[key] for key in d.keys() if d[key] > 1}
    

    With a bit of creativity in for loop(and additional lookup list with True/False/None for example) you should be able to get rid of last line, as you only want to create keys in dict that we visited once up to that point. Hope it helps :)

    0 讨论(0)
提交回复
热议问题