split a generator/iterable every n items in python (splitEvery)

前端 未结 13 1316
臣服心动
臣服心动 2020-11-27 16:44

I\'m trying to write the Haskel function \'splitEvery\' in Python. Here is it\'s definition:

splitEvery :: Int -> [e] -> [[e]]
    @\'splitEvery\' n@ s         


        
相关标签:
13条回答
  • 2020-11-27 17:36

    This is an answer that works for both list and generator:

    from itertools import count, groupby
    def split_every(size, iterable):
        c = count()
        for k, g in groupby(iterable, lambda x: next(c)//size):
            yield list(g) # or yield g if you want to output a generator
    
    0 讨论(0)
  • 2020-11-27 17:36

    Here is how you deal with list vs iterator:

    def isList(L): # Implement it somehow - returns True or false
    ...
    return (list, lambda x:x)[int(islist(L))](result)
    
    0 讨论(0)
  • 2020-11-27 17:37

    If you want a solution that

    • uses generators only (no intermediate lists or tuples),
    • works for very long (or infinite) iterators,
    • works for very large batch sizes,

    this does the trick:

    def one_batch(first_value, iterator, batch_size):
        yield first_value
        for i in xrange(1, batch_size):
            yield iterator.next()
    
    def batch_iterator(iterator, batch_size):
        iterator = iter(iterator)
        while True:
            first_value = iterator.next()  # Peek.
            yield one_batch(first_value, iterator, batch_size)
    

    It works by peeking at the next value in the iterator and passing that as the first value to a generator (one_batch()) that will yield it, along with the rest of the batch.

    The peek step will raise StopIteration exactly when the input iterator is exhausted and there are no more batches. Since this is the correct time to raise StopIteration in the batch_iterator() method, there is no need to catch the exception.

    This will process lines from stdin in batches:

    for input_batch in batch_iterator(sys.stdin, 10000):
        for line in input_batch:
            process(line)
        finalise()
    

    I've found this useful for processing lots of data and uploading the results in batches to an external store.

    0 讨论(0)
  • 2020-11-27 17:38

    I think those questions are almost equal

    Changing a little bit to crop the last, I think a good solution for the generator case would be:

    from itertools import *
    def iter_grouper(n, iterable):
        it = iter(iterable)
        item = itertools.islice(it, n)
        while item:
            yield item
            item = itertools.islice(it, n)
    

    for the object that supports slices (lists, strings, tuples), we can do:

    def slice_grouper(n, sequence):
       return [sequence[i:i+n] for i in range(0, len(sequence), n)]
    

    now it's just a matter of dispatching the correct method:

    def grouper(n, iter_or_seq):
        if hasattr(iter_or_seq, "__getslice__"):
            return slice_grouper(n, iter_or_seq)
        elif hasattr(iter_or_seq, "__iter__"):
            return iter_grouper(n, iter_or_seq)
    

    I think you could polish it a little bit more :-)

    0 讨论(0)
  • 2020-11-27 17:39

    Why not do it like this? Looks almost like your splitEvery_2 function.

    def splitEveryN(n, it):
        return [it[i:i+n] for i in range(0, len(it), n)]
    

    Actually it only takes away the unnecessary step interval from the slice in your solution. :)

    0 讨论(0)
  • 2020-11-27 17:42
    from itertools import islice
    
    def split_every(n, iterable):
        i = iter(iterable)
        piece = list(islice(i, n))
        while piece:
            yield piece
            piece = list(islice(i, n))
    

    Some tests:

    >>> list(split_every(5, range(9)))
    [[0, 1, 2, 3, 4], [5, 6, 7, 8]]
    
    >>> list(split_every(3, (x**2 for x in range(20))))
    [[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]
    
    >>> [''.join(s) for s in split_every(6, 'Hello world')]
    ['Hello ', 'world']
    
    >>> list(split_every(100, []))
    []
    
    0 讨论(0)
提交回复
热议问题