I\'m trying to write the Haskel function \'splitEvery\' in Python. Here is it\'s definition:
splitEvery :: Int -> [e] -> [[e]]
@\'splitEvery\' n@ s
This is an answer that works for both list and generator:
from itertools import count, groupby
def split_every(size, iterable):
c = count()
for k, g in groupby(iterable, lambda x: next(c)//size):
yield list(g) # or yield g if you want to output a generator
Here is how you deal with list vs iterator:
def isList(L): # Implement it somehow - returns True or false
...
return (list, lambda x:x)[int(islist(L))](result)
If you want a solution that
this does the trick:
def one_batch(first_value, iterator, batch_size):
yield first_value
for i in xrange(1, batch_size):
yield iterator.next()
def batch_iterator(iterator, batch_size):
iterator = iter(iterator)
while True:
first_value = iterator.next() # Peek.
yield one_batch(first_value, iterator, batch_size)
It works by peeking at the next value in the iterator and passing that as the first value to a generator (one_batch()
) that will yield it, along with the rest of the batch.
The peek step will raise StopIteration
exactly when the input iterator is exhausted and there are no more batches. Since this is the correct time to raise StopIteration
in the batch_iterator()
method, there is no need to catch the exception.
This will process lines from stdin in batches:
for input_batch in batch_iterator(sys.stdin, 10000):
for line in input_batch:
process(line)
finalise()
I've found this useful for processing lots of data and uploading the results in batches to an external store.
I think those questions are almost equal
Changing a little bit to crop the last, I think a good solution for the generator case would be:
from itertools import *
def iter_grouper(n, iterable):
it = iter(iterable)
item = itertools.islice(it, n)
while item:
yield item
item = itertools.islice(it, n)
for the object that supports slices (lists, strings, tuples), we can do:
def slice_grouper(n, sequence):
return [sequence[i:i+n] for i in range(0, len(sequence), n)]
now it's just a matter of dispatching the correct method:
def grouper(n, iter_or_seq):
if hasattr(iter_or_seq, "__getslice__"):
return slice_grouper(n, iter_or_seq)
elif hasattr(iter_or_seq, "__iter__"):
return iter_grouper(n, iter_or_seq)
I think you could polish it a little bit more :-)
Why not do it like this? Looks almost like your splitEvery_2
function.
def splitEveryN(n, it):
return [it[i:i+n] for i in range(0, len(it), n)]
Actually it only takes away the unnecessary step interval from the slice in your solution. :)
from itertools import islice
def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))
Some tests:
>>> list(split_every(5, range(9)))
[[0, 1, 2, 3, 4], [5, 6, 7, 8]]
>>> list(split_every(3, (x**2 for x in range(20))))
[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [144, 169, 196], [225, 256, 289], [324, 361]]
>>> [''.join(s) for s in split_every(6, 'Hello world')]
['Hello ', 'world']
>>> list(split_every(100, []))
[]