What's the most Pythonic efficient way to iterate over a list in sliding pairs? Here's a related example:
>>> l
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> for x, y in itertools.izip(l, l[1::2]): print x, y
...
a b
b d
c f
this is iteration in pairs, but how can we get iteration over a sliding pair? Meaning iteration over the pairs:
a b
b c
c d
d e
etc.
which is iteration over the pairs, except sliding the pair by 1 element each time rather than by 2 elements. thanks.
How about:
for x, y in itertools.izip(l, l[1:]): print x, y
You can go even simpler. Just zip the list and the list offset by one.
In [4]: zip(l, l[1:])
Out[4]: [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e'), ('e', 'f'), ('f', 'g')]
Here is a little generator that I wrote a while back for a similar scenario:
def pairs(items):
items_iter = iter(items)
prev = next(items_iter)
for item in items_iter:
yield prev, item
prev = item
Here's a function for arbitrarily sized sliding windows that works for iterators/generators as well as lists
def sliding(seq, n):
return izip(*starmap(islice, izip(tee(seq, n), count(0), repeat(None))))
Nathan's solution is probably more efficient though.
The timing, as defined by the addition of two subsequent entries in the list, is displayed below and ordered from fastest to slowest.
Gil
In [69]: timeit.repeat("for x,y in itertools.izip(l, l[1::1]): x + y", setup=setup, number=1000)
Out[69]: [1.029047966003418, 0.996290922164917, 0.998831033706665]
Geoff Reedy
In [70]: timeit.repeat("for x,y in sliding(l,2): x+y", setup=setup, number=1000)
Out[70]: [1.2408790588378906, 1.2099130153656006, 1.207326889038086]
Alestanis
In [66]: timeit.repeat("for i in range(0, len(l)-1): l[i] + l[i+1]", setup=setup, number=1000)
Out[66]: [1.3387370109558105, 1.3243639469146729, 1.3245630264282227]
chmullig
In [68]: timeit.repeat("for x,y in zip(l, l[1:]): x+y", setup=setup, number=1000)
Out[68]: [1.4756009578704834, 1.4369518756866455, 1.5067830085754395]
Nathan Villaescusa
In [63]: timeit.repeat("for x,y in pairs(l): x+y", setup=setup, number=1000)
Out[63]: [2.254757881164551, 2.3750967979431152, 2.302199125289917]
sr2222
Notice the reduced repetition number...
In [60]: timeit.repeat("for x,y in SubsequenceIter(l,2): x+y", setup=setup, number=100)
Out[60]: [1.599524974822998, 1.5634570121765137, 1.608154058456421]
The setup code:
setup="""
from itertools import izip, starmap, islice, tee, count, repeat
l = range(10000)
def sliding(seq, n):
return izip(*starmap(islice, izip(tee(seq, n), count(0), repeat(None))))
class SubsequenceIter(object):
def __init__(self, iterable, subsequence_length):
self.iterator = iter(iterable)
self.subsequence_length = subsequence_length
self.subsequence = [0]
def __iter__(self):
return self
def next(self):
self.subsequence.pop(0)
while len(self.subsequence) < self.subsequence_length:
self.subsequence.append(self.iterator.next())
return self.subsequence
def pairs(items):
items_iter = iter(items)
prev = items_iter.next()
for item in items_iter:
yield (prev, item)
prev = item
"""
Not exactly the most efficient, but quite flexible:
class SubsequenceIter(object):
def __init__(self, iterable, subsequence_length):
self.iterator = iter(iterable)
self.subsequence_length = subsequence_length
self.subsequence = [0]
def __iter__(self):
return self
def next(self):
self.subsequence.pop(0)
while len(self.subsequence) < self.subsequence_length:
self.subsequence.append(self.iterator.next())
return self.subsequence
Usage:
for x, y in SubsequenceIter(l, 2):
print x, y
No need for imports, this will work provided a list of objects or a string; anything with var[indexing]
. Tested on python 3.6
# This will create windows with all but 1 overlap
def ngrams_list(a_list, window_size=5, skip_step=1):
return list(zip(*[a_list[i:] for i in range(0, window_size, skip_step)]))
the for loop by itself creates this with a_list
being the alphabet (shown window = 5
, OP would want window=2
:
['ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'BCDEFGHIJKLMNOPQRSTUVWXYZ',
'CDEFGHIJKLMNOPQRSTUVWXYZ',
'DEFGHIJKLMNOPQRSTUVWXYZ',
'EFGHIJKLMNOPQRSTUVWXYZ']
zip(*result_of_for_loop)
will collect all full vertical columns as results. And if you want less than all-but-one overlap:
# You can sample that output to get less overlap:
def sliding_windows_with_overlap(a_list, window_size=5, overlap=2):
zip_output_as_list = ngrams_list(a_list, window_size)])
return zip_output_as_list[::overlap+1]
With overlap=2
it skips the columns starting with B
& C
, and choosing the D
[('A', 'B', 'C', 'D', 'E'),
('D', 'E', 'F', 'G', 'H'),
('G', 'H', 'I', 'J', 'K'),
('J', 'K', 'L', 'M', 'N'),
('M', 'N', 'O', 'P', 'Q'),
('P', 'Q', 'R', 'S', 'T'),
('S', 'T', 'U', 'V', 'W'),
('V', 'W', 'X', 'Y', 'Z')]
EDIT: looks like this is similar to what @chmullig provided, with options
来源:https://stackoverflow.com/questions/13014595/pythonic-iteration-over-sliding-window-pairs-in-list