Python Sliding Window on sentence string

喜夏-厌秋 提交于 2020-07-03 08:11:06

问题


I'm looking for a sliding window splitter of string composed with words with window size N.

Input: "I love food and I like drink" , window size 3

Output: [ "I love food", "love food and", "food and I", "and I like" .....]

All the suggestions of window sliding is around sequence of string, no terms. Is there something out of the box?


回答1:


You can use iterator with different offsets and zip all of them.

>>> arr = "I love food. blah blah".split()
>>> its = [iter(arr), iter(arr[1:]), iter(arr[2:])] #Construct the pattern for longer windowss
>>> zip(*its)
[('I', 'love', 'food.'), ('love', 'food.', 'blah'), ('food.', 'blah', 'blah')]

You might want to use izip if you have long sentences, or may be plain old loops (like in the other answer).




回答2:


An approach based on subscripting the string sequence:

def split_on_window(sequence="I love food and I like drink", limit=4):
    results = []
    split_sequence = sequence.split()
    iteration_length = len(split_sequence) - (limit - 1)
    max_window_indicies = range(iteration_length)
    for index in max_window_indicies:
        results.append(split_sequence[index:index + limit])
    return results

Sample Output:

>>> split_on_window("I love food and I like drink", 3)
['I', 'love', 'food']
['love', 'food', 'and']
['food', 'and', 'I']
['and', 'I', 'like']
['I', 'like', 'drink']

Here's an alternative answer inspired by @SuperSaiyan:

from itertools import izip

def split_on_window(sequence, limit):
    split_sequence = sequence.split()
    iterators = [iter(split_sequence[index:]) for index in range(limit)]
    return izip(*iterators)

Sample Output:

>>> list(split_on_window(s, 4))
[('I', 'love', 'food', 'and'), ('love', 'food', 'and', 'I'), 
('food', 'and', 'I', 'like'), ('and', 'I', 'like', 'drink')]

Benchmarks:

Sequence = I love food and I like drink, limit = 3
Repetitions = 1000000
Using subscripting -> 3.8326420784
Using izip -> 5.41380286217 # Modified to return a list for the benchmark.



回答3:


def token_sliding_window(str, size):
    tokens = str.split(' ')
    for i in range(len(tokens )- size + 1):
        yield tokens[i: i+size]


来源:https://stackoverflow.com/questions/42842884/python-sliding-window-on-sentence-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!