Search in a string and obtain the 2 words before and after the match in Python

后端 未结 4 1939
伪装坚强ぢ
伪装坚强ぢ 2021-01-19 15:39

I\'m using Python to search some words (also multi-token) in a description (string).

To do that I\'m using a regex like this

    result = re.search(w         


        
相关标签:
4条回答
  • 2021-01-19 16:18

    Try this regex: ((?:[a-z,]+\s+){0,2})here is\s+((?:[a-z,]+\s*){0,2})

    with re.findall and re.IGNORECASE set

    Demo

    0 讨论(0)
  • 2021-01-19 16:20

    Based on your clarification, this becomes a bit more complicated. The solution below deals with scenarios where the searched pattern may in fact also be in the two preceding or two subsequent words.

    line = "Parking here is horrible, here is great here is mediocre here is here is "
    print line
    pattern = "here is"
    r = re.search(pattern, line, re.IGNORECASE)
    output = []
    if r:
        while line:
            before, match, line = line.partition(pattern)
            if match:
                if not output:
                    before = before.split()[-2:]
                else:    
                    before = ' '.join([pattern, before]).split()[-2:]
                after = line.split()[:2]
                output.append((before, after))
    print output
    

    Output from my example would be:

    [(['Parking'], ['horrible,', 'here']), (['is', 'horrible,'], ['great', 'here']), (['is', 'great'], ['mediocre', 'here']), (['is', 'mediocre'], ['here', 'is']), (['here', 'is'], [])]

    0 讨论(0)
  • 2021-01-19 16:35

    I would do it like this (edit: added anchors to cover most cases):

    (\S+\s+|^)(\S+\s+|)here is(\s+\S+|)(\s+\S+|$)
    

    Like this you will always have 4 groups (might have to be trimmed) with the following behavior:

    1. If group 1 is empty, there was no word before (group 2 is empty too)
    2. If group 2 is empty, there was only one word before (group 1)
    3. If group 1 and 2 are not empty, they are the words before in order
    4. If group 3 is empty, there was no word after
    5. If group 4 is empty, there was only one word after
    6. If group 3 and 4 are not empty, they are the words after in order

    Corrected demo link

    0 讨论(0)
  • 2021-01-19 16:40

    How about string operations?

    line = 'Parking here is horrible, this shop sucks.'
    
    before, term, after = line.partition('here is')
    before = before.rsplit(maxsplit=2)[-2:]
    after = after.split(maxsplit=2)[:2]
    

    Result:

    >>> before
    ['Parking']
    >>> after
    ['horrible,', 'this']
    
    0 讨论(0)
提交回复
热议问题