Python extract sentence containing word

后端 未结 6 1697
孤独总比滥情好
孤独总比滥情好 2020-12-09 10:52

I am trying to extract all the sentence containing a specified word from a text.

txt=\"I like to eat apple. Me too. Let\'s go buy some apples.\"
txt = \".\"         


        
相关标签:
6条回答
  • 2020-12-09 11:32
    In [7]: import re
    
    In [8]: txt=".I like to eat apple. Me too. Let's go buy some apples."
    
    In [9]: re.findall(r'([^.]*apple[^.]*)', txt)
    Out[9]: ['I like to eat apple', " Let's go buy some apples"]
    

    But note that @jamylak's split-based solution is faster:

    In [10]: %timeit re.findall(r'([^.]*apple[^.]*)', txt)
    1000000 loops, best of 3: 1.96 us per loop
    
    In [11]: %timeit [s+ '.' for s in txt.split('.') if 'apple' in s]
    1000000 loops, best of 3: 819 ns per loop
    

    The speed difference is less, but still significant, for larger strings:

    In [24]: txt = txt*10000
    
    In [25]: %timeit re.findall(r'([^.]*apple[^.]*)', txt)
    100 loops, best of 3: 8.49 ms per loop
    
    In [26]: %timeit [s+'.' for s in txt.split('.') if 'apple' in s]
    100 loops, best of 3: 6.35 ms per loop
    
    0 讨论(0)
  • 2020-12-09 11:32

    You can use str.split,

    >>> txt="I like to eat apple. Me too. Let's go buy some apples."
    >>> txt.split('. ')
    ['I like to eat apple', 'Me too', "Let's go buy some apples."]
    
    >>> [ t for t in txt.split('. ') if 'apple' in t]
    ['I like to eat apple', "Let's go buy some apples."]
    
    0 讨论(0)
  • 2020-12-09 11:37

    No need for regex:

    >>> txt = "I like to eat apple. Me too. Let's go buy some apples."
    >>> [sentence + '.' for sentence in txt.split('.') if 'apple' in sentence]
    ['I like to eat apple.', " Let's go buy some apples."]
    
    0 讨论(0)
  • 2020-12-09 11:41
    In [3]: re.findall(r"([^.]*?apple[^.]*\.)",txt)                                                                                                                             
    Out[4]: ['I like to eat apple.', " Let's go buy some apples."]
    
    0 讨论(0)
  • 2020-12-09 11:43
    r"\."+".+"+"apple"+".+"+"\."
    

    This line is a bit odd; why concatenate so many separate strings? You could just use r'..+apple.+.'.

    Anyway, the problem with your regular expression is its greedy-ness. By default a x+ will match x as often as it possibly can. So your .+ will match as many characters (any characters) as possible; including dots and apples.

    What you want to use instead is a non-greedy expression; you can usually do this by adding a ? at the end: .+?.

    This will make you get the following result:

    ['.I like to eat apple. Me too.']
    

    As you can see you no longer get both the apple-sentences but still the Me too.. That is because you still match the . after the apple, making it impossible to not capture the following sentence as well.

    A working regular expression would be this: r'\.[^.]*?apple[^.]*?\.'

    Here you don’t look at any characters, but only those characters which are not dots themselves. We also allow not to match any characters at all (because after the apple in the first sentence there are no non-dot characters). Using that expression results in this:

    ['.I like to eat apple.', ". Let's go buy some apples."]
    
    0 讨论(0)
  • 2020-12-09 11:54

    Obviously, the sample in question is extract sentence containing substring instead of
    extract sentence containing word. How to solve the extract sentence containing word problem through python is as follows:

    A word can be in the begining|middle|end of the sentence. Not limited to the example in the question, I would provide a general function of searching a word in a sentence:

    def searchWordinSentence(word,sentence):
        pattern = re.compile(' '+word+' |^'+word+' | '+word+' $')
        if re.search(pattern,sentence):
            return True
    

    limited to the example in the question, we can solve like:

    txt="I like to eat apple. Me too. Let's go buy some apples."
    word = "apple"
    print [ t for t in txt.split('. ') if searchWordofSentence(word,t)]
    

    The corresponding output is:

    ['I like to eat apple']
    
    0 讨论(0)
自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题