Python: extracting a sentence with a particular word

前端 未结 3 816
抹茶落季
抹茶落季 2021-01-20 20:42

I have a json file containing texts like:

dr. goldberg offers everything.parking is good.he\'s nice and easy to talk

How can I

相关标签:
3条回答
  • 2021-01-20 21:13

    How about parsing the string and looking at the values?

    import json
    
    def sen_or_none(string):
      return "parking" in string.lower() and string or None
    
    def walk(node):
      if isinstance(node, list):
        for item in node:
          v = walk(item)
          if v:
            return v
      elif isinstance(node, dict):
        for key, item in node.items():
          v = walk(item)
          if v:
            return v
      elif isinstance(node, basestring):
        for item in node.split("."):
          v = sen_or_none(item)
          if v:
            return v
      return None
    
    with open('data.json') as data_file:    
      print walk(json.load(data_file))
    
    0 讨论(0)
  • 2021-01-20 21:21

    you can use nltk.tokenize :

    from nltk.tokenize import sent_tokenize
    from nltk.tokenize import word_tokenize
    f=open("test_data.json").read()
    sentences=sent_tokenize(f)
    my_sentence=[sent for sent in sentences if 'parking' in word_tokenize(sent)] #this gave you the all sentences that your special word is in it ! 
    

    and as a complete way you can use a function :

    >>> def sentence_finder(text,word):
    ...    sentences=sent_tokenize(text)
    ...    return [sent for sent in sentences if word in word_tokenize(sent)]
    
    >>> s="dr. goldberg offers everything. parking is good. he's nice and easy to talk"
    >>> sentence_finder(s,'parking')
    ['parking is good.']
    
    0 讨论(0)
  • 2021-01-20 21:33

    You can use the standard library re module:

    import re
    line = "dr. goldberg offers everything.parking is good.he's nice and easy to talk"
    res = re.search("\.?([^\.]*parking[^\.]*)", line)
    if res is not None:
        print res.group(1)
    

    It will print parking is good.

    Idea is simple - you search for sentence starting from optional dot character ., than consume all non-dots, parking word and the rest of non-dots.

    Question mark handles the case where your sentence is in the start of the line.

    0 讨论(0)
提交回复
热议问题