Python: extracting a sentence with a particular word

前端未结

关注

 3  816

抹茶落季

I have a json file containing texts like:

dr. goldberg offers everything.parking is good.he\'s nice and easy to talk

How can I

相关标签:

3条回答

名媛妹妹

2021-01-20 21:13

How about parsing the string and looking at the values?

import json

def sen_or_none(string):
  return "parking" in string.lower() and string or None

def walk(node):
  if isinstance(node, list):
    for item in node:
      v = walk(item)
      if v:
        return v
  elif isinstance(node, dict):
    for key, item in node.items():
      v = walk(item)
      if v:
        return v
  elif isinstance(node, basestring):
    for item in node.split("."):
      v = sen_or_none(item)
      if v:
        return v
  return None

with open('data.json') as data_file:    
  print walk(json.load(data_file))

0 讨论(0)

北海茫月

2021-01-20 21:21

you can use nltk.tokenize :

from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
f=open("test_data.json").read()
sentences=sent_tokenize(f)
my_sentence=[sent for sent in sentences if 'parking' in word_tokenize(sent)] #this gave you the all sentences that your special word is in it !

and as a complete way you can use a function :

>>> def sentence_finder(text,word):
...    sentences=sent_tokenize(text)
...    return [sent for sent in sentences if word in word_tokenize(sent)]

>>> s="dr. goldberg offers everything. parking is good. he's nice and easy to talk"
>>> sentence_finder(s,'parking')
['parking is good.']

0 讨论(0)

南方客

2021-01-20 21:33
You can use the standard library re module:
```
import re
line = "dr. goldberg offers everything.parking is good.he's nice and easy to talk"
res = re.search("\.?([^\.]*parking[^\.]*)", line)
if res is not None:
    print res.group(1)
```
It will print parking is good.

Idea is simple - you search for sentence starting from optional dot character ., than consume all non-dots, parking word and the rest of non-dots.

Question mark handles the case where your sentence is in the start of the line.
0 讨论(0)
发布评论:

提交评论
- 加载中...