Python Replace Single Quotes Except Apostrophes

前端未结

关注

 3  1522

没有蜡笔的小新 2021-01-23 02:11

I am performing the following operations on lists of words. I read lines in from a Project Gutenberg text file, split each line on spaces, perform general punctuation substituti

3条回答

鱼传尺愫 (楼主)

2021-01-23 03:16
I suggest working smart here: use nltk's or another NLP toolkit instead.

Tokenize words like this:
```
import nltk
sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
```
You may not like the fact that contractions like don't are separated. Actually, this is expected behavior. See Issue 401.

However, TweetTokenizer can help with that:
```
from nltk.tokenize import tknzr = TweetTokenizer()
tknzr.tokenize("The code didn't work!")
```
If it gets more involved a RegexpTokenizer could be helpful:
```
from nltk.tokenize import RegexpTokenizer
s = "Good muffins cost $3.88\nin New York.  Please don't buy me\njust one of them."
tokenizer = RegexpTokenizer('\w+|\$[\d\.]+|\S+')
tokenizer.tokenize(s)
```
Then it should be much easier to annotate the tokenized words correctly.

Further references:
- http://www.nltk.org/api/nltk.tokenize.html
- http://www.nltk.org/_modules/nltk/tokenize/regexp.html
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...