nltk tokenization and contractions

后端未结

关注

 3  2003

执念已碎 2021-02-19 01:14

I\'m tokenizing text with nltk, just sentences fed to wordpunct_tokenizer. This splits contractions (e.g. \'don\'t\' to \'don\' +\" \' \"+\'t\') but I want to keep them as one w

3条回答

别跟我提以往 (楼主)

2021-02-19 02:02

I've worked with NLTK before on this project. When I did, I found that contractions were useful to consider.

However, I did not write custom tokenizer, I simply handled it after POS tagging.

I suspect this is not the answer that you are looking for, but I hope it helps somewhat

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...