nltk tokenization and contractions

后端 未结 3 1990
执念已碎
执念已碎 2021-02-19 01:14

I\'m tokenizing text with nltk, just sentences fed to wordpunct_tokenizer. This splits contractions (e.g. \'don\'t\' to \'don\' +\" \' \"+\'t\') but I want to keep them as one w

3条回答
  •  别跟我提以往
    2021-02-19 02:02

    I've worked with NLTK before on this project. When I did, I found that contractions were useful to consider.

    However, I did not write custom tokenizer, I simply handled it after POS tagging.

    I suspect this is not the answer that you are looking for, but I hope it helps somewhat

提交回复
热议问题