ImportError: cannot import name PunktWordTokenizer

爱⌒轻易说出口 提交于 2019-12-07 14:15:53

问题


I was trying to use PunktWordTokenizer and it was occurred an error as below.

from nltk.tokenize.punkt import PunktWordTokenizer

And this gave the following error message.

Traceback (most recent call last): File "file", line 5, in <module>
from nltk.tokenize.punkt import PunktWordTokenizer ImportError: cannot import name PunktWordTokenizer

I've checked that nltk is installed and that PunkWordTokenzer is also installed using nltk.download(). Need some help for this.


回答1:


There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.

>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer

For solving this Try pip install -U nltk to upgrade your NLTK version.




回答2:


PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.

from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)

The difference is :

PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.

For example, given Input: This’s a test

PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]


来源:https://stackoverflow.com/questions/44238864/importerror-cannot-import-name-punktwordtokenizer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!