问题
I was trying to use PunktWordTokenizer and it was occurred an error as below.
from nltk.tokenize.punkt import PunktWordTokenizer
And this gave the following error message.
Traceback (most recent call last): File "file", line 5, in <module>
from nltk.tokenize.punkt import PunktWordTokenizer ImportError: cannot import name PunktWordTokenizer
I've checked that nltk is installed and that PunkWordTokenzer is also installed using nltk.download(). Need some help for this.
回答1:
There appears to be a regression related to PunktWordTokenizer in 3.0.2. The issue was not present in 3.0.1, rolling back to that version or earlier fixes the issue.
>>> import nltk
>>> nltk.__version__
'3.0.2'
>>> from nltk.tokenize import PunktWordTokenizer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name PunktWordTokenizer
For solving this Try pip install -U nltk
to upgrade your NLTK version.
回答2:
PunktWordTokenizer was previously exposed to user but not any more. You can rather use WordPunctTokenizer.
from nltk.tokenize import WordPunctTokenizer
WordPunctTokenizer().tokenize(“text to tokenize”)
The difference is :
PunktWordTokenizer splits on punctuation, but keeps it with the word. Where as WordPunctTokenizer splits all punctuations into separate tokens.
For example, given Input: This’s a test
PunktWordTokenizer: [‘This’, “‘s”, ‘a’, ‘test’]
WordPunctTokenizer: [‘This’, “‘”, ‘s’, ‘a’, ‘test’]
来源:https://stackoverflow.com/questions/44238864/importerror-cannot-import-name-punktwordtokenizer