问题
Started text analysing, and eventually ran into a need for downloading Corpora in using PyCharm2019 as IDE. Not really sure what traceback message wants me to do, since I used PyCharm's own lib import interface to enable Corpora already. Why does an error stating that Corpora is not available to the code keep reappearing?
Imported TextBlob, tried to do a line like: from textblob import TextBlob...view code below
from textblob import TextBlob
TextBlob(train['tweet'][1]).words
print("\nPRINT TOKENIZATION") # own instruction to allow for knowing what code result delivers
print(TextBlob(train['tweet'][1]).words)
….
Tried to install via nltk, no luck...error when downloading 'brown.tei'
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml Exception in Tkinter callback Traceback (most recent call last): File "C:\Users\jcst\AppData\Local\Programs\Python\Python37-32\lib\tkinter__init__.py", line 1705, in call return self.func(*args) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\downloader.py", line 1796, in _download return self._download_threaded(*e) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\downloader.py", line 2082, in _download_threaded assert self._download_msg_queue == [] AssertionError Traceback (most recent call last): File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\decorators.py", line 35, in decorated return func(*args, **kwargs) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\tokenizers.py", line 57, in tokenize return nltk.tokenize.sent_tokenize(text) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\tokenize__init__.py", line 104, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\data.py", line 870, in load opened_resource = _open(resource_url)
Resource File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\data.py", line 995, in open punkt not found. Please use the NLTK Downloader to obtain the resource: return find(path, path + ['']).open()
File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\nltk\data.py", line 701, in find
import nltk nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in: - 'C:\Users\jcst/nltk_data' - 'C:\Users\jcst\PycharmProjects\TextMining\venv\nltk_data' - 'C:\Users\jcst\PycharmProjects\TextMining\venv\share\nltk_data' - 'C:\Users\jcst\PycharmProjects\TextMining\venv\lib\nltk_data' - 'C:\Users\jcst\AppData\Roaming\nltk_data' - 'C:\nltk_data' - 'D:\nltk_data' - 'E:\nltk_data' - ''
raise LookupError(resource_not_found)
LookupError:
Resource punkt not found. Please use the NLTK Downloader to obtain the resource:
import nltk nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/english.pickle
Searched in: - 'C:\Users\jcst/nltk_data' - 'C:\Users\jcst\PycharmProjects\TextMining\venv\nltk_data' - 'C:\Users\jcst\PycharmProjects\TextMining\venv\share\nltk_data' - 'C:\Users\jcst\PycharmProjects\TextMining\venv\lib\nltk_data' - 'C:\Users\jcst\AppData\Roaming\nltk_data' - 'C:\nltk_data' - 'D:\nltk_data' - 'E:\nltk_data' - ''
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:/Users/jcst/PycharmProjects/TextMining/ModuleImportAndTrainFileIntro.py", line 151, in TextBlob(train['tweet'][1]).words File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\decorators.py", line 24, in get value = obj.dict[self.func.name] = self.func(obj) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\blob.py", line 649, in words return WordList(word_tokenize(self.raw, include_punc=False)) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\tokenizers.py", line 73, in word_tokenize for sentence in sent_tokenize(text)) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\base.py", line 64, in itokenize return (t for t in self.tokenize(text, *args, **kwargs)) File "C:\Users\jcst\PycharmProjects\TextMining\venv\lib\site-packages\textblob\decorators.py", line 38, in decorated raise MissingCorpusError() textblob.exceptions.MissingCorpusError: Looks like you are missing some required data for this feature.
To download the necessary data, simply run
python -m textblob.download_corpora
or use the NLTK downloader to download the missing data: http://nltk.org/data.html If this doesn't fix the problem, file an issue at https://github.com/sloria/TextBlob/issues.
来源:https://stackoverflow.com/questions/56263066/issues-tokenizing-text