I could successfully read text into a variable but while trying to tokenize the texts im getting this strange error :
sentences=nltk.sent_tokenize(sample)
Unicod
In a nutshell, NLTK3's pos_tag function doesn't work.
The NLTK2 function works fine, however.
pip uninstall nltk
pip install http://pypi.python.org/packages/source/n/nltk/nltk-2.0.4.tar.gz
On the other hand, the tagger is pretty bad (apparently 'conservatory' is a verb). I wish SpaCy worked on Windows.