The NLTK
documentation is rather poor in this integration. The steps I followed were:
Download http://nlp.stanford.edu/software/stanford-postagger-
In this example I download the tagger on /content folder
cd /content
wget https://nlp.stanford.edu/software/stanford-tagger-4.1.0.zip
unzip stanford-tagger-4.1.0.zip
After unziping, I have a folder stanford-postagger-full-2020-08-06 in /content, so I can use the tagger with:
from nltk.tag.stanford import StanfordPOSTagger
stanford_dir = '/content/stanford-postagger-full-2020-08-06'
modelfile = f'{stanford_dir}/models/spanish-ud.tagger'
jarfile = f'{stanford_dir}/stanford-postagger.jar'
st = StanfordPOSTagger(model_filename=modelfile, path_to_jar=jarfile)
To check that everything works fine, we can do:
>st.tag(["Juan","Medina","es","un","ingeniero"])
>[('Juan', 'PROPN'),
('Medina', 'PROPN'),
('es', 'AUX'),
('un', 'DET'),
('ingeniero', 'NOUN')]
In this case is necessary to download the NER core and the spanish model separatelly.
cd /content
#download NER core
wget https://nlp.stanford.edu/software/stanford-ner-4.0.0.zip
unzip stanford-ner-4.0.0.zip
#download spanish models
wget http://nlp.stanford.edu/software/stanford-spanish-corenlp-2018-02-27-models.jar
unzip stanford-spanish-corenlp-2018-02-27-models.jar -d stanford-spanish
#copy only the necessary files
cp stanford-spanish/edu/stanford/nlp/models/ner/* stanford-ner-4.0.0/classifiers/
rm -rf stanford-spanish stanford-ner-4.0.0.zip stanford-spanish-corenlp-2018-02-27-models.jar
To use it on python:
from nltk.tag.stanford import StanfordNERTagger
stanford_dir = '/content/stanford-ner-4.0.0/'
jarfile = f'{stanford_dir}/stanford-ner.jar'
modelfile = f'{stanford_dir}/classifiers/spanish.ancora.distsim.s512.crf.ser.gz'
st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)
To check that everything works fine, we can do:
>st.tag(["Juan","Medina","es","un","ingeniero"])
>[('Juan', 'PERS'),
('Medina', 'PERS'),
('es', 'O'),
('un', 'O'),
('ingeniero', 'O')]