Setting NLTK with Stanford NLP (both StanfordNERTagger and StanfordPOSTagger) for Spanish

后端 未结 3 2169
天涯浪人
天涯浪人 2021-02-09 07:54

The NLTK documentation is rather poor in this integration. The steps I followed were:

  • Download http://nlp.stanford.edu/software/stanford-postagger-

3条回答
  •  醉话见心
    2021-02-09 08:10

    Try:

    # StanfordPOSTagger
    from nltk.tag.stanford import StanfordPOSTagger
    stanford_dir = '/home/me/stanford/stanford-postagger-full-2015-04-20/'
    modelfile = stanford_dir + 'models/english-bidirectional-distsim.tagger'
    jarfile = stanford_dir + 'stanford-postagger.jar'
    
    st = StanfordPOSTagger(model_filename=modelfile, path_to_jar=jarfile)
    
    
    # NERTagger
    stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
    jarfile = stanford_dir + 'stanford-ner.jar'
    modelfile = stanford_dir + 'classifiers/english.all.3class.distsim.crf.ser.gz'
    
    st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)
    

    For detailed information on NLTK API with Stanford tools, take a look at: https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software#stanford-tagger-ner-tokenizer-and-parser

    Note: The NLTK APIs are for the individual Stanford tools, if you're using Stanford Core NLP, it's best to follow @dimazest instructions on http://www.eecs.qmul.ac.uk/~dm303/stanford-dependency-parser-nltk-and-anaconda.html


    EDITED

    As for Spanish NER Tagging, I strongly suggest that you us Stanford Core NLP (http://nlp.stanford.edu/software/corenlp.shtml) instead of using the Stanford NER package (http://nlp.stanford.edu/software/CRF-NER.shtml). And follow @dimazest solution for JSON file reading.

    Alternatively, if you must use the NER packge, you can try following the instructions from https://github.com/alvations/nltk_cli (Disclaimer: This repo is not affiliated with NLTK officially). Do the following on the unix command line:

    cd $HOME
    wget http://nlp.stanford.edu/software/stanford-spanish-corenlp-2015-01-08-models.jar
    unzip stanford-spanish-corenlp-2015-01-08-models.jar -d stanford-spanish
    cp stanford-spanish/edu/stanford/nlp/models/ner/* /home/me/stanford/stanford-ner-2015-04-20/ner/classifiers/
    

    Then in python:

    # NERTagger
    stanford_dir = '/home/me/stanford/stanford-ner-2015-04-20/'
    jarfile = stanford_dir + 'stanford-ner.jar'
    modelfile = stanford_dir + 'classifiers/spanish.ancora.distsim.s512.crf.ser.gz'
    
    st = StanfordNERTagger(model_filename=modelfile, path_to_jar=jarfile)
    

提交回复
热议问题