Number of parameters must be always be even : opennlp

给你一囗甜甜゛ 提交于 2019-12-19 10:32:08

问题


I've been trying to use the command Line interface to train my model like this:

opennlp TokenNameFinderTrainer -model en-ner-pincode.bin -iterations 500 \ -lang en -data en-ner-pincode.train -encoding UTF-8

the console output is:

Number of parameters must be always be even
Usage: opennlp TokenNameFinderTrainer[.evalita|.ad|.conll03|.bionlp2004|.conll02|.muc6|.ontonotes|.brat] [-factory factoryName] [-resources resourcesDir] [-type modelType] [-featuregen featuregenFile] [-nameTypes types] [-sequenceCodec codec] [-params paramsFile] -lang language -model modelFile -data sampleData [-encoding charsetName]

It works fine if I don't include the number of Iterations. Does anybody know the reason behind this?

thanks!


回答1:


Actually the issue is

    -params paramsFile
            training parameters file.
    -iterations num
            number of training iterations, ignored if -params is used.
    -cutoff num
            minimal number of times a feature must be seen, ignored if -params is used.

If anyone use params then iterations and cutoff are ignored. So for your case this info message is shown.

Resource Link:

  1. Tokenizer Training : Training Tool

UPDATE:

So, Please use ChunkerTrainerME instead of TokenNameFinderTrainer

Your command should look like below

opennlp ChunkerTrainerME -model en-ner-pincode.bin -iterations 500 \ -lang en -data en-ner-pincode.train -encoding UTF-8

UPDATE2: Converting the data

I will use Spanish data as reference, but it would be the same operations to Dutch. You just must remember change “-lang es” to “-lang nl” and use the correct training files. So to convert the information to the OpenNLP format:

$ opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types per > es_corpus_train_persons.txt

Optionally, you can convert the training test samples as well.

$ opennlp TokenNameFinderConverter conll02 -data esp.testa -lang es -types per > corpus_testa.txt
$ opennlp TokenNameFinderConverter conll02 -data esp.testb -lang es -types per > corpus_testb.txt

Training with Spanish data

To train the model for the name finder:

\bin\opennlp TokenNameFinderTrainer -lang es -encoding u
tf8 -iterations 500 -data es_corpus_train_persons.txt -model es_ner_person.bin

UPDATE3: Converting the data (optional)

To convert the information to the OpenNLP format:

$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.train > corpus_train.txt

Optionally, you can convert the training test samples as well.

$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testa > corpus_testa.txt
$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testb > corpus_testb.txt

Training with English data

You can train the model for the name finder this way:

$ opennlp TokenNameFinderTrainer.conll03 -model en_ner_person.bin -iterations 500 \
                                 -lang en -types per -data eng.train -encoding utf8

If you have converted the data, then you can train the model for the name finder this way:

$ opennlp TokenNameFinderTrainer -model en_ner_person.bin -iterations 500 \
                                 -lang en -data corpus_train.txt -encoding utf8



回答2:


From "https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#intro.cli"

Usage: opennlp TokenizerTrainer[.namefinder|.conllx|.pos] [-abbDict
path] ...  -model modelFile ...

The general structure of this tool command line includes the obligatory tool name (TokenizerTrainer), the optional format parameters ([.namefinder|.conllx|.pos]), the optional parameters ([-abbDict path] ...), and the obligatory parameters (-model modelFile ...).

So the parameters are either things starting with "." or with "-", and there needs to be an even number of them. There are examples in the documentation that seems to agree with this.




回答3:


A short answer, iterations is not a parameter for TokenNameFinderTrainer. You can see that from the question where the recognized parameters are listed in the console output.



来源:https://stackoverflow.com/questions/37383509/number-of-parameters-must-be-always-be-even-opennlp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!