问题
I just started using openNLP to recognize names. I am using the model (en-ner-person.bin) that comes with open NLP. I noticed that while it recognizes us, uk, and european names, it fails to recognize Indian or Japanese names. My questions are (1) is there already models available that I can use to recognize foreign names (2) If not, then I believe I will need to generate new models. In that case, is there a copora available that I can use?
回答1:
You can make your own model with your data using an opennlp addon called modelbuilder-addon, if you try it you may be the first one to do so other than me...it's brand new.
it is very new, but it works for me.
You feed it the following:
- a list of "known entities" via a file where each line is a name
- a list of sentences from YOUR data via file where each line is a sentence
- (optionally) a blacklist to remove false positives
you can checkout the addon here
https://svn.apache.org/repos/asf/opennlp/addons/modelbuilder-addon
you can use this to get started
import java.io.File;
import opennlp.addons.modelbuilder.DefaultModelBuilderUtil;
public class ModelBuilderAddonUse {
public static void main(String[] args) {
File fileOfSentences = new File("path to your sentence file");
File fileOfNames = new File("path to your file of person names");
File blackListFile = new File("path to your blacklist file");
File modelOutFile = new File("path to you where the model will be saved");
File annotatedSentencesOutFile = new File("path to your sentence file");
DefaultModelBuilderUtil.generateModel(fileOfSentences, fileOfNames, blackListFile, modelOutFile, annotatedSentencesOutFile, "person", 3);
}
}
the idea is that your known entities (common names in your data) are used to create annotations, and those annotations are used to generate a model, then the model is used to generate more names and annotations etc... the tool will do this as per the "iterations" parameter. You should run it, check your results, any undesirable hits should be added to the blacklist file, and then you can run the training again. I've used this and got pretty good results. If you find problems with it, put in a ticket at OpenNLP.
来源:https://stackoverflow.com/questions/20509678/opennlp-foreign-names-does-not-get-recognized