OpenNLP: foreign names does not get recognized

后端 未结 1 590
轻奢々
轻奢々 2020-12-20 21:21

I just started using openNLP to recognize names. I am using the model (en-ner-person.bin) that comes with open NLP. I noticed that while it recognizes us, uk, and european

相关标签:
1条回答
  • 2020-12-20 22:01

    You can make your own model with your data using an opennlp addon called modelbuilder-addon, if you try it you may be the first one to do so other than me...it's brand new.

    it is very new, but it works for me.

    You feed it the following:

    • a list of "known entities" via a file where each line is a name
    • a list of sentences from YOUR data via file where each line is a sentence
    • (optionally) a blacklist to remove false positives

    you can checkout the addon here

    https://svn.apache.org/repos/asf/opennlp/addons/modelbuilder-addon

    you can use this to get started

    import java.io.File;
    import opennlp.addons.modelbuilder.DefaultModelBuilderUtil;
    
    public class ModelBuilderAddonUse {
    
      public static void main(String[] args) {
        File fileOfSentences = new File("path to your sentence file");
        File fileOfNames = new File("path to your file of person names");
        File blackListFile = new File("path to your blacklist file");
        File modelOutFile = new File("path to you where the model will be saved");
        File annotatedSentencesOutFile = new File("path to your sentence file");
    
        DefaultModelBuilderUtil.generateModel(fileOfSentences, fileOfNames, blackListFile, modelOutFile, annotatedSentencesOutFile, "person", 3);
    
    
      }
    }
    

    the idea is that your known entities (common names in your data) are used to create annotations, and those annotations are used to generate a model, then the model is used to generate more names and annotations etc... the tool will do this as per the "iterations" parameter. You should run it, check your results, any undesirable hits should be added to the blacklist file, and then you can run the training again. I've used this and got pretty good results. If you find problems with it, put in a ticket at OpenNLP.

    0 讨论(0)
提交回复
热议问题