Training Named Entity in OpenNLP

帅比萌擦擦* 提交于 2019-12-08 12:52:32

问题


I want to train a corpus for Indian names:

class NameTraining
{
    public static void TrainNames() throws IOException 
    {
        Charset charset = Charset.forName("UTF-8");         
        FileReader fileReader = new FileReader("train.txt");
        ObjectStream fileStream = new PlainTextByLineStream(fileReader);
        ObjectStream sampleStream = new NameSampleDataStream(fileStream);
        TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
        NameFinderME nfm = new NameFinderME(model); 
    }

    public static void main(String args[]) throws IOException
    {
        NameTraining det = new NameTraining();
        det.TrainNames();
    }
}

I compile this using the command:

javac -cp $(echo lib/*.jar | tr ' ' ':') NameTraining.java -Xlint:unchecked

However I get these error messages

NameTraining.java:35: warning: [unchecked] unchecked conversion
found   : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<java.lang.String>
        ObjectStream sampleStream = new NameSampleDataStream(fileStream);
                                                             ^
NameTraining.java:36: warning: [unchecked] unchecked conversion
found   : opennlp.tools.util.ObjectStream
required: opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
        TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap());
                                                                          ^
2 warnings

I want to know two things

  1. Is the above code correct for training, and if yes, then how do I check the results after training?
  2. What do the warnings mean?

回答1:


Hi I got a brief successful training data set

public static void TrainNames() throws IOException
    {
        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream =new PlainTextByLineStream(new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-ner-person.train"), charset);
        ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);       
        //FileReader fileReader = new FileReader("train.txt");
        //ObjectStream fileStream = new PlainTextByLineStream(fileReader);
        //ObjectStream sampleStream = new NameSampleDataStream(fileStream);
        TokenNameFinderModel model = NameFinderME.train("en", "person", sampleStream, Collections.<String, Object>emptyMap());
        NameFinderME nfm = new NameFinderME(model);
        String sentence = "";


        BufferedReader br = new BufferedReader(new FileReader("/home/yogi.singh/dev/java/nlp/train.txt"));
        try
         {
            StringBuilder sb = new StringBuilder();
            String line = br.readLine();

            while (line != null)
            {
                sb.append(line);
                sb.append('\n');
                line = br.readLine();
            }
            sentence = sb.toString();
         } 
        finally
        {
            br.close();
        }

        InputStream is1 = new FileInputStream("/home/yogi.singh/dev/java/nlp/data/en-token.bin");
        TokenizerModel model1 = new TokenizerModel(is1);

        Tokenizer tokenizer = new TokenizerME(model1);

        String tokens[] = tokenizer.tokenize(sentence);

        for (String a : tokens)
            System.out.println(a);

        Span nameSpans[] = nfm.find(tokens);
        for(Span s: nameSpans)
        {
            System.out.print(s.toString());
            System.out.print(" ");
            for(int index = s.getStart();index < s.getEnd();index++)
            {
                System.out.print(tokens[index] + " ");
            }
            System.out.println(" ");
        }
    }



回答2:


The warnings are related to the use of Java generics rather than OpenNLP.

Try this:

ObjectStream<String> fileStream = new PlainTextByLineStream(fileReader);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(fileStream);


来源:https://stackoverflow.com/questions/19397291/training-named-entity-in-opennlp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!