Getting output in the desired format using TokenRegex

后端未结

关注

 3  901

执笔经年 2021-01-24 06:31

I am using TokensRegex for rule based entity extraction. It works well but I am having trouble getting my output in the desired format. The following snippet of code gives me an

3条回答

梦毁少年i (楼主)

2021-01-24 07:02

I produced a jar of the latest build a week or so ago. Use that jar available from GitHub.

This sample code will run the rules and apply the appropriate ner tags.

package edu.stanford.nlp.examples;

import edu.stanford.nlp.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;

import java.util.*;


public class TokensRegexExampleTwo {

  public static void main(String[] args) {

    // set up properties
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,tokensregex");
    props.setProperty("tokensregex.rules", "multi-step-per-org.rules");
    props.setProperty("tokensregex.caseInsensitive", "true");

    // set up pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // set up text to annotate
    Annotation annotation = new Annotation("...text to annotate...");

    // annotate text
    pipeline.annotate(annotation);

    // print out found entities
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
        System.out.println(token.word() + "\t" + token.ner());
      }
    }
  }
}

0 讨论(0)

查看其它3个回答