Getting output in the desired format using TokenRegex

后端 未结 3 902
执笔经年
执笔经年 2021-01-24 06:31

I am using TokensRegex for rule based entity extraction. It works well but I am having trouble getting my output in the desired format. The following snippet of code gives me an

3条回答
  •  面向向阳花
    2021-01-24 07:01

    I managed to get output in desired format.

    Annotation document = new Annotation();
    
    //use the pipeline to annotate the document we created
    pipeline.annotate(document);
    List sentences = document.get(SentencesAnnotation.class);
    
    //Note- I doesn't put environment related stuff in rule file.
    Env env = TokenSequencePattern.getNewEnv();
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
    
    
    CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor
          .createExtractorFromFiles(env, "test_degree.rules");
    
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
          List matched = extractor.extractExpressions(sentence);
          for(MatchedExpression phrase : matched){
          // Print out matched text and value
          System.out.println("MATCHED ENTITY: " + phrase.getText() + " VALUE: " + phrase.getValue().get());
          }
        }
    

    Output:

    MATCHED ENTITY: Technical Skill VALUE: SKILL

    You might want to have a look at my rule file in this question.

    Hope this helps!

提交回复
热议问题