I am using TokensRegex for rule based entity extraction. It works well but I am having trouble getting my output in the desired format. The following snippet of code gives me an
I managed to get output in desired format.
Annotation document = new Annotation();
//use the pipeline to annotate the document we created
pipeline.annotate(document);
List sentences = document.get(SentencesAnnotation.class);
//Note- I doesn't put environment related stuff in rule file.
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor
.createExtractorFromFiles(env, "test_degree.rules");
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
List matched = extractor.extractExpressions(sentence);
for(MatchedExpression phrase : matched){
// Print out matched text and value
System.out.println("MATCHED ENTITY: " + phrase.getText() + " VALUE: " + phrase.getValue().get());
}
}
Output:
MATCHED ENTITY: Technical Skill VALUE: SKILL
You might want to have a look at my rule file in this question.
Hope this helps!