How to create pipeline of java nlp and ruta scripts?

半腔热情 提交于 2019-12-07 20:03:11

问题


I'm working on a Maven project which dynamically executes some ruta scripts to annotate some tags and process the output in java.

Now that I want to use NLP (mostly dkpro) first and then pass the output to the ruta scripts (pipeline) and process further. How to achieve it ?


Edited:

Below is my new script;

    AnalysisEngineDescription pipeline = createEngineDescription(createEngineDescription(OpenNlpSegmenter.class),
            createEngineDescription(OpenNlpPosTagger.class),
            AnalysisEngineFactory.createEngineDescription(RutaEngine.class, RutaEngine.PARAM_MAIN_SCRIPT,
                    "com.textjuicer.ruta.date.Author_updated"),
            createEngineDescription(ConsoleWriter.class));

Error:

Not able to resolve type: Reference

May 25, 2016 6:45:43 PM org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl processAndOutputNewCASes(273) SEVERE: Exception occurred org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) at com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) at com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) Caused by: java.lang.IllegalArgumentException: Not able to resolve type: Reference at org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) at org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) at org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ... 17 more

Exception in thread "main" org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:563) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:298) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:568) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:410) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:343) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:170) at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:191) at com.textjuicer.ruta.date.ArtifactAnnotator.runNLP(ArtifactAnnotator.java:225) at com.textjuicer.ruta.date.ArtifactAnnotator.getAllAnnotations(ArtifactAnnotator.java:70) at com.textjuicer.ruta.date.ArtifactAnnotator.main(ArtifactAnnotator.java:38) Caused by: java.lang.IllegalArgumentException: Not able to resolve type: Reference at org.apache.uima.ruta.expression.type.SimpleTypeExpression.getType(SimpleTypeExpression.java:48) at org.apache.uima.ruta.rule.RegExpRule.getGroup2Types(RegExpRule.java:148) at org.apache.uima.ruta.rule.RegExpRule.apply(RegExpRule.java:80) at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63) at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48) at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:561) ... 17 more


回答1:


You can add Ruta script simply as an analysis engine at the end of your DKPro Pipeline. The exact code mainly depends on how you build and run your pipeline.

Adapted from the uimafit documentation:

// your collecton reader
CollectionReaderDescription reader = 
  CollectionReaderFactory.createReaderDescription(
    TextReader.class, 
    TextReader.PARAM_INPUT, "/home/uimafit/documents");

// some DKPro Code component
AnalysisEngineDescription dkpro= 
  AnalysisEngineFactory.createEngineDescription(
    Tokenizer.class);

AnalysisEngineDescription ruta = 
  AnalysisEngineFactory.createEngineDescription(
    RutaEngine.class, 
    RutaEngine.PARAM_MAIN_SCRIPT, "Main.ruta");

// some writer
AnalysisEngineDescription writer= 
  AnalysisEngineFactory.createEngineDescription(
    XmiWriter.class, 
    XmiWriter.PARAM_OUTPUT, "/home/uimafit/output");

SimplePipeline.runPipeline(reader, dkpro, ruta, writer);

You can create an analysis engine of your Ruta script by using the uimaFIT factories by either specifying the mainScript parameter or by directly configuring the rules with PARAM_RULES. You can also use the xml descriptor of the Ruta script to create the analysis engine.

If the ruta script declares new types, then either the xml descriptor has to be used to create the analysis engine, or the types.txt file of uimaFIT needs to be extended by the generated type system of the script. (... or the type system need to be included in some other way.)

If the ruta script imports and calls other scripts, then the generated descriptor need to be used, or the corresponding parameters need to be set correctly, e.g., additionalScripts. Same is true for imported analysis engines.

If you import the NLP/DKPro typesystem in your Ruta script, then you can simply write rules using the DKPro annotations.

(I am a developer of UIMA Ruta)



来源:https://stackoverflow.com/questions/37404738/how-to-create-pipeline-of-java-nlp-and-ruta-scripts

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!