Reusable version of DKPro Core pipeline

让人想犯罪 __ 提交于 2019-12-06 16:31:27

Create a single CAS:

JCas jcas = JCasFactory.createJCas();

Fill the CAS

jcas.setDocumentText("This is a test");
jcas.setDocumentLanguage("en");

Create the pipeline once (and keep the engine around for further requests) using

AnalysisEngine engine = createEngine(
   createEngineDescription(...),
   createEngineDescription(...),
   ...);

If you create the engine implicitly all the time, it has to load models etc over and over again.

Apply the pipeline to the CAS

SimplePipeline.runPipeline(jcas, engine);

If you want to further speed up processing, then create yourself a pool of CASes and re-use them across multiple requests - creating a CAS from scratch takes a moment.

Some components may be thread-safe, others may not. This is largely up to the implementation of the underlying third-party library. But also the wrappers in DKPro Core are not explicitly built to be thread-safe. For example, in the default configuration, models are loaded and used depending on the document language. If you use the same instance of an analysis engine from multiple threads, this would cause problems.

Again, you should consider creating a pool of pre-instantiated pipelines. You would need quite a bit of memory though, because each instance will be loading their own models. There is some experimental functionality to share models between instances of the same component, but it is not tested too much. Mind that third-party tools may also have implemented their models in a non-thread-safe manner. For model sharing in DKPro Core, see this discussion on the mailing list.

Disclosure: I am one of the DKPro Core developers.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!