uima

Change text in reusable pipeline in DKPro

你离开我真会死。 提交于 2019-12-06 15:18:38
问题 This questions describes how to reuse a pipeline in dkpro but if I only create one JCas and then try to change the text then I get the exception org.apache.uima.cas.CASRuntimeException: Data for Sofa feature setLocalSofaData() has already been set. How do I get around this? 回答1: The sofa data in the CAS can only be set once. It cannot be modified after it has been set. In order to re-use a CAS, call the reset() method on it. This clears all annotations and allows you to set the sofa/text

How to create pipeline of java nlp and ruta scripts?

ぃ、小莉子 提交于 2019-12-06 07:02:23
I'm working on a Maven project which dynamically executes some ruta scripts to annotate some tags and process the output in java. Now that I want to use NLP (mostly dkpro) first and then pass the output to the ruta scripts (pipeline) and process further. How to achieve it ? Edited: Below is my new script; AnalysisEngineDescription pipeline = createEngineDescription(createEngineDescription(OpenNlpSegmenter.class), createEngineDescription(OpenNlpPosTagger.class), AnalysisEngineFactory.createEngineDescription(RutaEngine.class, RutaEngine.PARAM_MAIN_SCRIPT, "com.textjuicer.ruta.date.Author_updated

UIMA with Spark

心已入冬 提交于 2019-12-06 02:33:01
as said in here there are some overlap between UIMA and spark in distribution infrastructures. I was planning to use UIMA with spark. (now i am moving to UIMAFit) Can any one tell me what are the problems we really face when we develop uima with spark. And what are the possible encounters. (Sorry I haven't done any research on this.) The main problem is accessing objects because UIMA tries to re instantiate objects when running their analyse engines. if the objects has local references then there will be a problem with accessing from a remote spark cluster. some RDD functions might not work

Uima Ruta Out of Memory issue in spark context

无人久伴 提交于 2019-12-05 21:43:25
问题 I'm running an UIMA application on apache spark. There are million of pages coming into batches to be processed by UIMA RUTA for calculation. But some time i'm facing out of memory exception.It throws exception sometime as it successfully process 2000 pages but some time fail on 500 pages. Application Log Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.uima.internal.util.IntArrayUtils.expand_size(IntArrayUtils.java:57) at org.apache.uima.internal.util.IntArrayUtils.ensure

Html Annotator,Html Converter in Uima Ruta

僤鯓⒐⒋嵵緔 提交于 2019-12-05 21:31:20
Can anyone briefly explain about the Html annotator, Html converter and TEIViewWriter with some examples.I want to create annotations in the initial view. Awaiting for the Answer. Main Script: PACKAGE uima.ruta.example; SCRIPT uima.ruta.example.Html; Document{-> EXEC(Html)}; WORDLIST JOURNALNAMELIST='JournalName.txt'; WORDLIST CITYPUBLIST='CITYPUB.txt'; DECLARE JOURNALNAME; DECLARE CITYPUB; Document{ -> MARKFAST(JOURNALNAME, JOURNALNAMELIST)}; Document{ -> MARKFAST(CITYPUB, CITYPUBLIST)}; DECLARE Reference; "<a name=para(.+?)>(.+?)</a>"-> 2=Reference; DECLARE FirstToken, LastToken; BLOCK(InRef

Wordlist -uima ruta

大城市里の小女人 提交于 2019-12-05 21:29:03
Actually I used some CITY names and PUBLISHERS names in Wordlist.In my understanding, Wordlist will annotate all occurrences of any list item in a document.But I found a problem,that number of occurence was increased or decreased when I changed the order of the text in the list. For Example: Script: WORDLIST CITYPUBLIST='CITYPUB.txt'; DECLARE CITYPUB; Document{ -> MARKFAST(CITYPUB, CITYPUBLIST)}; WORDLIST JournalNameLIST='JournalName.txt'; DECLARE JournalName; Document{ -> MARKFAST(JournalName, JournalNameLIST)}; Wordlist(CITYPUB.txt): Arlington (VA): National Center for Education in Maternal

Setting feature value to the count of containing annotation in UIMA Ruta

天涯浪子 提交于 2019-12-05 20:14:06
I've got a RUTA script where all the sentences have been annotated with a Sentence annotation and various words and phrases have been annotated with their own specific annotations. That all works as expected. Each one of those annotations has a feature for the index of the sentence that contains it. So in a contrived example and given the text Jack and Jill went up the hill. Jack fell down. I have a "down" annotation that I want set the sentence index to 2, indicating that it is in the second sentence. I'm thinking something like the following although I know that's not correct. Sentence

UIMA RUTA - how to do find & replace using regular expression and groups

半城伤御伤魂 提交于 2019-12-05 19:26:39
RUTA newbie here. I'm processing a document using RUTA and have a lot of normalization to do before I can start annotating. I'm trying to find the best way to do a Find and Replace of sequence of characters using regular expressions and groups on the original document in RUTA. In essence, I'm trying to see how to do something similar to a String.replaceAll in RUTA. For example, in Java, inputString = inputString.replaceAll( "(?i)7\\s*\\(SEVEN\\)", "7"); But I can't figure out a simple way to achieve this in RUTA. Thanks It's not simple in general because you cannot change the document text in

Change text in reusable pipeline in DKPro

拈花ヽ惹草 提交于 2019-12-04 21:39:22
This questions describes how to reuse a pipeline in dkpro but if I only create one JCas and then try to change the text then I get the exception org.apache.uima.cas.CASRuntimeException: Data for Sofa feature setLocalSofaData() has already been set. How do I get around this? The sofa data in the CAS can only be set once. It cannot be modified after it has been set. In order to re-use a CAS, call the reset() method on it. This clears all annotations and allows you to set the sofa/text again. To build a CAS incrementally, a common strategies is to add annotations to the CAS while adding text to a

How to reconfigure uima ruta analysis engine (change the parameter values) programmatically?

倾然丶 夕夏残阳落幕 提交于 2019-12-04 16:09:13
This is in continuation with the question: How to run external ruta scripts from a maven project without placing the script or its typesystem in the classpath? Please guide me to reconfigure analysis engine (by changing the parameter values) programmatically. Situation: you have a correct xml descriptor of a UIMA Ruta analysis engine and you want to reconfigure so that the paths point to the folder of the descriptor.java url to file The following code illustrates that by changing the parameter values in different stages. Only one stage is required. Which is the correct one for you depends on