ruta

UIMA RUTA - how to do find & replace using regular expression and groups

£可爱£侵袭症+ 提交于 2019-12-22 09:23:21
问题 RUTA newbie here. I'm processing a document using RUTA and have a lot of normalization to do before I can start annotating. I'm trying to find the best way to do a Find and Replace of sequence of characters using regular expressions and groups on the original document in RUTA. In essence, I'm trying to see how to do something similar to a String.replaceAll in RUTA. For example, in Java, inputString = inputString.replaceAll( "(?i)7\\s*\\(SEVEN\\)", "7"); But I can't figure out a simple way to

How to reconfigure uima ruta analysis engine (change the parameter values) programmatically?

北城余情 提交于 2019-12-14 00:25:23
问题 This is in continuation with the question: How to run external ruta scripts from a maven project without placing the script or its typesystem in the classpath? Please guide me to reconfigure analysis engine (by changing the parameter values) programmatically. 回答1: Situation: you have a correct xml descriptor of a UIMA Ruta analysis engine and you want to reconfigure so that the paths point to the folder of the descriptor.java url to file The following code illustrates that by changing the

UIMA Ruta: Editor could not be initialized

情到浓时终转凉″ 提交于 2019-12-13 02:22:25
问题 I am new to UIMA Ruta and I am currently trying to get a simple HelloWorld script to run. I followed the instructions here to set up my HelloWorld project. The first error that occured was java.lang.NoClassDefFoundError: org/slf4j/event/Logger which I resolved by converting my project to a maven project and adding the slf4j-api 2.0.0-alpha1 and ruta-core 2.7.0 dependencies to pom.xml. Now, my HelloWorld script generates an output file to the output folder. But when I try to open it with the

UIMA Ruta Creating annotation with features separated by some text

 ̄綄美尐妖づ 提交于 2019-12-07 22:29:01
问题 I have some text with annotations created like the following: wewf.werwfwef. wewfwefwwew. wefewefwff AnnotationA asdfawece aefae eafewfaefa aefafe ceaewfae adfcaecae acaeaet aegaegageg caeacdaefa AnnotationB sadaeceaee aef aewfaegg rresf ceeaefaeaeaf adfcaecae acaeaet aegaegageg caeacdaefa AnnotationA adfcaecae acaeaet aegaegageg caeacdaefa adfcaecae acaeaet aegaegageg caeacdaefa AnnotationB adfcaecae acaeaet aegaegageg caeacdaefa adfcaecae acaeaet aegaegageg caeacdaefa I want to create an

How to create pipeline of java nlp and ruta scripts?

半腔热情 提交于 2019-12-07 20:03:11
问题 I'm working on a Maven project which dynamically executes some ruta scripts to annotate some tags and process the output in java. Now that I want to use NLP (mostly dkpro) first and then pass the output to the ruta scripts (pipeline) and process further. How to achieve it ? Edited: Below is my new script; AnalysisEngineDescription pipeline = createEngineDescription(createEngineDescription(OpenNlpSegmenter.class), createEngineDescription(OpenNlpPosTagger.class), AnalysisEngineFactory

How to create pipeline of java nlp and ruta scripts?

ぃ、小莉子 提交于 2019-12-06 07:02:23
I'm working on a Maven project which dynamically executes some ruta scripts to annotate some tags and process the output in java. Now that I want to use NLP (mostly dkpro) first and then pass the output to the ruta scripts (pipeline) and process further. How to achieve it ? Edited: Below is my new script; AnalysisEngineDescription pipeline = createEngineDescription(createEngineDescription(OpenNlpSegmenter.class), createEngineDescription(OpenNlpPosTagger.class), AnalysisEngineFactory.createEngineDescription(RutaEngine.class, RutaEngine.PARAM_MAIN_SCRIPT, "com.textjuicer.ruta.date.Author_updated

Uima Ruta Out of Memory issue in spark context

无人久伴 提交于 2019-12-05 21:43:25
问题 I'm running an UIMA application on apache spark. There are million of pages coming into batches to be processed by UIMA RUTA for calculation. But some time i'm facing out of memory exception.It throws exception sometime as it successfully process 2000 pages but some time fail on 500 pages. Application Log Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.uima.internal.util.IntArrayUtils.expand_size(IntArrayUtils.java:57) at org.apache.uima.internal.util.IntArrayUtils.ensure

Html Annotator,Html Converter in Uima Ruta

僤鯓⒐⒋嵵緔 提交于 2019-12-05 21:31:20
Can anyone briefly explain about the Html annotator, Html converter and TEIViewWriter with some examples.I want to create annotations in the initial view. Awaiting for the Answer. Main Script: PACKAGE uima.ruta.example; SCRIPT uima.ruta.example.Html; Document{-> EXEC(Html)}; WORDLIST JOURNALNAMELIST='JournalName.txt'; WORDLIST CITYPUBLIST='CITYPUB.txt'; DECLARE JOURNALNAME; DECLARE CITYPUB; Document{ -> MARKFAST(JOURNALNAME, JOURNALNAMELIST)}; Document{ -> MARKFAST(CITYPUB, CITYPUBLIST)}; DECLARE Reference; "<a name=para(.+?)>(.+?)</a>"-> 2=Reference; DECLARE FirstToken, LastToken; BLOCK(InRef

Wordlist -uima ruta

大城市里の小女人 提交于 2019-12-05 21:29:03
Actually I used some CITY names and PUBLISHERS names in Wordlist.In my understanding, Wordlist will annotate all occurrences of any list item in a document.But I found a problem,that number of occurence was increased or decreased when I changed the order of the text in the list. For Example: Script: WORDLIST CITYPUBLIST='CITYPUB.txt'; DECLARE CITYPUB; Document{ -> MARKFAST(CITYPUB, CITYPUBLIST)}; WORDLIST JournalNameLIST='JournalName.txt'; DECLARE JournalName; Document{ -> MARKFAST(JournalName, JournalNameLIST)}; Wordlist(CITYPUB.txt): Arlington (VA): National Center for Education in Maternal

Setting feature value to the count of containing annotation in UIMA Ruta

天涯浪子 提交于 2019-12-05 20:14:06
I've got a RUTA script where all the sentences have been annotated with a Sentence annotation and various words and phrases have been annotated with their own specific annotations. That all works as expected. Each one of those annotations has a feature for the index of the sentence that contains it. So in a contrived example and given the text Jack and Jill went up the hill. Jack fell down. I have a "down" annotation that I want set the sentence index to 2, indicating that it is in the second sentence. I'm thinking something like the following although I know that's not correct. Sentence