ruta

Maximum size for a single Wordlist-UIMA RUTA

两盒软妹~` 提交于 2021-02-19 02:54:12
问题 What is the maximum size for a wordlist in Uima Ruta? Because I want to store list of countries, states and cities name. 回答1: There is no maximum size for the wordlists in UIMA Ruta. The lines of the file are normally transferred into a char-based in-memory tree structure (TRIE). This means that the size is only restricted by the available RAM and it's memory consumption is less than linear. My largest wordlist consisted of about 500k entries, as far as I remember. So a list of country names

Why do I get “Editor could not be initialized” error while running UIMA Ruta scripts?

二次信任 提交于 2021-01-29 04:54:54
问题 I often get errors like this while running UIMA Ruta scripts. Why so ? What can I do to prevent it ? Does it depend on my code or is it related to Eclipse IDE ? Error: Editor could not be initialized. org.apache.uima.UIMARuntimeException at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:368) at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:312) at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:193) at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:218) at org.apache

Why do I get “Editor could not be initialized” error while running UIMA Ruta scripts?

て烟熏妆下的殇ゞ 提交于 2021-01-29 04:53:12
问题 I often get errors like this while running UIMA Ruta scripts. Why so ? What can I do to prevent it ? Does it depend on my code or is it related to Eclipse IDE ? Error: Editor could not be initialized. org.apache.uima.UIMARuntimeException at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:368) at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:312) at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:193) at org.apache.uima.util.CasIOUtils.load(CasIOUtils.java:218) at org.apache

UIMA Ruta, uimaFIT and DKPro: Which versions work together?

时光毁灭记忆、已成空白 提交于 2020-01-25 09:19:05
问题 In the GSCL 2013 Ruta tutorial the versions of the components in the pom.xml are: uimaj-core: 2.4.2 DKPro components: 1.5.0 ruta-core: 2.1.0 Now, I incremented the version numbers incrementally and found that version 1.8.0 of the DKPro components introduces the following exception: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.uima.cas.text.AnnotationIndex.withSnapshotIterators()Lorg/apache/uima/cas/FSIndex; at org.apache.uima.fit.util.FSCollectionFactory

XCASParsingException while trying to deserialize xmi into CAS object

喜欢而已 提交于 2020-01-05 04:07:09
问题 I have made Ruta scripts run from Java and have converted the resulting CAS object into an xmi file as below; FileOutputStream fileOutputStream = new FileOutputStream(outputXmiFile); XmiCasSerializer.serialize(cas, fileOutputStream); When I try to convert it back into a CAS object (on another server), as below; FileInputStream fileInputStream = new FileInputStream(xmiFile); XmlCasDeserializer.deserialize(fileInputStream, cas); I get the below exception ; XCASParsingException: Error parsing

Are some extra settings in RUTA script needed to detect annotations with the same begin and end attributes?

半腔热情 提交于 2019-12-24 11:09:02
问题 I have a xmi output from Tika UIMA Annotator which is passed to a UIMA Ruta script for further processing. I was able to successfully import the corresponding type system and detect any MarkupAnnotations covering some fragment of text. However the input has some MarkupAnnotations which has the same value for begin and end (so, do not cover any text). Those annotations are not recognized by the RUTA engine. For example, the following rule is not fired: MarkupAnnotation.name=="img" {->MARK

How to run external ruta scripts from a maven project without placing the script or its typesystem in the classpath?

狂风中的少年 提交于 2019-12-23 03:15:57
问题 Till now, I had been running ruta scripts from a maven project by creating AnalysisEngine and CAS, and processing the engine. To do this, I had placed all the scripts and descriptor files (Engine & TypeSystem) into scr/main/resources folder of the maven project. Now I want to place the scripts and TypeSystem files in an external path and pass the path dynamically to my java code that runs the scripts. Is it possible to do it ? If so, how ? I simply placed the files(script & descriptor) in an

CPU usage too high while running Ruta Script

六眼飞鱼酱① 提交于 2019-12-22 12:44:06
问题 CPU usage too high while running Ruta Script.So I plan to use GPU. Whether I need to do any additional process to run the script in GPU machine. Orelse is there any alternative solution to reduce the CPU usage Sample Script: PACKAGE uima.ruta.example; ENGINE utils.PlainTextAnnotator; TYPESYSTEM utils.PlainTextTypeSystem; WORDLIST EditorMarkerList = 'EditorMarker.txt'; WORDLIST EnglishStopWordList = 'EnglishStopWords.txt'; WORDLIST FirstNameList = 'FirstNames.txt'; WORDLIST

Setting feature value to the count of containing annotation in UIMA Ruta

白昼怎懂夜的黑 提交于 2019-12-22 09:55:54
问题 I've got a RUTA script where all the sentences have been annotated with a Sentence annotation and various words and phrases have been annotated with their own specific annotations. That all works as expected. Each one of those annotations has a feature for the index of the sentence that contains it. So in a contrived example and given the text Jack and Jill went up the hill. Jack fell down. I have a "down" annotation that I want set the sentence index to 2, indicating that it is in the second

UIMA RUTA - how to do find & replace using regular expression and groups

独自空忆成欢 提交于 2019-12-22 09:24:24
问题 RUTA newbie here. I'm processing a document using RUTA and have a lot of normalization to do before I can start annotating. I'm trying to find the best way to do a Find and Replace of sequence of characters using regular expressions and groups on the original document in RUTA. In essence, I'm trying to see how to do something similar to a String.replaceAll in RUTA. For example, in Java, inputString = inputString.replaceAll( "(?i)7\\s*\\(SEVEN\\)", "7"); But I can't figure out a simple way to