The ADD action adds all the elements of the passed RutaExpressions to a given list. For example, this expressions could be a string, an integer variable or a list. For a complete overview on UIMA Ruta expressions see Section 2.6, “Expressions”.
ADD(ListVariable,(RutaExpression)+)
Document{->ADD(list, var)};
In this example, the variable 'var' is added to the list 'list'.
The ADDFILTERTYPE action adds its arguments to the list of filtered types, which restrict the visibility of the rules.
ADDFILTERTYPE(TypeExpression(,TypeExpression)*)
Document{->ADDFILTERTYPE(CW)};
After applying this rule, capitalized words are invisible additionally to the previously filtered types.
The ADDFILTERTYPE action adds its arguments to the list of retained types, which extend the visibility of the rules.
ADDRETAINTYPE(TypeExpression(,TypeExpression)*)
Document{->ADDRETAINTYPE(MARKUP)};
After applying this rule, markup is visible additionally to the previously retained types.
The ASSIGN action assigns the value of the passed expression to a variable of the same type.
ASSIGN(BooleanVariable,BooleanExpression)
ASSIGN(NumberVariable,NumberExpression)
ASSIGN(StringVariable,StringExpression)
ASSIGN(TypeVariable,TypeExpression)
Document{->ASSIGN(amount, (amount/2))};
In this example, the value of the variable 'amount' is divided in half.
The CALL action initiates the execution of a different script file or script block. Currently, only complete script files are supported.
CALL(DifferentFile)
CALL(Block)
Document{->CALL(NamedEntities)};
Here, a script 'NamedEntities' for named entity recognition is executed.
The CLEAR action removes all elements of the given list. If the list was initialized as it was declared, then it is reset to its initial value.
CLEAR(ListVariable)
Document{->CLEAR(SomeList)};
This rule clears the list 'SomeList'.
The COLOR action sets the color of an annotation type in the modified view, if the rule has fired. The background color is passed as the second parameter. The font color can be changed by passing a further color as a third parameter. The supported colors are: black, silver, gray, white, maroon, red, purple, fuchsia, green, lime, olive, yellow, navy, blue, aqua, lightblue, lightgreen, orange, pink, salmon, cyan, violet, tan, brown, white and mediumpurple.
COLOR(TypeExpression,StringExpression(, StringExpression (, BooleanExpression)?)?)
Document{->COLOR(Headline, "red", "green", true)};
This rule colors all Headline annotations in the modified view. Thereby, the background color is set to red, font color is set to green and all 'Headline' annotations are selected when opening the modified view.
The CONFIGURE action can be used to configure the analysis engine of the given namespace (first parameter). The parameters that should be configured with corresponding values are passed as name-value pairs.
CONFIGURE(AnalysisEngine(,StringExpression = Expression)+)
ENGINE utils.HtmlAnnotator; Document{->CONFIGURE(HtmlAnnotator, "onlyContent" = false)};
The former rule changes the value of configuration parameter “onlyContent” to false and reconfigure the analysis engine.
The CREATE action is similar to the MARK action. It also annotates the matched text fragments with a type annotation, but additionally assigns values to a chosen subset of the type's feature elements.
CREATE(TypeExpression(,NumberExpression)* (,StringExpression = Expression)+)
Paragraph{COUNT(ANY,0,10000,cnt)->CREATE(Headline,"size" = cnt)};
This rule counts the number of tokens of type ANY in a Paragraph annotation and assigns the counted value to the int variable 'cnt'. If the counted number is between 0 and 10000, a Headline annotation is created for this Paragraph. Moreover, the feature named 'size' of Headline is set to the value of 'cnt'.
The DEL action deletes the matched text fragments in the modified view.
DEL
Name{->DEL};
This rule deletes all text fragments that are annotated with a Name annotation.
The DYNAMICANCHORING action turns dynamic anchoring on or off (first parameter) and assigns the anchoring parameters penalty (second parameter) and factor (third parameter).
DYNAMICANCHORING(BooleanExpression (,NumberExpression(,NumberExpression)?)?)
Document{->DYNAMICANCHORING(true)};
The above mentioned example activates dynamic anchoring.
The EXEC action initiates the execution of a different script file or analysis engine on the complete input document, independent from the matched text and the current filtering settings. If the imported component (DifferentFile) refers to another script file, it is applied on a new representation of the document: the complete text of the original CAS with the default filtering settings of the UIMA Ruta analysis engine. If it refers to an external analysis engine, then it is applied on the complete document. The optional, first argument is is a string expression, which specifies the view the component should be applied on. The optional, third argument is a list of types, which should be reindexed by Ruta (not UIMA itself).
Annotations created by the external analysis engine are not accessible for UIMA Ruta rules in the same script. The types of these annotations need to be provided in the second argument in order to be visible to the Ruta rules.
EXEC((StringExpression,)? DifferentFile(, TypeListExpression)?)
ENGINE NamedEntities; Document{->EXEC(NamedEntities, {Person, Location})};
Here, an analysis engine for named entity recognition is executed once on the complete document and the annotations of the types Person and Location (and all subtypes) are reindexed in UIMA Ruta. Without this list of types, the annotations are added to the CAS, but cannot be accessed by Ruta rules.
The FILL action fills a chosen subset of the given type's feature elements.
FILL(TypeExpression(,StringExpression = Expression)+)
Headline{COUNT(ANY,0,10000,tokenCount) ->FILL(Headline,"size" = tokenCount)};
Here, the number of tokens within an Headline annotation is counted and stored in variable 'tokenCount'. If the number of tokens is within the interval [0;10000], the FILL action fills the Headline's feature 'size' with the value of 'tokenCount'.
This action filters the given types of annotations. They are now ignored by rules. Expressions are not yet supported. This action is related to RETAINTYPE (see Section 2.8.35, “RETAINTYPE”).
The visibility of types is calculated using three lists: A list “default” for the initially filtered types, which is specified in the configuration parameters of the analysis engine, the list “filtered”, which is specified by the FILTERTYPE action, and the list “retained”, which is specified by the RETAINTYPE action. For determining the actual visibility of types, list “filtered” is added to list “default” and then all elements of list “retained” are removed. The annotations of the types in the resulting list are not visible. Please note that the actions FILTERTYPE and RETAINTYPE replace all elements of the respective lists and that RETAINTYPE overrides FILTERTYPE.
FILTERTYPE((TypeExpression(,TypeExpression)*))?
Document{->FILTERTYPE(SW)};
This rule filters all small written words in the input document. They are further ignored by every rule.
Document{->FILTERTYPE};
Here, the the action (without parentheses) specifies that no additional types should be filtered.
This action creates a complex structure: an annotation with features. The optionally passed indexes (NumberExpressions after the TypeExpression) can be used to create an annotation that spans the matched information of several rule elements. The features are collected using the indexes of the rule elements of the complete rule.
GATHER(TypeExpression(,NumberExpression)* (,StringExpression = NumberExpression)+)
DECLARE Annotation A; DECLARE Annotation B; DECLARE Annotation C(Annotation a, Annotation b); W{REGEXP("A")->MARK(A)}; W{REGEXP("B")->MARK(B)}; A B{-> GATHER(C, 1, 2, "a" = 1, "b" = 2)};
Two annotations A and B are declared and annotated. The last rule creates an annotation C spanning the elements A (index 1 since it is the first rule element) and B (index 2) with its features 'a' set to annotation A (again index 1) and 'b' set to annotation B (again index 2).
The GET action retrieves an element of the given list dependent on a given strategy.
Table 2.3. Currently supported strategies
Strategy | Functionality |
---|---|
dominant | finds the most occurring element |
GET(ListExpression, Variable, StringExpression)
Document{->GET(list, var, "dominant")};
In this example, the element of the list 'list' that occurs most is stored in the variable 'var'.
The GETFEATURE action stores the value of the matched annotation's feature (first paramter) in the given variable (second parameter).
GETFEATURE(StringExpression, Variable)
Document{->GETFEATURE("language", stringVar)};
In this example, variable 'stringVar' will contain the value of the feature 'language'.
This action retrieves a list of types dependent on a given strategy.
Table 2.4. Currently supported strategies
Strategy | Functionality |
---|---|
Types | get all types within the matched annotation |
Types:End | get all types that end at the same offset as the matched annotation |
Types:Begin | get all types that start at the same offset as the matched annotation |
GETLIST(ListVariable, StringExpression)
Document{->GETLIST(list, "Types")};
Here, a list of all types within the document is created and assigned to list variable 'list'.
The GREEDYANCHORING action turns greedy anchoring on or off. If the first parameter is set to true, then start positions already matched by the same rule element will be ignored. This situation occurs mostly for rules that start with a quantifier. The second optional parameter activates greedy acnhoring for the complete rule. Later rule matches are only possible after previous matches.
GREEDYANCHORING(BooleanExpression(,BooleanExpression)?)
Document{->GREEDYANCHORING(true, true)}; ANY+; CW CW;
The above mentioned example activates dynamic anchoring and the second rule will then only match once since the next positions, e.g., the second token, are already covered by the first attempt. The third rule will not match on capitalized word that have benn already considered by previous matches of the rule.
The LOG action writes a log message.
LOG(StringExpression)
Document{->LOG("processed")};
This rule writes a log message with the string "processed".
The MARK action is the most important action in the UIMA Ruta system. It creates a new annotation of the given type. The optionally passed indexes (NumberExpressions after the TypeExpression) can be used to create an annotation that spanns the matched information of several rule elements.
MARK(TypeExpression(,NumberExpression)*)
Freeline Paragraph{->MARK(ParagraphAfterFreeline,1,2)};
This rule matches on a free line followed by a Paragraph annotation and annotates both in a single ParagraphAfterFreeline annotation. The two numerical expressions at the end of the mark action state that the matched text of the first and the second rule elements are joined to create the boundaries of the new annotation.
The MARKFAST action creates annotations of the given type (first parameter), if an element of the passed list (second parameter) occurs within the window of the matched annotation. Thereby, the created annotation does not cover the whole matched annotation. Instead, it only covers the text of the found occurrence. The third parameter is optional. It defines, whether the MARKFAST action should ignore the case, whereby its default value is false. The optional fourth parameter specifies a character threshold for the ignorence of the case. It is only relevant, if the ignore-case value is set to true. The last parameter is set to true by default and specifies whether whitespaces in the entries of the dictionary should be ignored. For more information on lists see Section 2.5.3, “Resources”. Additionally to external word lists, string lists variables can be used.
MARKFAST(TypeExpression,ListExpression(,BooleanExpression (,NumberExpression,(BooleanExpression)?)?)?)
MARKFAST(TypeExpression,StringListExpression(,BooleanExpression (,NumberExpression,(BooleanExpression)?)?)?)
WORDLIST FirstNameList = 'FirstNames.txt'; DECLARE FirstName; Document{-> MARKFAST(FirstName, FirstNameList, true, 2)};
This rule annotates all first names listed in the list 'FirstNameList' within the document and ignores the case, if the length of the word is greater than 2.
The MARKFIRST action annotates the first token (basic annotation) of the matched annotation with the given type.
MARKFIRST(TypeExpression)
Document{->MARKFIRST(First)};
This rule annotates the first token of the document with the annotation First.
The MARKLAST action annotates the last token of the matched annotation with the given type.
MARKLAST(TypeExpression)
Document{->MARKLAST(Last)};
This rule annotates the last token of the document with the annotation Last.
The MARKONCE action has the same functionality as the MARK action, but creates a new annotation only, if each part of the matched annotation is not yet part of the given type.
MARKONCE(NumberExpression,TypeExpression(,NumberExpression)*)
Freeline Paragraph{->MARKONCE(ParagraphAfterFreeline,1,2)};
This rule matches on a free line followed by a Paragraph and annotates both in a single ParagraphAfterFreeline annotation, if no part is not already annotated with ParagraphAfterFreeline annotation. The two numerical expressions at the end of the MARKONCE action state that the matched text of the first and the second rule elements are joined to create the boundaries of the new annotation.
The MARKSCORE action is similar to the MARK action. It also creates a new annotation of the given type, but only if it is not yet existing. The optionally passed indexes (parameters after the TypeExpression) can be used to create an annotation that spanns the matched information of several rule elements. Additionally, a score value (first parameter) is added to the heuristic score value of the annotation. For more information on heuristic scores see Section 2.12, “Heuristic extraction using scoring rules” .
MARKSCORE(NumberExpression,TypeExpression(,NumberExpression)*)
Freeline Paragraph{->MARKSCORE(10,ParagraphAfterFreeline,1,2)};
This rule matches on a free line followed by a paragraph and annotates both in a single ParagraphAfterFreeline annotation. The two number expressions at the end of the mark action indicate that the matched text of the first and the second rule elements are joined to create the boundaries of the new annotation. Additionally, the score '10' is added to the heuristic threshold of this annotation.
The MARKTABLE action creates annotations of the given type (first parameter), if an element of the given column (second parameter) of a passed table (third parameter) occures within the window of the matched annotation. Thereby, the created annotation does not cover the whole matched annotation. Instead, it only covers the text of the found occurrence. Optionally the MARKTABLE action is able to assign entries of the given table to features of the created annotation. For more information on tables see Section 2.5.3, “Resources”. Additionally, several configuration parameters are possible. (See example.)
MARKTABLE(TypeExpression, NumberExpression, TableExpression (,BooleanExpression, NumberExpression, StringExpression, NumberExpression)? (,StringExpression = NumberExpression)+)
WORDTABLE TestTable = 'TestTable.csv'; DECLARE Annotation Struct(STRING first); Document{-> MARKTABLE(Struct, 1, TestTable, true, 4, ".,-", 2, "first" = 2)};
In this example, the whole document is searched for all occurrences of the entries of the first column of the given table 'TestTable'. For each occurrence, an annotation of the type Struct is created and its feature 'first' is filled with the entry of the second column. Moreover, the case of the word is ignored if the length of the word exceeds 4. Additionally, the chars '.', ',' and '-' are ignored, but maximally two of them.
The MATCHEDTEXT action saves the text of the matched annotation in a passed String variable. The optionally passed indexes can be used to match the text of several rule elements.
MATCHEDTEXT(StringVariable(,NumberExpression)*)
Headline Paragraph{->MATCHEDTEXT(stringVariable,1,2)};
The text covered by the Headline (rule element 1) and the Paragraph (rule element 2) annotation is saved in variable 'stringVariable'.
The MERGE action merges a number of given lists. The first parameter defines, if the merge is done as intersection (false) or as union (true). The second parameter is the list variable that will contain the result.
MERGE(BooleanExpression, ListVariable, ListExpression, (ListExpression)+)
Document{->MERGE(false, listVar, list1, list2, list3)};
The elements that occur in all three lists will be placed in the list 'listVar'.
The REMOVE action removes lists or single values from a given list.
REMOVE(ListVariable,(Argument)+)
Document{->REMOVE(list, var)};
In this example, the variable 'var' is removed from the list 'list'.
This action removes all duplicates within a given list.
REMOVEDUPLICATE(ListVariable)
Document{->REMOVEDUPLICATE(list)};
Here, all duplicates within the list 'list' are removed.
The REMOVEFILTERTYPE action removes its arguments from the list of filtered types, which restrict the visibility of the rules.
REMOVEFILTERTYPE(TypeExpression(,TypeExpression)*)
Document{->REMOVEFILTERTYPE(W)};
After applying this rule, words are possibly visible again depending on the current filtering settings.
The REMOVEFILTERTYPE action removes its arguments from the list of retained types, which extend the visibility of the rules.
REMOVERETAINTYPE(TypeExpression(,TypeExpression)*)
Document{->REMOVERETAINTYPE(W)};
After applying this rule, words are possibly not visible anymore depending on the current filtering settings.
The REPLACE action replaces the text of all matched annotations with the given StringExpression. It remembers the modification for the matched annotations and shows them in the modified view (see Section 2.13, “Modification”).
REPLACE(StringExpression)
FirstName{->REPLACE("first name")};
This rule replaces all first names with the string 'first name'.
The RETAINTYPE action retains the given types. This means that they are now not ignored by rules. This action is related to FILTERTYPE (see Section 2.8.14, “FILTERTYPE”).
The visibility of types is calculated using three lists: A list “default” for the initially filtered types, which is specified in the configuration parameters of the analysis engine, the list “filtered”, which is specified by the FILTERTYPE action, and the list “retained”, which is specified by the RETAINTYPE action. For determining the actual visibility of types, list “filtered” is added to list “default” and then all elements of list “retained” are removed. The annotations of the types in the resulting list are not visible. Please note that the actions FILTERTYPE and RETAINTYPE replace all elements of the respective lists and that RETAINTYPE overrides FILTERTYPE.
RETAINTYPE((TypeExpression(,TypeExpression)*))?
Document{->RETAINTYPE(SPACE)};
Here, all spaces are retained and can be matched by rules.
Document{->RETAINTYPE};
Here, the the action (without parentheses) specifies that no types should be retained.
The SETFEATURE action sets the value of a feature of the matched complex structure.
SETFEATURE(StringExpression,Expression)
Document{->SETFEATURE("language","en")};
Here, the feature 'language' of the input document is set to English.
The SHIFT action can be used to change the offsets of an annotation. The optional number expressions, which point the rule elements of the rule, specify the new offsets of the annotation. The annotations that will be modified have to start or end at the match of the rule element of the action. This means that the action has to be placed at a matching condition, which will be used to specify the annotations to be changed.
SHIFT(TypeExpression(,NumberExpression)*)
Author{-> SHIFT(Author,1,2)} PM;
In this example, an annotation of the type “Author” is expanded in order to cover the following punctation mark.
W{STARTSWITH(FS) -> SHIFT(FS, 1, 2)} W+ MARKUP;
In this example, an annotation of the type “FS” that consists mostly of words is shrinked by removing the last MARKUP annotation.
The SPLIT action is able to split the matched annotation for each occurrence of annotation of the given type. There are three additional parameters: The first one specifies if complete annotations of the given type should be used to split the matched annotations. If set to false, then even the boundary of an annotation will cause splitting. The third (addToBegin) and fourth (addToEnd) argument specify if the complete annotation (for splitting) will be added to the begin or end of the splitted annotation. The latter two are only utilized if the first one is set to true.. If omitted, the first argument is true and the other two arguments are false by default.
SPLIT(TypeExpression(,BooleanExpression, (BooleanExpression, BooleanExpression)? )?
Sentence{-> SPLIT(PERIOD, true, false, true)};
In this example, an annotation of the type “Sentence” is splitted for each occurence of a period, which is added to the end of the new sentence.
The TRANSFER action creates a new feature structure and adds all compatible features of the matched annotation.
TRANSFER(TypeExpression)
Document{->TRANSFER(LanguageStorage)};
Here, a new feature structure “LanguageStorage” is created and the compatible features of the Document annotation are copied. E.g., if LanguageStorage defined a feature named 'language', then the feature value of the Document annotation is copied.
The TRIE action uses an external multi tree word list to annotate the matched annotation and provides several configuration parameters.
TRIE((String = (TypeExpression|{TypeExpression,StringExpression, Expression}))+,ListExpression,BooleanExpression,NumberExpression, BooleanExpression,NumberExpression,StringExpression)
Document{->TRIE("FirstNames.txt" = FirstName, "Companies.txt" = Company, 'Dictionary.mtwl', true, 4, false, 0, ".,-/")};
Here, the dictionary 'Dictionary.mtwl' that contains word lists for first names and companies is used to annotate the document. The words previously contained in the file 'FirstNames.txt' are annotated with the type FirstName and the words in the file 'Companies.txt' with the type Company. The case of the word is ignored, if the length of the word exceeds 4. The edit distance is deactivated. The cost of an edit operation can currently not be configured by an argument. The last argument additionally defines several chars that will be ignored.
Document{->TRIE("FirstNames.txt" = {A, "a", "first"}, "LastNames.txt" = {B, "b", true}, "CompleteNames.txt" = {C, "c", 6}, list1, true, 4, false, 0, ":")};
Here, the dictionary 'list1' is applied on the document. Matches originated in dictionary 'FirstNames.txt' result in annotations of type A wheras their features 'a' are set to 'first'. The other two dictionaries create annotations of type 'B' and 'C' for the corresponding dictionaries with a boolean feature value and a integer feature value.
The TRIM action changes the offsets on the matched annotations by removing annotations, whose types are specified by the given parameters.
TRIM(TypeExpression ( , TypeExpression)*)
TRIM(TypeListExpression)
Keyword{-> TRIM(SPACE)};
This rule removes all spaces at the beginning and at the end of Keyword annotations and thus changes the offsets of the matched annotations.
The UNMARK action removes the annotation of the given type overlapping the matched annotation. There are two additional configurations: If additional indexes are given, then the span of the specified rule elements are applied, similar the the MARK action. If instead a boolean is given as an additional argument, then all annotations of the given type are removed that start at the matched position.
UNMARK(AnnotationExpression)
UNMARK(TypeExpression)
UNMARK(TypeExpression (,NumberExpression)*)
UNMARK(TypeExpression, BooleanExpression)
Headline{->UNMARK(Headline)};
Here, the Headline annotation is removed.
CW ANY+? QUESTION{->UNMARK(Headline,1,3)};
Here, all Headline annotations are removed that start with a capitalized word and end with a question mark.
CW{->UNMARK(Headline,true)};
Here, all Headline annotations are removed that start with a capitalized word.
Complex{->UNMARK(Complex.inner)};
Here, the annotation stored in the feature inner
will be removed.
The UNMARKALL action removes all the annotations of the given type and all of its descendants overlapping the matched annotation, except the annotation is of at least one type in the passed list.
UNMARKALL(TypeExpression, TypeListExpression)
Annotation{->UNMARKALL(Annotation, {Headline})};
Here, all annotations except from headlines are removed.