TOKEN_TYPE
- the type system token type (e.g. org.apache.uima.fit.examples.type.Token
)SENTENCE_TYPE
- the type system sentence type (e.g. org.apache.uima.fit.examples.type.Sentence
)public class TokenBuilder<TOKEN_TYPE extends Annotation,SENTENCE_TYPE extends Annotation> extends Object
JCas
.Constructor and Description |
---|
TokenBuilder(Class<TOKEN_TYPE> tokenClass,
Class<SENTENCE_TYPE> sentenceClass)
Calls
TokenBuilder(Class, Class, String, String) with the last two
arguments as null. |
TokenBuilder(Class<TOKEN_TYPE> tokenClass,
Class<SENTENCE_TYPE> sentenceClass,
String posFeatureName,
String stemFeatureName)
Instantiates a TokenBuilder with the type system information that the builder needs to build
tokens.
|
Modifier and Type | Method and Description |
---|---|
void |
buildTokens(JCas jCas,
String text)
Builds white-space delimited tokens from the input text.
|
void |
buildTokens(JCas jCas,
String text,
String tokensString)
|
void |
buildTokens(JCas jCas,
String text,
String tokensString,
String posTagsString)
|
void |
buildTokens(JCas aJCas,
String aText,
String aTokensString,
String aPosTagsString,
String aStemsString)
Build tokens for the given text, tokens, part-of-speech tags, and word stems.
|
static <T extends Annotation,S extends Annotation> |
create(Class<T> tokenClass,
Class<S> sentenceClass)
Instantiates a TokenBuilder with the type system information that the builder needs to build
tokens.
|
void |
setPosFeatureName(String posFeatureName)
Set the feature name for the part-of-speech tag for your token type.
|
void |
setStemFeatureName(String stemFeatureName)
Set the feature name for the stem for your token type.
|
public TokenBuilder(Class<TOKEN_TYPE> tokenClass, Class<SENTENCE_TYPE> sentenceClass)
TokenBuilder(Class, Class, String, String)
with the last two
arguments as null.public TokenBuilder(Class<TOKEN_TYPE> tokenClass, Class<SENTENCE_TYPE> sentenceClass, String posFeatureName, String stemFeatureName)
tokenClass
- the class of your token type from your type system (e.g.
org.apache.uima.fit.type.Token.class)sentenceClass
- the class of your sentence type from your type system (e.g.
org.apache.uima.fit.type.Sentence.class)posFeatureName
- the feature name for the part-of-speech tag for your token type. This assumes that
there is a single string feature for which to put your pos tag. null is an ok value.stemFeatureName
- the feature name for the stem for your token type. This assumes that there is a single
string feature for which to put your stem. null is an ok value.public static <T extends Annotation,S extends Annotation> TokenBuilder<T,S> create(Class<T> tokenClass, Class<S> sentenceClass)
T
- the type system token type (e.g. org.apache.uima.fit.examples.type.Token)S
- the type system sentence type (e.g. org.apache.uima.fit.examples.type.Sentence
)tokenClass
- the class of your token type from your type system (e.g.
org.apache.uima.fit.type.Token
)sentenceClass
- the class of your sentence type from your type system (e.g.
org.apache.uima.fit.type.Sentence
)public void setPosFeatureName(String posFeatureName)
posFeatureName
- the part-of-speech feature name.public void setStemFeatureName(String stemFeatureName)
stemFeatureName
- the stem feature name.public void buildTokens(JCas jCas, String text) throws UIMAException
jCas
- the JCas to add the tokens totext
- the JCas will have its document text set to this.UIMAException
public void buildTokens(JCas jCas, String text, String tokensString) throws UIMAException
UIMAException
public void buildTokens(JCas jCas, String text, String tokensString, String posTagsString) throws UIMAException
UIMAException
public void buildTokens(JCas aJCas, String aText, String aTokensString, String aPosTagsString, String aStemsString) throws UIMAException
aJCas
- the JCas to add the Token annotations toaText
- this method sets the text of the JCas to this method. Therefore, it is generally a
good idea to call JCas.reset() before calling this method when passing in the default
view.aTokensString
- the tokensString must have the same non-white space characters as the text. The
tokensString is used to identify token boundaries using white space - i.e. the only
difference between the 'text' parameter and the 'tokensString' parameter is that the
latter may have more whitespace characters. For example, if the text is "She ran."
then the tokensString might be "She ran ."aPosTagsString
- the posTagsString should be a space delimited string of part-of-speech tags - one for
each tokenaStemsString
- the stemsString should be a space delimitied string of stems - one for each tokenUIMAException
Copyright © 2012–2013 The Apache Software Foundation. All rights reserved.