org.apache.uima.tika
Class FileSystemCollectionReader
java.lang.Object
org.apache.uima.resource.Resource_ImplBase
org.apache.uima.resource.ConfigurableResource_ImplBase
org.apache.uima.collection.CollectionReader_ImplBase
org.apache.uima.tika.FileSystemCollectionReader
- All Implemented Interfaces:
- org.apache.uima.collection.base_cpm.BaseCollectionReader, org.apache.uima.collection.CollectionReader, org.apache.uima.resource.ConfigurableResource, org.apache.uima.resource.Resource
public class FileSystemCollectionReader
- extends org.apache.uima.collection.CollectionReader_ImplBase
A collection reader that reads documents from a directory in the
filesystem.
This resource is different from the one in UIMA example as it uses TIKA to
extract the text from binary documents and generates annotations to represent
the markup
Field Summary |
static String |
PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a
directory containing input files. |
static String |
PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of
the documents in the input directory. |
Fields inherited from interface org.apache.uima.resource.Resource |
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_PARAM_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT |
Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase |
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit |
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase |
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue |
Methods inherited from class org.apache.uima.resource.Resource_ImplBase |
getCasManager, getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.uima.resource.ConfigurableResource |
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue |
Methods inherited from interface org.apache.uima.resource.Resource |
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger |
PARAM_INPUTDIR
public static final String PARAM_INPUTDIR
- Name of configuration parameter that must be set to the path of a
directory containing input files.
- See Also:
- Constant Field Values
PARAM_LANGUAGE
public static final String PARAM_LANGUAGE
- Name of optional configuration parameter that contains the language of
the documents in the input directory. If specified this information will
be added to the CAS.
- See Also:
- Constant Field Values
FileSystemCollectionReader
public FileSystemCollectionReader()
hasNext
public boolean hasNext()
- See Also:
BaseCollectionReader.hasNext()
getNext
public void getNext(org.apache.uima.cas.CAS aCAS)
throws IOException,
org.apache.uima.collection.CollectionException
- Throws:
IOException
org.apache.uima.collection.CollectionException
- See Also:
CollectionReader.getNext(org.apache.uima.cas.CAS)
close
public void close()
throws IOException
- Throws:
IOException
- See Also:
BaseCollectionReader.close()
getProgress
public org.apache.uima.util.Progress[] getProgress()
- See Also:
BaseCollectionReader.getProgress()
getNumberOfDocuments
public int getNumberOfDocuments()
- Gets the total number of documents that will be returned by this
collection reader. This is not part of the general collection reader
interface.
- Returns:
- the number of documents in the collection
initialize
public void initialize()
throws org.apache.uima.resource.ResourceInitializationException
- Overrides:
initialize
in class org.apache.uima.collection.CollectionReader_ImplBase
- Throws:
org.apache.uima.resource.ResourceInitializationException
Copyright © 2006-2011 The Apache Software Foundation. All Rights Reserved.