Package org.semanticdesktop.aperture.extractor.util

Interface Summary
PoiUtil.TextExtractor A TextExtractor is a delegate that extracts the full-text from an MS Office document using a POIFSFileSystem.
 

Class Summary
HtmlParserUtil A utility class for HTML parsing using the HTMLParser library.
HtmlParserUtil.ContentExtractor A NodeVisitor specialization that is able to start all over with interpreting parsing events.
PoiUtil Features Apache POI-specific utility methods for text and metadata extraction purposes.
StringExtractor StringExtractor uses a set of heuristics to extract as much human-readable text as possible from a binary stream.
ThreadedExtractorWrapper A ThreadedExtractorWrapper wraps an Extractor and executes it on a separate thread, bailing out if the wrapped Extractor appears to be hanging.
WPFilterInputStream A FilterInputStream that processes bytes that have a special meaning in WordPerfect documents.
WPStringExtractor A StringExtractor extension optimized for processing WordPerfect document streams.
 

Exception Summary
ThreadedExtractorWrapper.ExtractionAbortedException An exception that gets thrown if the extraction is aborted per user request i.e. when the ThreadedExtractorWrapper.stop() method is called.
ThreadedExtractorWrapper.ExtractionInterruptedException An exception that gets thrown if the underlying extractor hangs.