|
Class Summary |
| HtmlParserUtil |
A utility class for HTML parsing using the HTMLParser library. |
| HtmlParserUtil.ContentExtractor |
A NodeVisitor specialization that is able to start all over with interpreting parsing events. |
| PoiUtil |
Features Apache POI-specific utility methods for text and metadata extraction purposes. |
| StringExtractor |
StringExtractor uses a set of heuristics to extract as much human-readable text as possible from a binary
stream. |
| ThreadedExtractorWrapper |
A ThreadedExtractorWrapper wraps an Extractor and executes it on a separate thread, bailing out if the
wrapped Extractor appears to be hanging. |
| WPFilterInputStream |
A FilterInputStream that processes bytes that have a special meaning in WordPerfect documents. |
| WPStringExtractor |
A StringExtractor extension optimized for processing WordPerfect document streams. |