|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.semanticdesktop.aperture.extractor.microsoft.util.PoiUtil
public class PoiUtil
Features Apache POI-specific utility methods for text and metadata extraction purposes.
Some methods use a buffer to be able to reset the InputStream to its start. The buffer size can be altered by giving the "aperture.poiUtil.bufferSize" system property a value holding the number of bytes that the buffer may use.
| Nested Class Summary | |
|---|---|
static class |
PoiUtil.NonCloseableStream
|
static interface |
PoiUtil.TextExtractor
A TextExtractor is a delegate that extracts the full-text from an MS Office document using a POIFSFileSystem. |
| Constructor Summary | |
|---|---|
PoiUtil()
|
|
| Method Summary | |
|---|---|
static InputStream |
extractAll(InputStream stream,
PoiUtil.TextExtractor textExtractor,
RDFContainer container,
Logger logger)
Extract full-text and metadata from an MS Office document contained in the specified stream. |
static void |
extractMetadata(org.apache.poi.poifs.filesystem.DirectoryNode dirNode,
RDFContainer container)
|
static InputStream |
extractMetadata(InputStream stream,
boolean resetStream,
RDFContainer container)
Extract all metadata from an OLE document. |
static void |
extractMetadata(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFileSystem,
RDFContainer container)
Extracts all metadata from the POIFSFileSystem's SummaryInformation and transforms it to RDF statements that are stored in the specified RDFContainer. |
static int |
getBufferSize()
Returns the buffer size to use when buffering the contents of a document. |
static org.apache.poi.hpsf.DocumentSummaryInformation |
getDocumentSummaryInformation(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFileSystem)
Returns the SummaryInformation holding the document metadata from a POIFSFileSystem. |
static org.apache.poi.hpsf.SummaryInformation |
getSummaryInformation(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFileSystem)
Returns the SummaryInformation holding the document metadata from a POIFSFileSystem. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public PoiUtil()
| Method Detail |
|---|
public static org.apache.poi.hpsf.SummaryInformation getSummaryInformation(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFileSystem)
poiFileSystem - The POI file system to obtain the metadata from.
public static org.apache.poi.hpsf.DocumentSummaryInformation getDocumentSummaryInformation(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFileSystem)
poiFileSystem - The POI file system to obtain the metadata from.
public static InputStream extractMetadata(InputStream stream,
boolean resetStream,
RDFContainer container)
throws IOException
stream - The stream containing the OLE document.resetStream - Specified whether the stream should be buffered and reset. The buffer size can be
determined by the system property described in the class documentation.container - The RDFContainer to store the metadata in.
IOException - When resetting of the buffer resulted in an IOException.
public static void extractMetadata(org.apache.poi.poifs.filesystem.POIFSFileSystem poiFileSystem,
RDFContainer container)
poiFileSystem - The POI file system to obtain the metadata from.container - The RDFContainer to store the created RDF statements in.
public static void extractMetadata(org.apache.poi.poifs.filesystem.DirectoryNode dirNode,
RDFContainer container)
public static InputStream extractAll(InputStream stream,
PoiUtil.TextExtractor textExtractor,
RDFContainer container,
Logger logger)
public static int getBufferSize()
systemProperty - The system property that contains the buffer size.defaultSize - The default buffer size, in case the system property is not set or does not contain
a valid value.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||