org.semanticdesktop.aperture.extractor.poiooxml
Class PoiOOXmlExtractor

java.lang.Object
  extended by org.semanticdesktop.aperture.extractor.poiooxml.PoiOOXmlExtractor
All Implemented Interfaces:
Extractor

public class PoiOOXmlExtractor
extends Object
implements Extractor

An Extractor for Open XML files. It uses the functionality introduced in POI 3.6 to process OpenXML.

Author:
Antoni

Nested Class Summary
static class PoiOOXmlExtractor.Type
           
 
Constructor Summary
PoiOOXmlExtractor(PoiOOXmlExtractor.Type type)
           
 
Method Summary
 void extract(URI id, InputStream stream, Charset charset, String mimeType, RDFContainer container)
          Extracts full-text and metadata from the specified binary stream and stores the extracted information as RDF statements in the specified RDFContainer.
 org.apache.poi.POIXMLDocument getDocument()
           
 boolean isCacheDocument()
           
 void setCacheDocument(boolean cacheDocument)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PoiOOXmlExtractor

public PoiOOXmlExtractor(PoiOOXmlExtractor.Type type)
Method Detail

extract

public void extract(URI id,
                    InputStream stream,
                    Charset charset,
                    String mimeType,
                    RDFContainer container)
             throws ExtractorException
Description copied from interface: Extractor
Extracts full-text and metadata from the specified binary stream and stores the extracted information as RDF statements in the specified RDFContainer. The optionally specified Charset and MIME type can be used to direct how the stream should be parsed.

The specified InputStream is expected to already use some kind of buffering so that the Extractors are not required to internally buffer bytes to improve performance.

Specified by:
extract in interface Extractor
Parameters:
id - the URI identifying the object (e.g. a file or web page) from which the stream was obtained. The generated statements should describe this URI.
stream - the InputStream delivering the raw bytes.
charset - the charset in which the inputstream is encoded (optional).
mimeType - the MIME type of the passed stream (optional).
container - the container in which this Extractor can put its created RDF statements.
Throws:
ExtractorException - in case of any error during the extraction process.

isCacheDocument

public boolean isCacheDocument()

setCacheDocument

public void setCacheDocument(boolean cacheDocument)

getDocument

public org.apache.poi.POIXMLDocument getDocument()


Copyright © 2010 Aperture Development Team. All Rights Reserved.