org.semanticdesktop.aperture.extractor.util
Class ThreadedExtractorWrapper

java.lang.Object
  extended by org.semanticdesktop.aperture.extractor.util.ThreadedExtractorWrapper
All Implemented Interfaces:
Extractor

public class ThreadedExtractorWrapper
extends Object
implements Extractor

A ThreadedExtractorWrapper wraps an Extractor and executes it on a separate thread, bailing out if the wrapped Extractor appears to be hanging. The heuristic for determining whether the Extractor is hanging is by looking at whether the InputStream is regularly accessed. Any Exceptions thrown by the wrapped Extractor are eventually thrown by the ThreadedExtractorWrapper.

Furthermore, a ThreadedExtractorWrapper can be requested to stop processing, causing it to throw an IOException on the InputStream the next time it is accessed by the wrapped Extractor. This allows for interrupting an extraction process upon user request, for example because it has been processing a single file for a very long time (especially large PDF documents are notorious). This implementation strategy is preferred over interrupting the Thread as that should only be used as a last resort to stop a thread.


Nested Class Summary
static class ThreadedExtractorWrapper.ExtractionAbortedException
          An exception that gets thrown if the extraction is aborted per user request i.e. when the stop() method is called.
static class ThreadedExtractorWrapper.ExtractionInterruptedException
          An exception that gets thrown if the underlying extractor hangs.
 
Field Summary
static long DEFAULT_MAX_IDLE_READ_TIME
          The maximum time between two reads that the wrapped Extractor is allowed to work on the read data before it is considered to be hanging.
static long DEFAULT_MAX_PROCESSING_TIME_PER_MB
          The maximum time per MB of data that the wrapped Extractor is allowed to work on the read data before it is considered to be hanging, in milliseconds.
static long DEFAULT_MINIMUM_MAX_PROCESSING_TIME
          The minimum maximum processing time that the wrapped Extractor is allowed to work on the read data before it is considered to be hanging.
 
Constructor Summary
ThreadedExtractorWrapper(Extractor extractor)
          Creates a new wrapper for the specified Extractor.
ThreadedExtractorWrapper(Extractor extractor, long maxProcessingTimePerMb, long minimumMaxProcessingTime, long maxIdleReadTime)
          Creates a new wrapper for the specified Extractor.
 
Method Summary
 void extract(URI id, InputStream input, Charset charset, String mimeType, RDFContainer result)
          Starts the extraction process using the wrapped Extractor on a separate thread.
 void stop()
          Interrupts processing of the wrapped extractor as soon as possible.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MAX_PROCESSING_TIME_PER_MB

public static final long DEFAULT_MAX_PROCESSING_TIME_PER_MB
The maximum time per MB of data that the wrapped Extractor is allowed to work on the read data before it is considered to be hanging, in milliseconds.

See Also:
Constant Field Values

DEFAULT_MINIMUM_MAX_PROCESSING_TIME

public static final long DEFAULT_MINIMUM_MAX_PROCESSING_TIME
The minimum maximum processing time that the wrapped Extractor is allowed to work on the read data before it is considered to be hanging. This minimum gives a lower bound for very small files.

See Also:
Constant Field Values

DEFAULT_MAX_IDLE_READ_TIME

public static final long DEFAULT_MAX_IDLE_READ_TIME
The maximum time between two reads that the wrapped Extractor is allowed to work on the read data before it is considered to be hanging.

See Also:
Constant Field Values
Constructor Detail

ThreadedExtractorWrapper

public ThreadedExtractorWrapper(Extractor extractor)
Creates a new wrapper for the specified Extractor. It uses default timeout values. It is equivalent to
 new ThreadedExtractorWrapper(extractor, DEFAULT_MAX_PROCESSING_TIME_PER_MB,
         DEFAULT_MINIMUM_MAX_PROCESSING_TIME, DEFAULT_MAX_IDLE_READ_TIME);
 

Parameters:
extractor - The Extractor to wrap.
See Also:
DEFAULT_MAX_PROCESSING_TIME_PER_MB, DEFAULT_MINIMUM_MAX_PROCESSING_TIME, DEFAULT_MAX_IDLE_READ_TIME

ThreadedExtractorWrapper

public ThreadedExtractorWrapper(Extractor extractor,
                                long maxProcessingTimePerMb,
                                long minimumMaxProcessingTime,
                                long maxIdleReadTime)
Creates a new wrapper for the specified Extractor. It allows the user to customize the timeout values.

Parameters:
extractor - The Extractor to wrap.
maxProcessingTimePerMb - see DEFAULT_MAX_PROCESSING_TIME_PER_MB
minimumMaxProcessingTime - see DEFAULT_MINIMUM_MAX_PROCESSING_TIME
maxIdleReadTime - see DEFAULT_MAX_IDLE_READ_TIME
Method Detail

stop

public void stop()
Interrupts processing of the wrapped extractor as soon as possible.


extract

public void extract(URI id,
                    InputStream input,
                    Charset charset,
                    String mimeType,
                    RDFContainer result)
             throws ExtractorException
Starts the extraction process using the wrapped Extractor on a separate thread. This Thread is interrupted as soon as no progress is reported. In this case an ThreadedExtractorWrapper.ExtractionAbortedException will be thrown.l

Specified by:
extract in interface Extractor
Parameters:
id - the URI identifying the object (e.g. a file or web page) from which the stream was obtained. The generated statements should describe this URI.
input - the InputStream delivering the raw bytes.
charset - the charset in which the inputstream is encoded (optional).
mimeType - the MIME type of the passed stream (optional).
result - the container in which this Extractor can put its created RDF statements.
Throws:
ExtractorException - if any problem with the extractor occurs, this is exactly the same Exception instance as the one thrown by the extractor.
ThreadedExtractorWrapper.ExtractionAbortedException - if the extractor wrapper decided that the extractor has stalled and the extraction has been aborted


Copyright © 2010 Aperture Development Team. All Rights Reserved.