|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.semanticdesktop.aperture.crawler.base.CrawlerHandlerBase
public class CrawlerHandlerBase
A base implementation of the CrawlerHandler interface. The method implementations are simplest
possible, that fulfill the contract. The applications are expected to override the methods they need.
The processBinary(Crawler, DataObject)
object is provided as a reference implementation to
show how to use the MIMEtype-detector and the extractors.
mimeTypeIdentifier
,
extractorRegistry
and subCrawlerRegistry
.getRDFContainerFactory(Crawler, String)
to influence the RDF containers used.processBinary(Crawler, DataObject)
to extract the contents of binary streams.
Field Summary | |
---|---|
protected boolean |
extractingContents
should binaries be processed? |
protected ExtractorRegistry |
extractorRegistry
Extractor registry, may be set by overriding classes to use processBinary |
protected MimeTypeIdentifier |
mimeTypeIdentifier
Mime-type identifier, must be set by overriding classes to use processBinary |
protected SubCrawlerRegistry |
subCrawlerRegistry
Subcrawler registry, may be set by overriding classes to use processBinary |
Constructor Summary | |
---|---|
CrawlerHandlerBase()
Construct and empty BaseCrawlerHandler. |
|
CrawlerHandlerBase(MimeTypeIdentifier mimeTypeIdentifier,
ExtractorRegistry extractorRegistry,
SubCrawlerRegistry subCrawlerRegistry)
Construct an initialised BaseCrawlerHandler. |
Method Summary | |
---|---|
void |
accessingObject(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
clearFinished(Crawler crawler,
ExitCode exitCode)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
clearingObject(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
clearStarted(Crawler crawler)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
crawlStarted(Crawler crawler)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
crawlStopped(Crawler crawler,
ExitCode exitCode)
This method implementation doesn't do anything, it is meant to be overridden. |
RDFContainerFactory |
getRDFContainerFactory(Crawler crawler,
String url)
Returns an rdf container factory. |
boolean |
isExtractingContents()
should binaries be processed? |
void |
objectChanged(Crawler crawler,
DataObject object)
This method implementation only disposes the data object and does nothing more. |
void |
objectNew(Crawler crawler,
DataObject object)
This method implementation only disposes the data object and does nothing more. |
void |
objectNotModified(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
objectRemoved(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
protected void |
processBinary(Crawler crawler,
DataObject dataObject)
Default and reference implementation of the handling of objects found in the crawling process: Identify the mime-type, invoke Extractors. |
void |
setExtractingContents(boolean extractingContents)
should binaries be processed? |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected boolean extractingContents
protected MimeTypeIdentifier mimeTypeIdentifier
protected ExtractorRegistry extractorRegistry
protected SubCrawlerRegistry subCrawlerRegistry
Constructor Detail |
---|
public CrawlerHandlerBase()
public CrawlerHandlerBase(MimeTypeIdentifier mimeTypeIdentifier, ExtractorRegistry extractorRegistry, SubCrawlerRegistry subCrawlerRegistry)
mimeTypeIdentifier
- initialised MimeTypeIdentifierextractorRegistry
- initialised ExtractorRegistry, can be null if binary handling is not neededsubCrawlerRegistry
- initialised SubCrawlerRegistry, can be null if binary handling is not neededMethod Detail |
---|
public RDFContainerFactory getRDFContainerFactory(Crawler crawler, String url)
RDF2Go.getModelFactory()
method.
Each model is separate.
getRDFContainerFactory
in interface CrawlerHandler
crawler
- The requesting Crawler.url
- The url of the resource that is currently being accessed.
CrawlerHandler.getRDFContainerFactory(Crawler, String)
public void accessingObject(Crawler crawler, String url)
accessingObject
in interface CrawlerHandler
crawler
- The reporting Crawler.url
- The url of the resource that is going to be accessed.CrawlerHandler.accessingObject(Crawler, String)
public void clearFinished(Crawler crawler, ExitCode exitCode)
clearFinished
in interface CrawlerHandler
crawler
- The concerning Crawler.exitCode
- The status with which the clearing stopped.CrawlerHandler.clearFinished(Crawler, ExitCode)
public void clearingObject(Crawler crawler, String url)
clearingObject
in interface CrawlerHandler
crawler
- The reporting Crawler.url
- The url of the resource whose crawl results are being cleared.CrawlerHandler.clearingObject(Crawler, String)
public void clearStarted(Crawler crawler)
clearStarted
in interface CrawlerHandler
crawler
- The reporting Crawler.CrawlerHandler.clearStarted(Crawler)
public void crawlStarted(Crawler crawler)
crawlStarted
in interface CrawlerHandler
crawler
- The reporting Crawler.CrawlerHandler.crawlStarted(Crawler)
public void crawlStopped(Crawler crawler, ExitCode exitCode)
crawlStopped
in interface CrawlerHandler
crawler
- The reporting Crawler.exitCode
- The status with which the crawling stopped.CrawlerHandler.crawlStopped(Crawler, ExitCode)
public void objectChanged(Crawler crawler, DataObject object)
objectChanged
in interface CrawlerHandler
crawler
- The reporting Crawler.object
- The constructed DataObject modeling the changed resource.CrawlerHandler.objectChanged(Crawler, DataObject)
public void objectNew(Crawler crawler, DataObject object)
objectNew
in interface CrawlerHandler
crawler
- The reporting Crawler.object
- The constructed DataObject modeling the new resource.CrawlerHandler.objectNew(Crawler, DataObject)
public void objectNotModified(Crawler crawler, String url)
objectNotModified
in interface CrawlerHandler
crawler
- The reporting Crawler.url
- The url of the unmodified resource.CrawlerHandler.objectNotModified(Crawler, String)
public void objectRemoved(Crawler crawler, String url)
objectRemoved
in interface CrawlerHandler
crawler
- The reporting Crawler.url
- The url that could no longer be found.CrawlerHandler.objectRemoved(Crawler, String)
protected void processBinary(Crawler crawler, DataObject dataObject) throws IOException, ExtractorException, SubCrawlerException
crawler
- the crawler that reported the dataObject. The crawler will be used to invoke
subcrawlers, if needed. The control then stays within the crawler's thread.dataObject
- the data object to process.
When the passed DataObject is not a FileDataObject,
nothing will be done.
IOException
- when the stream cannot be read
ExctractorException
- when the extractor fails
SubCrawlerException
- when the extraction of contents using a SubCrawler
failed.
ExtractorException
public boolean isExtractingContents()
public void setExtractingContents(boolean extractingContents)
extractingContents
- set to true to extract the contents when calling
#processBinary(DataObject)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |