|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.semanticdesktop.aperture.crawler.base.CrawlerHandlerBase
public class CrawlerHandlerBase
A base implementation of the CrawlerHandler interface. The method implementations are simplest
possible, that fulfill the contract. The applications are expected to override the methods they need.
The processBinary(Crawler, DataObject) object is provided as a reference implementation to
show how to use the MIMEtype-detector and the extractors.
mimeTypeIdentifier,
extractorRegistry and subCrawlerRegistry.getRDFContainerFactory(Crawler, String) to influence the RDF containers used.processBinary(Crawler, DataObject)
to extract the contents of binary streams.
| Field Summary | |
|---|---|
protected boolean |
extractingContents
should binaries be processed? |
protected ExtractorRegistry |
extractorRegistry
Extractor registry, may be set by overriding classes to use processBinary |
protected MimeTypeIdentifier |
mimeTypeIdentifier
Mime-type identifier, must be set by overriding classes to use processBinary |
protected SubCrawlerRegistry |
subCrawlerRegistry
Subcrawler registry, may be set by overriding classes to use processBinary |
| Constructor Summary | |
|---|---|
CrawlerHandlerBase()
Construct and empty BaseCrawlerHandler. |
|
CrawlerHandlerBase(MimeTypeIdentifier mimeTypeIdentifier,
ExtractorRegistry extractorRegistry,
SubCrawlerRegistry subCrawlerRegistry)
Construct an initialised BaseCrawlerHandler. |
|
| Method Summary | |
|---|---|
void |
accessingObject(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
clearFinished(Crawler crawler,
ExitCode exitCode)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
clearingObject(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
clearStarted(Crawler crawler)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
crawlStarted(Crawler crawler)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
crawlStopped(Crawler crawler,
ExitCode exitCode)
This method implementation doesn't do anything, it is meant to be overridden. |
RDFContainerFactory |
getRDFContainerFactory(Crawler crawler,
String url)
Returns an rdf container factory. |
boolean |
isExtractingContents()
should binaries be processed? |
void |
objectChanged(Crawler crawler,
DataObject object)
This method implementation only disposes the data object and does nothing more. |
void |
objectNew(Crawler crawler,
DataObject object)
This method implementation only disposes the data object and does nothing more. |
void |
objectNotModified(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
void |
objectRemoved(Crawler crawler,
String url)
This method implementation doesn't do anything, it is meant to be overridden. |
protected void |
processBinary(Crawler crawler,
DataObject dataObject)
Default and reference implementation of the handling of objects found in the crawling process: Identify the mime-type, invoke Extractors. |
void |
setExtractingContents(boolean extractingContents)
should binaries be processed? |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected boolean extractingContents
protected MimeTypeIdentifier mimeTypeIdentifier
protected ExtractorRegistry extractorRegistry
protected SubCrawlerRegistry subCrawlerRegistry
| Constructor Detail |
|---|
public CrawlerHandlerBase()
public CrawlerHandlerBase(MimeTypeIdentifier mimeTypeIdentifier,
ExtractorRegistry extractorRegistry,
SubCrawlerRegistry subCrawlerRegistry)
mimeTypeIdentifier - initialised MimeTypeIdentifierextractorRegistry - initialised ExtractorRegistry, can be null if binary handling is not neededsubCrawlerRegistry - initialised SubCrawlerRegistry, can be null if binary handling is not needed| Method Detail |
|---|
public RDFContainerFactory getRDFContainerFactory(Crawler crawler,
String url)
RDF2Go.getModelFactory() method.
Each model is separate.
getRDFContainerFactory in interface CrawlerHandlercrawler - The requesting Crawler.url - The url of the resource that is currently being accessed.
CrawlerHandler.getRDFContainerFactory(Crawler, String)
public void accessingObject(Crawler crawler,
String url)
accessingObject in interface CrawlerHandlercrawler - The reporting Crawler.url - The url of the resource that is going to be accessed.CrawlerHandler.accessingObject(Crawler, String)
public void clearFinished(Crawler crawler,
ExitCode exitCode)
clearFinished in interface CrawlerHandlercrawler - The concerning Crawler.exitCode - The status with which the clearing stopped.CrawlerHandler.clearFinished(Crawler, ExitCode)
public void clearingObject(Crawler crawler,
String url)
clearingObject in interface CrawlerHandlercrawler - The reporting Crawler.url - The url of the resource whose crawl results are being cleared.CrawlerHandler.clearingObject(Crawler, String)public void clearStarted(Crawler crawler)
clearStarted in interface CrawlerHandlercrawler - The reporting Crawler.CrawlerHandler.clearStarted(Crawler)public void crawlStarted(Crawler crawler)
crawlStarted in interface CrawlerHandlercrawler - The reporting Crawler.CrawlerHandler.crawlStarted(Crawler)
public void crawlStopped(Crawler crawler,
ExitCode exitCode)
crawlStopped in interface CrawlerHandlercrawler - The reporting Crawler.exitCode - The status with which the crawling stopped.CrawlerHandler.crawlStopped(Crawler, ExitCode)
public void objectChanged(Crawler crawler,
DataObject object)
objectChanged in interface CrawlerHandlercrawler - The reporting Crawler.object - The constructed DataObject modeling the changed resource.CrawlerHandler.objectChanged(Crawler, DataObject)
public void objectNew(Crawler crawler,
DataObject object)
objectNew in interface CrawlerHandlercrawler - The reporting Crawler.object - The constructed DataObject modeling the new resource.CrawlerHandler.objectNew(Crawler, DataObject)
public void objectNotModified(Crawler crawler,
String url)
objectNotModified in interface CrawlerHandlercrawler - The reporting Crawler.url - The url of the unmodified resource.CrawlerHandler.objectNotModified(Crawler, String)
public void objectRemoved(Crawler crawler,
String url)
objectRemoved in interface CrawlerHandlercrawler - The reporting Crawler.url - The url that could no longer be found.CrawlerHandler.objectRemoved(Crawler, String)
protected void processBinary(Crawler crawler,
DataObject dataObject)
throws IOException,
ExtractorException,
SubCrawlerException
crawler - the crawler that reported the dataObject. The crawler will be used to invoke
subcrawlers, if needed. The control then stays within the crawler's thread.dataObject - the data object to process.
When the passed DataObject is not a FileDataObject,
nothing will be done.
IOException - when the stream cannot be read
ExctractorException - when the extractor fails
SubCrawlerException - when the extraction of contents using a SubCrawler failed.
ExtractorExceptionpublic boolean isExtractingContents()
public void setExtractingContents(boolean extractingContents)
extractingContents - set to true to extract the contents when calling
#processBinary(DataObject)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||