org.semanticdesktop.aperture.crawler
Interface CrawlerHandler

All Known Implementing Classes:
CrawlerHandlerBase

public interface CrawlerHandler

CrawlerHandlers are notified by a Crawler about additions, changes and deletions or resources in a DataSource. Furthermore, they are notified when the Crawler is cleaning up all its crawl results.

Rather than being pure listeners on a Crawler, CrawlerHandlers are also responsible to produce an RDFContainer on demand which the Crawler can use to store the source-specific metadata of a DataObject. It is up to the CrawlerHandler implementor to decide whether a new instance is returned for every DataObject or whether a shared instance is used. It is also responsible for any transaction and context management.


Method Summary
 void accessingObject(Crawler crawler, String url)
          Notification that the Crawler is going to start accessing the specified data object.
 void clearFinished(Crawler crawler, ExitCode exitCode)
          Notification that the Crawler has finished clearing the information about the state of the datasource.
 void clearingObject(Crawler crawler, String url)
          Notification that the Crawler is removing all information it knows about the specified url.
 void clearStarted(Crawler crawler)
          Notification that the specified Crawler has started clearing the information it had about the state of the datasource.
 void crawlStarted(Crawler crawler)
          Notification that the specified Crawler has started crawling its DataSource for DataObjects.
 void crawlStopped(Crawler crawler, ExitCode exitCode)
          Notification that the specified Crawler has stopped crawling its DataSource for DataObjects.
 RDFContainerFactory getRDFContainerFactory(Crawler crawler, String url)
          Returns a RDFContainerFactory that will be used to provide RDFContainers that will hold a DataObject's metadata.
 void objectChanged(Crawler crawler, DataObject object)
          Notification that the Crawler has found a changed resource in the domain it is crawling.
 void objectNew(Crawler crawler, DataObject object)
          Notification that the Crawler has found a new resource in the domain it is crawling.
 void objectNotModified(Crawler crawler, String url)
          Notification that the Crawler has found a resource that has not been modified since the previous crawl.
 void objectRemoved(Crawler crawler, String url)
          Notification that the specified resource that has been found in the past could no longer be found.
 

Method Detail

crawlStarted

void crawlStarted(Crawler crawler)
Notification that the specified Crawler has started crawling its DataSource for DataObjects.

Parameters:
crawler - The reporting Crawler.

crawlStopped

void crawlStopped(Crawler crawler,
                  ExitCode exitCode)
Notification that the specified Crawler has stopped crawling its DataSource for DataObjects. Reasons for stopping may be that the Crawler might have completed crawling, it may have been requested to stop or it may have stopped because of a fatal exception.

Parameters:
crawler - The reporting Crawler.
exitCode - The status with which the crawling stopped.

accessingObject

void accessingObject(Crawler crawler,
                     String url)
Notification that the Crawler is going to start accessing the specified data object.

Parameters:
crawler - The reporting Crawler.
url - The url of the resource that is going to be accessed.

getRDFContainerFactory

RDFContainerFactory getRDFContainerFactory(Crawler crawler,
                                           String url)
Returns a RDFContainerFactory that will be used to provide RDFContainers that will hold a DataObject's metadata.

Parameters:
crawler - The requesting Crawler.
url - The url of the resource that is currently being accessed.
Returns:
an RDFContainer instance.

objectNew

void objectNew(Crawler crawler,
               DataObject object)
Notification that the Crawler has found a new resource in the domain it is crawling.

Parameters:
crawler - The reporting Crawler.
object - The constructed DataObject modeling the new resource.

objectChanged

void objectChanged(Crawler crawler,
                   DataObject object)
Notification that the Crawler has found a changed resource in the domain it is crawling.

Parameters:
crawler - The reporting Crawler.
object - The constructed DataObject modeling the changed resource.

objectNotModified

void objectNotModified(Crawler crawler,
                       String url)
Notification that the Crawler has found a resource that has not been modified since the previous crawl.

Parameters:
crawler - The reporting Crawler.
url - The url of the unmodified resource.

objectRemoved

void objectRemoved(Crawler crawler,
                   String url)
Notification that the specified resource that has been found in the past could no longer be found. This may indicate that the resource no longer exists or that it now falls outside the scope of the DataSource.

Parameters:
crawler - The reporting Crawler.
url - The url that could no longer be found.

clearStarted

void clearStarted(Crawler crawler)
Notification that the specified Crawler has started clearing the information it had about the state of the datasource. This is followed by a clearingObject(Crawler, String) on every known url.

Parameters:
crawler - The reporting Crawler.
See Also:
Crawler.clear()

clearingObject

void clearingObject(Crawler crawler,
                    String url)
Notification that the Crawler is removing all information it knows about the specified url. Note that this means information stored by the crawler (usually in an AccessData instance), not the information in the data source itself.

Parameters:
crawler - The reporting Crawler.
url - The url of the resource whose crawl results are being cleared.
See Also:
Crawler.clear()

clearFinished

void clearFinished(Crawler crawler,
                   ExitCode exitCode)
Notification that the Crawler has finished clearing the information about the state of the datasource.

Parameters:
crawler - The concerning Crawler.
exitCode - The status with which the clearing stopped.
See Also:
Crawler.clear()


Copyright © 2010 Aperture Development Team. All Rights Reserved.