org.semanticdesktop.aperture.crawler.mbox
Class MboxCrawler

java.lang.Object
  extended by org.semanticdesktop.aperture.crawler.base.CrawlerBase
      extended by org.semanticdesktop.aperture.crawler.mail.AbstractJavaMailCrawler
          extended by org.semanticdesktop.aperture.crawler.mbox.MboxCrawler
All Implemented Interfaces:
DataAccessor, Crawler, DataObjectFactory.PartStreamFactory

public class MboxCrawler
extends AbstractJavaMailCrawler

A crawler implementation for mbox files.


Field Summary
 
Fields inherited from class org.semanticdesktop.aperture.crawler.mail.AbstractJavaMailCrawler
ACCESSED_KEY, baseFolders, currentFolder, currentFolderURI, maxDepth, maximumByteSize, store, SUBFOLDERS_KEY
 
Fields inherited from class org.semanticdesktop.aperture.crawler.base.CrawlerBase
accessData, accessorRegistry, crawlReportFile, source, stopRequested
 
Constructor Summary
MboxCrawler()
           
 
Method Summary
protected  boolean checkIfCurrentFolderHasBeenChanged(AccessData newAccessData)
          Applies source-specific methods to determine if the current folder has been changed since it has last been crawled.
protected  void closeConnection()
          It seems that mstor doesn't close the opened folders when Service.close() is invoked.
protected  ExitCode crawlObjects()
          Method called by crawl() that should implement the actual crawling of the DataSource.
protected  void ensureConnectedStore()
          Ensures that the crawler is connected to the underlying mail storage system and can perform the crawl.
protected  String getFolderName(String url)
          Extracts the name of the folder from the data object URI.
protected  URI getFolderURI(javax.mail.Folder folder)
          Returns the URI of the folder, using the URI scheme appropriate for the current crawler.
protected  String getMessageUri(javax.mail.Folder folder, javax.mail.Message message)
          Returns the URI of the message, using the URI scheme appropriate for the current crawler.
protected  void recordCurrentFolderInAccessData(AccessData newAccessData)
          Records source-specific information about the current folder that will enable the crawler to detect if the crawler has been changed on a future crawl.
protected  void retrieveConfigurationData(DataSource dataSource)
          Prepare for accessing the specified DataSource by fetching all properties from it that are required to connect to the mail box.
 
Methods inherited from class org.semanticdesktop.aperture.crawler.mail.AbstractJavaMailCrawler
applySpecificProcessing, checkSubfoldersChanged, crawlFolder, crawlMessages, crawlSingleFolder, crawlSingleMessage, crawlSubFolders, createDataObject, getAllRelatedDataObjects, getCurrentFolderMessageCount, getCurrentFolderObject, getDataObject, getDataObjectByMessageURI, getDataObjectIfModified, getDataObjectOrAllObjects, getMessageByURI, getMessageCount, getMessageFromCurrentFolder, getMessageUid, getPartStream, getSubFoldersString, holdsFolders, holdsMessages, isAcceptable, isRemoved, isTooLarge, reportNotModified, setCurrentFolder
 
Methods inherited from class org.semanticdesktop.aperture.crawler.base.CrawlerBase
clear, clear, crawl, getAccessData, getCrawlerHandler, getCrawlReport, getCrawlReportFile, getDataAccessorRegistry, getDataSource, getRDFContainerFactory, inDomain, isStopRequested, reportAccessingObject, reportDeletedDataObject, reportFatalErrorCause, reportFatalErrorCause, reportFatalErrorCause, reportModifiedDataObject, reportNewDataObject, reportUnmodifiedDataObject, reportUntouched, runSubCrawler, setAccessData, setCrawlerHandler, setCrawlReportFile, setDataAccessorRegistry, setDataSource, stop, storeCrawlReport, touchObject
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MboxCrawler

public MboxCrawler()
Method Detail

crawlObjects

protected ExitCode crawlObjects()
Description copied from class: CrawlerBase
Method called by crawl() that should implement the actual crawling of the DataSource. The return value of this method should indicate whether the scanning was completed successfully (i.e. it wasn't interrupted or anything). Also this method is expected to update the deprecatedUrls set, as any remaining URLs in this set will be removed as being removed after this method completes.

Specified by:
crawlObjects in class CrawlerBase
Returns:
An ExitCode indicating how the crawl procedure terminated.

retrieveConfigurationData

protected void retrieveConfigurationData(DataSource dataSource)
Prepare for accessing the specified DataSource by fetching all properties from it that are required to connect to the mail box.

Specified by:
retrieveConfigurationData in class AbstractJavaMailCrawler

ensureConnectedStore

protected void ensureConnectedStore()
                             throws javax.mail.MessagingException
Description copied from class: AbstractJavaMailCrawler
Ensures that the crawler is connected to the underlying mail storage system and can perform the crawl. This method may be called at any time, it shouldn't do anything if a connection is already present and should reestablish it if it's not.

Specified by:
ensureConnectedStore in class AbstractJavaMailCrawler
Throws:
javax.mail.MessagingException

closeConnection

protected void closeConnection()
It seems that mstor doesn't close the opened folders when Service.close() is invoked. This is important for the DataAccessor implementation, because when operating as a Crawler, the AbstractJavaMailCrawler.crawlFolder method takes care about this.

Overrides:
closeConnection in class AbstractJavaMailCrawler

recordCurrentFolderInAccessData

protected void recordCurrentFolderInAccessData(AccessData newAccessData)
                                        throws javax.mail.MessagingException
Description copied from class: AbstractJavaMailCrawler
Records source-specific information about the current folder that will enable the crawler to detect if the crawler has been changed on a future crawl.

Specified by:
recordCurrentFolderInAccessData in class AbstractJavaMailCrawler
Parameters:
newAccessData - the access data where the information should be stored
Throws:
javax.mail.MessagingException

checkIfCurrentFolderHasBeenChanged

protected boolean checkIfCurrentFolderHasBeenChanged(AccessData newAccessData)
                                              throws javax.mail.MessagingException
Description copied from class: AbstractJavaMailCrawler
Applies source-specific methods to determine if the current folder has been changed since it has last been crawled.

Specified by:
checkIfCurrentFolderHasBeenChanged in class AbstractJavaMailCrawler
Parameters:
newAccessData - the AccessData instance that is to be consulted
Returns:
false if the information stored in the accessData instance indictates that the folder hasn't been changed, false otherwise
Throws:
javax.mail.MessagingException

getFolderURI

protected URI getFolderURI(javax.mail.Folder folder)
                    throws javax.mail.MessagingException
Description copied from class: AbstractJavaMailCrawler
Returns the URI of the folder, using the URI scheme appropriate for the current crawler.

Specified by:
getFolderURI in class AbstractJavaMailCrawler
Parameters:
folder - the Folder whose URI we'd like to obtain.
Returns:
the uri of the folder
Throws:
javax.mail.MessagingException

getMessageUri

protected String getMessageUri(javax.mail.Folder folder,
                               javax.mail.Message message)
                        throws javax.mail.MessagingException
Description copied from class: AbstractJavaMailCrawler
Returns the URI of the message, using the URI scheme appropriate for the current crawler.

Specified by:
getMessageUri in class AbstractJavaMailCrawler
Parameters:
folder - the folder where the message resides
message - the message itself
Returns:
the uri of the message
Throws:
javax.mail.MessagingException

getFolderName

protected String getFolderName(String url)
                        throws UrlNotFoundException
Description copied from class: AbstractJavaMailCrawler
Extracts the name of the folder from the data object URI. The result should be a string that can be passed to the Store.getFolder(String) method to obtain the corresponding Folder instance which directly contains the data object (message or attachment) with the given url. This method can be called ONLY when all confguration has been read from the DataObject, that is AFTER AbstractJavaMailCrawler.retrieveConfigurationData(DataSource).

Specified by:
getFolderName in class AbstractJavaMailCrawler
Returns:
the folder name
Throws:
UrlNotFoundException - if the given url does not belong to the current Store


Copyright © 2010 Aperture Development Team. All Rights Reserved.