org.semanticdesktop.aperture.subcrawler.vcard
Class VcardSubCrawler

java.lang.Object
  extended by org.semanticdesktop.aperture.subcrawler.base.AbstractSubCrawler
      extended by org.semanticdesktop.aperture.subcrawler.vcard.VcardSubCrawler
All Implemented Interfaces:
SubCrawler

public class VcardSubCrawler
extends AbstractSubCrawler

A SubCrawler Implementation working with VCard documents.

Known issues:

URIs for VCARDS

This crawler uses following conventions to generate URIS:

  1. If the UID parameter is present, it is concatenated to the stream id (preceded by a hash)
  2. If it's not, then the contact is serialized to a string and a hash of that string is concatenated. to the stream id.
This guarantees that an unmodified contact will be detected and reported as unmodified. (Which is not the case if we used other properties, or consecutive numbers marking the position of the contact in a file (which change if new contacts are added or removed)).


Constructor Summary
VcardSubCrawler()
           
 
Method Summary
 String getUriPrefix()
          Returns the prefix used when generating uris.
 void stopSubCrawler()
          Stops a running crawl as fast as possible.
 void subCrawl(URI id, InputStream stream, SubCrawlerHandler handler, DataSource dataSource, AccessData accessData, Charset charset, String mimeType, RDFContainer parentMetadata)
          Starts crawling the given stream and to report the encountered DataObjects to the given SubCrawlerHandler.
 
Methods inherited from class org.semanticdesktop.aperture.subcrawler.base.AbstractSubCrawler
createChildUri, getDataObject
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VcardSubCrawler

public VcardSubCrawler()
Method Detail

subCrawl

public void subCrawl(URI id,
                     InputStream stream,
                     SubCrawlerHandler handler,
                     DataSource dataSource,
                     AccessData accessData,
                     Charset charset,
                     String mimeType,
                     RDFContainer parentMetadata)
              throws SubCrawlerException
Description copied from interface: SubCrawler
Starts crawling the given stream and to report the encountered DataObjects to the given SubCrawlerHandler. If an AccessData instance is passed, it is used to check if the data objects are to be reported as new, modified, or unmodified. Note that the SubCrawler will not report deleted objects.

Parameters:
id - the URI identifying the object (e.g. a file or web page) from which the stream was obtained. This URI is treated as the URI of the parent object, all objects encountered in the stream are considered to be contained within the parent object. (optional, the implementation may use this uri or the one returned from the RDFContainer.getDescribedUri() method of the parentMetadata)
stream - the stream to be crawled. (obligatory)
handler - The crawler handler that is to receive the notifications from the SubCrawler (obligatory)
dataSource - the data source that will be returned by the DataObject.getDataSource() method of the returned data objects. Some implementations may require that this reference is not null and that it contains some particular information
accessData - the AccessData used to determine if the encountered objects are to be returned as new, modified, unmodified or deleted. Information about new or modified objects is stored within for use in future crawls. This parameter may be null if this functionality is not desired, in which case all DataObjects will be reported as new. (optional)
charset - the charset in which the input stream is encoded (optional).
mimeType - the MIME type of the passed stream (optional).
parentMetadata - The 'parent' RDFContainer, that will contain the metadata about the top-level entity in the stream. A SubCrawler may (in some cases) limit itself to augmenting the metadata in this RDFContainer without delivering any additional DataObjects. (obligatory)
Throws:
SubCrawlerException - if any of the obligatory parameters is null or if any error during the crawling process occured
See Also:
SubCrawler.subCrawl(URI, InputStream, SubCrawlerHandler, DataSource, AccessData, Charset, String, RDFContainer)

stopSubCrawler

public void stopSubCrawler()
Description copied from interface: SubCrawler
Stops a running crawl as fast as possible. This method may return before the crawling has actually stopped.

See Also:
SubCrawler.stopSubCrawler()

getUriPrefix

public String getUriPrefix()
Description copied from class: AbstractSubCrawler
Returns the prefix used when generating uris. See the documentation for SubCrawler class for more details.

Specified by:
getUriPrefix in class AbstractSubCrawler
Returns:
the prefix used when generating uris.


Copyright © 2010 Aperture Development Team. All Rights Reserved.