org.semanticdesktop.aperture.websites
Class AbstractTagCrawler

java.lang.Object
  extended by org.semanticdesktop.aperture.crawler.base.CrawlerBase
      extended by org.semanticdesktop.aperture.websites.AbstractTagCrawler
All Implemented Interfaces:
Crawler
Direct Known Subclasses:
DeliciousCrawler

public abstract class AbstractTagCrawler
extends CrawlerBase

Author:
grimnes

Field Summary
 
Fields inherited from class org.semanticdesktop.aperture.crawler.base.CrawlerBase
accessData, accessorRegistry, crawlReportFile, source, stopRequested
 
Constructor Summary
AbstractTagCrawler()
           
 
Method Summary
protected  ExitCode crawlObjects()
          Method called by crawl() that should implement the actual crawling of the DataSource.
protected abstract  List<String> crawlTags(String username, String password)
          Gets a list of the user's tags
protected  void crawlTheRest(String username, String password)
          crawl photos, etc return them to the crawlerhandler yourself
protected  String getShortName(String uri)
          The passed uri identifies something on the web, probably a namespace.
protected  void reportItem(Tag item, List<String> tags)
          Report a new item to the crawlerhandler, this assumes items never change.
 
Methods inherited from class org.semanticdesktop.aperture.crawler.base.CrawlerBase
clear, clear, crawl, getAccessData, getCrawlerHandler, getCrawlReport, getCrawlReportFile, getDataAccessorRegistry, getDataSource, getRDFContainerFactory, inDomain, isStopRequested, reportAccessingObject, reportDeletedDataObject, reportFatalErrorCause, reportFatalErrorCause, reportFatalErrorCause, reportModifiedDataObject, reportNewDataObject, reportUnmodifiedDataObject, reportUntouched, runSubCrawler, setAccessData, setCrawlerHandler, setCrawlReportFile, setDataAccessorRegistry, setDataSource, stop, storeCrawlReport, touchObject
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractTagCrawler

public AbstractTagCrawler()
Method Detail

crawlObjects

protected ExitCode crawlObjects()
Description copied from class: CrawlerBase
Method called by crawl() that should implement the actual crawling of the DataSource. The return value of this method should indicate whether the scanning was completed successfully (i.e. it wasn't interrupted or anything). Also this method is expected to update the deprecatedUrls set, as any remaining URLs in this set will be removed as being removed after this method completes.

Specified by:
crawlObjects in class CrawlerBase
Returns:
An ExitCode indicating how the crawl procedure terminated.
See Also:
CrawlerBase.crawlObjects()

crawlTheRest

protected void crawlTheRest(String username,
                            String password)
                     throws Exception
crawl photos, etc return them to the crawlerhandler yourself

Throws:
Exception

reportItem

protected void reportItem(Tag item,
                          List<String> tags)
                   throws UpdateException,
                          UnsupportedEncodingException
Report a new item to the crawlerhandler, this assumes items never change.

Parameters:
item -
tags -
Throws:
UnsupportedEncodingException
UpdateException

crawlTags

protected abstract List<String> crawlTags(String username,
                                          String password)
                                   throws Exception
Gets a list of the user's tags

Parameters:
username -
Returns:
a list of tags
Throws:
IOException
SAXException
SailUpdateException
ParserConfigurationException
Exception

getShortName

protected String getShortName(String uri)
The passed uri identifies something on the web, probably a namespace. To shorten this, parse the url for something like a localname. Returns the last string after a '#' or a '/'.

Parameters:
uri - a URI
Returns:
a short name for it, for display.


Copyright © 2010 Aperture Development Team. All Rights Reserved.