org.semanticdesktop.aperture.websites
Class AbstractTagCrawler
java.lang.Object
org.semanticdesktop.aperture.crawler.base.CrawlerBase
org.semanticdesktop.aperture.websites.AbstractTagCrawler
- All Implemented Interfaces:
- Crawler
- Direct Known Subclasses:
- DeliciousCrawler
public abstract class AbstractTagCrawler
- extends CrawlerBase
- Author:
- grimnes
Methods inherited from class org.semanticdesktop.aperture.crawler.base.CrawlerBase |
clear, clear, crawl, getAccessData, getCrawlerHandler, getCrawlReport, getCrawlReportFile, getDataAccessorRegistry, getDataSource, getRDFContainerFactory, inDomain, isStopRequested, reportAccessingObject, reportDeletedDataObject, reportFatalErrorCause, reportFatalErrorCause, reportFatalErrorCause, reportModifiedDataObject, reportNewDataObject, reportUnmodifiedDataObject, reportUntouched, runSubCrawler, setAccessData, setCrawlerHandler, setCrawlReportFile, setDataAccessorRegistry, setDataSource, stop, storeCrawlReport, touchObject |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AbstractTagCrawler
public AbstractTagCrawler()
crawlObjects
protected ExitCode crawlObjects()
- Description copied from class:
CrawlerBase
- Method called by crawl() that should implement the actual crawling of the DataSource. The return value
of this method should indicate whether the scanning was completed successfully (i.e. it wasn't
interrupted or anything). Also this method is expected to update the deprecatedUrls set, as any
remaining URLs in this set will be removed as being removed after this method completes.
- Specified by:
crawlObjects
in class CrawlerBase
- Returns:
- An ExitCode indicating how the crawl procedure terminated.
- See Also:
CrawlerBase.crawlObjects()
crawlTheRest
protected void crawlTheRest(String username,
String password)
throws Exception
- crawl photos, etc
return them to the crawlerhandler yourself
- Throws:
Exception
reportItem
protected void reportItem(Tag item,
List<String> tags)
throws UpdateException,
UnsupportedEncodingException
- Report a new item to the crawlerhandler, this assumes items never change.
- Parameters:
item
- tags
-
- Throws:
UnsupportedEncodingException
UpdateException
crawlTags
protected abstract List<String> crawlTags(String username,
String password)
throws Exception
- Gets a list of the user's tags
- Parameters:
username
-
- Returns:
- a list of tags
- Throws:
IOException
SAXException
SailUpdateException
ParserConfigurationException
Exception
getShortName
protected String getShortName(String uri)
- The passed uri identifies something on the web, probably a namespace. To
shorten this, parse the url for something like a localname. Returns the
last string after a '#' or a '/'.
- Parameters:
uri
- a URI
- Returns:
- a short name for it, for display.
Copyright © 2010 Aperture Development Team. All Rights Reserved.