org.semanticdesktop.aperture.hypertext.linkextractor
Interface LinkExtractor

All Known Implementing Classes:
HtmlLinkExtractor

public interface LinkExtractor

A LinkExtractor extracts links from a document, e.g. the anchors inside a HTML document. Implementations are typically MIME type-specific.

The resulting list of links is returned as a collection of Strings rather than URLs in order to allow for any kind of scheme to be used without having to provide a URLStreamHandler for that scheme.


Field Summary
static Object BASE_URL_KEY
          Suggested key to use in the params map to indicate the base URL with which relative URLs can be resolved.
static Object INCLUDE_EMBEDDED_RESOURCES_KEY
          Suggested key to use in the params map to indicate that non-navigational links should als be extracted, e.g. embedded images, background, stylesheets, etc.
 
Method Summary
 List extractLinks(InputStream stream, Map params)
          Extracts all links occurring in the specified stream.
 

Field Detail

BASE_URL_KEY

static final Object BASE_URL_KEY
Suggested key to use in the params map to indicate the base URL with which relative URLs can be resolved. The corresponding value should be a String holding the base URL.


INCLUDE_EMBEDDED_RESOURCES_KEY

static final Object INCLUDE_EMBEDDED_RESOURCES_KEY
Suggested key to use in the params map to indicate that non-navigational links should als be extracted, e.g. embedded images, background, stylesheets, etc. The corresponding value should be a Boolean.

Method Detail

extractLinks

List extractLinks(InputStream stream,
                  Map params)
                  throws Exception
Extracts all links occurring in the specified stream.

Parameters:
stream - The input stream containing the content from which the links should be extracted, e.g. an HTML document.
params - An optional set of parameters to guide the link extraction process.
Returns:
A List of Strings representing the encountered links in the order in which they were encountered in the document.
Throws:
Exception - When an error occurred during processing of the document stream.


Copyright © 2010 Aperture Development Team. All Rights Reserved.