org.semanticdesktop.aperture.hypertext.linkextractor.html
Class HtmlLinkExtractor

java.lang.Object
  extended by org.semanticdesktop.aperture.hypertext.linkextractor.html.HtmlLinkExtractor
All Implemented Interfaces:
TokenHandler, LinkExtractor

public class HtmlLinkExtractor
extends Object
implements LinkExtractor, TokenHandler

A LinkExtractor implementation that can extract links from HTML documents.


Field Summary
 
Fields inherited from interface org.semanticdesktop.aperture.hypertext.linkextractor.LinkExtractor
BASE_URL_KEY, INCLUDE_EMBEDDED_RESOURCES_KEY
 
Constructor Summary
HtmlLinkExtractor()
           
 
Method Summary
 void attribute(String name)
          Notification of an attribute for the most recently reported element.
 void attribute(String name, String value)
          Notification of an attribute for the most recently reported element.
 void comment(String comment)
          Notification of comment.
 void docType(String name, String sysId, String fpi, String uri)
          Notification of a processing instruction.
 void endDocument()
          Notification of the end of a document.
 void endOfStartTag()
          Notification of the end of a start tag.
 void endTag(String name)
          Notification of an end tag.
 void error(String message)
          Notification of a detected error.
 List extractLinks(InputStream inputStream, Map params)
          Extracts all links occurring in the specified stream.
 void startDocument()
          Notification of the start of a new document.
 void startOfStartTag(String name)
          Notification of the start of a start tag.
 void text(String text)
          Notification of text.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlLinkExtractor

public HtmlLinkExtractor()
Method Detail

extractLinks

public List extractLinks(InputStream inputStream,
                         Map params)
                  throws IOException
Description copied from interface: LinkExtractor
Extracts all links occurring in the specified stream.

Specified by:
extractLinks in interface LinkExtractor
Parameters:
inputStream - The input stream containing the content from which the links should be extracted, e.g. an HTML document.
params - An optional set of parameters to guide the link extraction process.
Returns:
A List of Strings representing the encountered links in the order in which they were encountered in the document.
Throws:
IOException

startDocument

public void startDocument()
Description copied from interface: TokenHandler
Notification of the start of a new document.

Specified by:
startDocument in interface TokenHandler

endDocument

public void endDocument()
Description copied from interface: TokenHandler
Notification of the end of a document.

Specified by:
endDocument in interface TokenHandler

startOfStartTag

public void startOfStartTag(String name)
Description copied from interface: TokenHandler
Notification of the start of a start tag.

Specified by:
startOfStartTag in interface TokenHandler
Parameters:
name - The tag name.

endOfStartTag

public void endOfStartTag()
Description copied from interface: TokenHandler
Notification of the end of a start tag.

Specified by:
endOfStartTag in interface TokenHandler

endTag

public void endTag(String name)
Description copied from interface: TokenHandler
Notification of an end tag.

Specified by:
endTag in interface TokenHandler
Parameters:
name - The tag name.

attribute

public void attribute(String name)
Description copied from interface: TokenHandler
Notification of an attribute for the most recently reported element. The reported attribute does not have a value.

Specified by:
attribute in interface TokenHandler
Parameters:
name - The name of the attribute.

attribute

public void attribute(String name,
                      String value)
Description copied from interface: TokenHandler
Notification of an attribute for the most recently reported element.

Specified by:
attribute in interface TokenHandler
Parameters:
name - The name of the attribute.
value - The value of the attribute.

text

public void text(String text)
Description copied from interface: TokenHandler
Notification of text.

Specified by:
text in interface TokenHandler
Parameters:
text - the text.

comment

public void comment(String comment)
Description copied from interface: TokenHandler
Notification of comment.

Specified by:
comment in interface TokenHandler
Parameters:
comment - The comment.

docType

public void docType(String name,
                    String sysId,
                    String fpi,
                    String uri)
Description copied from interface: TokenHandler
Notification of a processing instruction.

Specified by:
docType in interface TokenHandler
Parameters:
name - The type name, e.g. HTML.
sysId - The system id, e.g. PUBLIC or SYSTEM.
fpi - The Formal Public Identifier, e.g. "-//W3C//DTD HTML 4.0 Transitional//EN".
uri - The URL of the DTD, e.g. "http://www.w3.org/TR/REC-html40/loose.dtd".

error

public void error(String message)
Description copied from interface: TokenHandler
Notification of a detected error.

Specified by:
error in interface TokenHandler
Parameters:
message - An error message.


Copyright © 2010 Aperture Development Team. All Rights Reserved.