|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.semanticdesktop.aperture.hypertext.linkextractor.html.HtmlLinkExtractor
public class HtmlLinkExtractor
A LinkExtractor implementation that can extract links from HTML documents.
Field Summary |
---|
Fields inherited from interface org.semanticdesktop.aperture.hypertext.linkextractor.LinkExtractor |
---|
BASE_URL_KEY, INCLUDE_EMBEDDED_RESOURCES_KEY |
Constructor Summary | |
---|---|
HtmlLinkExtractor()
|
Method Summary | |
---|---|
void |
attribute(String name)
Notification of an attribute for the most recently reported element. |
void |
attribute(String name,
String value)
Notification of an attribute for the most recently reported element. |
void |
comment(String comment)
Notification of comment. |
void |
docType(String name,
String sysId,
String fpi,
String uri)
Notification of a processing instruction. |
void |
endDocument()
Notification of the end of a document. |
void |
endOfStartTag()
Notification of the end of a start tag. |
void |
endTag(String name)
Notification of an end tag. |
void |
error(String message)
Notification of a detected error. |
List |
extractLinks(InputStream inputStream,
Map params)
Extracts all links occurring in the specified stream. |
void |
startDocument()
Notification of the start of a new document. |
void |
startOfStartTag(String name)
Notification of the start of a start tag. |
void |
text(String text)
Notification of text. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HtmlLinkExtractor()
Method Detail |
---|
public List extractLinks(InputStream inputStream, Map params) throws IOException
LinkExtractor
extractLinks
in interface LinkExtractor
inputStream
- The input stream containing the content from which the links should be extracted,
e.g. an HTML document.params
- An optional set of parameters to guide the link extraction process.
IOException
public void startDocument()
TokenHandler
startDocument
in interface TokenHandler
public void endDocument()
TokenHandler
endDocument
in interface TokenHandler
public void startOfStartTag(String name)
TokenHandler
startOfStartTag
in interface TokenHandler
name
- The tag name.public void endOfStartTag()
TokenHandler
endOfStartTag
in interface TokenHandler
public void endTag(String name)
TokenHandler
endTag
in interface TokenHandler
name
- The tag name.public void attribute(String name)
TokenHandler
attribute
in interface TokenHandler
name
- The name of the attribute.public void attribute(String name, String value)
TokenHandler
attribute
in interface TokenHandler
name
- The name of the attribute.value
- The value of the attribute.public void text(String text)
TokenHandler
text
in interface TokenHandler
text
- the text.public void comment(String comment)
TokenHandler
comment
in interface TokenHandler
comment
- The comment.public void docType(String name, String sysId, String fpi, String uri)
TokenHandler
docType
in interface TokenHandler
name
- The type name, e.g. HTML.sysId
- The system id, e.g. PUBLIC or SYSTEM.fpi
- The Formal Public Identifier, e.g. "-//W3C//DTD HTML 4.0 Transitional//EN".uri
- The URL of the DTD, e.g. "http://www.w3.org/TR/REC-html40/loose.dtd".public void error(String message)
TokenHandler
error
in interface TokenHandler
message
- An error message.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |