|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.semanticdesktop.aperture.hypertext.linkextractor.html.HtmlLinkExtractor
public class HtmlLinkExtractor
A LinkExtractor implementation that can extract links from HTML documents.
| Field Summary |
|---|
| Fields inherited from interface org.semanticdesktop.aperture.hypertext.linkextractor.LinkExtractor |
|---|
BASE_URL_KEY, INCLUDE_EMBEDDED_RESOURCES_KEY |
| Constructor Summary | |
|---|---|
HtmlLinkExtractor()
|
|
| Method Summary | |
|---|---|
void |
attribute(String name)
Notification of an attribute for the most recently reported element. |
void |
attribute(String name,
String value)
Notification of an attribute for the most recently reported element. |
void |
comment(String comment)
Notification of comment. |
void |
docType(String name,
String sysId,
String fpi,
String uri)
Notification of a processing instruction. |
void |
endDocument()
Notification of the end of a document. |
void |
endOfStartTag()
Notification of the end of a start tag. |
void |
endTag(String name)
Notification of an end tag. |
void |
error(String message)
Notification of a detected error. |
List |
extractLinks(InputStream inputStream,
Map params)
Extracts all links occurring in the specified stream. |
void |
startDocument()
Notification of the start of a new document. |
void |
startOfStartTag(String name)
Notification of the start of a start tag. |
void |
text(String text)
Notification of text. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HtmlLinkExtractor()
| Method Detail |
|---|
public List extractLinks(InputStream inputStream,
Map params)
throws IOException
LinkExtractor
extractLinks in interface LinkExtractorinputStream - The input stream containing the content from which the links should be extracted,
e.g. an HTML document.params - An optional set of parameters to guide the link extraction process.
IOExceptionpublic void startDocument()
TokenHandler
startDocument in interface TokenHandlerpublic void endDocument()
TokenHandler
endDocument in interface TokenHandlerpublic void startOfStartTag(String name)
TokenHandler
startOfStartTag in interface TokenHandlername - The tag name.public void endOfStartTag()
TokenHandler
endOfStartTag in interface TokenHandlerpublic void endTag(String name)
TokenHandler
endTag in interface TokenHandlername - The tag name.public void attribute(String name)
TokenHandler
attribute in interface TokenHandlername - The name of the attribute.
public void attribute(String name,
String value)
TokenHandler
attribute in interface TokenHandlername - The name of the attribute.value - The value of the attribute.public void text(String text)
TokenHandler
text in interface TokenHandlertext - the text.public void comment(String comment)
TokenHandler
comment in interface TokenHandlercomment - The comment.
public void docType(String name,
String sysId,
String fpi,
String uri)
TokenHandler
docType in interface TokenHandlername - The type name, e.g. HTML.sysId - The system id, e.g. PUBLIC or SYSTEM.fpi - The Formal Public Identifier, e.g. "-//W3C//DTD HTML 4.0 Transitional//EN".uri - The URL of the DTD, e.g. "http://www.w3.org/TR/REC-html40/loose.dtd".public void error(String message)
TokenHandler
error in interface TokenHandlermessage - An error message.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||