|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.semanticdesktop.aperture.subcrawler.SubCrawlerUtil
public class SubCrawlerUtil
A utility class containing some methods useful when working with subcrawlers and subcrawled resources.
Constructor Summary | |
---|---|
SubCrawlerUtil()
|
Method Summary | |
---|---|
static URI |
createChildUri(URI objectUri,
String childPath,
String prefix)
Creates a URI for a subcrawled entity. |
static DataObject |
getDataObject(URI uri,
InputStream stream,
DataSource dataSource,
Charset charset,
String mimeType,
RDFContainerFactory containerFactory,
SubCrawlerRegistry registry)
Tries to access a DataObject that is hidden in a stream. |
static DataObject |
getDataObject(URI parentUri,
String path,
InputStream stream,
DataSource dataSource,
Charset charset,
String mimeType,
RDFContainerFactory factory,
String prefix,
SubCrawler sc)
|
static URI |
getParentObjectUri(URI subCrawledObjectUri)
Returns the URI of the parent data object, from the URI of a subcrawled object. |
static URI |
getRootObjectUri(URI subCrawledObjectUri)
Returns the URI of the root object, from the URI of a subcrawled object. |
static String |
getSubCrawledObjectPath(URI subCrawledObjectUri)
Returns the the path of the subcrawled object within the parent object. |
static String |
getSubCrawlerPrefix(URI subCrawledObjectUri)
Returns the subcrawler prefix from the URI of a subcrawled object. |
static boolean |
isSubcrawledObjectUri(URI subCrawledObjectUri)
Returns true if the given uri is an URI of the subcrawled object, false otherwise. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SubCrawlerUtil()
Method Detail |
---|
public static DataObject getDataObject(URI uri, InputStream stream, DataSource dataSource, Charset charset, String mimeType, RDFContainerFactory containerFactory, SubCrawlerRegistry registry) throws SubCrawlerException, PathNotFoundException, IOException
Tries to access a DataObject that is hidden in a stream. This method can get the desired object through multiple levels of nesting. E.g. for an uri:
"zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/#1!/Board+paper.docx"
This method will assume that the given stream points at the root data object. i.e.:
"file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml"
Then it will apply a MimeSubCrawler on that stream, to get the first attachment, and afterwards it will apply the ZipSubCrawler on that attachment to get the desired file.
uri
- the uri of the subcrawled objectstream
- the stream pointing at the root data object of the uridataSource
- the data source that will be returned from the DataObject.getDataSource()
method
of the returned objectcharset
- a charset (optional)mimeType
- the mime type of the stream (optional)containerFactory
- the factory of RDFContainersregistry
- a SubCrawlerRegistry, from which all the necessary SubCrawlerFactories will be obtained
SubCrawlerException
PathNotFoundException
IOException
public static URI getRootObjectUri(URI subCrawledObjectUri)
Returns the URI of the root object, from the URI of a subcrawled object. E.g. for
"zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1!/Board+paper.docx"
This method will return
"file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml"
... that is the portion of the uri between the last 'scheme' part (regex: '\w{2,}:') and the first exclamation mark. The regex is constructed in a way to allow for windows drive names (single letter and a semicolon), an uri scheme cannot have a single letter.
subCrawledObjectUri
-
public static URI getParentObjectUri(URI subCrawledObjectUri)
Returns the URI of the parent data object, from the URI of a subcrawled object. E.g. for
"zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1!/Board+paper.docx"
This method will return
"mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1"
If this object already denotes a root data object (i.e. not a subcrawled data object) this method will return null. For example given a uri of a normal file:
"file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml"
This method will return null.
subCrawledObjectUri
-
public static String getSubCrawlerPrefix(URI subCrawledObjectUri)
Returns the subcrawler prefix from the URI of a subcrawled object. This means the immediate 'topmost' data object. E.g. for
"zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1!/Board+paper.docx"
This method will return "zip"
If this object already denotes a root data object (i.e. not a subcrawled data object) this method will return null. For example given a uri of a normal file:
"file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml"
This method will return null.
subCrawledObjectUri
-
public static String getSubCrawledObjectPath(URI subCrawledObjectUri)
Returns the the path of the subcrawled object within the parent object. This means the immediate 'topmost' data object. E.g. for
"zip:mime:file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml!/86b313dc282850fef1762fb400171750%2540amrapali.com#1!/Board+paper.docx"
This method will return "/Board+paper.docx"
If this object already denotes a root data object (i.e. not a subcrawled data object) this method will return null. For example given a uri of a normal file:
"file:/C:/Users/Chris/Desktop/docx%20problem/Useful%20documents1.eml"
This method will return null.
subCrawledObjectUri
-
public static boolean isSubcrawledObjectUri(URI subCrawledObjectUri)
subCrawledObjectUri
-
public static URI createChildUri(URI objectUri, String childPath, String prefix)
objectUri
- the uri of the parent data objectchildPath
- the path within the the child object
public static DataObject getDataObject(URI parentUri, String path, InputStream stream, DataSource dataSource, Charset charset, String mimeType, RDFContainerFactory factory, String prefix, SubCrawler sc) throws SubCrawlerException, PathNotFoundException
SubCrawlerException
PathNotFoundException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |