Project name
From Merriam-Webster Online:
Main Entry: ap·er·ture
(sounds like this)
Pronunciation: 'ap-&(r)-"chur, -ch&r, -"tyur, -"tur
Function: noun
Etymology: Middle English, from Latin apertura, from apertus, past
participle of aperire to open
- an opening or open space : HOLE
- a : the opening in a photographic lens that admits the light
b : the diameter of the stop in an optical system that determines the diameter of the bundle of rays traversing the instrument
c : the diameter of the objective lens or mirror of a telescope
News
November 12, 2007 Aperture 1.0.1-beta released!
This release bears the mark of the Nepomuk Social Semantic Desktop - a major research initative where research institutes and commercial companies from around Europe. Aperture is used as one of the pillars of a next-generation platform that will revolutionize the way people organize and use the data stored on their computers. The input from the Nepomuk Community drove us to implement a host of new features that make Aperture more useful, more flexible and more powerful.
May 31, 2007 Aperture 2007.1 alpha 4 released!
The entire Aperture Framework has been rewritten to utilize the RDF2Go framework. It is now completely independent from the underlying RDF store. Aperture registries and factories can now be used in an OSGi environment as services. The infrastructure allows for on-the-fly deployment of new extraction components.
November 2, 2006 Aperture 2006.1 alpha 3 released!
This release adds support for crawling ical calendar files. The MIME type detection has been extended to support many more file formats. Extended the tutorials. There are numerous bugfixes and small improvements.
March 6, 2006: Aperture 2006.1 alpha 2 released!
This release adds support for crawling file systems, web sites, IMAP and Outlook mail boxes. Furthermore, the number of supported file formats has increased significantly.
Features
- Crawl information systems such as file systems, websites, mail boxes and mail servers
- Extract full-text and metadata from many common file formats
- View files in their native applications
- Ease of use: easy to learn, easy to code, easy to deploy in industrial projects
- Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms
- Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)
Supported File Formats
- Plain text
- HTML, XHTML
- XML
- PDF (Portable Document Format)
- RTF (Rich Text Format)
- Microsoft Office: Word, Excel, Powerpoint, Visio, Publisher
- Microsoft Works
- OpenOffice 1.x: Writer, Calc, Impress, Draw
- StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
- OpenDocument (OpenOffice 2.x, StarOffice 8.x)
- Corel WordPerfect, Quattro, Presentations
- Emails (.eml files)
- ical files
Crawlers
Crawlers support the extraction of information from heterogenous data sources. At the moment we support the following source types:
- File Systems (local, remote, removeable media)
- Websites and intranets
- IMAP e-mail servers
- Microsoft Outlook (alpha)
- Internet Calendar (ical) files
Support
At this moment the project is still in alpha stage and we provide only limited support. If you have any questions about the project, feel free to join the development mailinglist and ask us.
Development
To use Aperture in your own projects, read the wiki for information about requirements and code examples.
If you are interested in contributing, feel free to contact the project admins or join the development mailinglist. We are very interested in new extractors and other contributions including crawlers.