org.semanticdesktop.aperture.extractor.corel.util
Class WPStringExtractor

java.lang.Object
  extended by org.semanticdesktop.aperture.util.StringExtractor
      extended by org.semanticdesktop.aperture.extractor.corel.util.WPStringExtractor

public class WPStringExtractor
extends StringExtractor

A StringExtractor extension optimized for processing WordPerfect document streams.

This class is made available as a utility class as other file formats may also use WordPerfect's file structure, e.g. Corel Presentations 3.0 files.


Field Summary
 
Fields inherited from class org.semanticdesktop.aperture.util.StringExtractor
COMMON_FONT_NAMES
 
Constructor Summary
WPStringExtractor()
           
 
Method Summary
 String extract(InputStream stream)
          Wraps the specified InputStream in a WPFilterInputStream and passes it to the super class.
protected  boolean isStartLine(String lineLowerCase)
          Determines whether the supplied line indicates the start of the textual contents.
protected  boolean isTextCharacter(int charNumber)
          Checks whether the supplied character is a text character.
protected  boolean isValidLine(String lineLowerCase)
          Determines whether the supplied line should be included in the end result.
 
Methods inherited from class org.semanticdesktop.aperture.util.StringExtractor
isNormalWord, postProcessLine
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WPStringExtractor

public WPStringExtractor()
Method Detail

extract

public String extract(InputStream stream)
               throws IOException
Wraps the specified InputStream in a WPFilterInputStream and passes it to the super class.

Overrides:
extract in class StringExtractor
Parameters:
stream - The InputStream to read the bytes from. The stream will be fully consumed but not closed.
Returns:
The resulting, heuristically determined text. A String is always returned, although it can be empty.
Throws:
IOException - When reading characters from the InputStream caused an IOException.

isTextCharacter

protected boolean isTextCharacter(int charNumber)
Description copied from class: StringExtractor
Checks whether the supplied character is a text character. By default, this method returns true for letters and single quotes.

Overrides:
isTextCharacter in class StringExtractor

isStartLine

protected boolean isStartLine(String lineLowerCase)
Description copied from class: StringExtractor
Determines whether the supplied line indicates the start of the textual contents. If 'true', all text extracted up to this point will be ignored, i.e. text extraction will start again from scratch but at the current location in the stream. The specified line is expected to be fully lowercased. This default implementation returns 'false'.

Overrides:
isStartLine in class StringExtractor

isValidLine

protected boolean isValidLine(String lineLowerCase)
Description copied from class: StringExtractor
Determines whether the supplied line should be included in the end result. The specified line is expected to be fully lowercased.

Overrides:
isValidLine in class StringExtractor


Copyright © 2010 Aperture Development Team. All Rights Reserved.