pt.tumba.cluster
Class TeXWordFinder

java.lang.Object
  extended by pt.tumba.cluster.DefaultWordFinder
      extended by pt.tumba.cluster.TeXWordFinder

public class TeXWordFinder
extends DefaultWordFinder

A word finder for TeX and LaTeX documents, which searches text for sequences of letters, but ignores any commands and environments as well as Math environments.


Field Summary
private  boolean IGNORE_COMMENTS
           
static int REG_EXPR
           
private  int regex_user_defined_ignores
           
static int STRING_EXPR
           
private  java.util.HashSet user_defined_ignores
           
 
Fields inherited from class pt.tumba.cluster.DefaultWordFinder
currentSegment, currentWord, currentWordPos, nextWord, nextWordPos, sentenceIterator, startsSentence, text
 
Constructor Summary
TeXWordFinder()
           
TeXWordFinder(java.lang.String inText)
          Creates a new DefaultWordFinder object.
 
Method Summary
 void addUserDefinedIgnores(java.util.Collection expressions, int regex)
          This method is used to import a user defined set of either strings or regular expressions to ignore.
private  int ignoreUserDefined(int i)
           
 java.lang.String next()
          This method scans the text from the end of the last word, and returns a new Word object corresponding to the next word.
 void setIgnoreComments(boolean ignore)
           
 
Methods inherited from class pt.tumba.cluster.DefaultWordFinder
current, currentSegment, getText, hasNext, ignore, ignore, ignore, ignore, init, isWordChar, isWordChar, nextSegment, replace, setSentenceIterator, setText, startsSentence, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

IGNORE_COMMENTS

private boolean IGNORE_COMMENTS

user_defined_ignores

private java.util.HashSet user_defined_ignores

regex_user_defined_ignores

private int regex_user_defined_ignores

STRING_EXPR

public static final int STRING_EXPR
See Also:
Constant Field Values

REG_EXPR

public static final int REG_EXPR
See Also:
Constant Field Values
Constructor Detail

TeXWordFinder

public TeXWordFinder(java.lang.String inText)
Creates a new DefaultWordFinder object.

Parameters:
inText - the text to search.

TeXWordFinder

public TeXWordFinder()
Method Detail

next

public java.lang.String next()
This method scans the text from the end of the last word, and returns a new Word object corresponding to the next word.

Overrides:
next in class DefaultWordFinder
Returns:
the next word.
Throws:
WordNotFoundException - search string contains no more words.

addUserDefinedIgnores

public void addUserDefinedIgnores(java.util.Collection expressions,
                                  int regex)
This method is used to import a user defined set of either strings or regular expressions to ignore.

Parameters:
expressions - a collection of of Objects whose toString() value should be the expression. Typically String objects.
regex - is an integer specifying the type of expression to use. e.g. REG_EXPR, STRING_EXPR.

ignoreUserDefined

private int ignoreUserDefined(int i)

setIgnoreComments

public void setIgnoreComments(boolean ignore)