mirror of
https://github.com/unicode-org/icu.git
synced 2025-04-14 01:11:02 +00:00
ICU-9101 Updated API docs for SearchIterator and StringSearch. Tried to keep them synchronized with ICU4C API docs as much as possible.
X-SVN-Rev: 35353
This commit is contained in:
parent
d92c13c285
commit
5799b71849
2 changed files with 353 additions and 618 deletions
|
@ -10,118 +10,43 @@ package com.ibm.icu.text;
|
|||
import java.text.CharacterIterator;
|
||||
|
||||
/**
|
||||
* <p>SearchIterator is an abstract base class that defines a protocol
|
||||
* for text searching. Subclasses provide concrete implementations of
|
||||
* various search algorithms. A concrete subclass, StringSearch, is
|
||||
* provided that implements language-sensitive pattern matching based
|
||||
* on the comparison rules defined in a RuleBasedCollator
|
||||
* object. Instances of SearchIterator maintain a current position and
|
||||
* scan over the target text, returning the indices where a match is
|
||||
* found and the length of each match. Generally, the sequence of forward
|
||||
* matches will be equivalent to the sequence of backward matches.One
|
||||
* case where this statement may not hold is when non-overlapping mode
|
||||
* is set on and there are continuous repetitive patterns in the text.
|
||||
* Consider the case searching for pattern "aba" in the text
|
||||
* "ababababa", setting overlapping mode off will produce forward matches
|
||||
* at offsets 0, 4. However when a backwards search is done, the
|
||||
* results will be at offsets 6 and 2.</p>
|
||||
*
|
||||
* <p>If matches searched for have boundary restrictions. BreakIterators
|
||||
* can be used to define the valid boundaries of such a match. Once a
|
||||
* BreakIterator is set, potential matches will be tested against the
|
||||
* BreakIterator to determine if the boundaries are valid and that all
|
||||
* characters in the potential match are equivalent to the pattern
|
||||
* searched for. For example, looking for the pattern "fox" in the text
|
||||
* "foxy fox" will produce match results at offset 0 and 5 with length 3
|
||||
* if no BreakIterators were set. However if a WordBreakIterator is set,
|
||||
* the only match that would be found will be at the offset 5. Since,
|
||||
* the SearchIterator guarantees that if a BreakIterator is set, all its
|
||||
* matches will match the given pattern exactly, a potential match that
|
||||
* passes the BreakIterator might still not produce a valid match. For
|
||||
* instance the pattern "e" will not be found in the string
|
||||
* "\u00e9" (latin small letter e with acute) if a
|
||||
* CharacterBreakIterator is used. Even though "e" is
|
||||
* a part of the character "\u00e9" and the potential match at
|
||||
* offset 0 length 1 passes the CharacterBreakIterator test, "\u00e9"
|
||||
* is not equivalent to "e", hence the SearchIterator rejects the potential
|
||||
* match. By default, the SearchIterator
|
||||
* does not impose any boundary restriction on the matches, it will
|
||||
* return all results that match the pattern. Illustrating with the
|
||||
* above example, "e" will
|
||||
* be found in the string "\u00e9" if no BreakIterator is
|
||||
* specified.</p>
|
||||
*
|
||||
* <p>SearchIterator also provides a means to handle overlapping
|
||||
* matches via the API setOverlapping(boolean). For example, if
|
||||
* overlapping mode is set, searching for the pattern "abab" in the
|
||||
* text "ababab" will match at positions 0 and 2, whereas if
|
||||
* overlapping is not set, SearchIterator will only match at position
|
||||
* 0. By default, overlapping mode is not set.</p>
|
||||
*
|
||||
* <p>The APIs in SearchIterator are similar to that of other text
|
||||
* iteration classes such as BreakIterator. Using this class, it is
|
||||
* easy to scan through text looking for all occurances of a
|
||||
* match.</p>
|
||||
* <tt>SearchIterator</tt> is an abstract base class that provides
|
||||
* methods to search for a pattern within a text string. Instances of
|
||||
* <tt>SearchIterator</tt> maintain a current position and scans over the
|
||||
* target text, returning the indices the pattern is matched and the length
|
||||
* of each match.
|
||||
* <p>
|
||||
* Example of use:<br>
|
||||
* <pre>
|
||||
* <tt>SearchIterator</tt> defines a protocol for text searching.
|
||||
* Subclasses provide concrete implementations of various search algorithms.
|
||||
* For example, <tt>StringSearch</tt> implements language-sensitive pattern
|
||||
* matching based on the comparison rules defined in a
|
||||
* <tt>RuleBasedCollator</tt> object.
|
||||
* <p>
|
||||
* Other options for searching includes using a BreakIterator to restrict
|
||||
* the points at which matches are detected.
|
||||
* <p>
|
||||
* <tt>SearchIterator</tt> provides an API that is similar to that of
|
||||
* other text iteration classes such as <tt>BreakIterator</tt>. Using
|
||||
* this class, it is easy to scan through text looking for all occurances of
|
||||
* a given pattern. The following example uses a <tt>StringSearch</tt>
|
||||
* object to find all instances of "fox" in the target string. Any other
|
||||
* subclass of <tt>SearchIterator</tt> can be used in an identical
|
||||
* manner.
|
||||
* <pre><code>
|
||||
* String target = "The quick brown fox jumped over the lazy fox";
|
||||
* String pattern = "fox";
|
||||
* SearchIterator iter = new StringSearch(pattern, target);
|
||||
* for (int pos = iter.first(); pos != SearchIterator.DONE;
|
||||
* pos = iter.next()) {
|
||||
* // println matches at offset 16 and 41 with length 3
|
||||
* System.out.println("Found match at " + pos + ", length is "
|
||||
* + iter.getMatchLength());
|
||||
* for (int pos = iter.first(); pos != SearchIterator.DONE;
|
||||
* pos = iter.next()) {
|
||||
* System.out.println("Found match at " + pos +
|
||||
* ", length is " + iter.getMatchLength());
|
||||
* }
|
||||
* target = "ababababa";
|
||||
* pattern = "aba";
|
||||
* iter.setTarget(new StringCharacterIterator(pattern));
|
||||
* iter.setOverlapping(false);
|
||||
* System.out.println("Overlapping mode set to false");
|
||||
* System.out.println("Forward matches of pattern " + pattern + " in text "
|
||||
* + text + ": ");
|
||||
* for (int pos = iter.first(); pos != SearchIterator.DONE;
|
||||
* pos = iter.next()) {
|
||||
* // println matches at offset 0 and 4 with length 3
|
||||
* System.out.println("offset " + pos + ", length "
|
||||
* + iter.getMatchLength());
|
||||
* }
|
||||
* System.out.println("Backward matches of pattern " + pattern + " in text "
|
||||
* + text + ": ");
|
||||
* for (int pos = iter.last(); pos != SearchIterator.DONE;
|
||||
* pos = iter.previous()) {
|
||||
* // println matches at offset 6 and 2 with length 3
|
||||
* System.out.println("offset " + pos + ", length "
|
||||
* + iter.getMatchLength());
|
||||
* }
|
||||
* System.out.println("Overlapping mode set to true");
|
||||
* System.out.println("Index set to 2");
|
||||
* iter.setIndex(2);
|
||||
* iter.setOverlapping(true);
|
||||
* System.out.println("Forward matches of pattern " + pattern + " in text "
|
||||
* + text + ": ");
|
||||
* for (int pos = iter.first(); pos != SearchIterator.DONE;
|
||||
* pos = iter.next()) {
|
||||
* // println matches at offset 2, 4 and 6 with length 3
|
||||
* System.out.println("offset " + pos + ", length "
|
||||
* + iter.getMatchLength());
|
||||
* }
|
||||
* System.out.println("Index set to 2");
|
||||
* iter.setIndex(2);
|
||||
* System.out.println("Backward matches of pattern " + pattern + " in text "
|
||||
* + text + ": ");
|
||||
* for (int pos = iter.last(); pos != SearchIterator.DONE;
|
||||
* pos = iter.previous()) {
|
||||
* // println matches at offset 0 with length 3
|
||||
* System.out.println("offset " + pos + ", length "
|
||||
* + iter.getMatchLength());
|
||||
* }
|
||||
* </pre>
|
||||
* </p>
|
||||
* </code></pre>
|
||||
*
|
||||
* @author Laura Werner, synwee
|
||||
* @stable ICU 2.0
|
||||
* @see BreakIterator
|
||||
* @see RuleBasedCollator
|
||||
*/
|
||||
public abstract class SearchIterator
|
||||
{
|
||||
|
@ -242,7 +167,7 @@ public abstract class SearchIterator
|
|||
* @stable ICU 2.0
|
||||
*/
|
||||
public static final int DONE = -1;
|
||||
|
||||
|
||||
// public methods -----------------------------------------------------
|
||||
|
||||
// public setters -----------------------------------------------------
|
||||
|
@ -269,38 +194,36 @@ public abstract class SearchIterator
|
|||
search_.setMatchedLength(0);
|
||||
search_.matchedIndex_ = DONE;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Determines whether overlapping matches are returned. See the class
|
||||
* documentation for more information about overlapping matches.
|
||||
* </p>
|
||||
* <p>
|
||||
* The default setting of this property is false
|
||||
* </p>
|
||||
*
|
||||
* @param allowOverlap flag indicator if overlapping matches are allowed
|
||||
* @see #isOverlapping
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
public void setOverlapping(boolean allowOverlap)
|
||||
{
|
||||
public void setOverlapping(boolean allowOverlap) {
|
||||
search_.isOverlap_ = allowOverlap;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Set the BreakIterator that is used to restrict the points at which
|
||||
* matches are detected.
|
||||
* Using <tt>null</tt> as the parameter is legal; it means that break
|
||||
* detection should not be attempted.
|
||||
* See class documentation for more information.
|
||||
* Set the BreakIterator that will be used to restrict the points
|
||||
* at which matches are detected.
|
||||
*
|
||||
* @param breakiter A BreakIterator that will be used to restrict the
|
||||
* points at which matches are detected.
|
||||
* @see #getBreakIterator
|
||||
* points at which matches are detected. If a match is
|
||||
* found, but the match's start or end index is not a
|
||||
* boundary as determined by the {@link BreakIterator},
|
||||
* the match will be rejected and another will be searched
|
||||
* for. If this parameter is <tt>null</tt>, no break
|
||||
* detection is attempted.
|
||||
* @see BreakIterator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public void setBreakIterator(BreakIterator breakiter)
|
||||
{
|
||||
public void setBreakIterator(BreakIterator breakiter) {
|
||||
search_.setBreakIter(breakiter);
|
||||
if (search_.breakIter() != null) {
|
||||
// Create a clone of CharacterItearator, so it won't
|
||||
|
@ -313,8 +236,9 @@ public abstract class SearchIterator
|
|||
|
||||
/**
|
||||
* Set the target text to be searched. Text iteration will then begin at
|
||||
* the start of the text string. This method is useful if you want to
|
||||
* the start of the text string. This method is useful if you want to
|
||||
* reuse an iterator to search within a different body of text.
|
||||
*
|
||||
* @param text new text iterator to look for match,
|
||||
* @exception IllegalArgumentException thrown when text is null or has
|
||||
* 0 length
|
||||
|
@ -343,128 +267,103 @@ public abstract class SearchIterator
|
|||
}
|
||||
}
|
||||
|
||||
//TODO: We should add APIs below to match ICU4C APIs
|
||||
//TODO: We may add APIs below to match ICU4C APIs
|
||||
// setCanonicalMatch
|
||||
// setElementComparison
|
||||
|
||||
// public getters ----------------------------------------------------
|
||||
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Returns the index of the most recent match in the target text.
|
||||
* This call returns a valid result only after a successful call to
|
||||
* {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
|
||||
* Just after construction, or after a searching method returns
|
||||
* <tt>DONE</tt>, this method will return <tt>DONE</tt>.
|
||||
* </p>
|
||||
* <p>
|
||||
* Use <tt>getMatchLength</tt> to get the length of the matched text.
|
||||
* <tt>getMatchedText</tt> will return the subtext in the searched
|
||||
* target text from index getMatchStart() with length getMatchLength().
|
||||
* </p>
|
||||
* @return index to a substring within the text string that is being
|
||||
* searched.
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #first
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
public int getMatchStart()
|
||||
{
|
||||
* Returns the index to the match in the text string that was searched.
|
||||
* This call returns a valid result only after a successful call to
|
||||
* {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
|
||||
* Just after construction, or after a searching method returns
|
||||
* {@link #DONE}, this method will return {@link #DONE}.
|
||||
* <p>
|
||||
* Use {@link #getMatchLength} to get the matched string length.
|
||||
*
|
||||
* @return index of a substring within the text string that is being
|
||||
* searched.
|
||||
* @see #first
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public int getMatchStart() {
|
||||
return search_.matchedIndex_;
|
||||
}
|
||||
|
||||
/**
|
||||
* Return the index in the target text at which the iterator is currently
|
||||
* positioned.
|
||||
* If the iteration has gone past the end of the target text, or past
|
||||
* the beginning for a backwards search, {@link #DONE} is returned.
|
||||
* @return index in the target text at which the iterator is currently
|
||||
* positioned.
|
||||
* Return the current index in the text being searched.
|
||||
* If the iteration has gone past the end of the text
|
||||
* (or past the beginning for a backwards search), {@link #DONE}
|
||||
* is returned.
|
||||
*
|
||||
* @return current index in the text being searched.
|
||||
* @stable ICU 2.8
|
||||
* @see #first
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
*/
|
||||
public abstract int getIndex();
|
||||
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Returns the length of the most recent match in the target text.
|
||||
* This call returns a valid result only after a successful
|
||||
* call to {@link #first}, {@link #next}, {@link #previous}, or
|
||||
* {@link #last}.
|
||||
* Just after construction, or after a searching method returns
|
||||
* <tt>DONE</tt>, this method will return 0. See getMatchStart() for
|
||||
* more details.
|
||||
* </p>
|
||||
* @return The length of the most recent match in the target text, or 0 if
|
||||
* there is no match.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchedText
|
||||
* Returns the length of text in the string which matches the search
|
||||
* pattern. This call returns a valid result only after a successful call
|
||||
* to {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
|
||||
* Just after construction, or after a searching method returns
|
||||
* {@link #DONE}, this method will return 0.
|
||||
*
|
||||
* @return The length of the match in the target text, or 0 if there
|
||||
* is no match currently.
|
||||
* @see #first
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public int getMatchLength()
|
||||
{
|
||||
public int getMatchLength() {
|
||||
return search_.matchedLength();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Returns the BreakIterator that is used to restrict the indexes at which
|
||||
* matches are detected. This will be the same object that was passed to
|
||||
* the constructor or to <code>setBreakIterator</code>.
|
||||
* If the BreakIterator has not been set, <tt>null</tt> will be returned.
|
||||
* See setBreakIterator for more information.
|
||||
* the constructor or to {@link #setBreakIterator}.
|
||||
* If the {@link BreakIterator} has not been set, <tt>null</tt> will be returned.
|
||||
* See {@link #setBreakIterator} for more information.
|
||||
*
|
||||
* @return the BreakIterator set to restrict logic matches
|
||||
* @see #setBreakIterator
|
||||
* @see BreakIterator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public BreakIterator getBreakIterator()
|
||||
{
|
||||
public BreakIterator getBreakIterator() {
|
||||
return search_.breakIter();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Return the target text that is being searched.
|
||||
* @return target text being searched.
|
||||
* @see #setTarget
|
||||
* Return the string text to be searched.
|
||||
* @return text string to be searched.
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public CharacterIterator getTarget()
|
||||
{
|
||||
public CharacterIterator getTarget() {
|
||||
return search_.text();
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Returns the text that was matched by the most recent call to
|
||||
* {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
|
||||
* If the iterator is not pointing at a valid match, for instance just
|
||||
* after construction or after <tt>DONE</tt> has been returned, an empty
|
||||
* String will be returned. See getMatchStart for more information
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* {@link #first}, {@link #next}, {@link #previous}, or {@link #last}.
|
||||
* If the iterator is not pointing at a valid match (e.g. just after
|
||||
* construction or after {@link #DONE} has been returned,
|
||||
* returns an empty string.
|
||||
*
|
||||
* @return the substring in the target test of the most recent match,
|
||||
* or null if there is no match currently.
|
||||
* @see #first
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* @return the substring in the target text of the most recent match
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public String getMatchedText()
|
||||
{
|
||||
public String getMatchedText() {
|
||||
if (search_.matchedLength() > 0) {
|
||||
int limit = search_.matchedIndex_ + search_.matchedLength();
|
||||
StringBuilder result = new StringBuilder(search_.matchedLength());
|
||||
|
@ -481,31 +380,22 @@ public abstract class SearchIterator
|
|||
}
|
||||
|
||||
// miscellaneous public methods -----------------------------------------
|
||||
|
||||
|
||||
/**
|
||||
* Search <b>forwards</b> in the target text for the next valid match,
|
||||
* starting the search from the current iterator position. The iterator is
|
||||
* adjusted so that its current index, as returned by {@link #getIndex},
|
||||
* is the starting position of the match if one was found. If a match is
|
||||
* found, the index of the match is returned, otherwise <tt>DONE</tt> is
|
||||
* returned. If overlapping mode is set, the beginning of the found match
|
||||
* can be before the end of the current match, if any.
|
||||
* @return The starting index of the next forward match after the current
|
||||
* iterator position, or
|
||||
* <tt>DONE</tt> if there are no more matches.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #following
|
||||
* @see #preceding
|
||||
* @see #previous
|
||||
* @see #first
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* Returns the index of the next point at which the text matches the
|
||||
* search pattern, starting from the current position
|
||||
* The iterator is adjusted so that its current index (as returned by
|
||||
* {@link #getIndex}) is the match position if one was found.
|
||||
* If a match is not found, {@link #DONE} will be returned and
|
||||
* the iterator will be adjusted to a position after the end of the text
|
||||
* string.
|
||||
*
|
||||
* @return The index of the next match after the current position,
|
||||
* or {@link #DONE} if there are no more matches.
|
||||
* @see #getIndex
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public int next()
|
||||
{
|
||||
public int next() {
|
||||
int index = getIndex(); // offset = getOffset() in ICU4C
|
||||
int matchindex = search_.matchedIndex_;
|
||||
int matchlength = search_.matchedLength();
|
||||
|
@ -545,29 +435,19 @@ public abstract class SearchIterator
|
|||
}
|
||||
|
||||
/**
|
||||
* Search <b>backwards</b> in the target text for the next valid match,
|
||||
* starting the search from the current iterator position. The iterator is
|
||||
* adjusted so that its current index, as returned by {@link #getIndex},
|
||||
* is the starting position of the match if one was found. If a match is
|
||||
* found, the index is returned, otherwise <tt>DONE</tt> is returned. If
|
||||
* overlapping mode is set, the end of the found match can be after the
|
||||
* beginning of the previous match, if any.
|
||||
* @return The starting index of the next backwards match after the current
|
||||
* iterator position, or
|
||||
* <tt>DONE</tt> if there are no more matches.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #following
|
||||
* @see #preceding
|
||||
* @see #next
|
||||
* @see #first
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* Returns the index of the previous point at which the string text
|
||||
* matches the search pattern, starting at the current position.
|
||||
* The iterator is adjusted so that its current index (as returned by
|
||||
* {@link #getIndex}) is the match position if one was found.
|
||||
* If a match is not found, {@link #DONE} will be returned and
|
||||
* the iterator will be adjusted to the index {@link #DONE}.
|
||||
*
|
||||
* @return The index of the previous match before the current position,
|
||||
* or {@link #DONE} if there are no more matches.
|
||||
* @see #getIndex
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public int previous()
|
||||
{
|
||||
public int previous() {
|
||||
int index; // offset in ICU4C
|
||||
if (search_.reset_) {
|
||||
index = search_.endIndex(); // m_search_->textLength in ICU4C
|
||||
|
@ -611,34 +491,29 @@ public abstract class SearchIterator
|
|||
|
||||
/**
|
||||
* Return true if the overlapping property has been set.
|
||||
* See setOverlapping(boolean) for more information.
|
||||
* See {@link #setOverlapping(boolean)} for more information.
|
||||
*
|
||||
* @see #setOverlapping
|
||||
* @return true if the overlapping property has been set, false otherwise
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
public boolean isOverlapping()
|
||||
{
|
||||
public boolean isOverlapping() {
|
||||
return search_.isOverlap_;
|
||||
}
|
||||
|
||||
//TODO: We should add APIs below to match ICU4C APIs
|
||||
//TODO: We may add APIs below to match ICU4C APIs
|
||||
// isCanonicalMatch
|
||||
// getElementComparison
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Resets the search iteration. All properties will be reset to their
|
||||
* default values.
|
||||
* </p>
|
||||
* <p>
|
||||
* If a forward iteration is initiated, the next search will begin at the
|
||||
* start of the target text. Otherwise, if a backwards iteration is initiated,
|
||||
* the next search will begin at the end of the target text.
|
||||
* </p>
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
public void reset()
|
||||
{
|
||||
* Resets the iteration.
|
||||
* Search will begin at the start of the text string if a forward
|
||||
* iteration is initiated before a backwards iteration. Otherwise if a
|
||||
* backwards iteration is initiated before a forwards iteration, the
|
||||
* search will begin at the end of the text string.
|
||||
*
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public void reset() {
|
||||
setMatchNotFound();
|
||||
setIndex(search_.beginIndex());
|
||||
search_.isOverlap_ = false;
|
||||
|
@ -647,112 +522,103 @@ public abstract class SearchIterator
|
|||
search_.isForwardSearching_ = true;
|
||||
search_.reset_ = true;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Return the index of the first <b>forward</b> match in the target text.
|
||||
* This method sets the iteration to begin at the start of the
|
||||
* target text and searches forward from there.
|
||||
* @return The index of the first forward match, or <code>DONE</code>
|
||||
* if there are no matches.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #following
|
||||
* @see #preceding
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* Returns the first index at which the string text matches the search
|
||||
* pattern. The iterator is adjusted so that its current index (as
|
||||
* returned by {@link #getIndex()}) is the match position if one
|
||||
*
|
||||
* was found.
|
||||
* If a match is not found, {@link #DONE} will be returned and
|
||||
* the iterator will be adjusted to the index {@link #DONE}.
|
||||
* @return The character index of the first match, or
|
||||
* {@link #DONE} if there are no matches.
|
||||
*
|
||||
* @see #getIndex
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public final int first()
|
||||
{
|
||||
public final int first() {
|
||||
int startIdx = search_.beginIndex();
|
||||
setIndex(startIdx);
|
||||
return handleNext(startIdx);
|
||||
}
|
||||
|
||||
/**
|
||||
* Return the index of the first <b>forward</b> match in target text that
|
||||
* is at or after argument <tt>position</tt>.
|
||||
* This method sets the iteration to begin at the specified
|
||||
* position in the the target text and searches forward from there.
|
||||
* @return The index of the first forward match, or <code>DONE</code>
|
||||
* if there are no matches.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #first
|
||||
* @see #preceding
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* Returns the first index equal or greater than <tt>position</tt> at which the
|
||||
* string text matches the search pattern. The iterator is adjusted so
|
||||
* that its current index (as returned by {@link #getIndex()}) is the
|
||||
* match position if one was found.
|
||||
* If a match is not found, {@link #DONE} will be returned and the
|
||||
* iterator will be adjusted to the index {@link #DONE}.
|
||||
*
|
||||
* @param position where search if to start from.
|
||||
* @return The character index of the first match following
|
||||
* <tt>position</tt>, or {@link #DONE} if there are no matches.
|
||||
* @throws IndexOutOfBoundsException If position is less than or greater
|
||||
* than the text range for searching.
|
||||
* @see #getIndex
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public final int following(int position)
|
||||
{
|
||||
public final int following(int position) {
|
||||
setIndex(position);
|
||||
return handleNext(position);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Return the index of the first <b>backward</b> match in target text.
|
||||
* This method sets the iteration to begin at the end of the
|
||||
* target text and searches backwards from there.
|
||||
* @return The starting index of the first backward match, or
|
||||
* <code>DONE</code> if there are no matches.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #first
|
||||
* @see #preceding
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #following
|
||||
* @see #DONE
|
||||
* Returns the last index in the target text at which it matches the
|
||||
* search pattern. The iterator is adjusted so that its current index
|
||||
* (as returned by {@link #getIndex}) is the match position if one was
|
||||
* found.
|
||||
* If a match is not found, {@link #DONE} will be returned and
|
||||
* the iterator will be adjusted to the index {@link #DONE}.
|
||||
*
|
||||
* @return The index of the first match, or {@link #DONE} if
|
||||
* there are no matches.
|
||||
* @see #getIndex
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public final int last()
|
||||
{
|
||||
public final int last() {
|
||||
int endIdx = search_.endIndex();
|
||||
setIndex(endIdx);
|
||||
return handlePrevious(endIdx);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Return the index of the first <b>backwards</b> match in target
|
||||
* text that ends at or before argument <tt>position</tt>.
|
||||
* This method sets the iteration to begin at the argument
|
||||
* position index of the target text and searches backwards from there.
|
||||
* @return The starting index of the first backwards match, or
|
||||
* <code>DONE</code>
|
||||
* if there are no matches.
|
||||
* @see #getMatchStart
|
||||
* @see #getMatchLength
|
||||
* @see #getMatchedText
|
||||
* @see #first
|
||||
* @see #following
|
||||
* @see #next
|
||||
* @see #previous
|
||||
* @see #last
|
||||
* @see #DONE
|
||||
* Returns the first index less than <tt>position</tt> at which the string
|
||||
* text matches the search pattern. The iterator is adjusted so that its
|
||||
* current index (as returned by {@link #getIndex}) is the match
|
||||
* position if one was found. If a match is not found,
|
||||
* {@link #DONE} will be returned and the iterator will be
|
||||
* adjusted to the index {@link #DONE}
|
||||
* <p>
|
||||
* When the overlapping option ({@link #isOverlapping}) is off, the last index of the
|
||||
* result match is always less than <tt>position</tt>.
|
||||
* When the overlapping option is on, the result match may span across
|
||||
* <tt>position</tt>.
|
||||
*
|
||||
* @param position where search is to start from.
|
||||
* @return The character index of the first match preceding
|
||||
* <tt>position</tt>, or {@link #DONE} if there are
|
||||
* no matches.
|
||||
* @throws IndexOutOfBoundsException If position is less than or greater than
|
||||
* the text range for searching
|
||||
* @see #getIndex
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public final int preceding(int position)
|
||||
{
|
||||
public final int preceding(int position) {
|
||||
setIndex(position);
|
||||
return handlePrevious(position);
|
||||
}
|
||||
|
||||
// protected constructor ----------------------------------------------
|
||||
|
||||
|
||||
/**
|
||||
* Protected constructor for use by subclasses.
|
||||
* Initializes the iterator with the argument target text for searching
|
||||
* and sets the BreakIterator.
|
||||
* See class documentation for more details on the use of the target text
|
||||
* and BreakIterator.
|
||||
* and {@link BreakIterator}.
|
||||
*
|
||||
* @param target The target text to be searched.
|
||||
* @param breaker A {@link BreakIterator} that is used to determine the
|
||||
* boundaries of a logical match. This argument can be null.
|
||||
|
@ -790,7 +656,8 @@ public abstract class SearchIterator
|
|||
/**
|
||||
* Sets the length of the most recent match in the target text.
|
||||
* Subclasses' handleNext() and handlePrevious() methods should call this
|
||||
* after they find a match in the target text.
|
||||
* after they find a match in the target text.
|
||||
*
|
||||
* @param length new length to set
|
||||
* @see #handleNext
|
||||
* @see #handlePrevious
|
||||
|
@ -802,50 +669,41 @@ public abstract class SearchIterator
|
|||
}
|
||||
|
||||
/**
|
||||
* Abstract method which subclasses override to provide the mechanism
|
||||
* for finding the next match in the target text. This allows different
|
||||
* subclasses to provide different search algorithms.
|
||||
* <p>
|
||||
* Abstract method that subclasses override to provide the mechanism
|
||||
* for finding the next <b>forwards</b> match in the target text. This
|
||||
* allows different subclasses to provide different search algorithms.
|
||||
* </p>
|
||||
* <p>
|
||||
* If a match is found, this function must call setMatchLength(int) to
|
||||
* set the length of the result match.
|
||||
* The iterator is adjusted so that its current index, as returned by
|
||||
* {@link #getIndex}, is the starting position of the match if one was
|
||||
* found. If a match is not found, <tt>DONE</tt> will be returned.
|
||||
* </p>
|
||||
* @param start index in the target text at which the forwards search
|
||||
* should begin.
|
||||
* @return the starting index of the next forwards match if found, DONE
|
||||
* otherwise
|
||||
* @see #setMatchLength(int)
|
||||
* @see #handlePrevious(int)
|
||||
* @see #DONE
|
||||
* If a match is found, the implementation should return the index at
|
||||
* which the match starts and should call
|
||||
* {@link #setMatchLength} with the number of characters
|
||||
* in the target text that make up the match. If no match is found, the
|
||||
* method should return {@link #DONE}.
|
||||
*
|
||||
* @param start The index in the target text at which the search
|
||||
* should start.
|
||||
* @return index at which the match starts, else if match is not found
|
||||
* {@link #DONE} is returned
|
||||
* @see #setMatchLength
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
protected abstract int handleNext(int start);
|
||||
|
||||
|
||||
/**
|
||||
* Abstract method which subclasses override to provide the mechanism for
|
||||
* finding the previous match in the target text. This allows different
|
||||
* subclasses to provide different search algorithms.
|
||||
* <p>
|
||||
* Abstract method which subclasses override to provide the mechanism
|
||||
* for finding the next <b>backwards</b> match in the target text.
|
||||
* This allows different
|
||||
* subclasses to provide different search algorithms.
|
||||
* </p>
|
||||
* <p>
|
||||
* If a match is found, this function must call setMatchLength(int) to
|
||||
* set the length of the result match.
|
||||
* The iterator is adjusted so that its current index, as returned by
|
||||
* {@link #getIndex}, is the starting position of the match if one was
|
||||
* found. If a match is not found, <tt>DONE</tt> will be returned.
|
||||
* </p>
|
||||
* @param startAt index in the target text at which the backwards search
|
||||
* should begin.
|
||||
* @return the starting index of the next backwards match if found,
|
||||
* DONE otherwise
|
||||
* @see #setMatchLength(int)
|
||||
* @see #handleNext(int)
|
||||
* @see #DONE
|
||||
* If a match is found, the implementation should return the index at
|
||||
* which the match starts and should call
|
||||
* {@link #setMatchLength} with the number of characters
|
||||
* in the target text that make up the match. If no match is found, the
|
||||
* method should return {@link #DONE}.
|
||||
*
|
||||
* @param startAt The index in the target text at which the search
|
||||
* should start.
|
||||
* @return index at which the match starts, else if match is not found
|
||||
* {@link #DONE} is returned
|
||||
* @see #setMatchLength
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
protected abstract int handlePrevious(int startAt);
|
||||
|
@ -878,16 +736,16 @@ public abstract class SearchIterator
|
|||
*/
|
||||
STANDARD_ELEMENT_COMPARISON,
|
||||
/**
|
||||
* <p>Collation element comparison is modified to effectively provide behavior
|
||||
* between the specified strength and strength - 1.</p>
|
||||
*
|
||||
* <p>Collation elements in the pattern that have the base weight for the specified
|
||||
* Collation element comparison is modified to effectively provide behavior
|
||||
* between the specified strength and strength - 1.
|
||||
* <p>
|
||||
* Collation elements in the pattern that have the base weight for the specified
|
||||
* strength are treated as "wildcards" that match an element with any other
|
||||
* weight at that collation level in the searched text. For example, with a
|
||||
* secondary-strength English collator, a plain 'e' in the pattern will match
|
||||
* a plain e or an e with any diacritic in the searched text, but an e with
|
||||
* diacritic in the pattern will only match an e with the same diacritic in
|
||||
* the searched text.<p>
|
||||
* the searched text.
|
||||
*
|
||||
* @draft ICU 53
|
||||
* @provisional This API might change or be removed in a future release.
|
||||
|
@ -895,16 +753,16 @@ public abstract class SearchIterator
|
|||
PATTERN_BASE_WEIGHT_IS_WILDCARD,
|
||||
|
||||
/**
|
||||
* <p>Collation element comparison is modified to effectively provide behavior
|
||||
* between the specified strength and strength - 1.</p>
|
||||
*
|
||||
* <p>Collation elements in either the pattern or the searched text that have the
|
||||
* Collation element comparison is modified to effectively provide behavior
|
||||
* between the specified strength and strength - 1.
|
||||
* <p>
|
||||
* Collation elements in either the pattern or the searched text that have the
|
||||
* base weight for the specified strength are treated as "wildcards" that match
|
||||
* an element with any other weight at that collation level. For example, with
|
||||
* a secondary-strength English collator, a plain 'e' in the pattern will match
|
||||
* a plain e or an e with any diacritic in the searched text, but an e with
|
||||
* diacritic in the pattern will only match an e with the same diacritic or a
|
||||
* plain e in the searched text.</p>
|
||||
* plain e in the searched text.
|
||||
*
|
||||
* @draft ICU 53
|
||||
* @provisional This API might change or be removed in a future release.
|
||||
|
@ -913,9 +771,9 @@ public abstract class SearchIterator
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>Sets the collation element comparison type.</p>
|
||||
*
|
||||
* <p>The default comparison type is {@link ElementComparisonType#STANDARD_ELEMENT_COMPARISON}.</p>
|
||||
* Sets the collation element comparison type.
|
||||
* <p>
|
||||
* The default comparison type is {@link ElementComparisonType#STANDARD_ELEMENT_COMPARISON}.
|
||||
*
|
||||
* @see ElementComparisonType
|
||||
* @see #getElementComparisonType()
|
||||
|
@ -927,7 +785,7 @@ public abstract class SearchIterator
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>Returns the collation element comparison type.</p>
|
||||
* Returns the collation element comparison type.
|
||||
*
|
||||
* @see ElementComparisonType
|
||||
* @see #setElementComparisonType(ElementComparisonType)
|
||||
|
|
|
@ -14,150 +14,111 @@ import com.ibm.icu.util.ICUException;
|
|||
import com.ibm.icu.util.ULocale;
|
||||
|
||||
// Java porting note:
|
||||
// ICU4C implementation contains dead code in many places.
|
||||
//
|
||||
// ICU4C implementation contains dead code in many places.
|
||||
// While porting ICU4C linear search implementation, these dead codes
|
||||
// were not fully ported. The code block tagged by "// *** Boyer-Moore ***"
|
||||
// are those dead code, still available in ICU4C.
|
||||
|
||||
//TODO: ICU4C implementation does not seem to handle UCharacterIterator pointing
|
||||
// ICU4C implementation does not seem to handle UCharacterIterator pointing
|
||||
// a fragment of text properly. ICU4J uses CharacterIterator to navigate through
|
||||
// the input text. We need to carefully review the code ported from ICU4C
|
||||
// assuming the start index is 0.
|
||||
|
||||
//TODO: ICU4C implementation initializes pattern.CE and pattern.PCE. It looks
|
||||
// ICU4C implementation initializes pattern.CE and pattern.PCE. It looks
|
||||
// CE is no longer used, except a few places checking CELength. It looks this
|
||||
// is a left over from already disable Boyer-Moore search code. This Java implementation
|
||||
// preserves the code, but we should clean them up later.
|
||||
|
||||
//TODO: We need to update document to remove the term "Boyer-Moore search".
|
||||
|
||||
/**
|
||||
/**
|
||||
*
|
||||
* <tt>StringSearch</tt> is a {@link SearchIterator} that provides
|
||||
* language-sensitive text searching based on the comparison rules defined
|
||||
* in a {@link RuleBasedCollator} object.
|
||||
* StringSearch ensures that language eccentricity can be
|
||||
* handled, e.g. for the German collator, characters ß and SS will be matched
|
||||
* if case is chosen to be ignored.
|
||||
* See the <a href="http://source.icu-project.org/repos/icu/icuhtml/trunk/design/collation/ICU_collation_design.htm">
|
||||
* "ICU Collation Design Document"</a> for more information.
|
||||
* <p>
|
||||
* <code>StringSearch</code> is the concrete subclass of
|
||||
* <code>SearchIterator</code> that provides language-sensitive text searching
|
||||
* based on the comparison rules defined in a {@link RuleBasedCollator} object.
|
||||
* </p>
|
||||
* <p>
|
||||
* <code>StringSearch</code> uses a version of the fast Boyer-Moore search
|
||||
* algorithm that has been adapted to work with the large character set of
|
||||
* Unicode. Refer to
|
||||
* <a href="http://www.icu-project.org/docs/papers/efficient_text_searching_in_java.html">
|
||||
* "Efficient Text Searching in Java"</a>, published in the
|
||||
* <i>Java Report</i> on February, 1999, for further information on the
|
||||
* algorithm.
|
||||
* </p>
|
||||
* <p>
|
||||
* Users are also strongly encouraged to read the section on
|
||||
* <a href="http://www.icu-project.org/userguide/searchString.html">
|
||||
* String Search</a> and
|
||||
* <a href="http://www.icu-project.org/userguide/Collate_Intro.html">
|
||||
* Collation</a> in the user guide before attempting to use this class.
|
||||
* </p>
|
||||
* <p>
|
||||
* String searching becomes a little complicated when accents are encountered at
|
||||
* match boundaries. If a match is found and it has preceding or trailing
|
||||
* accents not part of the match, the result returned will include the
|
||||
* preceding accents up to the first base character, if the pattern searched
|
||||
* for starts an accent. Likewise,
|
||||
* if the pattern ends with an accent, all trailing accents up to the first
|
||||
* base character will be included in the result.
|
||||
* </p>
|
||||
* <p>
|
||||
* For example, if a match is found in target text "a\u0325\u0300" for
|
||||
* the pattern
|
||||
* "a\u0325", the result returned by StringSearch will be the index 0 and
|
||||
* length 3 <0, 3>. If a match is found in the target
|
||||
* "a\u0325\u0300"
|
||||
* for the pattern "\u0300", then the result will be index 1 and length 2
|
||||
* <1, 2>.
|
||||
* </p>
|
||||
* <p>
|
||||
* In the case where the decomposition mode is on for the RuleBasedCollator,
|
||||
* all matches that starts or ends with an accent will have its results include
|
||||
* preceding or following accents respectively. For example, if pattern "a" is
|
||||
* looked for in the target text "á\u0325", the result will be
|
||||
* index 0 and length 2 <0, 2>.
|
||||
* </p>
|
||||
* <p>
|
||||
* The StringSearch class provides two options to handle accent matching
|
||||
* described below:
|
||||
* </p>
|
||||
* <p>
|
||||
* Let S' be the sub-string of a text string S between the offsets start and
|
||||
* end <start, end>.
|
||||
* <br>
|
||||
* A pattern string P matches a text string S at the offsets <start,
|
||||
* length>
|
||||
* There are 2 match options for selection:<br>
|
||||
* Let S' be the sub-string of a text string S between the offsets start and
|
||||
* end [start, end].
|
||||
* <br>
|
||||
* A pattern string P matches a text string S at the offsets [start, end]
|
||||
* if
|
||||
* <pre>
|
||||
* option 1. P matches some canonical equivalent string of S'. Suppose the
|
||||
* RuleBasedCollator used for searching has a collation strength of
|
||||
* TERTIARY, all accents are non-ignorable. If the pattern
|
||||
* "a\u0300" is searched in the target text
|
||||
* "a\u0325\u0300",
|
||||
* a match will be found, since the target text is canonically
|
||||
* equivalent to "a\u0300\u0325"
|
||||
* option 2. P matches S' and if P starts or ends with a combining mark,
|
||||
* there exists no non-ignorable combining mark before or after S'
|
||||
* in S respectively. Following the example above, the pattern
|
||||
* "a\u0300" will not find a match in "a\u0325\u0300",
|
||||
* since
|
||||
* there exists a non-ignorable accent '\u0325' in the middle of
|
||||
* 'a' and '\u0300'. Even with a target text of
|
||||
* "a\u0300\u0325" a match will not be found because of the
|
||||
* non-ignorable trailing accent \u0325.
|
||||
* option 1. Some canonical equivalent of P matches some canonical equivalent
|
||||
* of S'
|
||||
* option 2. P matches S' and if P starts or ends with a combining mark,
|
||||
* there exists no non-ignorable combining mark before or after S?
|
||||
* in S respectively.
|
||||
* </pre>
|
||||
* Option 2. will be the default mode for dealing with boundary accents unless
|
||||
* specified via the API setCanonical(boolean).
|
||||
* One restriction is to be noted for option 1. Currently there are no
|
||||
* composite characters that consists of a character with combining class > 0
|
||||
* before a character with combining class == 0. However, if such a character
|
||||
* exists in the future, the StringSearch may not work correctly with option 1
|
||||
* when such characters are encountered.
|
||||
* </p>
|
||||
* Option 2. will be the default.
|
||||
* <p>
|
||||
* <tt>SearchIterator</tt> provides APIs to specify the starting position
|
||||
* within the text string to be searched, e.g. <tt>setIndex</tt>,
|
||||
* <tt>preceding</tt> and <tt>following</tt>. Since the starting position will
|
||||
* be set as it is specified, please take note that there are some dangerous
|
||||
* positions which the search may render incorrect results:
|
||||
* This search has APIs similar to that of other text iteration mechanisms
|
||||
* such as the break iterators in {@link BreakIterator}. Using these
|
||||
* APIs, it is easy to scan through text looking for all occurrences of
|
||||
* a given pattern. This search iterator allows changing of direction by
|
||||
* calling a {@link #reset} followed by a {@link #next} or {@link #previous}.
|
||||
* Though a direction change can occur without calling {@link #reset} first,
|
||||
* this operation comes with some speed penalty.
|
||||
* Match results in the forward direction will match the result matches in
|
||||
* the backwards direction in the reverse order
|
||||
* <p>
|
||||
* {@link SearchIterator} provides APIs to specify the starting position
|
||||
* within the text string to be searched, e.g. {@link SearchIterator#setIndex setIndex},
|
||||
* {@link SearchIterator#preceding preceding} and {@link SearchIterator#following following}. Since the
|
||||
* starting position will be set as it is specified, please take note that
|
||||
* there are some danger points which the search may render incorrect
|
||||
* results:
|
||||
* <ul>
|
||||
* <li> The midst of a substring that requires decomposition.
|
||||
* <li> The midst of a substring that requires normalization.
|
||||
* <li> If the following match is to be found, the position should not be the
|
||||
* second character which requires to be swapped with the preceding
|
||||
* character. Vice versa, if the preceding match is to be found,
|
||||
* position to search from should not be the first character which
|
||||
* second character which requires to be swapped with the preceding
|
||||
* character. Vice versa, if the preceding match is to be found,
|
||||
* position to search from should not be the first character which
|
||||
* requires to be swapped with the next character. E.g certain Thai and
|
||||
* Lao characters require swapping.
|
||||
* <li> If a following pattern match is to be found, any position within a
|
||||
* contracting sequence except the first will fail. Vice versa if a
|
||||
* preceding pattern match is to be found, a invalid starting point
|
||||
* <li> If a following pattern match is to be found, any position within a
|
||||
* contracting sequence except the first will fail. Vice versa if a
|
||||
* preceding pattern match is to be found, a invalid starting point
|
||||
* would be any character within a contracting sequence except the last.
|
||||
* </ul>
|
||||
* </p>
|
||||
* <p>
|
||||
* Though collator attributes will be taken into consideration while
|
||||
* performing matches, there are no APIs provided in StringSearch for setting
|
||||
* and getting the attributes. These attributes can be set by getting the
|
||||
* collator from <tt>getCollator</tt> and using the APIs in
|
||||
* <tt>com.ibm.icu.text.Collator</tt>. To update StringSearch to the new
|
||||
* collator attributes, <tt>reset()</tt> or
|
||||
* <tt>setCollator(RuleBasedCollator)</tt> has to be called.
|
||||
* </p>
|
||||
* A {@link BreakIterator} can be used if only matches at logical breaks are desired.
|
||||
* Using a {@link BreakIterator} will only give you results that exactly matches the
|
||||
* boundaries given by the {@link BreakIterator}. For instance the pattern "e" will
|
||||
* not be found in the string "\u00e9" if a character break iterator is used.
|
||||
* <p>
|
||||
* Consult the
|
||||
* <a href="http://www.icu-project.org/userguide/searchString.html">
|
||||
* String Search</a> user guide and the <code>SearchIterator</code>
|
||||
* documentation for more information and examples of use.
|
||||
* </p>
|
||||
* Options are provided to handle overlapping matches.
|
||||
* E.g. In English, overlapping matches produces the result 0 and 2
|
||||
* for the pattern "abab" in the text "ababab", where else mutually
|
||||
* exclusive matches only produce the result of 0.
|
||||
* <p>
|
||||
* This class is not subclassable
|
||||
* Though collator attributes will be taken into consideration while
|
||||
* performing matches, there are no APIs here for setting and getting the
|
||||
* attributes. These attributes can be set by getting the collator
|
||||
* from {@link #getCollator} and using the APIs in {@link RuleBasedCollator}.
|
||||
* Lastly to update <tt>StringSearch</tt> to the new collator attributes,
|
||||
* {@link #reset} has to be called.
|
||||
* <p>
|
||||
* Restriction: <br>
|
||||
* Currently there are no composite characters that consists of a
|
||||
* character with combining class > 0 before a character with combining
|
||||
* class == 0. However, if such a character exists in the future,
|
||||
* <tt>StringSearch</tt> does not guarantee the results for option 1.
|
||||
* <p>
|
||||
* Consult the {@link SearchIterator} documentation for information on
|
||||
* and examples of how to use instances of this class to implement text
|
||||
* searching.
|
||||
* <p>
|
||||
* Note, <tt>StringSearch</tt> is not to be subclassed.
|
||||
* </p>
|
||||
* @see SearchIterator
|
||||
* @see RuleBasedCollator
|
||||
* @author Laura Werner, synwee
|
||||
* @stable ICU 2.0
|
||||
* @since ICU 2.0
|
||||
*/
|
||||
// internal notes: all methods do not guarantee the correct status of the
|
||||
// characteriterator. the caller has to maintain the original index position
|
||||
|
@ -165,8 +126,9 @@ import com.ibm.icu.util.ULocale;
|
|||
public final class StringSearch extends SearchIterator {
|
||||
|
||||
/**
|
||||
* DONE is returned by previous() and next() after all valid matches have
|
||||
* been returned, and by first() and last() if there are no matches at all.
|
||||
* DONE is returned by {@link #previous()} and {@link #next()} after all valid matches have
|
||||
* been returned, and by {@link SearchIterator#first() first()} and
|
||||
* {@link SearchIterator#last() last()} if there are no matches at all.
|
||||
* @see #previous
|
||||
* @see #next
|
||||
* @stable ICU 2.0
|
||||
|
@ -198,19 +160,18 @@ public final class StringSearch extends SearchIterator {
|
|||
/**
|
||||
* Initializes the iterator to use the language-specific rules defined in
|
||||
* the argument collator to search for argument pattern in the argument
|
||||
* target text. The argument breakiter is used to define logical matches.
|
||||
* target text. The argument <code>breakiter</code> is used to define logical matches.
|
||||
* See super class documentation for more details on the use of the target
|
||||
* text and BreakIterator.
|
||||
* text and {@link BreakIterator}.
|
||||
* @param pattern text to look for.
|
||||
* @param target target text to search for pattern.
|
||||
* @param collator RuleBasedCollator that defines the language rules
|
||||
* @param collator {@link RuleBasedCollator} that defines the language rules
|
||||
* @param breakiter A {@link BreakIterator} that is used to determine the
|
||||
* boundaries of a logical match. This argument can be null.
|
||||
* @exception IllegalArgumentException thrown when argument target is null,
|
||||
* @throws IllegalArgumentException thrown when argument target is null,
|
||||
* or of length 0
|
||||
* @see BreakIterator
|
||||
* @see RuleBasedCollator
|
||||
* @see SearchIterator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator,
|
||||
|
@ -259,14 +220,13 @@ public final class StringSearch extends SearchIterator {
|
|||
/**
|
||||
* Initializes the iterator to use the language-specific rules defined in
|
||||
* the argument collator to search for argument pattern in the argument
|
||||
* target text. No BreakIterators are set to test for logical matches.
|
||||
* target text. No {@link BreakIterator}s are set to test for logical matches.
|
||||
* @param pattern text to look for.
|
||||
* @param target target text to search for pattern.
|
||||
* @param collator RuleBasedCollator that defines the language rules
|
||||
* @exception IllegalArgumentException thrown when argument target is null,
|
||||
* @param collator {@link RuleBasedCollator} that defines the language rules
|
||||
* @throws IllegalArgumentException thrown when argument target is null,
|
||||
* or of length 0
|
||||
* @see RuleBasedCollator
|
||||
* @see SearchIterator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator) {
|
||||
|
@ -277,17 +237,12 @@ public final class StringSearch extends SearchIterator {
|
|||
* Initializes the iterator to use the language-specific rules and
|
||||
* break iterator rules defined in the argument locale to search for
|
||||
* argument pattern in the argument target text.
|
||||
* See super class documentation for more details on the use of the target
|
||||
* text and BreakIterator.
|
||||
* @param pattern text to look for.
|
||||
* @param target target text to search for pattern.
|
||||
* @param locale locale to use for language and break iterator rules
|
||||
* @exception IllegalArgumentException thrown when argument target is null,
|
||||
* @throws IllegalArgumentException thrown when argument target is null,
|
||||
* or of length 0. ClassCastException thrown if the collator for
|
||||
* the specified locale is not a RuleBasedCollator.
|
||||
* @see BreakIterator
|
||||
* @see RuleBasedCollator
|
||||
* @see SearchIterator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public StringSearch(String pattern, CharacterIterator target, Locale locale) {
|
||||
|
@ -299,11 +254,11 @@ public final class StringSearch extends SearchIterator {
|
|||
* break iterator rules defined in the argument locale to search for
|
||||
* argument pattern in the argument target text.
|
||||
* See super class documentation for more details on the use of the target
|
||||
* text and BreakIterator.
|
||||
* text and {@link BreakIterator}.
|
||||
* @param pattern text to look for.
|
||||
* @param target target text to search for pattern.
|
||||
* @param locale ulocale to use for language and break iterator rules
|
||||
* @exception IllegalArgumentException thrown when argument target is null,
|
||||
* @param locale locale to use for language and break iterator rules
|
||||
* @throws IllegalArgumentException thrown when argument target is null,
|
||||
* or of length 0. ClassCastException thrown if the collator for
|
||||
* the specified locale is not a RuleBasedCollator.
|
||||
* @see BreakIterator
|
||||
|
@ -318,17 +273,12 @@ public final class StringSearch extends SearchIterator {
|
|||
/**
|
||||
* Initializes the iterator to use the language-specific rules and
|
||||
* break iterator rules defined in the default locale to search for
|
||||
* argument pattern in the argument target text.
|
||||
* See super class documentation for more details on the use of the target
|
||||
* text and BreakIterator.
|
||||
* argument pattern in the argument target text.
|
||||
* @param pattern text to look for.
|
||||
* @param target target text to search for pattern.
|
||||
* @exception IllegalArgumentException thrown when argument target is null,
|
||||
* @throws IllegalArgumentException thrown when argument target is null,
|
||||
* or of length 0. ClassCastException thrown if the collator for
|
||||
* the default locale is not a RuleBasedCollator.
|
||||
* @see BreakIterator
|
||||
* @see RuleBasedCollator
|
||||
* @see SearchIterator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
public StringSearch(String pattern, String target) {
|
||||
|
@ -337,17 +287,14 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* Gets the {@link RuleBasedCollator} used for the language rules.
|
||||
* <p>
|
||||
* Gets the RuleBasedCollator used for the language rules.
|
||||
* Since <tt>StringSearch</tt> depends on the returned {@link RuleBasedCollator}, any
|
||||
* changes to the {@link RuleBasedCollator} result should follow with a call to
|
||||
* either {@link #reset()} or {@link #setCollator(RuleBasedCollator)} to ensure the correct
|
||||
* search behavior.
|
||||
* </p>
|
||||
* <p>
|
||||
* Since StringSearch depends on the returned RuleBasedCollator, any
|
||||
* changes to the RuleBasedCollator result should follow with a call to
|
||||
* either StringSearch.reset() or
|
||||
* StringSearch.setCollator(RuleBasedCollator) to ensure the correct
|
||||
* search behaviour.
|
||||
* </p>
|
||||
* @return RuleBasedCollator used by this StringSearch
|
||||
* @return {@link RuleBasedCollator} used by this <tt>StringSearch</tt>
|
||||
* @see RuleBasedCollator
|
||||
* @see #setCollator
|
||||
* @stable ICU 2.0
|
||||
|
@ -357,15 +304,11 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* Sets the {@link RuleBasedCollator} to be used for language-specific searching.
|
||||
* <p>
|
||||
* Sets the RuleBasedCollator to be used for language-specific searching.
|
||||
* </p>
|
||||
* <p>
|
||||
* This method causes internal data such as Boyer-Moore shift tables
|
||||
* to be recalculated, but the iterator's position is unchanged.
|
||||
* </p>
|
||||
* @param collator to use for this StringSearch
|
||||
* @exception IllegalArgumentException thrown when collator is null
|
||||
* The iterator's position will not be changed by this method.
|
||||
* @param collator to use for this <tt>StringSearch</tt>
|
||||
* @throws IllegalArgumentException thrown when collator is null
|
||||
* @see #getCollator
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
|
@ -390,7 +333,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* Returns the pattern for which StringSearch is searching for.
|
||||
* Returns the pattern for which <tt>StringSearch</tt> is searching for.
|
||||
* @return the pattern searched for
|
||||
* @stable ICU 2.0
|
||||
*/
|
||||
|
@ -399,13 +342,8 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Set the pattern to search for.
|
||||
* </p>
|
||||
* <p>
|
||||
* This method causes internal data such as Boyer-Moore shift tables
|
||||
* to be recalculated, but the iterator's position is unchanged.
|
||||
* </p>
|
||||
* The iterator's position will not be changed by this method.
|
||||
* @param pattern for searching
|
||||
* @see #getPattern
|
||||
* @exception IllegalArgumentException thrown if pattern is null or of
|
||||
|
@ -435,10 +373,8 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Set the canonical match mode. See class documentation for details.
|
||||
* The default setting for this property is false.
|
||||
* </p>
|
||||
* @param allowCanonical flag indicator if canonical matches are allowed
|
||||
* @see #isCanonical
|
||||
* @stable ICU 2.8
|
||||
|
@ -449,13 +385,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* Set the target text to be searched. Text iteration will hence begin at
|
||||
* the start of the text string. This method is useful if you want to
|
||||
* re-use an iterator to search within a different body of text.
|
||||
* @param text new text iterator to look for match,
|
||||
* @exception IllegalArgumentException thrown when text is null or has
|
||||
* 0 length
|
||||
* @see #getTarget
|
||||
* {@inheritDoc}
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
@Override
|
||||
|
@ -465,12 +395,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* Return the index in the target text where the iterator is currently
|
||||
* positioned at.
|
||||
* If the iteration has gone past the end of the target text or past
|
||||
* the beginning for a backwards search, {@link #DONE} is returned.
|
||||
* @return index in the target text where the iterator is currently
|
||||
* positioned at
|
||||
* {@inheritDoc}
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
@Override
|
||||
|
@ -483,23 +408,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Sets the position in the target text which the next search will start
|
||||
* from to the argument. This method clears all previous states.
|
||||
* </p>
|
||||
* <p>
|
||||
* This method takes the argument position and sets the position in the
|
||||
* target text accordingly, without checking if position is pointing to a
|
||||
* valid starting point to begin searching.
|
||||
* </p>
|
||||
* <p>
|
||||
* Search positions that may render incorrect results are highlighted in
|
||||
* the class documentation.
|
||||
* </p>
|
||||
* @param position index to start next search from.
|
||||
* @exception IndexOutOfBoundsException thrown if argument position is out
|
||||
* of the target text range.
|
||||
* @see #getIndex
|
||||
* {@inheritDoc}
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
@Override
|
||||
|
@ -513,19 +422,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Resets the search iteration. All properties will be reset to the
|
||||
* default value.
|
||||
* </p>
|
||||
* <p>
|
||||
* Search will begin at the start of the target text if a forward iteration
|
||||
* is initiated before a backwards iteration. Otherwise if a
|
||||
* backwards iteration is initiated before a forwards iteration, the search
|
||||
* will begin at the end of the target text.
|
||||
* </p>
|
||||
* <p>
|
||||
* Canonical match option will be reset to false, ie an exact match.
|
||||
* </p>
|
||||
* {@inheritDoc}
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
@Override
|
||||
|
@ -581,17 +478,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Concrete method to provide the mechanism
|
||||
* for finding the next <b>forwards</b> match in the target text.
|
||||
* See super class documentation for its use.
|
||||
* </p>
|
||||
* @param position index in the target text at which the forwards search
|
||||
* should begin.
|
||||
* @return the starting index of the next forwards match if found, DONE
|
||||
* otherwise
|
||||
* @see #handlePrevious(int)
|
||||
* @see #DONE
|
||||
* {@inheritDoc}
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
@Override
|
||||
|
@ -641,17 +528,7 @@ public final class StringSearch extends SearchIterator {
|
|||
}
|
||||
|
||||
/**
|
||||
* <p>
|
||||
* Concrete method to provide the mechanism
|
||||
* for finding the next <b>backwards</b> match in the target text.
|
||||
* See super class documentation for its use.
|
||||
* </p>
|
||||
* @param position index in the target text at which the backwards search
|
||||
* should begin.
|
||||
* @return the starting index of the next backwards match if found, DONE
|
||||
* otherwise
|
||||
* @see #handleNext(int)
|
||||
* @see #DONE
|
||||
* {@inheritDoc}
|
||||
* @stable ICU 2.8
|
||||
*/
|
||||
@Override
|
||||
|
|
Loading…
Add table
Reference in a new issue