ICU-21710 Remove BOYER_MOORE dead code from usearch.cpp

This commit is contained in:
Jeff Genovy 2021-08-12 13:17:52 -07:00
parent 9ddda243d7
commit 6d850be783
5 changed files with 25 additions and 2406 deletions

View file

@ -268,8 +268,8 @@ Werner's text searching article for more details
(<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>).
However, implementing collation-based search with the Boyer-Moore method
while getting correct results is very tricky,
and ICU no longer uses this method.
while getting correct results is very tricky, and ICU no longer uses this method
(as of ICU4C 4.0 and ICU4J 53).
Please see the [String Search Service](./string-search) chapter.

View file

@ -270,20 +270,24 @@ the following `StringSearch` specific considerations:
### Search Algorithm
ICU4C releases up to 3.8 used the Boyer-Moore search algorithm in the string
ICU4C (C/C++) releases up to 3.8 used the Boyer-Moore search algorithm in the string
search service. There were some known issues in these previous releases.
(See ICU tickets [ICU-5024](https://unicode-org.atlassian.net/browse/ICU-5024),
[ICU-5382](https://unicode-org.atlassian.net/browse/ICU-5382),
[ICU-5420](https://unicode-org.atlassian.net/browse/ICU-5420))
[ICU-5420](https://unicode-org.atlassian.net/browse/ICU-5420)).
In ICU4C 4.0, the string
search service was updated with the simple linear search algorithm, which
locates a match by shifting a cursor in the target text one by one, and these
issues were fixed. In ICU4C 4.0.1, the Boyer-Moore search code was reintroduced
as a separated API set as a technology preview. In a later release, this code was deleted.
In ICU4C 4.0, the string search service was updated to use a simple linear search
algorithm, which locates a match by shifting a cursor in the target text one by one,
and these issues were fixed.
The Boyer-Moore searching
algorithm is based on automata or combinatorial properties of strings and
In ICU4C 4.0.1, the Boyer-Moore search code was reintroduced as a separate API with
technology preview status. However, in ICU4C 51.1, this was removed.
(See ICU ticket [ICU-9573](https://unicode-org.atlassian.net/browse/ICU-9573)).
Similarly, in ICU4J 53 (Java) the Boyer-Moore search algorithm was replaced by the
simple linear search algorithm, ported from ICU4C. (See ICU ticket [ICU-6288](https://unicode-org.atlassian.net/browse/ICU-6288)).
The Boyer-Moore search algorithm is based on automata or combinatorial properties of strings and
pre-processes the pattern and known to be much faster than the linear search
when search pattern length is longer. According to performance evaluation
between these two implementations, the Boyer-Moore search is faster than the

View file

@ -195,7 +195,9 @@ determine whether case and accents are ignored during a search.
#### What algorithm are you using to perform the search?
StringSearch uses a version of the Boyer-Moore search algorithm that has been
As of ICU4J 53 / ICU4C 4.0, StringSearch uses a simple linear search algorithm which
locates a match by shifting a cursor in the target text one by one. Previous
versions of ICU used a version of the Boyer-Moore search algorithm which was
modified for use with Unicode. Rather than using raw Unicode character values in
its comparisons and shift tables, the algorithm uses collation elements that
have been "hashed" down to a smaller range to make the tables a reasonable size.

View file

@ -35,8 +35,9 @@
* See the <a href="http://source.icu-project.org/repos/icu/icuhtml/trunk/design/collation/ICU_collation_design.htm">
* "ICU Collation Design Document"</a> for more information.
* <p>
* The implementation may use a linear search or a modified form of the Boyer-Moore
* search; for more information on the latter see
* As of ICU4C 4.0 / ICU4J 53, the implementation uses a linear search. In previous versions,
* a modified form of the Boyer-Moore searching algorithm was used. For more information
* on the modified Boyer-Moore algorithm see
* <a href="http://icu-project.org/docs/papers/efficient_text_searching_in_java.html">
* "Efficient Text Searching in Java"</a>, published in <i>Java Report</i>
* in February, 1999.
@ -595,8 +596,8 @@ U_CAPI UCollator * U_EXPORT2 usearch_getCollator(
/**
* Sets the collator used for the language rules. User retains the ownership
* of this collator, thus the responsibility of deletion lies with the user.
* This method causes internal data such as Boyer-Moore shift tables to
* be recalculated, but the iterator's position is unchanged.
* This method causes internal data such as the pattern collation elements
* and shift tables to be recalculated, but the iterator's position is unchanged.
* @param strsrch search iterator data struct
* @param collator to be used
* @param status for errors if it occurs
@ -608,7 +609,7 @@ U_CAPI void U_EXPORT2 usearch_setCollator( UStringSearch *strsrch,
/**
* Sets the pattern used for matching.
* Internal data like the Boyer Moore table will be recalculated, but the
* Internal data like the pattern collation elements will be recalculated, but the
* iterator's position is unchanged.
*
* The UStringSearch retains a pointer to the pattern string. The caller must not

File diff suppressed because it is too large Load diff