java.lang.Object java.text.Collator java.text.RuleBasedCollator
RuleBasedCollatorclass is a concrete subclass of
Collatorthat provides a simple, data-driven, table collator. With this class you can create a customized table-based
RuleBasedCollatormaps characters to sort keys.
RuleBasedCollator has the following restrictions
for efficiency (other subclasses may be used for more complex languages) :
The collation table is composed of a list of collation rules, where each rule is of one of three forms:
<modifier> <relation> <text-argument> <reset> <text-argument>The definitions of the rule elements is as follows:
b cis treated as
'@' : Indicates that accents are sorted backwards, as in French.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:a < b < c a < b & b < c a < c & a < b
Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a,A < b,B ... &ae;\u00e3&AE;\u00c3"). [\u00e3 and \u00c3 are, of course, the escape sequences for a-umlaut.]a < b & a < c a < c & a < b
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
RuleBasedCollator automatically processes its rule table to
include both pre-composed and combining-character versions of
accented characters. Even if the provided rule string contains only
base characters and separate combining accent characters, the pre-composed
accented characters matching all canonical combinations of characters from
the rule string will be entered in the table.
This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. There are two caveats, however. First, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION or FULL_DECOMPOSITION to enable sorting of combining sequences. Second, if the strings contain characters with compatibility decompositions (such as full-width and half-width forms), you must use FULL_DECOMPOSITION, since the rule tables only include canonical mappings.
The following are errors:
Simple: "< a < b < c < d"
Norwegian: "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I < j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R < s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z < \u00E6, \u00C6 < \u00F8, \u00D8 < \u00E5 = a\u030A, \u00C5 = A\u030A; aa, AA"
To create a
RuleBasedCollator object with specialized
rules tailored to your needs, you construct the
with the rules contained in a
String object. For example:
Or:String simple = "< a< b< c< d"; RuleBasedCollator mySimple = new RuleBasedCollator(simple);
String Norwegian = "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I" + "< j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R" + "< s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z" + "< \u00E6, \u00C6" + // Latin letter ae & AE "< \u00F8, \u00D8" + // Latin letter o & O with stroke "< \u00E5 = a\u030A," + // Latin letter a with ring above " \u00C5 = A\u030A;" + // Latin letter A with ring above " aa, AA"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
A new collation rules string can be created by concatenating rules
strings. For example, the rules returned by #getRules() could
be concatenated to combine multiple
The following example demonstrates how to change the order of non-spacing accents,
// old rule String oldRules = "=\u0301;\u0300;\u0302;\u0308" // main accents + ";\u0327;\u0303;\u0304;\u0305" // main accents + ";\u0306;\u0307;\u0309;\u030A" // main accents + ";\u030B;\u030C;\u030D;\u030E" // main accents + ";\u030F;\u0310;\u0311;\u0312" // main accents + "< a , A ; ae, AE ; \u00e6 , \u00c6" + "< b , B < c, C < e, E & C < d, D"; // change the order of accent characters String addOn = "& \u0300 ; \u0308 ; \u0302"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
|static final int||CHARINDEX|
|static final int||EXPANDCHARINDEX|
|static final int||CONTRACTCHARINDEX|
|static final int||UNMAPPED|
|Fields inherited from java.text.Collator:|
|PRIMARY, SECONDARY, TERTIARY, IDENTICAL, NO_DECOMPOSITION, CANONICAL_DECOMPOSITION, FULL_DECOMPOSITION, LESS, EQUAL, GREATER|
public RuleBasedCollator(String rules) throws ParseException
RuleBasedCollator(String rules, int decomp) throws ParseException
|Method from java.text.RuleBasedCollator Summary:|
|clone, compare, equals, getCollationElementIterator, getCollationElementIterator, getCollationKey, getRules, getTables, hashCode|
|Methods from java.text.Collator:|
|clone, compare, compare, equals, equals, getAvailableLocales, getCollationKey, getDecomposition, getInstance, getInstance, getStrength, hashCode, setDecomposition, setStrength|
|Methods from java.lang.Object:|
|clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait|
|Method from java.text.RuleBasedCollator Detail:|
public CollationElementIterator getCollationElementIterator(String source)
public synchronized CollationKey getCollationKey(String source)
public String getRules()
public int hashCode()