The default Unicode sort order is distinctly different from the sort order for the server’s default character set. This separate configuration parameter is a static parameter that requires that you restart your server and reindex the unichar data if it is changed. This sort order is identified using a string parameter, rather than a numeric parameter, to guarantee that the sort order is unique.
Table 9-6 lists the available default Unicode sort orders.
Name |
ID |
Description |
---|---|---|
defaultml |
20 |
Default Unicode multi-lingual ordering |
thaidict |
21 |
Thai dictionary ordering |
iso14651 |
22 |
Ordering as per ISO14651 standard |
utf8bin |
24 |
Ordering for UTF-16 that matches the UTF-8 binary |
binary |
25 |
Binary sort |
altnoacc |
39 |
Alternate accent-insensitive |
altdict |
45 |
Alternate dictionary ordering |
altnocsp |
46 |
Alternate case-insensitive with preference |
scandict |
47 |
Scandinavian dictionary ordering |
scannocp |
48 |
Scandinavian case-insensitive with preference |
bin_utf8 |
50 |
UTF-8 binary sort order |
dict |
51 |
General-purpose dictionary ordering |
nocase |
52 |
General-purpose case-insensitive dictionary ordering |
nocasep |
53 |
General-purpose case-insensitive with preference |
noaccent |
54 |
General-purpose accent-insensitive dictionary ordering |
espdict |
55 |
Spanish dictionary ordering |
espnocs |
56 |
Spanish case-insensitive dictionary ordering |
espnoac |
57 |
Spanish accent-insensitive dictionary ordering |
rusnocs |
59 |
Russian case-insensitive dictionary ordering |
cyrnocs |
64 |
Cyrillic case-insensitive dictionary ordering |
elldict |
65 |
Greek dictionary ordering |
hundict |
69 |
Hungarian dictionary ordering |
hunnoac |
70 |
Hungarian accent-insensitive dictionary ordering |
hunnocs |
71 |
Hungarian case-insensitive dictionary ordering |
turknoac |
73 |
Turkish accent-insensitive dictionary ordering |
Table 9-7 lists the loadable sort orders.
Name |
ID |
Description |
---|---|---|
cp932bin |
129 |
Ordering that matches the binary ordering of CP932 |
dynix |
130 |
Chinese phonetic ordering |
gb3213bn |
137 |
Ordering that matches the binary ordering of GB2312 |
cyrdict |
140 |
Common cyrillic dictionary ordering |
turdict |
155 |
Turkish Dictionary ordering |
euckscbn |
161 |
Ordering that matches the binary ordering of EUCKSC |
gbpinyin |
163 |
Chinese phonetic ordering |
rusdict |
165 |
Russian dictionary ordering |
sjisbin |
179 |
Ordering that matches the binary ordering of SJIS |
eucjisbn |
192 |
Ordering that matches the binary ordering of EUCJIS |
big5bin |
194 |
Ordering that matches the binary ordering of BIG5 |
To view this sort order list in Adaptive Server, use sp_helpsort. See the Reference Manual: Procedures.
You can add sort orders using external files in the $SYBASE/collate/Unicode directory. The names and collation IDs are stored in syscharsets. The names of external Unicode sort orders do not have to be in syscharsets before you can set the default Unicode sort order.
External Unicode sort orders are provided by Sybase. Do not attempt to create external Unicode sort orders.
Sort order associated with Unicode data is completely independent of the sort order associated with traditional character data. All relational expressions involving the Unicode datatypes are performed using the Unicode sort order. This includes mixed-mode expressions involving Unicode and non-Unicode data. For example, in the following query the varchar character constant ‘Mü’ is implicitly cast to unichar and the comparison is performed according to the Unicode sort order:
select * from authors where unicode_name > 'Mü'
The same holds true for all other comparison operators, as well as the concatenation operator “+”, the operator “in”, and the operator “between.” Once again, the goal is to retain compatibility with existing database applications.
Tables joins based on equality (equijoins) deserve special mention. These are generally optimized by the server to take advantage of indexes that defined on the participating columns. When a unichar column is joined with a char column, the latter requires a conversion, and since the character sort order and the Unicode sort order are distinct, the optimizer will ignore the index on the char column.
In Adaptive Server version 12.5.1 and later, when the server’s default character set is configured to UTF-8, you can configure the server's default sort order (for char data) to be any of the above sort orders. Prior to this version, the binary sort order “bin_utf8” (ID=50) was the only well-behaved sort order for UTF-8. Although not required, the sort order for char data in UTF-8 can be selected so that it corresponds with the sort order for unichar.
There is a potential confusion regarding choice of binary sort orders for Unicode. The sort order named “binary” is the most efficient one for unichar data (UTF-16), and is thus the default. This order is based on the Unicode scalar value, meaning that all 32-bit surrogate pairs are placed after all 16-bit Unicode values. The sort order named “utf8bin” is designed to match the order of the default (most efficient) binary order for UTF-8 char data, namely “bin_utf8”. The recommended matching combinations are thus “binary” for unichar and “binary” for UTF-8 char, or “utf8bin” for unichar and “bin_utf8” for UTF-8 char. The former favors unichar efficiency, while the latter favors char efficiency. Avoid using “utf8bin” for UTF-8 char, since it is equivalent to “bin_utf8” but less efficient.