Selecting the default Unicode sort order

The default Unicode sort order is distinctly different from the sort order for the server’s default character set. This separate configuration parameter is a static parameter that requires that you restart your server and reindex the unichar data if it is changed. This sort order is identified using a string parameter, rather than a numeric parameter, to guarantee that the sort order is unique.

Table 7-6 lists the available default Unicode sort orders.

Table 7-6: Default Unicode sort orders

Name

ID

Description

defaulml

20

Default Unicode multilingual ordering

thaidict

21

Thai dictionary ordering

iso14651

22

ISO 14651 ordering

utf8bin

24

Intended only for unichar (UTF-16); matches Unicode UTF-8 binary ordering

binary

25

UTF-16 binary ordering

altnoacc

39

Alternate case- and accent-insensitive dictionary ordering

altdict

45

Alternate dictionary ordering

altnocsp

46

Alternate dictionary sorting with case insensitivity and preference

scandict

47

Scandinavian dictionary ordering

scannocp

48

Scandinavian case-insensitive dictionary ordering with preference

bin_utf8

50

UTF-8 binary sort order

dict

51

English dictionary, meaning:

  • accented characters are adjacent

  • accented characters are distinct

  • uppercase precedes lowercase

nocase

52

Dictionary, case-insensitive. Limited to ASCII only.

nocasep

53

Dictionary, case-insensitive, except in order by, where uppercase precedes lowercase.

noaccent

54

Dictinary, case- and accent-insensitive.

espdict

55

Spanish, dictionary

espnocs

56

Spanish, case insensitive

espnoac

57

Spanish, accent insensitive

rusnocs

59

Russian, case insensitive

cyrnocs

64

Common Cyrillic, case insensitive

elldict

65

Greek, dictionary

hundict

69

Hungarian, dictionary

hunnoac

70

Hungarian, accent insensitive

hunnocs

71

Hungarian, case insensitive

turknoac

73

Turkish, accent insensitive

turknocs

74

Turkish, case insensitive

Table 7-7 lists the loadable sort orders.

Table 7-7: Loadable sort orders

Name

ID

Description

cp932bin

129

Japanese cp932

gb3213bn

137

Chinese gb2312

cyrdict

140

Cyrillic, dictionary

turdict

155

Turkish, dictionary

euckscbn

161

Korean euckcs

gbpinyin

163

Chinese gb2312 pinyin

rusdict

165

Russian, dictionary

sjisbin

179

Japanese, sjis binary

big5bin

194

Chinese b165

To view this sort order list in Adaptive Server, use the sp_helpsort system procedure. See Chapter 1, “System Procedures” in Reference Manual: Procedures for more information on sp_helpsort.

You can add sort orders using external files in the $SYBASE/collate/Unicode directory. The names and collation IDs are stored in SYSCHARSETS. The names of external Unicode sort orders do not have to be in SYSCHARSETS before you can set the default Unicode sort order.

NoteExternal Unicode sort orders are provided by Sybase. Do not attempt to create external Unicode sort orders.

It is important to note that sort order associated with Unicode data is completely independent of the sort order associated with traditional character data. All relational expressions involving the Unicode datatypes will be performed using the Unicode sort order. This includes mixed-mode expressions involving Unicode and non-Unicode data. For example, in the following query the varchar character constant ‘Mü’ will be implicitly cast to unichar and the comparison will be performed according to the Unicode sort order:

select * from authors where unicode_name > 'Mü'

The same holds true for all other comparison operators, as well as the concatenation operator “+”, the operator “in”, and the operator “between.” Once again, the goal is retain compatibility with existing database applications.

Tables joins based on equality (equi-joins) deserve special mention. These are generally optimized by the server to take advantage of indexes that defined on the participating columns. When a unichar column is joined with a char column, the latter will require a conversion, and since the character sort order and the Unicode sort order are distinct, the optimizer will ignore the index on the char column.

In Adaptive Server 12.5.1, when the server’s default character set is configured to UTF-8, you can configure the server's default sort order (for char data) to be any of the above sort orders. Prior to this version, the binary sort order “bin_utf8” (ID=50) was the only well-behaved sort order for UTF-8. Although not required, the sort order for char data in UTF-8 can be selected so that it corresponds with the sort order for unichar.

There is a potential confusion regarding choice of binary sort orders for Unicode. The sort order named “binary” is the most efficient one for unichar data (UTF-16), and is thus the default. This order is based on the Unicode scalar value, meaning that all 32-bit surrogate pairs are placed after all 16-bit Unicode values. The sort order named “utf8bin” is designed to match the order of the default (most efficient) binary order for UTF-8 char data, namely “bin_utf8”. The recommended matching combinations are thus “binary” for unichar and “binary” for UTF-8 char, or “utf8bin” for unichar and “bin_utf8” for UTF-8 char. The former favors unichar efficiency, while the latter favors char efficiency. The use of “utf8bin” for UTF-8 char is to be avoided, since it is equivalent to “bin_utf8” but less efficient.