Creating a custom collating sequence file

This section explains how to create a custom collating sequence file. Before you begin, please read this entire section and familiarize yourself with the collating sequence files included with your Open Client and Open Server products.

“Collating sequence file example” illustrates a collating sequence file.

Appendix B, “External Localization File Syntax” provides general information about localization file syntax.

To create or change a collating sequence file:

  1. If you plan to use a shipped .srt file as a model, be sure to copy and rename it so you do not overwrite the original file. The new file’s name must include the .srt suffix. In addition, a descriptive name helps to associate the file with the language it supports.

  2. Determine the values for general entries. Table 6-4 describes these entries:

    Table 6-4: .srt file general entries

    Entry keyword

    Description

    Required

    Entry value

    class

    The sort order class.

    Currently, class 1 for 8-bit character sets is the only supported class.

    Yes

    0x01d

    id

    A unique hexadecimal number that identifies the collating sequence.

    Yes

    For user-defined collating sequences, ID must have a value of 0xC9 through 0xFF.

    Sybase reserves hexadecimal 0x00 through 0xC8.

    menuname

    The name of the collating sequence as it is to appear in the sybinit program.

    Yes

    A string no longer than 64 characters is recommended. sybinit truncates strings to 64 characters.

    This value is user-defined.

    name

    The name of the collating sequence.

    No

    A string no longer than 30 characters.

    This value is user-defined.

    charset

    The character set with which this collating sequence file is intended for use.

    This is also the name of the directory in which this collating sequence file will reside.

    Yes

    The value must match a character set subdirectory name in the Sybase directory tree.

    preference

    For sort orders that are not case sensitive, whether to give preference to characters to the left of the equals sign when sorting output generated by a select statement with an order by clause.

    No

    False – no preference.

    True – preference for characters to the left of the equals sign. A value of “true” has a greater performance impact than “false.”

    The default is “true.”

    description

    Phrase that describes the collating sequence. Stored with the collating sequence.

    No

    A string no longer than 255 characters.

    This value is user-defined.

  3. Determine whether there are any ligatures. A ligature is a single character that is sorted as multiple characters. If there are ligatures:

    • Place the ligature (“lig”) entries together, preceding the “char” entries.

    • Include both the uppercase and lowercase forms of a ligature, if applicable.

    The syntax for a case-sensitive ligature is:

    lig = value, after characters ;case-sensitive sort
    

    where:

    • characters is a string representing the characters after which the ligature will sort.

    • value is the hexadecimal encoding for the ligature character, or the typed or quoted ligature character.

    The syntax for a ligature that is not case sensitive:

    lig = value1=value2, after characters ;case-insensitive sort
    

    where:

    • value1 and value2 are the hexadecimal encodings for the uppercase and lowercase ligature characters, or the typed or quoted ligature characters.

    • characters is a string representing the characters after which the ligature will sort.

    The following example shows ligature entries in a collating sequence file that is not case sensitive for ISO 8859-1:

    lig = 0xC6, after AE ;diphthong AE, A with E
    
    lig = 0xE6, after ae ;diphthong ae, a with e
    
    char = 0x41,0x61,0xC0,0xE0,0xC1,0xE1,0xC2,0xE2x
    
    ;varieties of letter A
    
    char = 0x42,0x62 ;B, b
    
    . . .
    
  4. Vertically list all the character entries for the sort order. This vertical list is the primary sort order.

    The syntax for a character entry is:

    char = value
    

    where value is the hexadecimal code set encoding for the character, or the typed or quoted character.

    For example:

    char = 0x41					;ISO 8859-1 code set.
    
  5. If applicable, add secondary sort order information to the file as follows:

    • For a case-sensitive sort order, put the lowercase variant to the right of the uppercase character (if you want the uppercase character to take precedence). Separate the characters with the list separator character.

    • For a sort order that is not case sensitive, put equal signs between each uppercase character and its lowercase equivalent (including accented characters).

    • Put a character and its variants in relative order to each other. For example, the French “é” goes to the right of “e.” Make sure these characters are not ligatures or separate primary sort order entries. Separate variants with the list separator character.

    The following example shows secondary sort order information for a Latin alphabet, case-sensitive sort order:

    char = 0x41,0x61,0xC0,0xE0,0xC1,0xE1,0xC2,0xE2,
    
    0xC3,0xE3,0xC4,0xE4,0xC5,0xE5
    
    ;A, a, A-grave, a-grave, A-acute, a-acute, 
    
    ;A-circumflex, a-circumflex, A-tilde, a-tilde,
    
    ;A-diaeresis, a-diaeresis, A-ring, a-ring
    
    . . .
    
    char = 0x4E,0x6E,0xD1,0xF1 ;N, n, N-tilde, n-tilde
    
    . . .
    
  6. Determine whether there are any sort doubles. A sort double or digraph is a pair of characters that is sorted as a single character. If there are any sort doubles:

    • List each sort double as a separate “char” entry.

    • For case-sensitive sorting, put all permutations of the sort double in the desired sort order.

    The syntax for a sort double is:

    char = value1value2
    

    where:

    • value1 is the first character in the sort double pair,

    • value2 is the second character in the pair.

    If value1 and value2 are written as hexadecimal numbers, use a leading ‘0x’ with value1 but not with value2. For example:

    char = 0x4348,0x4368,0x6348,0x6368 ;CH,Ch,cH,ch
    

    value1 and value2 can also be typed or quoted characters. For example:

    char = CH, Ch, cH, ch
    

    or

    char = "CH", "Ch", "cH", "ch"
    

    The following example shows the placement of the Spanish sort double “ch” in a case-sensitive .srt file for the iso_1 (ISO 8859-1) character set:

    char = 0x41,0x61,0xC0,0xE0,0xC1,0xE1,0xC2,0xE2
    
    ;varieties of letter A
    
    char = 0x42,0x62 ;B, b
    
    char = 0x44,0x64,0xC7,0xE7 ;C, c, C-cedilla, c-cedilla
    
    char = 0x4348,0x4368,0x6348,0x6368 ;CH,Ch,cH,ch
    
    . . .
    
  7. Include all other characters in the vertical list, such as non-printable characters, characters not on a keyboard, symbols, and characters related to linguistic style. Use “char” or “lig” entries, as appropriate. Be sure to group all “lig” entries together before “char” entries.

    For information on how to write nonalphabetic characters in a collating sequence file, see Table 6-3.

  8. Save the new .srt file in the charsets directory under the charset_name subdirectory.

  9. Edit locales file entries, as appropriate, to refer to the new collating sequence file. For more information, see Chapter 5, “Editing the Locales File.”