Glossary

case-sensitive: When applied to a collating sequence, it means that the collating sequence distinguishes between uppercase and lowercase characters.

character: A member in a set of elements that represents data in a native language, such as “e,” “ë,” “5,” or “¿.”

character set: A finite set of characters and glyphs that can include letters, ideographs, digits, symbols, and control functions. See also single-byte character set and multibyte character set.

coded character set: A character set in which each character is assigned a numeric code value. Also called a code page.

coded character set conversion: Changing the encoding of characters from one set of numeric codes to another.

When clients and servers use different character sets, coded character set conversion them to interpret data the same way.

collating sequence: The order in which a system sorts text.

digraph: See ligature.

encoding: For character sets, the unique identification of each character with a numeric code.

glyph: The graphic representation of a character. For example, the character “f” can be represented by the glyph “f” or “ƒ.”

ideograph: A character or symbol that represents an idea, such as those used in written Chinese and Japanese.

internationalization: The process of enabling an application to support multiple languages and cultural conventions. An internationalized application uses the language and cultural conventions appropriate to the geographic area in which it is running.

ligature: A single character that is sorted as multiple characters. For example, “→” is sorted as “AE,” and “β,” sorted as “ss.”

locale: 1. A specific geographic or national language region. 2. A collection of information related to a specific geographic or national language region.

locales file: A Sybase-specific file that maps locale names to languages, character sets, and collating sequences. Open Client and Open Server products examine the locales file when loading localization information.

locales structure (CS_LOCALE): A CS-Library structure that is used to define custom localization values in Client-Library and Server-Library applications. The CS-Library routines cs_loc_alloc and cs_loc_drop allocate and drop a locale structure. The CS-Library routine cs_locale loads a locale structure with information.

localization: The process of setting up an application to execute using a specific language and related cultural conventions.

multibyte character set: A character set that includes characters that are encoded using more than one byte, such as EUC JIS and Shift-JIS. A multibyte character set can include characters of varying widths.

single-byte character set: A character set in which all characters are encoded using a single byte.

sort double: In a collating sequence, a pair of characters that is sorted as a single character. For example, “ch” in Spanish.

sort order: See collating sequence.

Unicode: A universal, 16-bit encoded character set, defined by the Unicode Standard. Unicode version 1.1 is code-for-code identical to ISO 10646, the international standard universal character set.

UTF-8: An encoding that is the UCS Transformation Format, 8-bit form. It uses multibyte characters up to 4 bytes long.

UTF-16: An encoding that is the UCS Transformation Format, 16-bit form. In UTF-16, each UCS-2 code value represents itself, where all of the characters currently defined are 2 bytes long. Code values beyond the BMP (Basic Multilingual Plane: 0..0xFFFF) are represented using pairs of special codes called surrogate pairs.