Selecting the Character Set for Your Server

In your server, all data is encoded in a special code. For example, the letter “a” is encoded as “97” in decimal. A character set is a specific collection of characters (including alphabetic and numeric characters, symbols, and nonprinting control characters) and their assigned numerical values, or codes.

A character set generally contains the characters for an alphabet, for example, the Latin alphabet used in the English language, or a script such as Cyrillic used with languages such as Russian, Serbian, and Bulgarian. Character sets that are platform-specific and support a subset of languages, for example, the Western European languages, are called native or national character sets. All character sets that come with SAP ASE, except for Unicode UTF-8, are native character sets.

A script is a writing system, a collection of all the elements that characterize the written form of a human language—for example, Latin, Japanese, or Arabic. Depending on the languages supported by an alphabet or script, a character set can support one or more languages. For example, the Latin alphabet supports the languages of Western Europe (see table below). On the other hand, the Japanese script supports only one language, Japanese. Therefore, the Group 1 character sets support multiple languages, while many character sets, such as those in Group 101, support only one language.

The language or languages that are covered by a character set is called a language group. A language group can contain many languages or only one language; a native character set is the platform-specific encoding of the characters for the language or languages of a particular language group.

Within a client/server network, you can support data processing in multiple languages if all the languages belong to the same language group (see the table below). For example, if data in the server is encoded in a Group 1 character set, you could have French, German, and Italian data and any of the other Group 1 languages in the same database. However, you cannot store data from another language group in the same database. For example, you cannot store Japanese data with French or German data.

Unlike the native character sets just described, Unicode is an international character set that supports over 650 of the world’s languages, such as Japanese, Chinese, Russian, French, and German. Unicode allows you to mix different languages from different language groups in the same server, no matter what the platform.

Since all character sets support the Latin script, and therefore English, a character set always supports at least two languages—English and one other language.

Many languages are supported by more than one character set. The character set you install for a language depends on the client’s platform and operating system.

Supported languages and character sets

Language group

Languages

Character sets

Group 1

Western European: Albanian, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, Swedish

ASCII 8, CP 437, CP 850, CP 860, CP 863, CP 1252 , ISO 8859-1, ISO 8859-15, Macintosh Roman, ROMAN8, ROMAN9, ISO-15, CP 858

CP 1252 is identical to ISO 8859-1 except for the 0x80–0x9F code points which are mapped to characters in CP 1252.

Group 2

Eastern European: Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Romanian, Slovak, Slovene (and English)

CP 852, CP 1250, ISO 8859-2, Macintosh Central European

Group 4

Baltic (and English)

CP 1257

Group 5

Cyrillic: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian (and English)

CP 855, CP 866, CP 1251, ISO 8859-5, Koi8, Macintosh Cyrillic

Group 6

Arabic (and English)

CP 864, CP 1256, ISO 8859-6

Group 7

Greek (and English)

CP 869, CP 1253, GREEK8, ISO 8859-7, Macintosh Greek

Group 8

Hebrew (and English)

CP 1255, ISO 8859-8

Group 9

Turkish (and English)

CP 857, CP 1254, ISO 8859-9, Macintosh Turkish, TURKISH8

Group 101

Japanese (and English)

CP 932 DEC Kanji, EUC-JIS, Shift-JIS

Group 102

Simplified Chinese (PRC) (and English)

CP 936, EUC-GB, GB18030

Group 103

Traditional Chinese (ROC) (and English)

Big 5, CP 950, EUC-CNS, Big 5 HKSCS

CP 950 is identical to Big 5.

Group 104

Korean (and English)

EUC-KSC, cp949

Group 105

Thai (and English)

CP 874, TIS 620

Group 106

Vietnamese (and English)

CP 1258

Unicode

Over 650 languages

UTF-8

Note: The English language is supported by all character sets because the first 128 (decimal) characters of any character set include the Latin alphabet (defined as “ASCll-7”). The characters beyond the first 128 differ between character sets and are used to support the characters in different native languages. For example, code points 0-127 of CP 932 and CP 874 both support English and the Latin alphabet. However, code points 128-255 support Japanese characters in CP 932 and code points 128-255 support Thai characters in CP 874.
The following character sets support the European currency symbol, the “euro”: CP 1252 (Western Europe); CP 1250 (Eastern Europe); CP 1251 (Cyrillic); CP 1256 (Arabic); CP 1253 (Greek); CP 1255 (Hebrew); CP 1254 (Turkish); CP 874 (Thai); iso15, roman9 and CP858. Unicode UTF-8 also supports:
Note: iso_1 and ISO 8859-1 are different names for the same character set.

To mix languages from different language groups you must use Unicode. If your server character set is Unicode, you can support more than 650 languages in a single server and mix languages from any language group.