In your server, all data is encoded in a special code. For example, the letter “a” is encoded as “97” in decimal. A character set is a specific collection of characters (including alphabetic and numeric characters, symbols, and nonprinting control characters) and their assigned numerical values, or codes.
A character set generally contains the characters for an alphabet, for example, the Latin alphabet used in the English language, or a script such as Cyrillic used with languages such as Russian, Serbian, and Bulgarian. Character sets that are platform-specific and support a subset of languages, for example, the Western European languages, are called native or national character sets. All character sets that come with SAP ASE, except for Unicode UTF-8, are native character sets.
A script is a writing system, a collection of all the elements that characterize the written form of a human language—for example, Latin, Japanese, or Arabic. Depending on the languages supported by an alphabet or script, a character set can support one or more languages. For example, the Latin alphabet supports the languages of Western Europe (see table below). On the other hand, the Japanese script supports only one language, Japanese. Therefore, the Group 1 character sets support multiple languages, while many character sets, such as those in Group 101, support only one language.
The language or languages that are covered by a character set is called a language group. A language group can contain many languages or only one language; a native character set is the platform-specific encoding of the characters for the language or languages of a particular language group.
Within a client/server network, you can support data processing in multiple languages if all the languages belong to the same language group (see the table below). For example, if data in the server is encoded in a Group 1 character set, you could have French, German, and Italian data and any of the other Group 1 languages in the same database. However, you cannot store data from another language group in the same database. For example, you cannot store Japanese data with French or German data.
Unlike the native character sets just described, Unicode is an international character set that supports over 650 of the world’s languages, such as Japanese, Chinese, Russian, French, and German. Unicode allows you to mix different languages from different language groups in the same server, no matter what the platform.
Since all character sets support the Latin script, and therefore English, a character set always supports at least two languages—English and one other language.
Many languages are supported by more than one character set. The character set you install for a language depends on the client’s platform and operating system.
Language group |
Languages |
Character sets |
---|---|---|
Group 1 |
Western European: Albanian, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, Swedish |
ASCII 8, CP 437, CP 850, CP 860, CP 863, CP 1252 , ISO 8859-1, ISO 8859-15, Macintosh Roman, ROMAN8, ROMAN9, ISO-15, CP 858 CP 1252 is identical to ISO 8859-1 except for the 0x80–0x9F code points which are mapped to characters in CP 1252. |
Group 2 |
Eastern European: Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Romanian, Slovak, Slovene (and English) |
CP 852, CP 1250, ISO 8859-2, Macintosh Central European |
Group 4 |
Baltic (and English) |
CP 1257 |
Group 5 |
Cyrillic: Bulgarian, Byelorussian, Macedonian, Russian, Serbian, Ukrainian (and English) |
CP 855, CP 866, CP 1251, ISO 8859-5, Koi8, Macintosh Cyrillic |
Group 6 |
Arabic (and English) |
CP 864, CP 1256, ISO 8859-6 |
Group 7 |
Greek (and English) |
CP 869, CP 1253, GREEK8, ISO 8859-7, Macintosh Greek |
Group 8 |
Hebrew (and English) |
CP 1255, ISO 8859-8 |
Group 9 |
Turkish (and English) |
CP 857, CP 1254, ISO 8859-9, Macintosh Turkish, TURKISH8 |
Group 101 |
Japanese (and English) |
CP 932 DEC Kanji, EUC-JIS, Shift-JIS |
Group 102 |
Simplified Chinese (PRC) (and English) |
CP 936, EUC-GB, GB18030 |
Group 103 |
Traditional Chinese (ROC) (and English) |
Big 5, CP 950, EUC-CNS, Big 5 HKSCS CP 950 is identical to Big 5. |
Group 104 |
Korean (and English) |
EUC-KSC, cp949 |
Group 105 |
Thai (and English) |
CP 874, TIS 620 |
Group 106 |
Vietnamese (and English) |
CP 1258 |
Unicode |
Over 650 languages |
UTF-8 |
Traditional Chinese on the Windows and Solaris platforms
Arabic, Hebrew, Thai, and Russian on the Linux platform
To mix languages from different language groups you must use Unicode. If your server character set is Unicode, you can support more than 650 languages in a single server and mix languages from any language group.