Unicode

Unicode is the first character set that enables all the world's languages to be encoded in the same data set. Prior to the introduction of Unicode, if you wanted to store data in, for example, Chinese, you had to choose a character set appropriate for that language—to the exclusion of most other languages. It was either impossible or impractical to mix character sets, and thus diverse languages, in the same data set.

In Adaptive Server version 12.5, Sybase supported Unicode in the form of two new datatypes: unichar and univarchar. These datatypes store data in the UTF-16 encoding of Unicode.

UTF-16 is an encoding wherein Unicode scalar values are represented by a single 16-bit value (or, in rare cases, as a pair of 16-bit values). The two encodings are equivalent insofar as either encoding can be used to represent any Unicode character. The choice of UTF-16 datatypes, rather than a UTF-16 server default character set, was intended to promote easy, step-wise migration for existing database applications.

Adaptive Server version 12.5.1 supports Unicode literals in SQL queries and a wide range of sort orders for UTF-8.

The character set model used by Adaptive Server is based on a single, configurable, server-wide character set. All data stored in Adaptive Server, using any of the “character” datatypes (char, varchar, nchar, nvarchar, and text), is interpreted as being in this character set. Sort orders are defined using this character set, as are language modules—collections of server messages translated into local languages.

During the connection dialog, a client application declares its native character set and language. If properly configured, the server thereafter attempts to convert any character data between its own character set and that of the client (character data includes any data stored in the database, as well as server messages in the client’s native language).This works well as long as the server’s and client’s character sets are compatible. It does not work well when characters are not defined in the other character set, as would be the case for the character sets SJIS, used for Japanese, and KOI8, used for Russian and other Cyrillic languages. Such incompatibilities are the reason for Unicode, which can be thought of as a character superset, including definitions for characters in all other character sets.

In Adaptive Server version 12.5, the character set model was not modified. Rather, a separate mechanism was added whereby data could be stored and manipulated in Unicode. The new Unicode datatypes unichar and univarchar are completely independent of the traditional character set model. Clients send and receive Unicode data independently of whatever other character data they send and receive.