About Unicode

Before Unicode was developed, there were many different encoding systems, many of which conflicted with each other. For example, the same number could represent different characters in different encoding systems. Unicode provides a unique number for each character in all supported written languages. For languages that can be written in several scripts, Unicode provides a unique number for each character in each supported script.

For more information about the supported languages and scripts, see the Unicode Web site.

Encoding forms

There are three Unicode encoding forms: UTF-8, UTF-16, and UTF-32. Originally UTF stood for Unicode Transformation Format. The acronym is used now in the names of these encoding forms, which map from a character set definition to the actual code units that represent the data, and to the encoding schemes, which are encoding forms with a specific byte serialization.

UTF-8 uses an unsigned byte sequence of one to four bytes to represent each Unicode character.
UTF-16 uses one or two unsigned 16-bit code units, depending on the range of the scalar value of the character, to represent each Unicode character.
UTF-32 uses a single unsigned 32-bit code unit to represent each Unicode character.

Encoding schemes

An encoding scheme specifies how the bytes in an encoding form are serialized. When you manipulate files, convert blobs and strings, and save DataWindow data in PowerBuilder, you can choose to use ANSI encoding, or one of three Unicode encoding schemes:

UTF-8 serializes a UTF-8 code unit sequence in exactly the same order as the code unit sequence itself.
UTF-16BE serializes a UTF-16 code unit sequence as a byte sequence in big-endian format.
UTF-16LE serializes a UTF-16 code unit sequence as a byte sequence in little-endian format.

UTF-8 is frequently used in Web requests and responses. The big-endian format, where the most significant value in the byte sequence is stored at the lowest storage address, is typically used on UNIX systems. The little-endian format, where the least significant value in the sequence is stored first, is used on Windows.