Datatypes used with Japanese characters

The following datatypes can be used with Japanese characters at the workstation:

The following datatypes can be used with Japanese characters at the mainframe:


Kanji datatypes

Kanji characters always occupy 2 bytes.


Hankaku Katakana datatypes

Hankaku katakana characters are always represented as single-byte character-type data with datatypes of TDSCHAR or TDSVARYCHAR.


Kanji string lengths

Kanji characters are represented as character-type data at the workstation, and as either character-type or graphic-type data at the mainframe. The length of a Japanese character string depends on which workstation is being used and whether the datatype is graphic or character.

Some character sets use a special indicator or code in character-type strings to announce that the following series of characters are double-byte characters. With kanji, this indicator is called a Shift Out (SO) code. An SO code marks the beginning of a double-byte kanji string. The end of the kanji string is marked by a Shift In (SI) code.

When setting field lengths for Japanese character strings, you must include room for these SO/SI codes.

When sending data from a mainframe to a workstation, you can replace SO/SI codes with blanks by calling the Gateway-Library function TDSETSOI before receiving or sending data.

Graphic datatypes do not use SO/SI codes.

WARNING! When receiving data from a workstation character set that does not use SO/SI codes, IBM_Kanji always inserts the SO/SI codes at the beginning and end of double-byte character strings. If the field length specification does not take this into account, and the length is just long enough for the data itself, some of the data is lost. If a field contains mixed single-byte and double-byte data in more than one kanji string, an SO/SI pair exists for each kanji string.

At the mainframe, the length of graphic-type strings is counted in double-byte (16-bit) characters. Thus, a string of 10 kanji characters has a length of 10.

At the workstation, the length of kanji character strings is counted in bytes. Thus, a string of 10 kanji characters has a length of 20.


Hankaku Katakana string lengths

The length of a hankaku katakana string is always represented in bytes, at both the workstation and the mainframe. A hankaku katakana character occupies one byte, except in eucjis.

The eucjis hankaku katakana character set uses an indicator (SS2) in character-type strings to announce that the next byte is occupied by a hankaku katakana. The SS2 indicator occupies one byte, and the hankaku katakana itself occupies one byte. As a result, the total length of each eucjis hankaku katakana character is two bytes.