Character set length requirements

Table 2-14 describes how Japanese characters are represented in supported character sets, and how their lengths are affected.

Table 2-14: Length requirements in Japanese character sets

Character set

SBCS or DBCS

Datatype

Length considerations

Example

EUC-JIS

DBCS (hankaku katakana)

character

Each 1-byte hankaku katakana character is preceded by a 1-byte SS2 indicator. As a result, each eucjis hankaku katakana character has a length of 2: the SS2 indicator and the hankaku katakana itself.

A string of 4 hankaku katakana occupies 8 bytes and has a length of 8.

EUC-JIS

DBCS (kanji)

character

Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 8 bytes and has a length of 8.

Shift-JIS

SBCS (hankaku katakana)

character

Each hankaku katakana character is 1 byte long and has a length of 1. Shift-JIS hankaku katakana does not use SS2 indicators.

A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.

Shift-JIS

DBCS (kanji)

character

Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 8 bytes and has a length of 8.

IBM Kanji

DBCS

character

Each kanji character is 2 bytes long and has a length of 2. Each kanji string is preceded by a Shift Out indicator and followed by a Shift In indicator, adding two to the length of each kanji string. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 10 bytes and has a length of 10. (8 bytes for the data and 2 bytes for the SO/SI codes)

IBM Kanji kanji

DBCS

graphic

Each kanji character is a double-byte character and has a length of 1. There are no SO/SI indicators with graphic data.

A string of 4 kanji occupies 8 bytes and has a length of 4.

IBM Kanji hankaku katakana

SBCS

character

Each hankaku katakana character is 1 byte long and has a length of 1. IBM Kanji hankaku katakana does not use SS2 indicators.

A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.