Length considerations

When converting from a workstation Japanese character set to a mainframe Japanese character set, you frequently need to adjust the length. The adjustment depends on which character sets, datatypes, and language are being used.

In this section:


Character set length requirements

Table 2-14 describes how Japanese characters are represented in supported character sets, and how their lengths are affected.

Table 2-14: Length requirements in Japanese character sets

Character set

SBCS or DBCS

Datatype

Length considerations

Example

EUC-JIS

DBCS (hankaku katakana)

character

Each 1-byte hankaku katakana character is preceded by a 1-byte SS2 indicator. As a result, each eucjis hankaku katakana character has a length of 2: the SS2 indicator and the hankaku katakana itself.

A string of 4 hankaku katakana occupies 8 bytes and has a length of 8.

EUC-JIS

DBCS (kanji)

character

Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 8 bytes and has a length of 8.

Shift-JIS

SBCS (hankaku katakana)

character

Each hankaku katakana character is 1 byte long and has a length of 1. Shift-JIS hankaku katakana does not use SS2 indicators.

A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.

Shift-JIS

DBCS (kanji)

character

Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 8 bytes and has a length of 8.

IBM Kanji kanji

DBCS

character

Each kanji character is 2 bytes long and has a length of 2. Each kanji string is preceded by a Shift Out indicator and followed by a Shift In indicator, adding two to the length of each kanji string. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 10 bytes and has a length of 10. (8 bytes for the data and 2 bytes for the SO/SI codes)

IBM Kanji kanji

DBCS

graphic

Each kanji character is a double-byte character and has a length of 1. There are no SO/SI indicators with graphic data.

A string of 4 kanji occupies 8 bytes and has a length of 4.

IBM Kanji hankaku katakana

SBCS

character

Each hankaku katakana character is 1 byte long and has a length of 1. IBM Kanji hankaku katakana does not use SS2 indicators.

A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.


Examples of length settings in conversions

Table 2-15 illustrates length adjustments required for some workstation-to-mainframe Japanese character set conversions.

Table 2-15: Length-settings in Japanese character set conversions

Source character set

Source datatypes

Source length

Target character set

Target datatypes

Target length

EUCJIS hankaku katakana

character

8

IBM Kanji hankaku katakana

character

4

EUCJIS kanji

character

8

IBM Kanji kanji

character

10

EUCJIS kanji

character

8

IBM Kanji kanji

graphic

4

Shift-JIS hankaku katakana

character

4

IBM Kanji hankaku katakana

character

4

Shift-JIS kanji

character

8

IBM Kanji kanji

character

10

Shift-JIS kanji

character

8

IBM Kanji kanji

graphic

4

IBM Kanji hankaku katakana

character

4

EUCJIS hankaku katakana

character

8

IBM Kanji hankaku katakana

character

4

Shift-JIS hankaku katakana

character

4

IBM Kanji kanji

character

10

EUCJIS kanji

character

8

IBM Kanji kanji

character

10

Shift-JIS kanji

character

8

IBM Kanji kanji

graphic

4

EUCJIS kanji

character

8

IBM Kanji kanji

graphic

4

Shift-JIS kanji

character

8


Lengths in conversions

Because differences among Japanese character sets can result in longer and shorter lengths after conversion, Gateway-Library includes the TDSETSOI function that specifies padding or stripping the SO/SI indicators.

When converting from a character set that uses SO/SI indicators to one that does not (for example, converting CHAR data from IBM Kanji to Shift-JIS kanji), you can use TDSETSOI to specify whether the SO/SI indicators are stripped or whether they are replaced with embedded blanks. When replaced with embedded blanks, the length does not change. When stripped, the length is reduced by two bytes for each kanji string.

If no strip option is set, the JCM automatically strips SO/SI indicators.

When TDSETSOI replaces SO/SI indicators with blanks, the blanks are positioned at the end of the field. For example, in an IBM Kanji CHAR field that contains four kanji, the first byte contains the SO indicator, and the tenth byte contains the SI indicator. After conversion to Shift-JIS kanji, the first eight bytes are occupied by kanji, and the blanks occupy bytes nine and ten.

By judicious use of TDSETSOI, you can minimize the length changes and calculations needed in Open ServerConnect programs. See TDSETSOI for details.