Length considerations

When converting from a workstation Japanese character set to a mainframe Japanese character set, you frequently need to adjust the length. The adjustment depends on which character sets, datatypes, and language are being used.

In this section:

Descriptions of eucjis data also apply to deckanji, with the exception that deckanji does not include hankaku katakana.
Open ServerConnect character datatypes are TDSCHAR and TDSVARYCHAR.
Open ServerConnect graphic datatypes are TDSGRAPHIC and TDSVARYGRAPHIC.
Open ServerConnect datatypes with “VARY” in the name have a two-byte length (“LL”) specification at the beginning of each data field. Do not count these “LL” bytes when calculating the length of the field.

Character set length requirements

Table 2-14 describes how Japanese characters are represented in supported character sets, and how their lengths are affected.

**Table 2-14: Length requirements in Japanese character sets**
Character set	SBCS or DBCS	Datatype	Length considerations	Example
EUC-JIS	DBCS (hankaku katakana)	character	Each 1-byte hankaku katakana character is preceded by a 1-byte SS2 indicator. As a result, each eucjis hankaku katakana character has a length of 2: the SS2 indicator and the hankaku katakana itself.	A string of 4 hankaku katakana occupies 8 bytes and has a length of 8.
EUC-JIS	DBCS (kanji)	character	Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.	A string of 4 kanji occupies 8 bytes and has a length of 8.
Shift-JIS	SBCS (hankaku katakana)	character	Each hankaku katakana character is 1 byte long and has a length of 1. Shift-JIS hankaku katakana does not use SS2 indicators.	A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.
Shift-JIS	DBCS (kanji)	character	Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.	A string of 4 kanji occupies 8 bytes and has a length of 8.
IBM Kanji kanji	DBCS	character	Each kanji character is 2 bytes long and has a length of 2. Each kanji string is preceded by a Shift Out indicator and followed by a Shift In indicator, adding two to the length of each kanji string. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.	A string of 4 kanji occupies 10 bytes and has a length of 10. (8 bytes for the data and 2 bytes for the SO/SI codes)
IBM Kanji kanji	DBCS	graphic	Each kanji character is a double-byte character and has a length of 1. There are no SO/SI indicators with graphic data.	A string of 4 kanji occupies 8 bytes and has a length of 4.
IBM Kanji hankaku katakana	SBCS	character	Each hankaku katakana character is 1 byte long and has a length of 1. IBM Kanji hankaku katakana does not use SS2 indicators.	A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.

Examples of length settings in conversions

Table 2-15 illustrates length adjustments required for some workstation-to-mainframe Japanese character set conversions.

**Table 2-15: Length-settings in Japanese character set conversions**
Source character set	Source datatypes	Source length	Target character set	Target datatypes	Target length
EUCJIS hankaku katakana	character	8	IBM Kanji hankaku katakana	character	4
EUCJIS kanji	character	8	IBM Kanji kanji	character	10
EUCJIS kanji	character	8	IBM Kanji kanji	graphic	4
Shift-JIS hankaku katakana	character	4	IBM Kanji hankaku katakana	character	4
Shift-JIS kanji	character	8	IBM Kanji kanji	character	10
Shift-JIS kanji	character	8	IBM Kanji kanji	graphic	4
IBM Kanji hankaku katakana	character	4	EUCJIS hankaku katakana	character	8
IBM Kanji hankaku katakana	character	4	Shift-JIS hankaku katakana	character	4
IBM Kanji kanji	character	10	EUCJIS kanji	character	8
IBM Kanji kanji	character	10	Shift-JIS kanji	character	8
IBM Kanji kanji	graphic	4	EUCJIS kanji	character	8
IBM Kanji kanji	graphic	4	Shift-JIS kanji	character	8

Lengths in conversions

Because differences among Japanese character sets can result in longer and shorter lengths after conversion, Gateway-Library includes the TDSETSOI function that specifies padding or stripping the SO/SI indicators.

When converting from a character set that uses SO/SI indicators to one that does not (for example, converting CHAR data from IBM Kanji to Shift-JIS kanji), you can use TDSETSOI to specify whether the SO/SI indicators are stripped or whether they are replaced with embedded blanks. When replaced with embedded blanks, the length does not change. When stripped, the length is reduced by two bytes for each kanji string.

If no strip option is set, the JCM automatically strips SO/SI indicators.

When TDSETSOI replaces SO/SI indicators with blanks, the blanks are positioned at the end of the field. For example, in an IBM Kanji CHAR field that contains four kanji, the first byte contains the SO indicator, and the tenth byte contains the SI indicator. After conversion to Shift-JIS kanji, the first eight bytes are occupied by kanji, and the blanks occupy bytes nine and ten.

By judicious use of TDSETSOI, you can minimize the length changes and calculations needed in Open ServerConnect programs. See TDSETSOI for details.