When converting from a workstation Japanese character set to a mainframe Japanese character set, you frequently need to adjust the length. The adjustment depends on which character sets, datatypes, and language are being used.
In this section:
Descriptions of eucjis data also apply to deckanji, with the exception that deckanji does not include hankaku katakana.
Open ServerConnect character datatypes are TDSCHAR and TDSVARYCHAR.
Open ServerConnect graphic datatypes are TDSGRAPHIC and TDSVARYGRAPHIC.
Open ServerConnect datatypes with “VARY” in the name have a two-byte length (“LL”) specification at the beginning of each data field. Do not count these “LL” bytes when calculating the length of the field.
Table 2-14 describes how Japanese characters are represented in supported character sets, and how their lengths are affected.
Character set |
SBCS or DBCS |
Datatype |
Length considerations |
Example |
---|---|---|---|---|
EUC-JIS |
DBCS (hankaku katakana) |
character |
Each 1-byte hankaku katakana character is preceded by a 1-byte SS2 indicator. As a result, each eucjis hankaku katakana character has a length of 2: the SS2 indicator and the hankaku katakana itself. |
A string of 4 hankaku katakana occupies 8 bytes and has a length of 8. |
EUC-JIS |
DBCS (kanji) |
character |
Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe. |
A string of 4 kanji occupies 8 bytes and has a length of 8. |
Shift-JIS |
SBCS (hankaku katakana) |
character |
Each hankaku katakana character is 1 byte long and has a length of 1. Shift-JIS hankaku katakana does not use SS2 indicators. |
A string of 4 hankaku katakana occupies 4 bytes and has a length of 4. |
Shift-JIS |
DBCS (kanji) |
character |
Each kanji character is 2 bytes long and has a length of 2. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe. |
A string of 4 kanji occupies 8 bytes and has a length of 8. |
IBM Kanji kanji |
DBCS |
character |
Each kanji character is 2 bytes long and has a length of 2. Each kanji string is preceded by a Shift Out indicator and followed by a Shift In indicator, adding two to the length of each kanji string. Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe. |
A string of 4 kanji occupies 10 bytes and has a length of 10. (8 bytes for the data and 2 bytes for the SO/SI codes) |
IBM Kanji kanji |
DBCS |
graphic |
Each kanji character is a double-byte character and has a length of 1. There are no SO/SI indicators with graphic data. |
A string of 4 kanji occupies 8 bytes and has a length of 4. |
IBM Kanji hankaku katakana |
SBCS |
character |
Each hankaku katakana character is 1 byte long and has a length of 1. IBM Kanji hankaku katakana does not use SS2 indicators. |
A string of 4 hankaku katakana occupies 4 bytes and has a length of 4. |
Table 2-15 illustrates length adjustments required for some workstation-to-mainframe Japanese character set conversions.
Source character set |
Source datatypes |
Source length |
Target character set |
Target datatypes |
Target length |
---|---|---|---|---|---|
EUCJIS hankaku katakana |
character |
8 |
IBM Kanji hankaku katakana |
character |
4 |
EUCJIS kanji |
character |
8 |
IBM Kanji kanji |
character |
10 |
EUCJIS kanji |
character |
8 |
IBM Kanji kanji |
graphic |
4 |
Shift-JIS hankaku katakana |
character |
4 |
IBM Kanji hankaku katakana |
character |
4 |
Shift-JIS kanji |
character |
8 |
IBM Kanji kanji |
character |
10 |
Shift-JIS kanji |
character |
8 |
IBM Kanji kanji |
graphic |
4 |
IBM Kanji hankaku katakana |
character |
4 |
EUCJIS hankaku katakana |
character |
8 |
IBM Kanji hankaku katakana |
character |
4 |
Shift-JIS hankaku katakana |
character |
4 |
IBM Kanji kanji |
character |
10 |
EUCJIS kanji |
character |
8 |
IBM Kanji kanji |
character |
10 |
Shift-JIS kanji |
character |
8 |
IBM Kanji kanji |
graphic |
4 |
EUCJIS kanji |
character |
8 |
IBM Kanji kanji |
graphic |
4 |
Shift-JIS kanji |
character |
8 |
Because differences among Japanese character sets can result in longer and shorter lengths after conversion, Gateway-Library includes the TDSETSOI function that specifies padding or stripping the SO/SI indicators.
When converting from a character set that uses SO/SI indicators to one that does not (for example, converting CHAR data from IBM Kanji to Shift-JIS kanji), you can use TDSETSOI to specify whether the SO/SI indicators are stripped or whether they are replaced with embedded blanks. When replaced with embedded blanks, the length does not change. When stripped, the length is reduced by two bytes for each kanji string.
If no strip option is set, the JCM automatically strips SO/SI indicators.
When TDSETSOI replaces SO/SI indicators with blanks, the blanks are positioned at the end of the field. For example, in an IBM Kanji CHAR field that contains four kanji, the first byte contains the SO indicator, and the tenth byte contains the SI indicator. After conversion to Shift-JIS kanji, the first eight bytes are occupied by kanji, and the blanks occupy bytes nine and ten.
By judicious use of TDSETSOI, you can minimize the length changes and calculations needed in Open ServerConnect programs. See TDSETSOI for details.
Copyright © 2005. Sybase Inc. All rights reserved. |
![]() |