What is ICU, and when is it needed?

ICU, or International Components for Unicode, is an open source library developed and maintained by IBM. ICU facilitates software internationalization by providing Unicode support. SQL Anywhere implements certain character set conversions and collation operations using ICU.

When is ICU needed on the database server? (all platforms except Windows Mobile)

Ideally, ICU should always be available for use by the database server. The following table specifies when and why ICU is needed:

ICU is needed when...	Notes
Unicode Collation Algorithm (UCA) is used as the collation for the NCHAR or CHAR character set.	UCA requires ICU.
The database character set is not UTF-8, but is a multi-byte character set.	For password conversion from the database character set to UTF-8 (database passwords are stored in UTF-8, internally).
The client and database character sets are different, and when either of them is multi-byte (including UTF-8). This includes Unicode ODBC, OLE DB, ADO.NET, and SQL Anywhere JDBC applications, regardless of the database character set where at least one of these clients do not have ICU.	Proper conversion to and from a multi-byte character set requires ICU.
The database character set is not UTF-8 and conversion between CHAR and NCHAR values is required.	The database server requires ICU to convert UTF-8 to another character set.
An embedded SQL client uses an NCHAR character set other than UTF-8.	The database server requires ICU to convert UTF-8 to another character set. The default embedded SQL client NCHAR character set is the same as the initial client CHAR character set. This can be changed using the db_change_nchar_charset function. See db_change_nchar_charset function.
The CSCONVERT or SORTKEY functions are used. The CSCONVERT function is called to convert between character sets that conform to the requirements of the third point above.	Character set conversion o and from a multi-byte character set requires ICU. Sortkey generation for many sortkey labels requires UCA, which, in turn, requires ICU. See CSCONVERT function [String] and SORTKEY function [String].

When is ICU needed on the database server? (Windows Mobile)

The following table specifies when and why ICU is needed for Windows Mobile:

ICU is needed when...	Notes
UCA is used as the NCHAR collation or the CHAR collation.	UCA requires ICU.
The SORTKEY function is used.	Sortkey generation for many sortkey labels requires UCA, which, in turn, requires ICU. See SORTKEY function [String].
The CHAR character set does not match the OS character set.	Even if the character sets match, using ICU is recommended because it improves character set conversion if you are using NCHAR, or if the CHAR character set is multibyte.

Note

If you do not install the ICU library, you must choose either a collation whose character set matches the Windows Mobile character set or the UTF8BIN collation as the CHAR collation when creating your database. Also, you must choose the UTF8BIN collation as the NCHAR collation when creating your database.

When can I get correct character set conversion on the database server without ICU?

You can get correct character set conversion without ICU when both the database character set and client character set are single-byte and sqlany.cvf is available (all platforms), or if the operating system supports the conversion (Windows only). This is because single-byte to single-byte conversions can be processed without ICU if the sqlany.cvf file is available, or the host operating system has the appropriate converters installed.

When is ICU needed on the client? (all platforms except Windows Mobile)

For Unicode client applications, you are likely to get better combined client and database server performance when all clients have ICU installed, regardless of the database character set. This is because some of the required conversion activity may be offloaded from the database server to the client, and because fewer conversions are required.

Also, if you are using ODBC on Windows platforms, you must have ICU installed on the client, even for ANSI applications. This is because the driver manager converts ANSI ODBC calls to Unicode ODBC calls.

Discuss this page in DocCommentXchange.