As Published In
Oracle Magazine
September/October 2005

TECHNOLOGY: Globalization


Going Global

By Ron Hardman

Globalization support in Oracle Database 10g Release 2 makes localization easier.

The project's development cycle was nearly complete, and the whole team felt as though it'd been through the ringer. It was six months of late nights, excruciatingly long meetings, rah-rah speeches delivered by Jerry, the company president, and countless cups of coffee. In just two weeks the company's flagship product would ship a major release, and nobody was happier than Scott, the lead Oracle database developer. There was a light at the end of the tunnel.

A rumor began that a huge contract had been signed a few days ago. As was typical with a large customer win, Jerry scheduled a meeting to discuss the deal with the development group, and to give direction regarding the implementation. Jerry dispensed with the pleasantries and quickly jumped into the "most exciting news of the day."

"We closed a deal with one of Japan's largest distribution companies on Tuesday with the agreement that we'll deliver a localized version of our upcoming release," he said, clearly excited about the sale. "We've contracted with a company in Japan to perform the translations, so as soon as you leave this meeting I want each team to confirm that your code is ready for localization. We have one month after the release of the English version to deliver the application to our new customer."

Scott's mind began to race. He'd been with the company since its initial product release and they'd never really discussed distributing the software outside of the country. He was vaguely familiar with Oracle's National Language Support (NLS) in Oracle8i Database, but the company was releasing on Oracle Database 10g Release 2. "Our application wasn't designed for this," he said to a co-worker as they returned to their offices.

The light at the end of the tunnel had just disappeared.

Problem Definition

Scott and his team had their work cut out for them. To pull this off, they needed to do the following: 

  • Find out the globalization strategy for the company and the product.

  • Determine the steps needed for internationalization and implement them.

Scott and his team turned to the globalization documentation for Oracle Database 10g, as well as the Localization Industry Standards Association (LISA).

Globalization. LISA defines globalization (g11n) as follows:

"[It] addresses all of the enterprise issues associated with making a company truly global. For the globalization of products and services, this involves integrating all of the internal and external business functions with marketing, sales, and customer support in the world market."

Scott set up a meeting with Jerry, the product development team, and the department heads so they could all fully understand the g11n strategy for the business and for their product. Once the direction was set, Scott agreed to split the effort with the UI team. They'd focus on changes required for the user interface, relying heavily on the Oracle Globalization Development Kit's Java API. Scott and his team would focus on the database, redesigning it to be locale-independent. They'd meet daily to discuss the progress and any areas of overlap.

The group also decided to limit product installations to a single territory and language. As you'll see, this simplified the design considerably.

Internationalization. LISA defines internationalization (i18n) as: ". . . the process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for redesign. Internationalization takes place at the level of program design and document development."

In the first meeting following Jerry's announcement, Scott and his team walked through the database design, identifying the specific areas they'd need to address for i18n. The team's task of preparing the product's database for localization was divided into four main categories: 

  • Currency

  • Strings

  • Datetime

  • Error handling

Although other applications might need to address different i18n areas, this list covered the major areas for Scott and his team.

Oracle Globalization Support

The term National Language Support (NLS) used to refer to all of Oracle Database's i18n and localization capabilities. It's now a subset of features included in what Oracle calls Globalization Support. Two of the main features added since Scott's Oracle8i Database training are Character semantics. Oracle9i Database introduced character semantics, which changed the way multibyte characters were treated. Instead of doubling or tripling column or variable precision, setting NLS_LENGTH_SEMANTICS = CHAR causes Oracle Database to treat storage of the string 'Today' the same as the Japanese string 'こんにちは'. With this setting, glyphs (characters) are the measure for column and variable precision rather than the bytes required to store the characters. (For more information on character semantics, refer to the links in Next Steps at the end of this article.)

Globalization Development Kit. Oracle Database 10g Release 1 added a Globalization Development Kit (GDK) to speed up i18n and localization for Java and PL/SQL developers. Oracle Database 10g Release 2 improved the feature set included in the UTL_I18N supplied package with new transliteration and locale mapping functions. The GDK includes Java and PL/SQL APIs. The UTL_I18N package performs most of the PL/SQL tasks for the GDK, while the UTL_LMS package gets and formats error messages according to the specified language.

Oracle Database 10g Release 2 also supports Unicode 4.0. This means that every character of every common language in the world is supported by one or more Oracle Database character sets.

Stepping Through Internationalization

Scott decided to take what the team had learned from its research and work through each issue.

Currency. The first category, Currency, had one requirement:

Adhere to internationally recognized ISO currency formats.

Some companies might need to convert currency if they support cross-currency commerce. Since Scott's company decided to allow only a single territory per product installation, this wasn't required.

Scott could format the currency in a few ways. First, he could use the NLS_CURRENCY parameter to set session-specific currency symbols.

The default value for currency symbols is based on the NLS_TERRITORY parameter value. With the session setting NLS_TERRITORY = 'AMERICAN' , the default currency symbol value is the dollar ($).

The NLS_CURRENCY parameter value, if set, overrides the default symbol that corresponds to the NLS_TERRITORY parameter. To test this, Scott executed the following to set the NLS_TERRITORY parameter to AMERICAN and the NLS_CURRENCY parameter to Yen (¥): 

ALTER SESSION SET 
NLS_TERRITORY = 'AMERICA';
ALTER SESSION SET NLS_CURRENCY = '¥';


In iSQL*Plus, he then ran a SELECT on the PRODUCT_PRICE table using the TO_CHAR function and a format mask: 

SELECT TO_CHAR(price, 'L999G999') 
AS "Price"
FROM product_price;
/

Price 
¥23
¥55
¥34


Note: Use iSQL*Plus to make sure that multibyte characters can be displayed; simply set your browser encoding to Unicode (UTF-8).

The format mask begins with an L , indicating that the currency symbol should be included in the results. G is the group separator as determined by the NLS_NUMERIC_CHARACTER parameter.

Scott could have also used the NLS_TERRITORY value to derive the appropriate currency symbol. The value for NLS_TERRITORY and the format mask passed to the TO_CHAR function determine the currency format.

To test this alternative, Scott set NLS_TERRITORY to GERMANY in a new session: 

ALTER SESSION SET 
NLS_TERRITORY = 'GERMANY';


Next, he ran the same SELECT statement, modifying the format mask used with TO_CHAR

SELECT TO_CHAR(price, 'L9999') AS "Price"
FROM product_price;
/

Price 
€23 
€55 
€34 


The format mask includes an L , indicating once again that the currency symbol should be included in the result set. No group separator was used.

Since product installations will support only one territory at a time, letting NLS_TERRITORY determine the appropriate currency was an acceptable solution. Scott decided to store all currency formats in an administration table, and set it for the entire application during installation.

Strings. There were two requirements in the second category, Strings. 

  1. Evaluate the character set recommendation that's currently US7ASCII .

  2. Make sure that multibyte strings don't exceed column/variable precision.

For the first requirement, the US7ASCII character set won't work with multibyte data: Japanese characters are as large as 3 bytes each, so this wasn't acceptable. Scott's team could either select a specific language character set or go with a universal character set. While it's perfectly acceptable to select a character set based on the language used by each customer, using a universal character set helps in the following ways: 

  • If the same character set is used for development, testing, and support at all client sites, character-set-related issues will be caught early and can be easily reproduced.

  • There will be only one set of instructions and no need for individual language recommendations. This is less confusing and makes it easier to support virtually any language without modifying the instructions.

Scott and his team decided to recommend the Unicode AL32UTF8 database character set. (Note that this character set is not a valid choice for the NATIONAL character set.)

The second requirement was related to byte semantics. Oracle Database 10g stores characters according to the number of bytes they require (rather than the number of characters in a string). This causes a problem when five characters that are 3 bytes each are assigned to a variable with a precision of 5. Scott demonstrated this in the following example: 

ALTER SESSION SET 
NLS_LENGTH_SEMANTICS = 'BYTE';
SET SERVEROUTPUT ON
DECLARE
  v_string VARCHAR2(5);
BEGIN
  v_string := 'こんにちは';
  DBMS_OUTPUT.PUT_LINE(v_string);
END;
/  

ERROR at line 1: 
ORA-06502: PL/SQL: numeric or value error: character string buffer too small 
ORA-06512: at line 4


Character semantics, introduced in Oracle9i Database, can be used in a couple of ways. The first is to change the variable declaration, adding CHAR , as in this example: 

v_string VARCHAR2(5 CHAR);


Changing every variable declaration in the production code would take Scott and his team far too long to comb through all the code and make changes. In addition, Oracle-supplied packages can't be modified, so this could still be a problem if built-ins are used. Oracle Database 10g provides a much easier way to resolve this problem using the NLS_LENGTH_SEMANTICS parameter. It can be set to CHAR for just one session or for the entire system. Once it's set, you must recompile all code to use character semantics.

To test this solution, Scott and his team set NLS_LENGTH_SEMANTICS = 'CHAR' for a session: 

ALTER SESSION SET 
NLS_LENGTH_SEMANTICS = 'CHAR';


They re-ran the anonymous block that returned an error earlier to verify that the issue was resolved: 

SET SERVEROUTPUT ON
DECLARE
  v_string VARCHAR2(5);
BEGIN
  v_string := 'こんにちは';
  DBMS_OUTPUT.PUT_LINE(v_string);
END;
/

こんにちは
PL/SQL procedure successfully completed.


Setting NLS_LENGTH_SEMANTICS = 'CHAR' for the database resolved the second issue. Scott added this to the list of installation prerequisites.

Datetime. Scott and his team identified one requirement for datetime:

Provide a way to identify the time zone within a territory for a user.

Identifying time zones is where the GDK for PL/SQL comes in handy. Although Scott had to support only one territory at a time, there could be multiple time zones within that territory. Take the United States, for example. For the NLS_TERRITORY value AMERICA , Oracle Database 10g lists seven time zones, from New York to Hawaii.

The easiest way to make sure datetime is handled correctly is to add a time zone selection to the administration screen for each user. A drop-down list of time zones would be ideal but impractical unless the list could be narrowed down to some extent. The UTL_I18N package that implements the GDK for PL/SQL includes a function called GET_LOCAL_TIME_ZONES . When you enter a territory, it returns a list of time zones.

Scott ran the following anonymous block to retrieve a list of time zones when given a territory: 

SET SERVEROUTPUT ON
DECLARE
  v_array utl_i18n.STRING_ARRAY;
BEGIN
  v_array := utl_i18n.GET_LOCAL_
TIME_ZONES('AMERICA');
  FOR y IN v_array.FIRST..v_array.LAST
  LOOP
    DBMS_OUTPUT.PUT_LINE(v_array(y));
  END LOOP;
END;
/

America/New_York 
America/Indianapolis 
America/Chicago 
America/Denver 
America/Phoenix 
America/Los_Angeles 
America/Anchorage 
America/Honolulu


If, in the future, the application needs to display the datetime with time zone information, it's quite easy to do. Like currency formatting, TO_CHAR is used to format the output. The NLS_DATE_LANGUAGE parameter ensures that the display will be in the desired language; in this test it's in Japanese: 

SELECT TO_CHAR(SYSTIMESTAMP, 
         'yyyy/MM/dd hh:mi:ssxff AM', 
         'NLS_DATE_LANGUAGE = JAPANESE') 
AS "Japanese Timestamp"
FROM dual;
/
Japanese Timestamp 
2005/06/01 10:43:57.625000 午後


Error handling. Error handling is typically the last thing considered in i18n, but it's one of the most important tasks. To drive this point home, Scott asked his team what action should be taken based on the following message:

'行1でエラーが 発 生しまし': 表名が無 効す

Since no one on Scott's team spoke Japanese, they guessed at the error message content based on the error number alone. "That is exactly what someone who doesn't speak English goes through when they see an error message in English," Scott said.

Oracle Database 10g includes message files for many languages. If you need messages for a particular language, you must install them.

During software installation, Oracle Database 10g lets you add or remove languages, as shown in Figure 1. If you need more languages later, you can add them by rerunning the installer and selecting the required languages, or by running the install_dir/install/addLangs.* program.

 

figure 1
Figure 1: Language selection dialog


Scott added Japanese to his list of languages on his test database. To see if the messages would now return in Japanese, he first modified the NLS_LANGUAGE parameter in his session: 

ALTER SESSION SET 
NLS_LANGUAGE = 'JAPANESE';

Then, he ran the following to generate an error:

SET SERVEROUTPUT ON
DECLARE
  v_string VARCHAR2(2);
BEGIN
  v_string := 行1でエラーが 発 生しまし;   
  DBMS_OUTPUT.PUT_LINE(v_string);
EXCEPTION
  WHEN OTHERS
  THEN 
    DBMS_OUTPUT.PUT_LINE(
DBMS_UTILITY.format_error_stack);
END;
/

ORA-06502: PL/SQL:
数値まは値の エラー:
文字 列バッファが小さすぎます が 発 生しまし
_


This certainly helps, but what if you need additional formatting, sometimes in multiple languages? The UTL_LMS package, included in the GDK, can help. UTL_LMS has two functions: one to get the raw message, and one to format it. Scott wanted to report errors in the local language as well as in English so that technical support could quickly troubleshoot problems. He created a quick test—shown in Listing 1—to display error messages in Japanese and English.

Code Listing 1: Displaying messages in English and Japanese 

SET SERVEROUTPUT ON
DECLARE
   v_english_message VARCHAR2(500);
   v_l10n_message VARCHAR2(500);
   v_string VARCHAR2(2);
   v_function PLS_INTEGER;
BEGIN
      v_string:='行1でエラーが 発 生しまし';
EXCEPTION
   WHEN OTHERS
   THEN
      v_function := UTL_LMS.GET_MESSAGE(6502, 'RDBMS', 
                                                    'ORA', 'JAPANESE', 
                                                     v_l10n_message);
      DBMS_OUTPUT.PUT_LINE(UTL_LMS.FORMAT_MESSAGE(v_l10n_message, 
                                                              ': Japanese Version'));
      v_function := UTL_LMS.GET_MESSAGE(6502, 'RDBMS',
                                                    'ORA', 'AMERICAN',
                                                     v_english_message);
      DBMS_OUTPUT.PUT_LINE(UTL_LMS.FORMAT_MESSAGE(v_english_message, 
                                                               ': English Version'));
END;
/


Next Steps


READ
more about globalization

more about Oracle GDK
Oracle Database 10g Globalization Support Guide

This also gave Scott the flexibility to later tie custom messages to Oracle Database 10g's message, just as he added "English Version" to this test.

Localization

If i18n is done correctly, localization (l10n) is straightforward. LISA defines l10n as follows:

"[It] involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language) where it will be used and sold."

Since another company was handling localization, aside from being available for bug fixes, Scott and his team were done with the globalization project.

Conclusion

Nearly three weeks have passed since Scott submitted the database changes. Since then, Jerry has announced two more overseas sales, one in South Korea and another in France. Thanks to the team effort involved in making it a global application, the product went straight to l10n for both countries.


Ron Hardman (Ron.Hardman@PeakRetrieval.com) is an application developer at Academy District 20 schools and the founder of Peak Retrieval, LLC. He is coauthor of Oracle Database 10g PL/SQL Programming and Expert Oracle PL/SQL , both from Oracle Press.


Send us your comments