Java Internationalization: An Overview

   
   

Articles Index


Customers expect products to conform to their cultural preferences, especially when it comes to language and data formats. You've probably been involved in creating applications in C, C++, or a 4GL that accommodate those expectations, but do you know how to write great global applications in the Java programming language?

Creating a global application isn't particularly difficult, but it does require you to become familiar with the most common international problems and their solutions. The problems associated with creating an international application are basically the same from one computing environment and language to any other. Solutions are roughly equivalent as well, although their implementations obviously differ among the various computing environments and programming languages. This article gives an overview of internationalization topics and concepts in a Java programming environment, and covers the following features available in the Java Development Kit 1.1.

  • Locale
  • Resource bundles
  • Character sets
  • Layout managers

Locale

Locales are used throughout the Java class libraries to customize how data is presented and formatted. They affect language choice, collation, calendar usage, date and time formats, number and currency formats, and many other culturally sensitive data representations. If you intend to create international Java applications, you'll definitely use the java.util.Locale class. There's no getting around it; you'll use Locales to create well-behaved, internationalized, multilingual Java applications. So, if you haven't had time to explore all the JDK 1.1 international features yet, you'll get a clearer understanding of the core of the internationalization model, the Locale, as you read and understand the descriptions and examples in this article.

A Locale is a relatively simple object. It identifies a specific language and a geographic region. In fact, the only significant contents of a Locale object are language and country. Although superficially these attributes are not particularly impressive, they represent a very rich and interesting set of information. A Locale object represents the language and cultural preferences of a geographic area. Language is a fairly easy idea to grasp; cultural preferences may not be immediately clear. Dates, time, numbers, and currency are all examples of data that is formatted according to cultural expectations. Cultural preferences are tightly coupled to a geographic area; that's why country is an important element of locale. Together these two elements (language and country) provide a precise context in which information can be presented. Using Locale, you can present information in the language and form that is best understood and appreciated by the user.

Language

A Locale's language is specified by the ISO 639 standard, which describes valid language codes that can be used to construct a Locale object. The following figure lists a few language codes in the standard. Because language is so dependent on geography, a language code might not capture all the nuances of usage in a particular area. For example, Canadian French and Swiss French may use different phrases and terms to mean different things even though basic grammar and vocabulary are the same. For this reason, language is only half of a well-constructed Locale object.

Language Code Language
en English
fr French
zh Chinese
ja Japanese

Country

The Locale's country identifier is also specified by an ISO standard, ISO 3166, which describes valid two-letter codes for all countries. ISO 3166 defines these codes in uppercase letters. The following figure lists a few countries that are part of the standard. Although the Locale constructor allows lowercase letters, it promptly converts the code to uppercases to create the correct internal representation. The country code provides more contextual information for a locale and affects a language's usage, word spelling, and collation rules.

Country Code Country
US United States
FR France
CA Canada

Variant

A variant is an optional extension to a Locale. It identifies a custom Locale that is not possible to create with just language and country codes. Variants can be used by anyone to add additional context for identifying a Locale. The locale en_US represents English (United States), but en_US_CA represents even more information and might identify a locale for English (California, U.S.A). OS or software vendors can use these variants to create more descriptive Locales for their specific environments.

Usage

Locale-sensitive objects have methods that use a Locale parameter. These objects behave differently depending on Locale, and they often format information for the user in ways that are culturally sensitive. These objects try to accommodate the presentation preferences of the various locales defined in the system. For example, a DateFormat class would format a date differently depending upon locale. Also, text and other user interface (UI) elements can be searched and applied in a locale-sensitive manner. Locale objects are used throughout a properly internationalized Java application; they are used by all other classes that have adaptable behavior or representation based on cultural, language, or geographic preferences.

Locales are defined in the java.util package and have numerous constructor and access methods. Each of the following methods returns a String:

  • getLanguage
  • getCountry
  • getVariant
  • toString
Locale
Locale(String language, String country)
Locale(String language, 
             String country, String variant)

You can use either of the constructors to create a Locale object:

Locale myLocale = new Locale(
                     "en", "US"
                 ); "en", "US",
                              "VENTURA");

The en represents English, and US is an abbreviation for United States. The second line shows how to create a Locale with an optional variant, which can be used to create a more specific Locale than what's possible with just language and country codes.

Although the Java compiler and run-time environment won't complain if you make up your own language and country identifiers, you should use the valid codes defined by ISO standards. By constraining yourself to the ISO definitions, you'll ensure compatibility with other Java applications and coding standards.

Once created, the Locale provides access to its individual components. getLanguage() and getCountry() return the ISO language and country codes that comprise a Locale object. These codes, however, aren't exactly user-friendly. They probably won't mean a lot to your customers, so if you want to display language and country information in the application, you should probably use other methods.

Locale myLocale;
String language;
String country;

myLocale = new Locale(
            "en", "US");
language = myLocale.getLanguage();
country = myLocale.getCountry();
System.out.println(language);
System.out.println(country);

OUTPUT:
en
US

getDisplayLanguage() and getDisplayCountry() will return String objects that are suitable for display to the customer. These methods are locale-sensitive, meaning that you can provide a Locale parameter to ask for a language or country string in a target language.

Locale myLocale = Locale.getDefault();
System.out.println(
  myLocale.getDisplayLanguage();
System.out.println(
  myLocale.getDisplayCountry();
System.out.println(
  myLocale.getDisplayLanguage(Locale.FRENCH));
System.out.println(
  myLocale.getDisplayCountry(Locale.FRENCH));

OUTPUT:
English
United States
anglais
États-Unis

The Locale class provides some static final Locales that are commonly used. If you don't provide an overriding Locale parameter, both getDisplayLanguage and getDisplayCountry will return their information in the language of the default locale. Some other examples are provided in the following figure.

Locale Identifier Meaning
en_US English (U.S.)
fr_CA French (Canadian)
fr_FR French (France)
ja_JP Japanese (Japan)
en_US_MAC English (U.S., Macintosh)

The string representation of a Locale can be created with the following:

String strLocale = myLocale.toString();

The method toString() will return a String in the form <language code>_<country code>[_<variant code>]. In the above example, toString() will return en_US. Notice that an underscore character separates each Locale component.

When the Java 1 Virtual Machine (JVM) starts up, it queries the underlying OS for a default-locale setting. You can discover your default locale programmatically. You can even change the default locale if you want to. Both of these operations are accomplished via static methods within the java.util.Locale class.

myLocale = Locale.getDefault();
System.out.println(myLocale.toString());
Locale.setDefault(Locale.GERMANY);
myLocale = Locale.getDefault();
System.out.println(mylocale.toString());

OUTPUT:
en_US
de_DE

Note: As recently as JDK 1.1.6, Locale.setDefault() causes a security exception in applets, so you might want to avoid this call in applets. As a workaround, instead of relying on the default locale, you can explicitly pass a Locale object to every locale-sensitive object you use. It's inconvenient, but it's a relatively easy fix to implement, especially if you're creating applets. You don't need to worry about the problem in applications, because you have more security rights on the local machine.

There are two additional methods that might interest you. They are getISO3Language and getISO3Country. When creating Locales, you always use the two letter ISO codes, but if you want to see them, you can use these methods to retrieve ISO's three letter codes for the same information.

After declaring the Locale as the core of Java internationalization, it might sound contradictory to say that this class doesn't do a lot on its own. A Locale's power comes from the classes that use it. In a Java application, each locale-sensitive object is responsible for its own locale-dependent behavior. A Locale object doesn't enforce this behavior, it simply acts as an indicator to other objects. Those objects are then responsible for using the Locale appropriately. By design, locale-sensitive classes are independent of each other. That is, the set of supported Locales in one class does not need to be the same as the set in another class. In practice, however, the current JDK 1.1.6 provides support to a single, shared set of locales.

In traditional operating systems and localization models, one locale setting is active at a time. You programmatically set the locale. Thereafter, all locale-sensitive functions use the specified locale selection. The specified locale is active throughout the application as a global locale. It changes when there is another global locale activation via a setlocale or similar call. Java technology, however, treats locales a little differently. A Java application can have multiple locales active at the same time. That is, it's possible to use a French date format and a U.S. number format in the same application. Nothing limits you from creating truly multicultural and multilingual Java applications.

Numbers

What number does 1,234 represent? Of course, the answer depends on locale. In the U.S, this string of digits represents one thousand two hundred and thirty four. However, in France this represents one and two hundred thirty four one-thousandths. Significant difference? Absolutely! Imagine you're a chemical manufacturer that just received an order for 1,234 kilograms of a certain chemical. Your interpretation of this number will definitely affect your sales quotas for the month.

Numbers are represented differently around the globe. When an application shows a number to the user, it must represent that number in a way that is sensitive to the cultural expectations regarding decimal point symbol, group separators, number of digits after the decimal, and leading zeros.

The java.text.NumberFormat class performs locale-specific formatting for both general purpose numbers. To instantiate a NumberFormat object, use the factory method getInstance, which returns a NumberFormat object suitable for your default locale. You can, of course, ask for an object with a specific locale in mind. To specify a locale other than your default, use getInstance(Locale locale).

If you are curious about what locales are supported, you can use the class method getAvailableLocales. This method returns an array of Locales.

Formatting a number couldn't be easier. Call the instance methods format(long number) or format(double number) to produce a String object that's suitable for displaying to the user. Other methods allow you to customize the format by turning various options on or off.

Currency

Each locale has its own preferences for currency symbols, negative amount format, leading zeros, group separators, decimal point symbol, and currency symbol position. Currency and numbers have a lot in common. In fact, they even use the same basic format class, NumberFormat, to instantiate new objects.

Although you still use NumberFormat, you call a different factory method to get a currency format object, getCurrencyInstance. This method will return a currency format object for the default locale. You can use this factory method just like you used the number factory method; call getCurrencyInstance(Locale locale) to specify a specific locale. Again, use the format method to produce a user visible String object. The currency formatter will handle all the details of selecting the correct currency symbol, placing that symbol in the string, and applying grouping rules. Also, like the number formatter, you can override several options to customize the format.

Date

A date helps to uniquely identify a point in time. Like other locale-sensitive structures, dates have many representation details. You must consider long and short date formats as well as date separator symbols. You have to worry about whether the year is displayed before the day and month or after. Again, the Java class libraries accommodate these needs.

The java.text.DateFormat class provides the getDateInstance method that creates a formatter for your default locale. The format method works in the same way as the other format methods covered so far, and applies the specific format rules for your chosen locale.

Calendar

The java.text.Calendar class is closely related to Date, and lets you extract year, month, week, and day information from a Date. You won't use Calendar directly. Instead, use getCalendarInstance to get a calendar object for your locale. The Gregorian style calendar is the only one provided at this time; however, you can create your own by subclassing Calendar.

Resource Bundles

This internationalization feature of the JDK provides a mechanism for separating user interface (UI) elements and other locale-sensitive data from the application logic in a program. Separating locale-sensitive elements from other code allows easy translation. It allows you to create a single code base for an application even though you may provide 30 different language versions. Although you might be predisposed to think of text only, remember that any localizable element is a resource, including buttons, icons, and menus.

The JDK uses resource bundles to isolate localizable elements from the rest of the application. The resource bundle contains either the resource itself or a reference to it. With all resources separated into a bundle, the Java application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.

Resource bundle names have two parts: a base name and a locale suffix. For example, suppose you create a resource bundle named MyBundle. Imagine that you have translated MyBundle for two different locales, ja_JP and fr_FR. The original MyBundle will be your default bundle, the one used when others cannot be found, or when no other locale-specific bundles exist. However, in addition to the default bundle, you'll create two more bundles. In the example these bundles would be named MyBundle_ja_JP and MyBundle_fr_FR. The ResourceBundle.getBundle method relies on this naming convention to search for the bundle used for the active locale.

The java.util.ResourceBundle class is abstract, which means you must use a subclass of ResourceBundle. The JDK provides two subclasses: PropertyResourceBundle and ListResourceBundle. If these don't meet your needs, you can create your own subclass of ResourceBundle.

PropertyResourceBundle

The PropertyResourceBundle is the most convenient bundle to use. To use this bundle, create a property file that contains key/value pairs in the form <key>=<value>. List each key/value pair on the same line of the file, and separate each pair with a new-line character. The following figure shows an example of PropertyResourceBundle.

# MyResource.properties
# <key>=<value>
TEXT_NOT_FOUND=The file could not be found.
TEXT_HELLO=Hello, world!
TEXT_WARNING=
  There are {0} warnings in the file {1}.
TEXT_INSERT_PAPER=Please insert more paper.
TEXT_DISREGARD=
  Please disregard the man behind the {0}.

Place these key/value pairs into a file with a .properties extension. For example, you might name the file MyResource.properties, and you'd load this bundle by calling ResourceBundle.getBundle("MyResource") and load individual elements with the getString method. By default getBundle searches for a .class file, but uses the .properties file, if it exists, instead of the .class file.

A PropertyResourceBundle is quite easy to create and use. However, it has one significant limitation. All values are limited to string objects. In other words, you can only place text strings in a PropertyResourceBundle. This may not be important to you, but if it is you must use a different type of bundle. The ListResourceBundle may be more appropriate if you need more complex key/value pairs.

ListResourceBundle

The ListResourceBundle is a little more complex than PropertyResourceBundle, but offers more features. For example, although a PropertyResourceBundle can only store text, a ListResourceBundle can contain any type of Java object. ListResourceBundle is abstract, so you must subclass it to create a usable class. See the following figure.

Like a PropertyResourceBundle, your ListResourceBundle contains a list of key/value pairs. However, these pairs are arranged as elements in a two-dimensional array of java.lang.Object. Your subclass must provide a single method getContents, as well as an Object array that lists your key/value pairs.

// MyResource.java
import java.util.ListResourceBundle;

public class MyResource 
  extends ListResourceBundle {

    public Object[][] getContents() {
        return contents;
    }

    public static Object[][] contents = {
      { "TEXT_NOT_FOUND", 
        "The file could not be found." },
      { "TEXT_HELLO", 
        "Hello, world!" },
      { "TEXT_WARNING", 
        "There are {0} warnings in the file{1}." },
      { "TEXT_INSERT_PAPER", 
        "Please insert more paper." },
      { "TEXT_DISREGARD", 
        "Please disregard the man behind the{0}." },
    };
}

Character Sets

The Java language has simplified the storage, manipulation, and representation of characters by using Unicode to represent text. Unicode is a 16-bit character set, which simply means that it can define 2 16 characters. Each character is uniquely identified within the set. When using regional character sets, you often had to store the character-set identifier along with the character or stream of characters so that you could distinguish among the different characters with the same code point across the various sets. Using Unicode, you no longer need to worry about overlapping code points.

Although you may be unfamiliar with Unicode, you needn't worry too much about how to use it. It is freely available in Java. If you do nothing at all, your application will use Unicode to represent text. The String class uses Unicode so you don't need to do anything special to get support in strings. However, if you have to maintain legacy data in a regional character set, you can use the numerous character converters that Java technology provides.

Using the character converters, you can convert your Unicode text to a regional character set. You can also convert from a regional character set to Unicode. So, although the Java language uses Unicode, it also allows you to maintain your older data if necessary.

Layout Managers

Layout managers are important in an international application because they compensate for two frustrating problems associated with translated user interfaces:

  • Expanding and shrinking text lengths
  • Component positions

First, translated text is often shorter or longer than the original text. Layout managers are important in an international application because they expand and shrink component size depending on the length of the text used for labels.

Second, a layout manager relieves the frustration associated with trying to position components as a result of text length differences. If you usually lay out UI components on an X-Y grid, you have no doubt noticed that those positions must change after translations. However, using a layout manager, you position components relative to each other, not necessarily by hard-coded pixel positions. This means you can write your UI code once and run it anywhere.

The Java Development Kit (JDK) 1.1.6 supplies at least five layout managers, and you can pick up quite a few different ones from the Internet. And of course you can create your own. For more information about layout managers, please see Exploring AWT Layout Managers.

Conclusion

The Java class libraries provide many tools to help you create excellent global applications. By supplying international solutions in the base class libraries, Sun helps developers create reliable, stable products. The solutions are used and tested by everyone that uses the product. Developers are not burdened with the task of solving these problems over and over again.

If you commit to using these features now, you'll save yourself lots of headaches later. In general,these JDK features are easy to use, but more importantly, they are easier to learn and use than to retrofit or fix applications that don't attempt to address the issues at all. If you're interested in updating an existing application for an international audience take a look at A Checklist for Internationalizing an Existing Program. The best way to learn about the JDK's international features is to use them.

So start writing some code, experiment, and have fun.

_______
1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.

John O'Conner teaches software internationalization topics and consults for global development projects. He also enjoys speaking Japanese, playing softball, and spending time with his family.