Articles
Java Platform, Standard Edition
|
| |
Customers expect products to conform to their cultural preferences, especially when it comes to language and data formats. You've probably been involved in creating applications in C, C++, or a 4GL that accommodate those expectations, but do you know how to write great global applications in the Java programming language?
Creating a global application isn't particularly difficult, but it does require you to become familiar with the most common international problems and their solutions. The problems associated with creating an international application are basically the same from one computing environment and language to any other. Solutions are roughly equivalent as well, although their implementations obviously differ among the various computing environments and programming languages. This article gives an overview of internationalization topics and concepts in a Java programming environment, and covers the following features available in the Java Development Kit 1.1.
Locale
Locales are used throughout the Java class libraries to customize how data is presented and formatted. They affect language choice, collation, calendar usage, date and time formats, number and currency formats, and many other culturally sensitive data representations. If you intend to create international Java applications, you'll definitely use the
java.util.Locale class. There's no getting around it; you'll use
Locales to create well-behaved, internationalized, multilingual Java applications. So, if you haven't had time to explore all the JDK 1.1 international features yet, you'll get a clearer understanding of the core of the internationalization model, the
Locale, as you read and understand the descriptions and examples in this article.
A
Locale is a relatively simple object. It identifies a specific language and a geographic region. In fact, the only significant contents of a
Locale object are language and country. Although superficially these attributes are not particularly impressive, they represent a very rich and interesting set of information. A
Locale object represents the language and cultural preferences of a geographic area. Language is a fairly easy idea to grasp;
cultural preferences may not be immediately clear. Dates, time, numbers, and currency are all examples of data that is formatted according to cultural expectations. Cultural preferences are tightly coupled to a geographic area; that's why country is an important element of locale. Together these two elements (language and country) provide a precise context in which information can be presented. Using
Locale, you can present information in the language and form that is best understood and appreciated by the user.
A
Locale's language is specified by the
ISO 639 standard, which describes valid language codes that can be used to construct a
Locale object. The following figure lists a few language codes in the standard. Because language is so dependent on geography, a language code might not capture all the nuances of usage in a particular area. For example, Canadian French and Swiss French may use different phrases and terms to mean different things even though basic grammar and vocabulary are the same. For this reason, language is only half of a well-constructed
Locale object.
| Language Code | Language |
en
|
English |
fr
|
French |
zh
|
Chinese |
ja
|
Japanese |
The
Locale's country identifier is also specified by an ISO standard,
ISO 3166, which describes valid two-letter codes for all countries. ISO 3166 defines these codes in uppercase letters. The following figure lists a few countries that are part of the standard. Although the
Locale constructor allows lowercase letters, it promptly converts the code to uppercases to create the correct internal representation. The country code provides more contextual information for a locale and affects a language's usage, word spelling, and collation rules.
| Country Code | Country |
| US | United States |
| FR | France |
| CA | Canada |
A variant is an optional extension to a
Locale. It identifies a custom
Locale that is not possible to create with just language and country codes. Variants can be used by anyone to add additional context for identifying a Locale. The locale
en_US represents English (United States), but
en_US_CA represents even more information and might identify a locale for English (California, U.S.A). OS or software vendors can use these variants to create more descriptive
Locales for their specific environments.
Locale-sensitive objects have methods that use a
Locale parameter. These objects behave differently depending on
Locale, and they often format information for the user in ways that are culturally sensitive. These objects try to accommodate the presentation preferences of the various locales defined in the system. For example, a
DateFormat class would format a date differently depending upon locale. Also, text and other user interface (UI) elements can be searched and applied in a locale-sensitive manner.
Locale objects are used throughout a properly internationalized Java application; they are used by all other classes that have adaptable behavior or representation based on cultural, language, or geographic preferences.
Locales are defined in the
java.util package and have numerous constructor and access methods. Each of the following methods returns a
String:
getLanguage
getCountry
getVariant
toString
Locale
Locale(String language, String country)
Locale(String language,
String country, String variant)
You can use either of the constructors to create a
Locale object:
Locale myLocale = new Locale(
"en", "US"
); "en", "US",
"VENTURA");
The en represents English, and US is an abbreviation for United States. The second line shows how to create a
Locale with an optional variant, which can be used to create a more specific
Locale than what's possible with just language and country codes.
Although the Java compiler and run-time environment won't complain if you make up your own language and country identifiers, you should use the valid codes defined by ISO standards. By constraining yourself to the ISO definitions, you'll ensure compatibility with other Java applications and coding standards.
Once created, the
Locale provides access to its individual components.
getLanguage() and
getCountry() return the ISO language and country codes that comprise a
Locale object. These codes, however, aren't exactly user-friendly. They probably won't mean a lot to your customers, so if you want to display language and country information in the application, you should probably use other methods.
Locale myLocale;
String language;
String country;
myLocale = new Locale(
"en", "US");
language = myLocale.getLanguage();
country = myLocale.getCountry();
System.out.println(language);
System.out.println(country);
OUTPUT:
en
US
|
getDisplayLanguage() and
getDisplayCountry() will return String objects that are suitable for display to the customer. These methods are locale-sensitive, meaning that you can provide a
Locale parameter to ask for a language or country string in a target language.
Locale myLocale = Locale.getDefault(); System.out.println( myLocale.getDisplayLanguage(); System.out.println( myLocale.getDisplayCountry(); System.out.println( myLocale.getDisplayLanguage(Locale.FRENCH)); System.out.println( myLocale.getDisplayCountry(Locale.FRENCH)); OUTPUT: English United States anglais États-Unis |
The
Locale class provides some static final
Locales that are commonly used. If you don't provide an overriding
Locale parameter, both
getDisplayLanguage and
getDisplayCountry will return their information in the language of the default locale. Some other examples are provided in the following figure.
| Locale Identifier | Meaning |
| en_US | English (U.S.) |
| fr_CA | French (Canadian) |
| fr_FR | French (France) |
| ja_JP | Japanese (Japan) |
| en_US_MAC | English (U.S., Macintosh) |
The string representation of a
Locale can be created with the following:
String strLocale = myLocale.toString();
The method
toString() will return a String in the form <language code>_<country code>[_<variant code>]. In the above example,
toString() will return
en_US. Notice that an underscore character separates each
Locale component.
When the Java
1
Virtual Machine (JVM) starts up, it queries the underlying OS for a default-locale setting. You can discover your default locale programmatically. You can even change the default locale if you want to. Both of these operations are accomplished via static methods within the
java.util.Locale class.
myLocale = Locale.getDefault(); System.out.println(myLocale.toString()); Locale.setDefault(Locale.GERMANY); myLocale = Locale.getDefault(); System.out.println(mylocale.toString()); OUTPUT: en_US de_DE |
Note: As recently as JDK 1.1.6,
![]()
Locale.setDefault()causes a security exception in applets, so you might want to avoid this call in applets. As a workaround, instead of relying on the default locale, you can explicitly pass aLocaleobject to every locale-sensitive object you use. It's inconvenient, but it's a relatively easy fix to implement, especially if you're creating applets. You don't need to worry about the problem in applications, because you have more security rights on the local machine.
![]()
There are two additional methods that might interest you. They are
getISO3Language and
getISO3Country. When creating
Locales, you always use the two letter ISO codes, but if you want to see them, you can use these methods to retrieve ISO's three letter codes for the same information.
After declaring the
Locale as the core of Java internationalization, it might sound contradictory to say that this class doesn't do a lot on its own. A
Locale's power comes from the classes that use it. In a Java application, each locale-sensitive object is responsible for its own locale-dependent behavior. A
Locale object doesn't enforce this behavior, it simply acts as an indicator to other objects. Those objects are then responsible for using the
Locale appropriately. By design, locale-sensitive classes are independent of each other. That is, the set of supported
Locales in one class does not need to be the same as the set in another class. In practice, however, the current JDK 1.1.6 provides support to a single, shared set of locales.
In traditional operating systems and localization models, one locale setting is active at a time. You programmatically set the locale. Thereafter, all locale-sensitive functions use the specified locale selection. The specified locale is active throughout the application as a global locale. It changes when there is another global locale activation via a
setlocale or similar call. Java technology, however, treats locales a little differently. A Java application can have multiple locales active at the same time. That is, it's possible to use a French date format and a U.S. number format in the same application. Nothing limits you from creating truly multicultural and multilingual Java applications.
What number does 1,234 represent? Of course, the answer depends on locale. In the U.S, this string of digits represents one thousand two hundred and thirty four. However, in France this represents one and two hundred thirty four one-thousandths. Significant difference? Absolutely! Imagine you're a chemical manufacturer that just received an order for 1,234 kilograms of a certain chemical. Your interpretation of this number will definitely affect your sales quotas for the month.
Numbers are represented differently around the globe. When an application shows a number to the user, it must represent that number in a way that is sensitive to the cultural expectations regarding decimal point symbol, group separators, number of digits after the decimal, and leading zeros.
The
java.text.NumberFormat class performs locale-specific formatting for both general purpose numbers. To instantiate a
NumberFormat object, use the factory method
getInstance, which returns a
NumberFormat object suitable for your default locale. You can, of course, ask for an object with a specific locale in mind. To specify a locale other than your default, use
getInstance(Locale locale).
If you are curious about what locales are supported, you can use the class method
getAvailableLocales. This method returns an array of
Locales.
Formatting a number couldn't be easier. Call the instance methods
format(long number) or
format(double number) to produce a
String object that's suitable for displaying to the user. Other methods allow you to customize the format by turning various options on or off.
Each locale has its own preferences for currency symbols, negative amount format, leading zeros, group separators, decimal point symbol, and currency symbol position. Currency and numbers have a lot in common. In fact, they even use the same basic format class,
NumberFormat, to instantiate new objects.
Although you still use
NumberFormat, you call a different factory method to get a currency format object,
getCurrencyInstance. This method will return a currency format object for the default locale. You can use this factory method just like you used the number factory method; call
getCurrencyInstance(Locale locale) to specify a specific locale. Again, use the
format method to produce a user visible
String object. The currency formatter will handle all the details of selecting the correct currency symbol, placing that symbol in the string, and applying grouping rules. Also, like the number formatter, you can override several options to customize the format.
A date helps to uniquely identify a point in time. Like other locale-sensitive structures, dates have many representation details. You must consider long and short date formats as well as date separator symbols. You have to worry about whether the year is displayed before the day and month or after. Again, the Java class libraries accommodate these needs.
The java.text.DateFormat class provides the getDateInstance method that creates a formatter for your default locale. The format method works in the same way as the other format methods covered so far, and applies the specific format rules for your chosen locale.
The
java.text.Calendar class is closely related to
Date, and lets you extract year, month, week, and day information from a
Date. You won't use
Calendar directly. Instead, use
getCalendarInstance to get a calendar object for your locale. The Gregorian style calendar is the only one provided at this time; however, you can create your own by subclassing
Calendar.
This internationalization feature of the JDK provides a mechanism for separating user interface (UI) elements and other locale-sensitive data from the application logic in a program. Separating locale-sensitive elements from other code allows easy translation. It allows you to create a single code base for an application even though you may provide 30 different language versions. Although you might be predisposed to think of text only, remember that any localizable element is a resource, including buttons, icons, and menus.
The JDK uses resource bundles to isolate localizable elements from the rest of the application. The resource bundle contains either the resource itself or a reference to it. With all resources separated into a bundle, the Java application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.
Resource bundle names have two parts: a base name and a locale suffix. For example, suppose you create a resource bundle named
MyBundle. Imagine that you have translated
MyBundle for two different locales, ja_JP and fr_FR. The original
MyBundle will be your default bundle, the one used when others cannot be found, or when no other locale-specific bundles exist. However, in addition to the default bundle, you'll create two more bundles. In the example these bundles would be named
MyBundle_ja_JP and
MyBundle_fr_FR. The
ResourceBundle.getBundle method relies on this naming convention to search for the bundle used for the active locale.
The
java.util.ResourceBundle class is abstract, which means you must use a subclass of
ResourceBundle. The JDK provides two subclasses:
PropertyResourceBundle and
ListResourceBundle. If these don't meet your needs, you can create your own subclass of
ResourceBundle.
PropertyResourceBundle
The
PropertyResourceBundle is the most convenient bundle to use. To use this bundle, create a property file that contains
key/value pairs in the form <key>=<value>. List each key/value pair on the same line of the file, and separate each pair with a new-line character. The following figure shows an example of
PropertyResourceBundle.
# MyResource.properties
# <key>=<value>
TEXT_NOT_FOUND=The file could not be found.
TEXT_HELLO=Hello, world!
TEXT_WARNING=
There are {0} warnings in the file {1}.
TEXT_INSERT_PAPER=Please insert more paper.
TEXT_DISREGARD=
Please disregard the man behind the {0}.
|
Place these key/value pairs into a file with a
.properties extension. For example, you might name the file
MyResource.properties, and you'd load this bundle by calling
ResourceBundle.getBundle("MyResource") and load individual elements with the
getString method. By default getBundle searches for a
.class file, but uses the
.properties file, if it exists, instead of the
.class file.
A
PropertyResourceBundle is quite easy to create and use. However, it has one significant limitation. All values are limited to string objects. In other words, you can only place text strings in a
PropertyResourceBundle. This may not be important to you, but if it is you must use a different type of bundle. The
ListResourceBundle may be more appropriate if you need more complex key/value pairs.
ListResourceBundle
The
ListResourceBundle is a little more complex than
PropertyResourceBundle, but offers more features. For example, although a
PropertyResourceBundle can only store text, a
ListResourceBundle can contain any type of Java object.
ListResourceBundle is abstract, so you must subclass it to create a usable class. See the following figure.
Like a
PropertyResourceBundle, your
ListResourceBundle contains a list of key/value pairs. However, these pairs are arranged as elements in a two-dimensional array of
java.lang.Object. Your subclass must provide a single method
getContents, as well as an
Object array that lists your key/value pairs.
// MyResource.java
import java.util.ListResourceBundle;
public class MyResource
extends ListResourceBundle {
public Object[][] getContents() {
return contents;
}
public static Object[][] contents = {
{ "TEXT_NOT_FOUND",
"The file could not be found." },
{ "TEXT_HELLO",
"Hello, world!" },
{ "TEXT_WARNING",
"There are {0} warnings in the file{1}." },
{ "TEXT_INSERT_PAPER",
"Please insert more paper." },
{ "TEXT_DISREGARD",
"Please disregard the man behind the{0}." },
};
}
|
The Java language has simplified the storage, manipulation, and representation of characters by using Unicode to represent text. Unicode is a 16-bit character set, which simply means that it can define 2 16 characters. Each character is uniquely identified within the set. When using regional character sets, you often had to store the character-set identifier along with the character or stream of characters so that you could distinguish among the different characters with the same code point across the various sets. Using Unicode, you no longer need to worry about overlapping code points.
Although you may be unfamiliar with Unicode, you needn't worry too much about how to use it. It is freely available in Java. If you do nothing at all, your application will use Unicode to represent text. The
String class uses Unicode so you don't need to do anything special to get support in strings. However, if you have to maintain legacy data in a regional character set, you can use the numerous character converters that Java technology provides.
Using the character converters, you can convert your Unicode text to a regional character set. You can also convert from a regional character set to Unicode. So, although the Java language uses Unicode, it also allows you to maintain your older data if necessary.
Layout managers are important in an international application because they compensate for two frustrating problems associated with translated user interfaces:
First, translated text is often shorter or longer than the original text. Layout managers are important in an international application because they expand and shrink component size depending on the length of the text used for labels.
Second, a layout manager relieves the frustration associated with trying to position components as a result of text length differences. If you usually lay out UI components on an X-Y grid, you have no doubt noticed that those positions must change after translations. However, using a layout manager, you position components relative to each other, not necessarily by hard-coded pixel positions. This means you can write your UI code once and run it anywhere.
The Java Development Kit (JDK) 1.1.6 supplies at least five layout managers, and you can pick up quite a few different ones from the Internet. And of course you can create your own. For more information about layout managers, please see Exploring AWT Layout Managers.
The Java class libraries provide many tools to help you create excellent global applications. By supplying international solutions in the base class libraries, Sun helps developers create reliable, stable products. The solutions are used and tested by everyone that uses the product. Developers are not burdened with the task of solving these problems over and over again.
If you commit to using these features now, you'll save yourself lots of headaches later. In general,these JDK features are easy to use, but more importantly, they are easier to learn and use than to retrofit or fix applications that don't attempt to address the issues at all. If you're interested in updating an existing application for an international audience take a look at A Checklist for Internationalizing an Existing Program. The best way to learn about the JDK's international features is to use them.
So start writing some code, experiment, and have fun.
_______
1
As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.
John O'Conner teaches software internationalization topics and consults for global development projects. He also enjoys speaking Japanese, playing softball, and spending time with his family.