International Enhancements in Java SE 6

   
By John O'Conner and Naoto Sato, March 2007  

Articles Index

 
 

One important strength of the Java Platform, Standard Edition (Java SE) has always been its internationalization and localization support. The platform continues to evolve, and Java SE 6 provides developers even more control over how they access and use locale-sensitive resources in their applications. Java SE 6 provides the following major enhancements to its internationalization support:

 
 
 
Resource Control and Access

To provide localized resources in applications, programmers should use resource bundles as defined by the java.util.ResourceBundle class. This class initiates the searching and loading of localized resources when you invoke its static getBundle method. The method returns a ResourceBundle instance that is responsible for providing the localized text, images, and other elements for a target locale. A locale is a cultural identifier defined by a language and geographical region.

Although the default algorithms for searching and loading bundles are well defined, Java SE 6 more clearly specifies resource caching and provides you more control over how your applications search and load localized resources. Applications should continue to use ResourceBundle methods to retrieve resources, but new features allow you great flexibility in how and where you store the actual content that ResourceBundle objects provide.

Before Java SE 6, programmers usually stored localized content in either a subclass of ListResourceBundle or as a properties file. Now, however, you can specify different formats for your resource files. You can, for example, create and use an XML-formatted resource file. You might also decide to change the default naming scheme for localized files. This extra control is available from custom ResourceBundle.Control classes that you can implement.

The ResourceBundle.Control class exposes the major steps of the bundle-loading process. Each step has a separate method in the class. You can override those methods to provide customized strategies for searching, loading, and caching resources. Because the Control class defines methods that implement the existing default strategies, you have to implement only the customized functionality that you want for your particular subclass. By providing your own Control subclass to the getBundle method, you control exactly how your application finds and uses localized resources.

Of course, you don't have to create your own Control class. You can always use the predefined, default Control. The default class provides methods that implement the default behavior. In the following example, the getBundle method uses the default Control:

Locale targetLocale = new Locale("fr", "FR"); // French language, French region
ResourceBundle myResources = getBundle("com.sun.demo.intl.AppResource", targetLocale);
 

If your host's default locale is en_US, the default Control object searches for the following localized AppResource names in this example:

com.sun.demo.intl.AppResource_fr_FR
com.sun.demo.intl.AppResource_fr
com.sun.demo.intl.AppResource_en_US
com.sun.demo.intl.AppResource_en
com.sun.demo.intl.AppResource
 

For each bundle name in the preceding list, the default Control searches for two implementation formats: ResourceBundle subclasses ( .class format) and PropertyResourceBundle property files ( .properties format). When it finds a resource in either format, it determines the bundle's parent chain and returns the ResourceBundle instance. Notice also that the bundle names use locale-specific suffixes -- for example, fr_FR, fr, and en_US -- to differentiate among the various localized bundles with the same base name, AppResource. Additionally, the default behavior caches bundles. Repeated invocations of getBundle return cached resources if you ask for the same bundle name. The Java platform documentation describes the getBundle method behavior in detail.

In some situations, you may prefer a different bundle-loading strategy. The next few sections describe scenarios that differ from the default. The scenarios are the following:

Properties-Only Searches

Some bundle-loading strategies don't require a fully customized Control subclass. Instead, use the Control class's static getControl method to enforce some standard options that differ only slightly from the default. For example, if your application uses properties files exclusively, you can avoid the overhead of searching for ResourceBundle subclasses. Instead, you can retrieve a control that searches only for properties files.

Call the Control.getControl method with a List<String> of file formats that should be supported. The predefined string values are java.class and java.properties. Three static final constants define a nonmutable list containing each list option:

  • FORMAT_CLASS defines a list containing the string java.class.
  • FORMAT_PROPERTIES defines a list containing the string java.properties.
  • FORMAT_DEFAULT defines a list containing both java.class and java.properties.

Use the Control.FORMAT_PROPERTIES constant to create a Control object that searches for properties files only:

Control propOnlyControl = Control.getControl(Control.FORMAT_PROPERTIES);
ResourceBundle bundle = ResourceBundle.getBundle("com.sun.demo.intl.res.Warnings",
                                                 propOnlyControl);
 

Using the propOnlyControl instance, the getBundle method ignores bundle file names ending in .class, and the method searches only for bundles ending in .properties.

Locales as Part of the Package Name

Different localizations of the same base bundle name are usually differentiated by locale identifier suffixes. For example, the default or root Warnings bundle is simply named Warnings.properties. However, a French version of that bundle would be named Warnings_fr_FR.properties. Using the default Control, these different bundles would exist together in the same package. But you can change the way that localized bundles are named.

Imagine that you prefer to put different localizations of the same bundle into separately defined subdirectories or packages. You might create the following property files in your file structure:

com/sun/demo/intl/res/root/Warnings.properties
com/sun/demo/intl/res/fr_FR/Warnings.properties
com/sun/demo/intl/res/ja_JP/Warnings.properties
 

To do this, you must define your own Control subclass. The subclass must override the following methods:

  • getFormats
  • toBundleName

Override the getFormats method because your application will use only properties files. Override the toBundleName method because your application will use the specified locale as part of the new bundle's package name rather than append locale-specific suffixes to the bundle base name.

The following code shows a customized Control class that allows bundle searches for .properties files and locale-specific package names instead of bundle-name suffixes.

class SubdirControl extends Control {

  // This control provides only properties file formats.
  public List<String> getFormats() {
    return Control.FORMAT_PROPERTIES;
  }

  // Given a base bundle name and a locale, this
  // method creates a bundle name for a specific locale.
  // In this case, the bundle name uses the locale as a part
  // of the package name, not a bundle-name suffix.
  //
  public String toBundleName(String bundleName, Locale locale) {
    StringBuffer localizedBundle = new StringBuffer();
    // Find the base bundle name.
    int nBaseName = bundleName.lastIndexOf('.');
    String baseName = bundleName;
    // Create a new name starting with the package name.
    if (nBaseName >= 0) {
      localizedBundle.append(bundleName.substring(0, nBaseName));
      baseName = bundleName.substring(nBaseName+1);
    }
    String strLocale = locale.toString();
    // Now append the locale identification to the package name.
    if (strLocale.length() > 0 ) {
      localizedBundle.append("." + strLocale);
    } else {
      localizedBundle.append(".root");
    }
    // Now append the basename to the fully qualified package.
    localizedBundle.append("." + baseName);
    return localizedBundle.toString();
  }
}
 

The following code shows how to provide the customized Control object to the getBundle method:

String bundleName = "com.sun.demo.intl.res.Warnings";
SubdirControl control = new SubdirControl();
Locale locale = new Locale("fr", "FR");
ResourceBundle bundle = ResourceBundle.getBundle(bundleName, locale, control);
 

If the default locale is en_US, the getBundle method will use the Control object to search the following candidate and fallback bundle names:

com.sun.demo.intl.res.fr_FR.Warnings
com.sun.demo.intl.res.fr.Warnings
com.sun.demo.intl.res.en_US.Warnings
com.sun.demo.intl.res.en.Warnings
com.sun.demo.intl.res.root.Warnings
 

Cache Controls

The default behavior for loading bundles includes a check to determine whether the bundle has already been loaded. However, you can change this cache option. If you simply want to clear the cache before a bundle reload, you can call the clearCache method of the ResourceBundle class:

ResourceBundle.clearCache();
ResourceBundle myBundle = ResourceBundle.getBundle("com.sun.demo.intl.res.Warnings");
 

You can even control a cache's expiration by setting a custom time-to-live value. In your own Control, override the getTimeToLive method to return the millisecond lifetime value for the bundle. Two predefined values exist: TTL_DONT_CACHE and TTL_NO_EXPIRATION_CONTROL.

The default Control returns TTL_NO_EXPIRATION_CONTROL, which means that bundles are cached without any expiration value. The value TTL_DONT_CACHE indicates that the bundle must not be cached at all. If you would like your bundles to expire every four hours to support live updates without restarting your application, for example, you can override the getTimeToLive method like this:

public long getTimeToLive() {
return 4L*60*60*1000; // 14,400,000 milliseconds is four hours.
}
 

The Control class offers many options for specifying precise bundle searching and loading. This article presents only a few. Some of the other methods you may override include the following:

  • getCandidateLocales
  • getFallbackLocale
  • newBundle
  • needsReload

See the complete platform documentation for the ResourceBundle.Control class for more details on these and other methods.

Locale-Sensitive Services

The java.text and java.util packages support more than 100 locales. Although existing locales represent the needs of many geographical regions, the locale-sensitive classes in the Java platform do not yet support some areas. Supporting a locale and its data requires a lot of research, including investigating and confirming date and number formats, country name translations, and sort orders. Sometimes even political influences affect locale data. Unfortunately, it is practically impossible to keep the platform's locale data completely up-to-date at all times, even though customers want and need access to new locale data in the platform.

One solution is to provide new application programming interfaces (APIs) that allow you to use any locale data that you may need for your own application. Java SE 6 provides an interface that developers can use to plug in their own locale data and related services. Fortunately, an active project called the Common Locale Data Repository (CLDR) attempts to track global locale data and maintain it. The Unicode Consortium hosts the project. Using the new Locale-Sensitive Services Service Provider Interface (SPI), you can use this or any other locale data in your application.

To provide locale data and services, you must first decide which functionality you want to provide to the application. You can provide locale data for the following locale-sensitive classes:

  • java.text.BreakIterator
  • java.text.Collator
  • java.text.DateFormat
  • java.text.DateFormatSymbols
  • java.text.DecimalFormatSymbols
  • java.text.NumberFormat
  • java.util.Currency
  • java.util.Locale
  • java.util.TimeZone

Once you decide which functionality you want to provide with your locale, you must implement the corresponding service provider interface (SPI), which resides in either the java.text.spi or java.util.spi packages.

Imagine that you want to provide a DateFormat object for a new locale. You should implement the java.text.spi.DateFormatProvider class. Because java.text.spi.DateFormatProvider is an abstract class, you must extend it and implement the following methods:

  • getAvailableLocales
  • getDateInstance
  • getDateTimeInstance
  • getTimeInstance

Notice that getAvailableLocales method is actually derived from the parent class LocaleServiceProvider, so all the SPI providers should implement it to declare their supported locales. Notice that the other three methods are mirrored factory methods from the corresponding API class. For example, the getDateInstance method also exists in the java.text.DateFormat class.

After implementing the required methods, you must package your service so that you can deploy it with the Java Runtime Environment (JRE). Because the Locale-Sensitive Services SPIs are based on the standard Java Extension Mechanism, you can package them as a JAR file in the JRE extension directory. JREs that use your extension can now provide locale data for previously undefined or unsupported locales.

Text Normalization

The Unicode Standard allows users to create equivalent text in different ways. For example, the é character, named LATIN SMALL LETTER E WITH ACUTE in the Unicode Standard, has the point value U+00E9. The base character of e and the acute accent mark, ´, are combined into a single code point called a composite or composed character.

However, you can also represent the same visual character by combining the two separate code points for the lowercase letter e and the acute accent. The two code points U+0065 and U+0301 combine to create the same visual and semantic effect, which is the é character. The combining characters are sometimes called a combining sequence. Other characters in the Unicode Standard can combine to create similar effects with different combining sequences and character forms.

Some combining sequences are visually different but have the same meaning for most practical purposes. For example, the three-character sequence 1/2 has essentially the same meaning as the single character ½, or U+00BD. Similarly, the character 2 and the superscript character ² are visually different but similar in meaning. These similarities among characters provide many opportunities for users to enter text in many different ways. As you might imagine, text operations such as searching and sorting become quite complicated if you must consider all the various ways to form equivalent text.

The Java platform's java.text.Collator class understands Unicode text forms and normalizes text for accurate comparisons. The normalization process converts text from disparate text forms to a single form that allows accurate text processing. Until Java SE 6, Collator used private APIs to normalize text. However, those APIs are now public in Java SE 6.

Use java.text.Normalizer to normalize text. You might want to normalize text for text-processing operations, serialization, transfer, or database storage. The API has only two static methods: normalize and isNormalized. As you might expect, the normalize method will normalize text into a specific form. The isNormalized method checks whether text is already normalized to a specific form.

The Normalizer.Form enumeration represents each Unicode normalization form:

  • NFD (Normalization Form D)
  • NFC (Normalization Form C)
  • NFKD (Normalization Form KD)
  • NFKC (Normalization Form KC)

NFD is canonical decomposition. The decomposition process converts composed character forms to combining sequences as mapped by the Unicode Standard. For example, the single code point U+00F1 for the ñ character becomes the decomposed combining sequence U+006E U+0303 in NFD. The new sequence contains the common character n followed by a combining tilde, ˜.

NFC is canonical decomposition followed by canonical composition. After canonically decomposing text, the process maps combining sequences into standard composed code points. For example, applying NFC to the sequence U+0065 U+0300 creates just a single code point: U+00E8, or è. NFC is the World Wide Web Consortium's recommended normalization form to transfer and process text on the Internet.

NFKD is compatibility decomposition. After applying canonical decomposition, the process applies a compatibility mapping that transforms some characters to a standard compatible form. Compatibility is determined by a predefined mapping from one character to another character, and the Unicode Standard defines and maintains the mappings. NFKD creates noticeable changes to the TRADE MARK SIGN character, which has code point U+2122. Compatibility decomposition transforms the single code point to the characters TM, which are the common characters for LATIN CAPITAL LETTER T ( U+0054) and LATIN CAPITAL LETTER M ( U+004D).

NFKC is compatibility decomposition followed by canonical composition. This normalization form tries to create composed characters that are compatible to the original characters. Equivalent compatible characters are defined by the Unicode Standard. If you apply NFKC to the code point U+1E9B, LATIN SMALL LETTER LONG S WITH DOT ABOVE, the decomposition step creates the sequence U+017F U+0307. Finally, the composition step transforms the sequence to a single composite character U+1E61.

The following code sample shows how to use the Normalizer class to transform text to Normalization Form D ( NFD):

String strName = "Jos\u00E9"; // using a composed é
String strNFD = Normalizer.normalize(strName, Normalizer.Form.NFD);
 

The resulting string strNFD now contains five code point values: Jose´. These characters have the Unicode values U+004A U+006F U+0073 U+0065 U+0301.

You can also test whether text is in a specific normalization form using the isNormalized method:

boolean bNormalized = Normalizer.isNormalized(strNFD, Normalizer.Form.NFD);
System.out.printf("NFD? %b\n", bNormalized);
 
International Domain Names

The Internationalizing Domain Names in Applications (IDNA) standard, defined by RFC 3490, describes the fact that domain names are no longer restricted to the ASCII character set. With a few restrictions, the full set of characters in the Unicode 3.2 standard are available to define domain names. Unfortunately, domain name server (DNS) and resolver services are not all fully capable of reliably storing and using non-ASCII characters. The IDNA solution defines a method for representing non-ASCII characters with an encoding that uses only ASCII characters. The result is that DNS and name resolver software continue to function with an ASCII-compatible encoding (ACE), but end users can use internationalized domain names using an expanded set of Unicode characters.

Java SE 6 supports the IDNA standard by providing the java.net.IDN class. This class provides methods for converting a Unicode domain name to an ASCII-compatible name. The available operations are toASCII and toUnicode. Applications should convert domain names to ACE using the toASCII method before submitting the domain names to DNS or resolver services. Applications can use the toUnicode method to create the Unicode text that users should see.

If you enter a non-ASCII domain name into your application, the application should convert the name using the toASCII method before submitting it across the Internet.

// Retrieve the domain name from the user interface.
String strUnicodeName = txtUnicodeName.getText();
// Convert the international domain name to
// an ASCII-compatible encoding.
String strACEName = IDN.toASCII(strUnicodeName);
 

Using the Japanese domain name shown in Figure 1, the conversion stores the text xn--wgv71a119e.jp in the strACEName variable. The new text is the ACE equivalent of the Japanese domain name.

Figure 1. The IDN class creates an encoded equivalent name for DNS and resolver software.
 

The text xn--wgv71a119e doesn't mean anything to most people. It's encoded text, suitable for machine or software consumption. Your applications should use the toUnicode method to convert these ASCII-encoded names into a suitable form that people can typically read and understand. The following code snippet shows how to convert the text back to its original form:

String strACEName = txtACEName.getText();
String strUnicodeName = IDN.toUnicode(strACEName);
 
Japanese Calendars

Your customers in Japan frequently use two calendars, the Gregorian calendar and the traditional Japanese Imperial calendar. Although everyone is familiar with the Gregorian calendar, and it may be used more often than not, the Japanese government often uses the Imperial calendar in its forms and documents. The Imperial calendar defines eras based on the reigning period of Japanese emperors.

The Java platform provides calendars by way of the getInstance method of the java.util.Calendar class. You can construct a Japanese Imperial calendar by using the locale ja_JP_JP like this:

Calendar calJapanese = Calendar.getInstance(new Locale("ja", "JP", "JP"));
 

Once you've created the calendar, you can use it to set, retrieve, and manipulate dates using Imperial calendar rules for era and year names.

The difference between Gregorian and Imperial calendars is most obvious when you format dates. The java.text.SimpleDateFormat and java.text.DateFormat classes support date formats for the new calendar. Create a formatter and display the current date using code like this:

Date now = new Date();
Locale localeJapanese = new Locale("ja", "JP");
Locale localeImperialJapanese = new Locale("ja", "JP", "JP");
DateFormat dfGregorian = DateFormat.getDateInstance(DateFormat.FULL, localeJapanese);
DateFormat dfImperial = DateFormat.getDateInstance(DateFormat.FULL, localeImperialJapanese);
String strGregorianDate = dfGregorian.format(now);
String strImperialDate = dfImperial.format(now);
txtGregorianDate.setText(strGregorianDate);
txtImperialDate.setText(strImperialDate);
 

When using the ja_JP locale, DateFormat produces a Gregorian date using Japanese characters for year, month, and day terms. When using the ja_JP_JP locale, DateFormat creates a date string using the new Imperial calendar. Figure 2 shows a date in which the Gregorian year 2007 shows as the Imperial year Heisei 19.

Figure 2. Java SE 6 provides support for the Japanese Imperial calendar.
 
New Supported Locales

With Java SE 6, the already long list of supported locales just got longer. The platform now includes new locales that are fully supported by the various locale-sensitive classes. Locale data comes from the increasingly popular CLDR data. Although the platform uses CLDR data for the new locales, pre-existing locales in the platform are unaffected.

Table 1: New Locales Available in Java SE 6
 
 
Language
Country
Locale ID
Chinese (Simplified)
Singapore
zh_SG
English
Malta
en_MT
English
Philippines
en_PH
English
Singapore
en_SG
Greek
Cyprus
el_CY
Indonesian
Indonesia
in_ID
Irish
Ireland
ga_IE
Japanese (Japanese Imperial calendar)
Japan
ja_JP_JP
Malay
Malaysia
ms_MY
Maltese
Malta
mt_MT
Serbian
Bosnia and Herzegovina
sr_BA
Serbian
Serbia and Montenegro
sr_CS
Spanish
United States
es_US
 
Summary

Java SE 6 updates the platform's already extensive internationalization support by opening up the platform to allow developers more control over how resources are found and cached. Also, using the Locale-Sensitive Services SPI, you can add locale support that is not already in Java SE 6. The Normalizer class is no longer private. You can use it to normalize text into the four forms defined by the Unicode Standard: NFC, NFD, NFKC, NFKD. You don't have to limit your application to plain ASCII domain names. The IDN class provides an API to convert non-ASCII domain names to usable ASCII-compatible encodings suitable for interacting with DNS and resolver services. You can now use and format dates using the Japanese Imperial calendar. Finally, more than a dozen new locales are available, and their data comes from the Unicode Consortium's CLDR project. The new CLDR-based locale definitions do not affect existing locales.

For More Information
About the Authors

John O'Conner is an engineer and writer at Sun Microsystems. He coaches Little League baseball and AYSO soccer teams, which are always populated by at least one of his five children.

Naoto Sato is a Java internationalization engineer in the client software group at Sun Microsystems. Currently, his work is focused on enhancements of locales in the Java platform. Before joining Sun, he worked with the internationalization team at IBM Japan.

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.