Even Better Java i18n Pluralisation using ICU4J

Last week I wrote about Java’s built-in ChoiceFormat class and the support it provides for pluralisation. It is a very useful class, but as pointed out by two commenters (btw… thanks for the feedback!) it doesn’t cater well for all languages – particularly those that have more complex rules. This led me to investigate further, as I was certain there would be something useful out there – after all, internationalisation is a very common requirement of a large number of applications. After a little digging, I found that the one library that stands out is ICU4J. So who uses it? Well… pretty much everyone!

So for those that have more complex internationalisation requirements, this is an excellent library to use! I generally find that the best way to find out how something works is to see an example, so I’ve used the pluralisation example provided in the comments of my previous post to demonstrate ICU4J. I chose this example for a few reasons: firstly, because someone took the time to ask a question and I want to answer it; secondly, because it is clearly not supported by the JDK ChoiceFormat class; and lastly, because I only know languages with simple pluralisation rules.

I wrote a very basic class that simply prints out a localised message looked up from a ResourceBundle – which is probably the most commonly used approach and therefore familiar to most readers.

import com.ibm.icu.text.MessageFormat;

import java.util.Locale;
import java.util.ResourceBundle;

public class IcuDemo {

    private static final int[] NUMBERS = new int[] {0, 1, 2, 5, 11, 22, 39};

    public static void main(String[] args) {
        printLocalisedMessages("plural", Locale.ENGLISH, new Locale("pl"));
    }

    private static void printLocalisedMessages(String key, Locale... locales) {
        for (Locale locale : locales) {
            System.out.println(locale.getDisplayLanguage() + ":");
            printLocalisedMessage(key, locale);
        }
    }

    private static void printLocalisedMessage(String key, Locale locale) {
        ResourceBundle bundle = ResourceBundle.getBundle("icu", locale);
        String pattern = bundle.getString(key);
        MessageFormat msgFormat = new MessageFormat(pattern, locale);

        for (int i : NUMBERS) {
            System.out.println(msgFormat.format(new Object[] {i}));
        }

        System.out.println();
    }
}

The code above should be familiar to everyone, as it shouldn’t be all that different from how you’re already doing i18n. However, note that I’ve imported com.ibm.icu.text.MessageFormat instead of the usual java.text.MessageFormat. The really interesting part comes in when we use ICU4J’s “plural” format type, which is shown in the following properties files:

icu.properties:

plural=Undefined

icu_en.properties:

plural={0} {0, plural, one{car}other{cars}}

icu_pl.properties:

plural={0} {0, plural, one{auto}few{auta}many{aut}other{aut}}

I’m sure you’ll immediately notice that I’m not specifying numbers in these patterns, as we did with ChoiceFormat. Instead, I’m simply referring to categories of numbers by predefined mnemonics. This really cool feature is available because a number of language pluralisation rules have already been defined by the Unicode CLDR (Common Locale Data Repository). In particular, we’re using the Language Plural Rules, which are provided in the ICU4J package. To explain how this works, let’s look at the English example and then work our way up to the Polish example.

English has two categories – singular/plural. These two categories are named as “one” and “other” – fairly straightforward. What this really means in terms of plural rule definition is:

one: n is 1
(by implication, every other number falls into the "other" category)

Polish is more complex than this and requires a number of rules to be defined:

one: n is 1
few: n mod 10 in 2..4 and n mod 100 not in 12..14
many: n is not 1 and n mod 10 in 0..1 or n mod 10 in 5..9 or n mod 100 in 12..14
(by implication, every other number falls into the "other" category)

Clearly the definition of rules makes our lives a lot easier. All we need to know is which category of numbers we want to provide a pluralisation for, and define the message against that name using the format “keyword{message}”.

Note: The CLDR points out that the names are just mnemonics and aren’t inteded to describe the exact contents of the category, so try not to focus too much on them. It’s merely providing categorisation by a recognisable name.

The above example only uses the predefined number categories, but we could easily mix this with explicit values if needed. In this case, the explicit values would be checked first for an exact match, and if none was found then the categories would be searched, and failing that the “other” category would be used. Here’s an example of how you can mix the two concepts together:

example={0, plural, =1{one}=5{five}other{#}}

If we formatted this with the numbers 1 to 5 in a loop, this would be formatted as follows:

one
2
3
4
five

Of course, there may be circumstances where the predefined rules don’t do what you want (although, we’re probably talking about exceptional circumstances now). In this case, you can simply define your own set of rules. This can be done using the PluralRules class or by customising the locale data that’s available to ICU4J.

I’ve only scratched the surface of what you can do with this library – and pluralisation is only one very small part of what it provides – but I hope this is useful and is able to help get you started using it.

Advertisements

2 Comments on “Even Better Java i18n Pluralisation using ICU4J”

  1. Paweł Dyda says:

    I am sure you know that but for other people’s benefit:
    Beware of MissingResourceException! Always read from bundles bundle.getString(key) inside of try-catch block. Otherwise pretty bad things could happen if translator remove a key or simply translate some of its part (trust me, this happens).

    Very interesting article, thanks!

  2. zenbaku says:

    Hi, I’d really appreciate if you could help me on this. I’m trying to use ICU4J to do some big calculations using their BigDecimal extended class, but I’m unable to import it and using it. I already have the .jar file, but I don’t know wher to put it neither how to call it inside my java files.

    Many thanks in advance!