Freemarker Default Number Formatting

We use Freemarker quite extensively at Betfair, and a question was raised the other day by one of the developers regarding number formatting. He had a numeric value that was being rendered with thousand separators and it wasn’t immediately obvious why. He quite rightly assumed that Freemarker would write the value of the number – unformatted – in the output. After all, this is the default behaviour of pretty much every other language / framework / tool that we use.

Unfortunately, Freemarker sees things differently. And this is where it’s REALLY important to make sure you read the documentation! Freemarker claims to be rendering for a “human audience” by default, but they’ve forgotten that Freemarker is a developer’s tool. Programmers are intelligent enough to know when they want numbers to be formatted. Automatically formatting numbers is not doing us a favour. Unless the programmer asks you to do something… don’t do it!

Scenarios like this can trip up anyone, and could possibly go unnoticed until something breaks in production. Let’s say for example that you were using a number as part of a link that you were rendering in the page. If all your testing used numbers smaller than 1000, you’d never see any problems. It’s only when you go over 1000 that the issue makes itself known. Our test data covers a large number of scenarios that quickly exposed this issue, but I can definitely think of smaller projects I’ve worked on in the past where this would not have been discovered early enough.

In fact, I think this default behaviour is dangerous. It breaks the Principle of Least Astonishment (or Principle of Least Surprise, as we call it). Not only that, but from what I can tell from the docs, it cannot be overridden. So there’s no way to change the default back to what is sensible.

Here’s an example of what happens when you render numbers using Freemarker and how to make sure you get the output you want. The example below assumes an English locale.

<#assign x = 1000>
${x}                 <#-- 1,000 -->
${x?string}          <#-- 1,000 -->
${x?c}               <#-- 1000 -->
${x?string.computer} <#-- 1000 -->

As you can see above, the default rendering of a number is formatted based on locale. If you simply want the raw number, you must explicitly specify that by using ?string.computer or use the shorthand notation ?c.

If you want more information on this, check out the docs.


Java i18n Pluralisation using ChoiceFormat

Betfair‘s site is hugely popular all around the world, and obviously needs to provide a fully localised experience for users across different locales. Yesterday I was looking at how best to provide internationalisation (i18n) support within our core platform and came across the really useful ChoiceFormat class. I had seen it before but hadn’t actually explored what it can do until now. It’s an extremely basic class in terms of the functionality it provides, but it is as powerful as it is simple.

I’m sure we all know about Java resource bundles for i18n and have probably used them quite a lot – so I won’t go into any detail on that. I’ll just assume it’s common ground. However, if you haven’t come across java.text.ChoiceFormat, you’re missing out! This is the guts of pluralisation within the i18n stable, and is definitely your friend when you realise that users don’t like messages that say “Updated 1 second(s) ago”. You know it’s 1 second, right? So why not just say that and skip the “(s)” bit? I’m sure many have tried to solve this the hard way by doing the intelligence themselves (defining multiple keys and choosing which to use based on the argument to be passed into it), but this is where ChoiceFormat comes into play.

As per the docs, “The choice is specified with an ascending list of doubles, where each item specifies a half-open interval up to the next item”. So let’s use the example above to show how it would be defined in a resource bundle:

lastUpdated=Updated {0} {0,choice,0#seconds|1#second|1&lt;seconds} ago

What this does is creates a bunch of contiguous ranges (in ascending order) and finds the best match. I said best match because sometimes there isn’t an exact match. In the example above, negative numbers don’t match any range – but because they’re smaller than the starting range, the first choice is selected. The same logic applies for values that are larger than the highest range (which isn’t possible in this example as the highest range ends at positive infinity).

So let’s run through the scenarios very quickly:

  • If you pass in a negative number, the first choice “seconds” is returned (because it is too small to match anything and the ranges work in ascending order)
  • If you pass in a number between 0 (inclusive) and 1 (exclusive), the first choice matches and “seconds” is returned
  • If you pass in number 1 (exactly), the second choice matches and “second” is returned
  • If you pass in a number greater than 1, the last choice matches and “seconds” is returned

You may have noticed that the definition is different for the last range. What this is effectively doing in code is calling ChoiceFormat.nextDouble(1) which returns the smallest double greater than 1, which is then used as the start of the range. This is not restricted to being used the last range, but can actually be used anywhere. There is a similar ChoiceFormat.previousDouble(double d) that is fairly self-explanatory.

Nifty! Even more so when you consider that some languages have multiple pluralisations to consider (e.g. Russian). So you can’t reasonably assume that you’re always dealing with a simple singular/plural as we have in English – sometimes there are many different plurals to take into account.


Bug in Java 6 DecimalFormat.format()

While trying out some options for decimal formatting, I came across what appears to be a bug in Java 6. I haven’t yet traced back exactly which version introduced the bug, but I have managed to verify it on both Windows 7 running Java 6 Update 23 & 24 and on Mac OS X Snow Leopard running Java 6 Update 26.

According to the JavaDocs for java.text.DecimalFormat:

If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are all the same as the positive pattern. That means that “#,##0.0#;(#)” produces precisely the same behavior as “#,##0.0#;(#,##0.0#)”.

However, when putting this into practice, the formatter truncates the final character from the output. This is shown in the following JUnit test:

@Test
public void testDecimalFormat() {
  double value = -4000d;

  final String expected = "(4,000.00)";
  final String actualA = new DecimalFormat("#,##0.00;(#,##0.00)").format(value);
  final String actualB = new DecimalFormat("#,##0.00;(#)").format(value);

  // passes
  assertEquals(expected, actualA);

  // fails - actualB = "(4,000.00"
  assertEquals(expected, actualB);
}

I have logged this on the Java Bugs Database and will update this post once I have a response from Oracle. But hopefully this helps someone else that has come across the same issue.

UPDATE: Yes, it is a bug. You can track it here (may take a day or two to appear on the external bug database apparently).