Blog Moved to new Domain

I’m in the process of migrating my blog to a new domain, and will no longer be posting at this address. Please check out for my new blog and to keep up to date with my posts.

Thanks for reading!

Freemarker Default Number Formatting

We use Freemarker quite extensively at Betfair, and a question was raised the other day by one of the developers regarding number formatting. He had a numeric value that was being rendered with thousand separators and it wasn’t immediately obvious why. He quite rightly assumed that Freemarker would write the value of the number – unformatted – in the output. After all, this is the default behaviour of pretty much every other language / framework / tool that we use.

Unfortunately, Freemarker sees things differently. And this is where it’s REALLY important to make sure you read the documentation! Freemarker claims to be rendering for a “human audience” by default, but they’ve forgotten that Freemarker is a developer’s tool. Programmers are intelligent enough to know when they want numbers to be formatted. Automatically formatting numbers is not doing us a favour. Unless the programmer asks you to do something… don’t do it!

Scenarios like this can trip up anyone, and could possibly go unnoticed until something breaks in production. Let’s say for example that you were using a number as part of a link that you were rendering in the page. If all your testing used numbers smaller than 1000, you’d never see any problems. It’s only when you go over 1000 that the issue makes itself known. Our test data covers a large number of scenarios that quickly exposed this issue, but I can definitely think of smaller projects I’ve worked on in the past where this would not have been discovered early enough.

In fact, I think this default behaviour is dangerous. It breaks the Principle of Least Astonishment (or Principle of Least Surprise, as we call it). Not only that, but from what I can tell from the docs, it cannot be overridden. So there’s no way to change the default back to what is sensible.

Here’s an example of what happens when you render numbers using Freemarker and how to make sure you get the output you want. The example below assumes an English locale.

<#assign x = 1000>
${x}                 <#-- 1,000 -->
${x?string}          <#-- 1,000 -->
${x?c}               <#-- 1000 -->
${x?} <#-- 1000 -->

As you can see above, the default rendering of a number is formatted based on locale. If you simply want the raw number, you must explicitly specify that by using ? or use the shorthand notation ?c.

If you want more information on this, check out the docs.

Even Better Java i18n Pluralisation using ICU4J

Last week I wrote about Java’s built-in ChoiceFormat class and the support it provides for pluralisation. It is a very useful class, but as pointed out by two commenters (btw… thanks for the feedback!) it doesn’t cater well for all languages – particularly those that have more complex rules. This led me to investigate further, as I was certain there would be something useful out there – after all, internationalisation is a very common requirement of a large number of applications. After a little digging, I found that the one library that stands out is ICU4J. So who uses it? Well… pretty much everyone!

So for those that have more complex internationalisation requirements, this is an excellent library to use! I generally find that the best way to find out how something works is to see an example, so I’ve used the pluralisation example provided in the comments of my previous post to demonstrate ICU4J. I chose this example for a few reasons: firstly, because someone took the time to ask a question and I want to answer it; secondly, because it is clearly not supported by the JDK ChoiceFormat class; and lastly, because I only know languages with simple pluralisation rules.

I wrote a very basic class that simply prints out a localised message looked up from a ResourceBundle – which is probably the most commonly used approach and therefore familiar to most readers.


import java.util.Locale;
import java.util.ResourceBundle;

public class IcuDemo {

    private static final int[] NUMBERS = new int[] {0, 1, 2, 5, 11, 22, 39};

    public static void main(String[] args) {
        printLocalisedMessages("plural", Locale.ENGLISH, new Locale("pl"));

    private static void printLocalisedMessages(String key, Locale... locales) {
        for (Locale locale : locales) {
            System.out.println(locale.getDisplayLanguage() + ":");
            printLocalisedMessage(key, locale);

    private static void printLocalisedMessage(String key, Locale locale) {
        ResourceBundle bundle = ResourceBundle.getBundle("icu", locale);
        String pattern = bundle.getString(key);
        MessageFormat msgFormat = new MessageFormat(pattern, locale);

        for (int i : NUMBERS) {
            System.out.println(msgFormat.format(new Object[] {i}));


The code above should be familiar to everyone, as it shouldn’t be all that different from how you’re already doing i18n. However, note that I’ve imported instead of the usual java.text.MessageFormat. The really interesting part comes in when we use ICU4J’s “plural” format type, which is shown in the following properties files:


plural={0} {0, plural, one{car}other{cars}}

plural={0} {0, plural, one{auto}few{auta}many{aut}other{aut}}

I’m sure you’ll immediately notice that I’m not specifying numbers in these patterns, as we did with ChoiceFormat. Instead, I’m simply referring to categories of numbers by predefined mnemonics. This really cool feature is available because a number of language pluralisation rules have already been defined by the Unicode CLDR (Common Locale Data Repository). In particular, we’re using the Language Plural Rules, which are provided in the ICU4J package. To explain how this works, let’s look at the English example and then work our way up to the Polish example.

English has two categories – singular/plural. These two categories are named as “one” and “other” – fairly straightforward. What this really means in terms of plural rule definition is:

one: n is 1
(by implication, every other number falls into the "other" category)

Polish is more complex than this and requires a number of rules to be defined:

one: n is 1
few: n mod 10 in 2..4 and n mod 100 not in 12..14
many: n is not 1 and n mod 10 in 0..1 or n mod 10 in 5..9 or n mod 100 in 12..14
(by implication, every other number falls into the "other" category)

Clearly the definition of rules makes our lives a lot easier. All we need to know is which category of numbers we want to provide a pluralisation for, and define the message against that name using the format “keyword{message}”.

Note: The CLDR points out that the names are just mnemonics and aren’t inteded to describe the exact contents of the category, so try not to focus too much on them. It’s merely providing categorisation by a recognisable name.

The above example only uses the predefined number categories, but we could easily mix this with explicit values if needed. In this case, the explicit values would be checked first for an exact match, and if none was found then the categories would be searched, and failing that the “other” category would be used. Here’s an example of how you can mix the two concepts together:

example={0, plural, =1{one}=5{five}other{#}}

If we formatted this with the numbers 1 to 5 in a loop, this would be formatted as follows:


Of course, there may be circumstances where the predefined rules don’t do what you want (although, we’re probably talking about exceptional circumstances now). In this case, you can simply define your own set of rules. This can be done using the PluralRules class or by customising the locale data that’s available to ICU4J.

I’ve only scratched the surface of what you can do with this library – and pluralisation is only one very small part of what it provides – but I hope this is useful and is able to help get you started using it.

Java i18n Pluralisation using ChoiceFormat

Betfair‘s site is hugely popular all around the world, and obviously needs to provide a fully localised experience for users across different locales. Yesterday I was looking at how best to provide internationalisation (i18n) support within our core platform and came across the really useful ChoiceFormat class. I had seen it before but hadn’t actually explored what it can do until now. It’s an extremely basic class in terms of the functionality it provides, but it is as powerful as it is simple.

I’m sure we all know about Java resource bundles for i18n and have probably used them quite a lot – so I won’t go into any detail on that. I’ll just assume it’s common ground. However, if you haven’t come across java.text.ChoiceFormat, you’re missing out! This is the guts of pluralisation within the i18n stable, and is definitely your friend when you realise that users don’t like messages that say “Updated 1 second(s) ago”. You know it’s 1 second, right? So why not just say that and skip the “(s)” bit? I’m sure many have tried to solve this the hard way by doing the intelligence themselves (defining multiple keys and choosing which to use based on the argument to be passed into it), but this is where ChoiceFormat comes into play.

As per the docs, “The choice is specified with an ascending list of doubles, where each item specifies a half-open interval up to the next item”. So let’s use the example above to show how it would be defined in a resource bundle:

lastUpdated=Updated {0} {0,choice,0#seconds|1#second|1&lt;seconds} ago

What this does is creates a bunch of contiguous ranges (in ascending order) and finds the best match. I said best match because sometimes there isn’t an exact match. In the example above, negative numbers don’t match any range – but because they’re smaller than the starting range, the first choice is selected. The same logic applies for values that are larger than the highest range (which isn’t possible in this example as the highest range ends at positive infinity).

So let’s run through the scenarios very quickly:

  • If you pass in a negative number, the first choice “seconds” is returned (because it is too small to match anything and the ranges work in ascending order)
  • If you pass in a number between 0 (inclusive) and 1 (exclusive), the first choice matches and “seconds” is returned
  • If you pass in number 1 (exactly), the second choice matches and “second” is returned
  • If you pass in a number greater than 1, the last choice matches and “seconds” is returned

You may have noticed that the definition is different for the last range. What this is effectively doing in code is calling ChoiceFormat.nextDouble(1) which returns the smallest double greater than 1, which is then used as the start of the range. This is not restricted to being used the last range, but can actually be used anywhere. There is a similar ChoiceFormat.previousDouble(double d) that is fairly self-explanatory.

Nifty! Even more so when you consider that some languages have multiple pluralisations to consider (e.g. Russian). So you can’t reasonably assume that you’re always dealing with a simple singular/plural as we have in English – sometimes there are many different plurals to take into account.

Bug in Java 6 DecimalFormat.format()

While trying out some options for decimal formatting, I came across what appears to be a bug in Java 6. I haven’t yet traced back exactly which version introduced the bug, but I have managed to verify it on both Windows 7 running Java 6 Update 23 & 24 and on Mac OS X Snow Leopard running Java 6 Update 26.

According to the JavaDocs for java.text.DecimalFormat:

If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are all the same as the positive pattern. That means that “#,##0.0#;(#)” produces precisely the same behavior as “#,##0.0#;(#,##0.0#)”.

However, when putting this into practice, the formatter truncates the final character from the output. This is shown in the following JUnit test:

public void testDecimalFormat() {
  double value = -4000d;

  final String expected = "(4,000.00)";
  final String actualA = new DecimalFormat("#,##0.00;(#,##0.00)").format(value);
  final String actualB = new DecimalFormat("#,##0.00;(#)").format(value);

  // passes
  assertEquals(expected, actualA);

  // fails - actualB = "(4,000.00"
  assertEquals(expected, actualB);

I have logged this on the Java Bugs Database and will update this post once I have a response from Oracle. But hopefully this helps someone else that has come across the same issue.

UPDATE: Yes, it is a bug. You can track it here (may take a day or two to appear on the external bug database apparently).

CV Advice

I’ve recently been involved in a number of interviews for our growing team at Betfair, and have been surprised to see some commonly occurring mistakes in some of the CV’s. Given the vast number of free resources online that advise job-seekers on how to write an effective CV, it’s somewhat disappointing to see some of these issues still plaguing us. This is not intended as a criticism of those that have submitted CV’s – in fact, a number of them have been very good! This is simply another one of those free resources that I’m hoping will be helpful to someone.

#1 Think like the interviewer

It’s very easy to get consumed with the stress of updating your CV and forget that someone else needs to read it. You should constantly try to read your CV as though you were the interviewer, and then fix anything that doesn’t make sense or appear to add value.

#2 Your CV is your personal brochure, not your biography

I can’t speak for anyone but myself on this point, but I generally tend to focus on the first two pages of any CV. That should typically include your skills, education, most recent work history (2 most recent positions/projects). Everything of value should be on those two pages. It’s not that the interviewer doesn’t care (we really do!) – it’s that we seldom have the time to read an extensive CV. Bullet points are always preferred over paragraphs… this can’t be emphasized enough.

#3 Acronyms… friend and foe

I’ve been amazed at the number of CV’s that include “private” acronyms – in other words, acronyms that only people in your own organisation will know. These are typically acronyms for systems that have been worked on, or internal tools that are used. While acronyms can make a CV much easier to read, you should be careful not to assume prior knowledge by the interviewer. If you need to use private acronyms, make sure you introduce them first… and only introduce them if you intend on using that acronym later. You don’t want to give the interviewer a StackOverflowException (sorry, lame programmer humour).

#4 Avoid technology shopping lists

Try to avoid listing an excessive number of technologies on your CV. It might be true that you’ve had some exposure to loads of different technologies, but try to describe those that are most applicable to the position you’re applying for. Admittedly, you may not always know which those are – but if your CV lists more technologies than some of the job postings on JobServe, you should probably cut it down a bit. Again, this is not to say that you should downplay your experience. But it’s highly unlikely that you’re an expert on everything. If you feel it is important to list them all, try order them for the interviewer. It’s nice to know what you consider your major strengths and what you’ve only dabbled in. It’s great when you see that someone has enough interest in their career that they’ve chosen to learn 10 different languages, but try let your CV demonstrate both the depth and breadth of your knowledge.

#5 Get someone to proofread it

Every interviewer will be looking for something different, but it’s almost certain that communication skills are high on their lists. With that in mind, you absolutely must get someone to proofread your CV before submitting it. Spelling mistakes, poor grammar, incomplete sentences and poor punctuation are all too common and they detract from creating a good first impression. This is not to say that you will be judged purely on these grounds, but getting them right will certainly help. But you’re a programmer, not an author… right? Wrong! You will be called on to explain your ideas and present your (or your team’s) work at some point, and you need to be able to convey it well. A CV that is poorly constructed and full of mistakes doesn’t inspire confidence in the candidate’s attention to detail. And that same attention to detail is often what helps make a great programmer.

These aren’t really ground-breaking revelations in the art of CV-writing, but I think they’re the common sense advice that we all need from time to time. I hope this has been helpful, and happy hunting.

And while I’m on the topic – if you’re a great developer, passionate about what you do, and keen to join an excellent team… check out the positions we have available!

Common Misconceptions about Web Performance Optimisation

What would you do to improve the performance of a web application?

This is a question I’ve posed to a fair number of Java developers in the past week. A common theme quickly emerged and remained consistent across all the responses, which is that everyone I spoke to immediately began listing improvements to the server-side processing involved in a typical page request. Many of the responses were perfectly valid and often considered “best practice” (as much as I can’t stand that term), but few were likely to have much of a noticeable impact to the user. Some even mentioned the use of caching, but only thought to apply it internally within the server (e.g. database cache) and not to the content within the response.

It seems that the natural assumption of many Java developers is to look for optimisations in their code before trying to understand the entire request-response lifecycle and all the actors involved. According to Steve Souders, “80-90% of the time spent by users waiting for pages to load is spent on the frontend”. If that comes as a shock to you, then I strongly recommend that you read High Performance Web Sites. It is an excellent reminder of the obvious, but often forgotten, reality that your website is accessed by people, using browsers, across networks and domains, via intermediaries (proxies, etc.), over a common protocol (HTTP).

Digging into each of the bolded items above will give a much better appreciation of the components that act in concert to provide a web experience to the user. If this is the first you’ve heard of web performance optimisation and you want to know more, I recommend the following sites:

Steve Souders’ 14 Rules for Faster-Loading Websites (from his book, High Performance Websites)

Yahoo!’s Best Practices for Speeding Up Your Website

So next time someone asks you that question, please throw in a few optimisations other than server-side processing.

YAML and the importance of Schema Validation

I’ve been working with YAML over the past few weeks as part of a new configuration framework, and have been quite impressed with the simplicity of the language and how easy it is to learn. I’ve in no way stretched its capabilities, but I did find that it has a seriously major weakness in that it doesn’t have an official schema language to validate it. I’m aware of tools like Kwalify that supposedly do the job – but where is the official schema language?

Having used XML for many years, I’ve come to depend on XML Schema to validate my documents, and the idea of using a language without any form of validation is a little worrying. Despite this flaw, we’ve still decided to use it – but it does leave us in a vulnerable position. We now have to create a set of tools to validate our configuration files, which is an unfortunate burden to the team – and something that would’ve been easily achievable in a matter of hours had we chosen to use XML. XML Schema is very well defined and very well known, so there’s no need for me to list the benefits it brings. But YAML has been around since 2004, so it’s quite disappointing that it still lacks such a fundamental tool.

So why is validation so important? The answer to that question depends on what you’re doing with the documents, and what the consequences would be if they were invalid. In our case, these configuration files will drive virtually every aspect of our site. If the configuration is invalid, it will directly affect our users… maybe just a few users, or maybe all of them. Neither of those scenarios is acceptable.

This is an area where XML really comes into its own. The existence of both XML Schema and XSLT are powerful additions to the XML toolkit, and they make a very compelling case to use it. Unfortunately the fact remains the XML is sometimes more verbose than what we would like it be, which is where YAML is the obvious winner.

In the meantime, we’re left with the job of developing our own tools to validate our configuration files. It’s not a big job, but it shouldn’t be necessary. Given a decent validation language, this should be trivial. That raises another issue with building custom tools for validation – we need to test that the validator is validating correctly. Again – not a big task, but it shouldn’t be necessary.

Hopefully the designers of YAML will one day design a schema language!

The “sticky shotgun” retrospective

I’ve been meaning to write this since early last week, but my first two weeks at Betfair have kept me fairly busy. After a nice chilled weekend, I finally have the time to get this down!

So I started a new job at Betfair at the beginning of April and have gradually been introduced to a few different ways of doing things. On my second day, we had a retrospective for the team’s previous sprint – which obviously I didn’t participate in, so I had the benefit of seeing how they do their retrospectives before being actively involved in one. The approach was subtly different from what I’ve done before, and I thought it worthwhile to share with others that work in a Scrum team.

Martin explained how it worked (for the benefit of a few of us that hadn’t done it their way before) and used the term “sticky shotgun” – which I think describes it quite aptly. Here’s how it works…

Everyone is given a marker and a pad of PostIt notes (or “stickies”) and writes down whatever comes to mind from the previous sprint. When they’re done, they all go up and stick them on the whiteboard in the correct zone (hence the term “sticky shotgun”). They use five zones: Start, Stop, Keep, More, and Less.

After the sticky shotgun, the stickies are clustered together into groups of similar issues. As you might expect, there are bound to be a lot of similar issues raised, as everyone has worked on the same user stories within the same team for the last two weeks. Instead of going through each individually, grouping them speeds up the process. To make sure nothing has been misinterpreted, they are always checked with the author to ensure that they’re put together in the correct group. We then briefly discuss the issues and try to identify what needs to be done to solve them; or in the case of the “keep” issues, what was done that makes the item worth keeping.

After the grouping and discussion, everyone is allowed to select their three most important groups. This helps rank the groups in order of importance so the team knows what needs to be addressed first.

This approach seems to provide a few distinct advantages over the way I’ve done retrospectives in the past:

Firstly, it allows a greater degree of participation. Not everyone is comfortable voicing their opinions in a group – particularly when they’re talking about the areas that need improvement. Allowing each person to note their thoughts down independently ensures that their opinions are heard.

Secondly, it speeds up what can be a very time-consuming meeting. It is very likely that a number of people have similar issues. This method allows everyone to voice their opinions without needing hours of everyone’s time. It also clearly identifies those opinions that are shared by a large number of people on the team.

Thirdly, it prevents snowballing. Quite often, one issue sparks off another and another – and before you know it, you’re not really focusing in the right direction. The point of the retrospective is to raise the issues, not to solve them. This approach provides for a relatively short period of problem identification, followed by a short period of problem definition, followed by a final short period of problem prioritisation.

Overall, it was very interesting to see a different slant on retrospectives, as they seem to be the quiet brother/sister in the Scrum family that no one talks about. There are hundreds of discussions over how we write user stories, how we estimate and assign story points to our user stories, how we sell Agile to customers – but I don’t recall seeing as much discussion over retrospectives.

Ultimately, it is the retrospectives that help us to see what we’re doing right or wrong and where we can improve. So it’s definitely a topic worth talking about.

Opposing views on the UML {xor} constraint

During my last lecture week at Oxford, a debate arose over the interpretation of the xor constraint within UML class diagrams. We were divided over two differing opinions, but were unable to reach consensus on which was correct. I’ll put forward the two views and give my opinion on the matter, but would be interested to hear if anyone knows the correct way to interpret this constraint.

This is the class diagram I’ll use as a reference for the opinions explained below. I’ve deliberately used simple names for the classes to avoid misinterpretation, as I’ve found that people often get distracted by the “real world” application of the diagram rather than the semantics of the notation – which is what we’re really trying to understand here.

Opinion A: the xor constraint applies to the association
In this case, we say that the xor constraint applies to the association itself, which means that the multiplicity of the association remains in effect. In other words, one of the following must be true (but not both):

  • EITHER: An object of type A may be associated to zero or more objects of type B
  • OR: An object of type A may be associated to zero or more objects to type C

Opinion B: the xor constraint applies to the link (the association instance)
In this case, we say that the xor constraint applies to the link between the objects, which means that the multiplicity specified on the association is effectively ignored. Regardless of the multiplicity specified on the association, the xor constraint requires that one of the following must be true (but not both):

  • EITHER: An object of type A must be associated to exactly one object of type B
  • OR: An object of type A must be associated to exactly one object of type C

According to UML 2 Toolkit (OMG Press), page 108:

The xor constraint specifies that objects of a class may participate in, at most, one of the associations at a time.

From what I can tell, this appears to support Opinion A above, which is also what I believe. This quote appears to be further reinforced on page 303.

It seems unnecessarily restrictive to apply the xor constraint to the links, as this substantially limits the usefulness of the constraint. In addition, Opinion A allows for the modelling of Opinion B by using the xor constraint with multiplicities of 1. So, given that Opinion A may be used to model a much wider range of possibilities, I don’t see how Opinion B would be very useful. Opinion B also opens the model to miscommunication, as the multiplicities would no longer make sense and should be disallowed when the association is used with the xor constraint.

That said, this is simply my opinion based on what I feel is logical. I don’t actually know for a fact which is correct. I completely understand why holders of Opinion B see it their way – I just don’t see how such strict semantics would be useful in this case.

Does anyone else have a different opinion on this? Or perhaps know the correct answer?