Freemarker Default Number Formatting

We use Freemarker quite extensively at Betfair, and a question was raised the other day by one of the developers regarding number formatting. He had a numeric value that was being rendered with thousand separators and it wasn’t immediately obvious why. He quite rightly assumed that Freemarker would write the value of the number – unformatted – in the output. After all, this is the default behaviour of pretty much every other language / framework / tool that we use.

Unfortunately, Freemarker sees things differently. And this is where it’s REALLY important to make sure you read the documentation! Freemarker claims to be rendering for a “human audience” by default, but they’ve forgotten that Freemarker is a developer’s tool. Programmers are intelligent enough to know when they want numbers to be formatted. Automatically formatting numbers is not doing us a favour. Unless the programmer asks you to do something… don’t do it!

Scenarios like this can trip up anyone, and could possibly go unnoticed until something breaks in production. Let’s say for example that you were using a number as part of a link that you were rendering in the page. If all your testing used numbers smaller than 1000, you’d never see any problems. It’s only when you go over 1000 that the issue makes itself known. Our test data covers a large number of scenarios that quickly exposed this issue, but I can definitely think of smaller projects I’ve worked on in the past where this would not have been discovered early enough.

In fact, I think this default behaviour is dangerous. It breaks the Principle of Least Astonishment (or Principle of Least Surprise, as we call it). Not only that, but from what I can tell from the docs, it cannot be overridden. So there’s no way to change the default back to what is sensible.

Here’s an example of what happens when you render numbers using Freemarker and how to make sure you get the output you want. The example below assumes an English locale.

<#assign x = 1000>
${x}                 <#-- 1,000 -->
${x?string}          <#-- 1,000 -->
${x?c}               <#-- 1000 -->
${x?string.computer} <#-- 1000 -->

As you can see above, the default rendering of a number is formatted based on locale. If you simply want the raw number, you must explicitly specify that by using ?string.computer or use the shorthand notation ?c.

If you want more information on this, check out the docs.


Even Better Java i18n Pluralisation using ICU4J

Last week I wrote about Java’s built-in ChoiceFormat class and the support it provides for pluralisation. It is a very useful class, but as pointed out by two commenters (btw… thanks for the feedback!) it doesn’t cater well for all languages – particularly those that have more complex rules. This led me to investigate further, as I was certain there would be something useful out there – after all, internationalisation is a very common requirement of a large number of applications. After a little digging, I found that the one library that stands out is ICU4J. So who uses it? Well… pretty much everyone!

So for those that have more complex internationalisation requirements, this is an excellent library to use! I generally find that the best way to find out how something works is to see an example, so I’ve used the pluralisation example provided in the comments of my previous post to demonstrate ICU4J. I chose this example for a few reasons: firstly, because someone took the time to ask a question and I want to answer it; secondly, because it is clearly not supported by the JDK ChoiceFormat class; and lastly, because I only know languages with simple pluralisation rules.

I wrote a very basic class that simply prints out a localised message looked up from a ResourceBundle – which is probably the most commonly used approach and therefore familiar to most readers.

import com.ibm.icu.text.MessageFormat;

import java.util.Locale;
import java.util.ResourceBundle;

public class IcuDemo {

    private static final int[] NUMBERS = new int[] {0, 1, 2, 5, 11, 22, 39};

    public static void main(String[] args) {
        printLocalisedMessages("plural", Locale.ENGLISH, new Locale("pl"));
    }

    private static void printLocalisedMessages(String key, Locale... locales) {
        for (Locale locale : locales) {
            System.out.println(locale.getDisplayLanguage() + ":");
            printLocalisedMessage(key, locale);
        }
    }

    private static void printLocalisedMessage(String key, Locale locale) {
        ResourceBundle bundle = ResourceBundle.getBundle("icu", locale);
        String pattern = bundle.getString(key);
        MessageFormat msgFormat = new MessageFormat(pattern, locale);

        for (int i : NUMBERS) {
            System.out.println(msgFormat.format(new Object[] {i}));
        }

        System.out.println();
    }
}

The code above should be familiar to everyone, as it shouldn’t be all that different from how you’re already doing i18n. However, note that I’ve imported com.ibm.icu.text.MessageFormat instead of the usual java.text.MessageFormat. The really interesting part comes in when we use ICU4J’s “plural” format type, which is shown in the following properties files:

icu.properties:

plural=Undefined

icu_en.properties:

plural={0} {0, plural, one{car}other{cars}}

icu_pl.properties:

plural={0} {0, plural, one{auto}few{auta}many{aut}other{aut}}

I’m sure you’ll immediately notice that I’m not specifying numbers in these patterns, as we did with ChoiceFormat. Instead, I’m simply referring to categories of numbers by predefined mnemonics. This really cool feature is available because a number of language pluralisation rules have already been defined by the Unicode CLDR (Common Locale Data Repository). In particular, we’re using the Language Plural Rules, which are provided in the ICU4J package. To explain how this works, let’s look at the English example and then work our way up to the Polish example.

English has two categories – singular/plural. These two categories are named as “one” and “other” – fairly straightforward. What this really means in terms of plural rule definition is:

one: n is 1
(by implication, every other number falls into the "other" category)

Polish is more complex than this and requires a number of rules to be defined:

one: n is 1
few: n mod 10 in 2..4 and n mod 100 not in 12..14
many: n is not 1 and n mod 10 in 0..1 or n mod 10 in 5..9 or n mod 100 in 12..14
(by implication, every other number falls into the "other" category)

Clearly the definition of rules makes our lives a lot easier. All we need to know is which category of numbers we want to provide a pluralisation for, and define the message against that name using the format “keyword{message}”.

Note: The CLDR points out that the names are just mnemonics and aren’t inteded to describe the exact contents of the category, so try not to focus too much on them. It’s merely providing categorisation by a recognisable name.

The above example only uses the predefined number categories, but we could easily mix this with explicit values if needed. In this case, the explicit values would be checked first for an exact match, and if none was found then the categories would be searched, and failing that the “other” category would be used. Here’s an example of how you can mix the two concepts together:

example={0, plural, =1{one}=5{five}other{#}}

If we formatted this with the numbers 1 to 5 in a loop, this would be formatted as follows:

one
2
3
4
five

Of course, there may be circumstances where the predefined rules don’t do what you want (although, we’re probably talking about exceptional circumstances now). In this case, you can simply define your own set of rules. This can be done using the PluralRules class or by customising the locale data that’s available to ICU4J.

I’ve only scratched the surface of what you can do with this library – and pluralisation is only one very small part of what it provides – but I hope this is useful and is able to help get you started using it.


Java i18n Pluralisation using ChoiceFormat

Betfair‘s site is hugely popular all around the world, and obviously needs to provide a fully localised experience for users across different locales. Yesterday I was looking at how best to provide internationalisation (i18n) support within our core platform and came across the really useful ChoiceFormat class. I had seen it before but hadn’t actually explored what it can do until now. It’s an extremely basic class in terms of the functionality it provides, but it is as powerful as it is simple.

I’m sure we all know about Java resource bundles for i18n and have probably used them quite a lot – so I won’t go into any detail on that. I’ll just assume it’s common ground. However, if you haven’t come across java.text.ChoiceFormat, you’re missing out! This is the guts of pluralisation within the i18n stable, and is definitely your friend when you realise that users don’t like messages that say “Updated 1 second(s) ago”. You know it’s 1 second, right? So why not just say that and skip the “(s)” bit? I’m sure many have tried to solve this the hard way by doing the intelligence themselves (defining multiple keys and choosing which to use based on the argument to be passed into it), but this is where ChoiceFormat comes into play.

As per the docs, “The choice is specified with an ascending list of doubles, where each item specifies a half-open interval up to the next item”. So let’s use the example above to show how it would be defined in a resource bundle:

lastUpdated=Updated {0} {0,choice,0#seconds|1#second|1&lt;seconds} ago

What this does is creates a bunch of contiguous ranges (in ascending order) and finds the best match. I said best match because sometimes there isn’t an exact match. In the example above, negative numbers don’t match any range – but because they’re smaller than the starting range, the first choice is selected. The same logic applies for values that are larger than the highest range (which isn’t possible in this example as the highest range ends at positive infinity).

So let’s run through the scenarios very quickly:

  • If you pass in a negative number, the first choice “seconds” is returned (because it is too small to match anything and the ranges work in ascending order)
  • If you pass in a number between 0 (inclusive) and 1 (exclusive), the first choice matches and “seconds” is returned
  • If you pass in number 1 (exactly), the second choice matches and “second” is returned
  • If you pass in a number greater than 1, the last choice matches and “seconds” is returned

You may have noticed that the definition is different for the last range. What this is effectively doing in code is calling ChoiceFormat.nextDouble(1) which returns the smallest double greater than 1, which is then used as the start of the range. This is not restricted to being used the last range, but can actually be used anywhere. There is a similar ChoiceFormat.previousDouble(double d) that is fairly self-explanatory.

Nifty! Even more so when you consider that some languages have multiple pluralisations to consider (e.g. Russian). So you can’t reasonably assume that you’re always dealing with a simple singular/plural as we have in English – sometimes there are many different plurals to take into account.


Bug in Java 6 DecimalFormat.format()

While trying out some options for decimal formatting, I came across what appears to be a bug in Java 6. I haven’t yet traced back exactly which version introduced the bug, but I have managed to verify it on both Windows 7 running Java 6 Update 23 & 24 and on Mac OS X Snow Leopard running Java 6 Update 26.

According to the JavaDocs for java.text.DecimalFormat:

If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are all the same as the positive pattern. That means that “#,##0.0#;(#)” produces precisely the same behavior as “#,##0.0#;(#,##0.0#)”.

However, when putting this into practice, the formatter truncates the final character from the output. This is shown in the following JUnit test:

@Test
public void testDecimalFormat() {
  double value = -4000d;

  final String expected = "(4,000.00)";
  final String actualA = new DecimalFormat("#,##0.00;(#,##0.00)").format(value);
  final String actualB = new DecimalFormat("#,##0.00;(#)").format(value);

  // passes
  assertEquals(expected, actualA);

  // fails - actualB = "(4,000.00"
  assertEquals(expected, actualB);
}

I have logged this on the Java Bugs Database and will update this post once I have a response from Oracle. But hopefully this helps someone else that has come across the same issue.

UPDATE: Yes, it is a bug. You can track it here (may take a day or two to appear on the external bug database apparently).


Common Misconceptions about Web Performance Optimisation

What would you do to improve the performance of a web application?

This is a question I’ve posed to a fair number of Java developers in the past week. A common theme quickly emerged and remained consistent across all the responses, which is that everyone I spoke to immediately began listing improvements to the server-side processing involved in a typical page request. Many of the responses were perfectly valid and often considered “best practice” (as much as I can’t stand that term), but few were likely to have much of a noticeable impact to the user. Some even mentioned the use of caching, but only thought to apply it internally within the server (e.g. database cache) and not to the content within the response.

It seems that the natural assumption of many Java developers is to look for optimisations in their code before trying to understand the entire request-response lifecycle and all the actors involved. According to Steve Souders, “80-90% of the time spent by users waiting for pages to load is spent on the frontend”. If that comes as a shock to you, then I strongly recommend that you read High Performance Web Sites. It is an excellent reminder of the obvious, but often forgotten, reality that your website is accessed by people, using browsers, across networks and domains, via intermediaries (proxies, etc.), over a common protocol (HTTP).

Digging into each of the bolded items above will give a much better appreciation of the components that act in concert to provide a web experience to the user. If this is the first you’ve heard of web performance optimisation and you want to know more, I recommend the following sites:

Steve Souders’ 14 Rules for Faster-Loading Websites (from his book, High Performance Websites)

Yahoo!’s Best Practices for Speeding Up Your Website

So next time someone asks you that question, please throw in a few optimisations other than server-side processing.


Certification and Competence

I found it quite interesting reading Martin Fowler’s recent post about the correlation between certification and competence, especially so recently after writing about the changes to some of Oracle’s certifications!

I have to say that I completely agree with his opinion – and I know it’s shared by most (if not all) of my friends. Having earned a number of certifications over the course of my career, I have seen first-hand just how useless so many of them are. I must emphasise that this is not necessarily true of all certifications. However, from my experience, none have proven enough to establish the holder of the certification to be an expert on the subject – and this is really where they fail.

So why did I decide to take on the Java Enterprise Architect certification? Well, I approached my investigation of it with the usual skepticism, but was finally convinced by one major aspect: it is ultimately assessed by a human being! As Fowler points out:

“At the moment the only way you can tell if someone is a good programmer is to find other good programmers to assess their ability.”

I don’t like certifications that are comprised of nothing by MCQ’s – not enough can be tested in a few multiple choice questions to be able to certify someone as mediocre, let alone an expert! Unless the candidate has had to form an opinion and defend it, you haven’t really tested their ability to apply their knowledge and reasoning. You’ve just asked them to regurgitate simple facts. At best, you’ve tested their ability to recognise a solution to a particular problem – but you haven’t established whether, given a blank sheet of paper, they are able to come to the solution themselves.

I’m hoping that the Java Enterprise Architect certification is able to distinguish the competent from the incompetent, but at this stage I’m not sure. I do know one thing though… adding a course attendance requirement does not strengthen the certification. Unfortunately, I feel this is where the “good money-making opportunity” that Fowler mentions comes into play.


Changes to Oracle Certification for Java Architects

I’ve recently completed the first exam towards the Oracle Certified Master, Java EE 5 Enterprise Architect and had aimed to complete the assignment later this year. However, I just happened to go to the Oracle Certification website where I saw that the rules are changing for a few of their certifications.

Quoted from the announcement on their website:

Beginning August 1, 2011, Java Architect, Java Developer, Solaris System Administrator and Solaris Security Administrator certification path requirements will include a new mandatory course attendance requirement.

I can’t say I’m impressed with how quietly they announced this! I haven’t seen anything on the OTN Java site about it, nor have I seen any attempt to make developers aware of this through other channels. I also have not received any emails from them about the changes to a certification track that I am currently working on. Unless I’ve managed to miss the announcement somewhere, I can only guess that they’re trying to sneak this in quietly to get more money out of us. Perhaps I’m being a bit cynical, but I don’t think it’s too far from the truth, given Oracle’s recent track record with the Java community.

So if you’re busy working towards any of these certifications, I suggest you pick up the pace and get it done by the end of July! And here I thought things were starting to settle down.

Fortunately the certification assignment has a very similar focus to my next assignment at Oxford. If all goes well, I might just manage to get it done in time.


SAP Deployment with Ant

A few months ago, I wrote about our continuous integration system and how I’d hooked everything up to automate as much as possible. One of the key components of this suite of tools was a SAP NetWeaver Deployment Plugin for Maven that I wrote. Since then, I have received a few comments and queries about this plugin and have been asked whether it’s something we’re actively developing. I had intended on releasing it as an open source plugin (and may still do so at some stage), but in the meantime I have set up another option that has proven very useful – and that’s what this post is all about.

This time I’ve provided the documented Ant build file at the end of this article, which you are welcome to download and use for your own development.

How does it work?

Within the standard Maven project object model, you have the ability to define artifact repositories within the <distributionManagement /> section of the pom.xml file. Maven uses these settings to deploy the artifact that was generated during the build process – which, in our case, simply deploys them to Nexus. This makes it easy to reference the artifact in other projects and ensures that it is appropriately shared (either internally or publicly, depending on how you have configured it). It’s up to you to decide how and when you deploy to Nexus, but the point is that it is in a shared repository. This should be done for you as part of your release process if you use the Maven Release Plugin, although it can be manually uploaded too (either via Nexus or via the Maven Install Plugin / Maven Deploy Plugin). Now that we have our artifact in Nexus, we can simply download it whenever we need it. We don’t want to rebuild the sources to produce another copy of the already released artifact – we simply want to use the same compiled artifact that we originally produced.

We used Ant to create a build script – modelled on the sample script provided in the SAP NetWeaver CE 7.1 Developer Edition installation – that would allow us to run a deployment from anywhere to anywhere. The standard script provided by SAP is tied to the directory where it lives – which is not very useful, since we really don’t need a full NWCE installation just to fulfill a few dependencies. So we decided to combine one of the best features of Maven (dependency management) with the simplicity and clarity of Ant, and have produced a build script with no local ties. The next few sections will explain what you need to make this work and how to setup your environment (the script also includes some documentation).

Setup

You’ll obviously need to have Ant installed for this to work. In addition to this, you’ll need to download the Maven Tasks for Ant and copy them into your {ANT_HOME}/lib folder. The next thing you need is your own internal Nexus repository manager. This is technically not required, as you could simply install all dependencies locally – although it’s more realistic that you’ll want to use a shared repository for your team / company.

Once you have your Nexus repo manager configured, you need to deploy the SAP deployment JAR files to the 3rd Party release repository. These are not made available in any public repo, so you’ll need to host them yourself. I’ve outlined the details I used to upload the deployment libraries in our system, but you’re free to use whatever GAV parameters you like (so long as you update the Ant build file as well). You can find all of these files in the SAP installation directory – just look in the sample Ant script (in /usr/sap/LCL/J00/j2ee/deployment/scripts) for the relative file locations.

JAR File Maven Artifact Definition
tc~je~dc_ant.jar com.sap.ant:sap-ant-tasks:7.1.1
sap.com~tc~exception~impl.jar com.sap:tc.exception.impl:7.1.1
sap.com~tc~je~clientlib~impl.jar com.sap:tc.je.clientlib.impl:7.1.1
sap.com~tc~logging~java~impl.jar com.sap:tc.logging.java.impl:7.1.1
sap.com~tc~je~deployment14~impl.jar com.sap:tc.je.deployment14.impl:7.1.1
sap.com~tc~bl~jarsap~impl.jar com.sap:tc.bl.jarsap.impl:7.1.1
sap.com~tc~sapxmltoolkit~sapxmltoolkit.jar com.sap:tc.sapxmltoolkit:7.1.1
sap.com~tc~bl~sl~utility~impl.jar com.sap:tc.bl.sl.utility.impl:7.1.1

Note: I’ve used version 7.1.1 for all of these SAP libraries as I’m running NetWeaver CE 7.1 with Enhancement Pack 1 (more on how to check versions here).

When you run Ant with the build file, it will automatically resolve and download the dependencies via Nexus (using the Maven Tasks for Ant). So all you need to do is specify what you want to deploy / undeploy, and then let Ant & Maven do their thing. If in doubt, simply run Ant specifying the build file and the -projecthelp option.

Good luck, and I hope this has helped you!

Download: SAP Deployment with Ant


Implementing Review Board for Code Reviews

A few days ago I migrated our build server, aptly named Bob (the Builder), to Ubuntu 10.04 LTS and incorporated a new product to the existing suite. The new product is Review Board, which we’ll be using for regular code reviews within our development team. I’ve written previously on the various products we use as part of our continuous integration system, and Review Board has now joined their ranks as a much needed assistant helping us cast a human eye on code quality. We also use Sonar for code quality analysis, but that’s only one part of a very big picture. I’ve seen through experience how code reviews can significantly improve quality, and I know my own personal experience is very well supported by many books and blogs on the subject.

One challenge we have is fully integrating this into our development environment, as some developers work in ABAP and others in Java. The Java developerment environment neatly integrates with Review Board, but the ABAP has some interesting obstacles. What we’ve done to overcome this challenge is to write a proof of concept application in ABAP that allows a developer to export ABAP code (in a neat human-readable form) into a Subversion repository. From there, the developer can request a code review using the post-review command line tool. It all works as expected, but it’s still quite a manual, time-consuming process. We hope to develop this further, making it much easier to use, and roll it out to the other development teams.

The next decision we have to make is to choose whether to do pre-commit or post-commit reviews. Both have their strengths and weaknesses, so we’ve decided to try them both out before making a firm decision on how to take it forward.

Only a week or two in, and I’m already seeing the benefits of implementing Review Board! More updates to come as we iron out the creases in our review process…


Thoughts on Joining an Open Source Project

Since starting my career as a software developer back in 2002, I’ve wanted to join an open source project. A few things have held me back from doing this… including inexperience in my early career, lack of available time to commit properly to a project, and the inherent difficulty in finding that one project amongst the thousands out there that I feel strongly about. Over the last few weeks I’ve been looking into this again and have now decided to join OpenMRS. The experience of choosing a project and getting stuck in has prompted me to write about why I did it and hopefully spark some new ideas for other developers wanting to get involved in open source projects.

Why join an open source project?
Everyone has the capacity to get involved in something and help out, and software developers are no exception. I’ve always been involved in building software for business use, which has a lot of challenges and rewards – but it doesn’t really improve anyone’s life in any significant way. My reason for joining an open source project is to contribute to something that really matters… something that might genuinely improve the lives of other people. For this reason, I’ve focused on finding a project that has this as one of its core objectives.

Why OpenMRS?
The answer to this question is pretty simple… I have fairly limited “free” time as it is, so I wanted to make sure I join a project that I believe in. OpenMRS is clearly focused on providing software that can improve the lives of millions of people – particularly in the developing world – so this immediately caught my attention. My original career plan was to study medicine and become a surgeon, but after finishing school I made the decision to study software engineering instead. I don’t regret that decision, but I’ve never lost my interest in medicine, and joining the OpenMRS project gives me the opportunity to work in both.

What’s next?
When joining any new project there is always a lot to learn. Others may have been involved for years, but you have to learn the ropes before you can be of any use to the existing team. So my focus for the next few weeks will be to learn about OpenMRS from every conceivable perspective. What problem is the system trying to solve? How does it work for the user? How has it been designed? How is the data model structured? However, the biggest challenge will be gaining enough understanding of the medical domain for the system to make sense. Very fortunately, the technical foundation is quite familiar territory – so the biggest obstacle is domain knowledge.

I’ll follow up in the coming weeks with some details on how to contribute to an open source project – entirely based on my experience, rather than a definitive list of “do’s and don’ts”. Until then, I’ll be buried in documentation!