YAML and the importance of Schema Validation

I’ve been working with YAML over the past few weeks as part of a new configuration framework, and have been quite impressed with the simplicity of the language and how easy it is to learn. I’ve in no way stretched its capabilities, but I did find that it has a seriously major weakness in that it doesn’t have an official schema language to validate it. I’m aware of tools like Kwalify that supposedly do the job – but where is the official schema language?

Having used XML for many years, I’ve come to depend on XML Schema to validate my documents, and the idea of using a language without any form of validation is a little worrying. Despite this flaw, we’ve still decided to use it – but it does leave us in a vulnerable position. We now have to create a set of tools to validate our configuration files, which is an unfortunate burden to the team – and something that would’ve been easily achievable in a matter of hours had we chosen to use XML. XML Schema is very well defined and very well known, so there’s no need for me to list the benefits it brings. But YAML has been around since 2004, so it’s quite disappointing that it still lacks such a fundamental tool.

So why is validation so important? The answer to that question depends on what you’re doing with the documents, and what the consequences would be if they were invalid. In our case, these configuration files will drive virtually every aspect of our site. If the configuration is invalid, it will directly affect our users… maybe just a few users, or maybe all of them. Neither of those scenarios is acceptable.

This is an area where XML really comes into its own. The existence of both XML Schema and XSLT are powerful additions to the XML toolkit, and they make a very compelling case to use it. Unfortunately the fact remains the XML is sometimes more verbose than what we would like it be, which is where YAML is the obvious winner.

In the meantime, we’re left with the job of developing our own tools to validate our configuration files. It’s not a big job, but it shouldn’t be necessary. Given a decent validation language, this should be trivial. That raises another issue with building custom tools for validation – we need to test that the validator is validating correctly. Again – not a big task, but it shouldn’t be necessary.

Hopefully the designers of YAML will one day design a schema language!