Skip to content

Proprietary Data Formats Are Evil

2009 October 27
by Michael Thaler

Imagine a world in which email is not free.

Imagine a world in which using email requires every user to pay someone–say, Apple–for the privilege of being able to read what others send you, or send them messages in the first place. Imagine a world in which Apple Mail were the only email client, and without Apple Mail, restrictive licenses and technical obfuscation would prevent you from reading or writing correspondence with others.

If that were how email worked, would it be nearly as ubiquitous as it is today?

Then why are so many people willing to store (or worse, distribute) important data in formats they have to trust a single entity for the privilege of using it?

Mind you: these people aren’t stupid. They’re certainly not malicious. But they’re victims of one of the greatest caveat emptor tricks of the computer age.

Incidentally, if you’ve ever emailed someone a Word document, I’m talking about you.

Now, I’m no free software zealot. I think free software is amazing–being able to understand, modify and improve the code of a piece of software you possess is wonderful and greatly increases its usefulness. But, I understand that there’s a lot of money in software, be it through the web or the desktop or the cloud, and that there are some things that require too much time and effort to be feasible to produce for someone working in their spare time. And I also understand that a developer whose livelihood depends on sales of said software might be reluctant to release it for everyone else to modify.

But, I submit to you now that it’s the ethical responsibility of any developer to write programs that store data in a human-readable, unobfuscated way.

Even if the program’s interface makes its innerworkings perfectly clear. Even if customer service is excellent. Even if  a lifetime’s worth of upgrades are included in the purchase.

It is perfectly fair for a software company to charge for a license to use software that they developed. But data produced by the program does not belong to that company–it belongs to the user, and the user should have every right to use it as they see fit, be it by developing their own software to use it, or even importing it into software made by other developers. And in order to do that, files should be stored in plaintext.

Pre-2007 Word documents (and in fact, any file produced by a Microsoft Office program by default settings) are a perfect example of how not to do this. If you use a plaintext editor like Notepad to view what these documents actually contain, all you’d see is garbage. Nothing human-readable. The only way to decipher it is by feeding it into Word itself, which, as many people forget, is an expensive program. And one that makes no guarantee of being supported in the future. And with no documentation available to understand exactly how it’s stored in the file for posterity.

Other Office suites such as OpenOffice.org and Google Documents can read Word documents, but only by reverse engineering–an error-prone process that leads to imperfect importing algorithms. This is a format frequently distributed among users, often disseminated to many different users for review or perusal, with a tacit assumption that everyone can use it. There’s no guarantee!

Microsoft isn’t quite as bad as it used to be. In response to disapproval from governments, antitrust suits and intellectual desertion, they’re in the process of migrating users to the new Office Open XML format (you know those .docx files that drive everyone crazy? With a bit of effort, they can even be compatible with older versions of Office!), even going so far as to strong-arm the ISO into making it an actual standard format.

And, as we all know, Microsoft has never been all that good at encouraging users to upgrade software in a timely manner.

This behavior is nonetheless a huge improvement over past practices. Microsoft also recently announced a release of documentation on Outlook’s data storage model, which is undoubtedly a stride in the right direction. But the days of obfuscated data storage should be over. We live in a world where users routinely share data between vastly different systems; conforming to open, documented standards can no longer be considered optional. The Web suffers enough from an incompatible browser holding back its innovation; it depends entirely on interoperability across browsers and platforms. Its success makes restricted, proprietary data formats obsolete.

You may read this post as a cheap shot at Microsoft. I may rail against them a lot, but they’re not the only offenders here. Adobe Flash is another example that’s wormed its way into Web ubiquity, for example. (The player may be downloadable for free, but who is Adobe accountable to? And once you’ve made your .swf files, how do you break them down into components without Adobe software?)

So please–don’t send me .doc files. You don’t know I can use it. Use .docx, .odt, .rtf or even .pdf. Just no .docs.

Share or bookmark this post:
  • Print
  • email
  • PDF
  • HackerNews
  • Twitter
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • del.icio.us
  • Reddit
  • Technorati
  • Suggest to Techmeme via Twitter
  • Tumblr