Skip to content

Proprietary Data Formats Are Evil

2009 October 27
by Michael Thaler

Imagine a world in which email is not free.

Imagine a world in which using email requires every user to pay someone–say, Apple–for the privilege of being able to read what others send you, or send them messages in the first place. Imagine a world in which Apple Mail were the only email client, and without Apple Mail, restrictive licenses and technical obfuscation would prevent you from reading or writing correspondence with others.

If that were how email worked, would it be nearly as ubiquitous as it is today?

Then why are so many people willing to store (or worse, distribute) important data in formats they have to trust a single entity for the privilege of using it?

Mind you: these people aren’t stupid. They’re certainly not malicious. But they’re victims of one of the greatest caveat emptor tricks of the computer age.

Incidentally, if you’ve ever emailed someone a Word document, I’m talking about you.

Now, I’m no free software zealot. I think free software is amazing–being able to understand, modify and improve the code of a piece of software you possess is wonderful and greatly increases its usefulness. But, I understand that there’s a lot of money in software, be it through the web or the desktop or the cloud, and that there are some things that require too much time and effort to be feasible to produce for someone working in their spare time. And I also understand that a developer whose livelihood depends on sales of said software might be reluctant to release it for everyone else to modify.

But, I submit to you now that it’s the ethical responsibility of any developer to write programs that store data in a human-readable, unobfuscated way.

Even if the program’s interface makes its innerworkings perfectly clear. Even if customer service is excellent. Even if  a lifetime’s worth of upgrades are included in the purchase.

It is perfectly fair for a software company to charge for a license to use software that they developed. But data produced by the program does not belong to that company–it belongs to the user, and the user should have every right to use it as they see fit, be it by developing their own software to use it, or even importing it into software made by other developers. And in order to do that, files should be stored in plaintext.

Pre-2007 Word documents (and in fact, any file produced by a Microsoft Office program by default settings) are a perfect example of how not to do this. If you use a plaintext editor like Notepad to view what these documents actually contain, all you’d see is garbage. Nothing human-readable. The only way to decipher it is by feeding it into Word itself, which, as many people forget, is an expensive program. And one that makes no guarantee of being supported in the future. And with no documentation available to understand exactly how it’s stored in the file for posterity.

Other Office suites such as OpenOffice.org and Google Documents can read Word documents, but only by reverse engineering–an error-prone process that leads to imperfect importing algorithms. This is a format frequently distributed among users, often disseminated to many different users for review or perusal, with a tacit assumption that everyone can use it. There’s no guarantee!

Microsoft isn’t quite as bad as it used to be. In response to disapproval from governments, antitrust suits and intellectual desertion, they’re in the process of migrating users to the new Office Open XML format (you know those .docx files that drive everyone crazy? With a bit of effort, they can even be compatible with older versions of Office!), even going so far as to strong-arm the ISO into making it an actual standard format.

And, as we all know, Microsoft has never been all that good at encouraging users to upgrade software in a timely manner.

This behavior is nonetheless a huge improvement over past practices. Microsoft also recently announced a release of documentation on Outlook’s data storage model, which is undoubtedly a stride in the right direction. But the days of obfuscated data storage should be over. We live in a world where users routinely share data between vastly different systems; conforming to open, documented standards can no longer be considered optional. The Web suffers enough from an incompatible browser holding back its innovation; it depends entirely on interoperability across browsers and platforms. Its success makes restricted, proprietary data formats obsolete.

You may read this post as a cheap shot at Microsoft. I may rail against them a lot, but they’re not the only offenders here. Adobe Flash is another example that’s wormed its way into Web ubiquity, for example. (The player may be downloadable for free, but who is Adobe accountable to? And once you’ve made your .swf files, how do you break them down into components without Adobe software?)

So please–don’t send me .doc files. You don’t know I can use it. Use .docx, .odt, .rtf or even .pdf. Just no .docs.

Share or bookmark this post:
  • Print
  • email
  • PDF
  • HackerNews
  • Twitter
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • del.icio.us
  • Reddit
  • Technorati
  • Suggest to Techmeme via Twitter
  • Tumblr

A (non-Exhaustive) List of Things You Can Do With the Internet

2009 October 8
by Michael Thaler

We all take the Internet for granted today, so as a brainstorming exercise I thought I might try to think up a list of things that you can do with the Internet. This is obviously not an exhaustive list, and it’s surely not a list of everything the Internet will ever be able to let you do–I just thought I’d try to contextualize this point in history by pointing out how easy the Internet has made our lives.

On reading it, some food for thought: how much of this is new? How much of it is made easier? How much of it is worth doing?

  • Pay your bills.
  • Download a music album, video game or movie, legally or illegally.
  • Learn the history of the life of Nikola Tesla.
  • Download a free operating system.
  • Collaborate on a calendar.
  • Find out what your friends are doing.
  • Collaborate on a massive trolling effort.
  • Find out how much money is in your bank account, your IRA, or any stock market holdings.
  • Choose from numerous comics to read, without buying a newspaper.
  • Find out what a word means in any language with a web presence.
  • Talk to a friend in real time on the other side of the world, for free.
  • Find out how to get past a tricky part in a video game.
  • Find a job, or network with potential colleagues.
  • Find out what’s happening in real time in Congress, and read commentary from any political perspective in existence.
  • Offer your own commentary, and maybe even convince someone you’re right.
  • Learn how to make a web site or program of your own, using any existing methodology.
  • Buy basically anything that can be legally shipped to you–and a lot of stuff that can’t.
  • Reunite with someone you’ve completely lost contact with.
  • Find porn of anything. (I’m not kidding, and you know it.)
  • Aggregate all the most recent news about everything you care about into a single window.
  • Learn how to program in any programming language.
  • Learn about any topic in music theory.
  • Have an affair. Discreetly.
  • Read the entire text of a public domain book, or have it read to you.
  • Acquire, modify, patch and redistribute the original source code of numerous high-quality pieces of software.
  • Collaborate with many other developers who share your interests and intentions in doing so.
  • Remotely and securely control a machine nowhere near you.
  • Download a screensaver that constantly evolves based on user preferences.
  • Participate in a revolt against an authority figure.
  • Provide your own opinions, and maybe even make money doing it.
  • Develop a new way to make money that no one in the world has ever done before.
  • Get into an incredibly vitriolic argument, and enjoy every minute of it.
  • Sell anything.
  • Let whoever made a piece of software you’re using know if something is wrong with it.
  • Watch TV shows from another country long before they’re ever officially brought to where you are, complete with subtitles.
  • Teach the world how to do something.

Anything else interesting you can think of that I missed?

Share or bookmark this post:
  • Print
  • email
  • PDF
  • HackerNews
  • Twitter
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • del.icio.us
  • Reddit
  • Technorati
  • Suggest to Techmeme via Twitter
  • Tumblr

Researchers reconstruct 3D models of a city from tourist photos

2009 September 30
by Michael Thaler

As a quick early indicator of the sort of results analyzing the Internet can produce, a group of researchers at the University of Washington tried an experiment: reconstruct cities from nothing but images from an Internet search. They succeeded at creating models of landmarks in Rome and Venice, and the entire Old City of Dubrovnik.

To do this, they used a cluster of machines working in parallel to match common data points in pictures taken by tourists matching a search. Their largest work, Dubrovnik, was completed in 17.5 hours.

This bears repetition: this was done with nothing but a search on user-submitted photos from Flickr and raw computing power. In other words, they distilled images taken by numerous, disconnected people into a single common conglomeration of what they all recorded, automatically determining what matched what in all the photos.

This is incredible research, and it’s an excellent microcosm of how the Internet’s data can be crystallized into something more than the sum of its parts.

Share or bookmark this post:
  • Print
  • email
  • PDF
  • HackerNews
  • Twitter
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • del.icio.us
  • Reddit
  • Technorati
  • Suggest to Techmeme via Twitter
  • Tumblr

Why study the web?

2009 September 27
by Michael Thaler

Short answer: because it’s the future.

We are living in an age of incredible technological advancement. The world is being transformed by it–the way we think, learn, communicate and live has been torn down and rebuilt within the past fifteen years. And yet, no one who lived before the Internet’s ubiquity could possibly have forseen how far it would reach. Today, living without systems like Wikipedia, AIM, Google, Facebook and Twitter is unthinkable. Within the last fifteen years, the Internet has already transformed public discourse, revolutionized content distribution, and redefined communication for everyone with easy access to it. Businesses race to it, visionaries build upon it, enthusiasts deconstruct it, establishments fear it.

Some take it for granted. Some even think the current Internet developments are a bubble (as they say in the Dismal Science). I don’t. I think the advances the Internet has brought are here to stay, and improve. I believe the Internet is a development that easily surpasses the invention of the printing press in importance, and that it collectively has the potential to become greater than the sum total of all human development that preceded it. And I believe I’m unbelievably lucky to have been born in the earliest generation that gets to define it, allowing me and everyone I know to personally witness its developments, setbacks, controversies and breakthroughs.

So what next? A prominent article about the movers and shakers of the Internet recently implored them to “Never. Stop. Innovating. Never. Never. Never.” Innovation will always continue, undoubtedly to places where we can’t possibly fathom now. And I’m unbelievably excited to see it happen.

So that’s what I’ll write about here: how technology, especially the Internet, is changing the world. The future is bearing down on us, and I for one am excited.

Share or bookmark this post:
  • Print
  • email
  • PDF
  • HackerNews
  • Twitter
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • del.icio.us
  • Reddit
  • Technorati
  • Suggest to Techmeme via Twitter
  • Tumblr