It Ain’t Just RSS, or even HTML

There’s been a lot of discussion about Mark Pilgrims latest article on XML.com. See, to pick a few examples, Dorothea, Marks follow-up comments and those of Dare Obasanjo.
I thought I’d comment briefly on Marks’ suggestion that “the ability to parse ill-formed feeds becomes a competitive advantage”. In short, he’s right. And it doesn’t only happen with RSS, or even just XML: I’ve seen it happen with SGML too.


Imagine you’re a company that accepts metadata from multiple sources, process that metadata using off-the-shelf SGML and XML tools. Of course you strictly validate the incoming material and push back on the producers when things go wrong. This causes much back-and-forth because sometimes the production of this data is also outsourced, meaning there’s some disconnects involved: the publisher doesn’t create the markup, and doesn’t have the technical facilities to check what the typesetters produce.
This scenario isn’t all that far from someone using a blogging tool which generates their RSS feeds for them: they have no idea how the hard work is done, and often don’t care.
Now, imagine that you’re muddling along, but another consumer of that material decides not to use markup tools and that everything is hack parsed with perl and regexps. Gets kind of hard to start pushing back on the publisher when you get responses like “well DataConsumer B managed to process this OK, what’s wrong with your system?”. In other words your inability to consume that data can be seen as a failing. And if DataConsumer X is a competitor then you can probably see how things can degrade.
So, possibly, there’s a dynamic involved here which isn’t limited to “DataConsumer A leans on Publisher B” to get them to fix up their act, but more like DataConsumer A co-operates with DataConsumer B to fix up their act to the benefit of both. The consumers need to work together and evangelise to each other, not at the publishers who are arguably following the path of least resistance (which unfortunately leads ever downwards).
Why is it to the benefit of both? Because then both DataConsumers A and B can benefit from using off-the-shelf commodity tools to do the heavy-lifting of processing the data. They’re effectively outsourcing that grunt work, which is arguably not what their core competency is anyway: it’s more likely they’re competing on doing something interesting with that data. And with a higher quality data feed underneath they can start to build more interesting features on that data. In RSS terms that means doing cool thinds with UIs (which can definitely be improved) and all of the RSS modules that are appearing.
The above is something I’ve actually dealt with in none RSS circles. That particular problem was eventually (largely) solved when DataConsumer A bought DataConsumer B. But that’s not an option for most RSS consumers!