This post was originally published in the Talis “Nodalities” blog.
Whilst attending the recent NewsInnovation event I gave a lightning talk about Linked Data. The talk was proceeded by an introduction to the Guardian Open Platform which reviewed their content and data publishing system, and some of their plans for future development. This set the scene really well as I argued that Linked Data was a natural extension of what the Guardian are doing, and in my half of the session gave a quick overview of Linked Data and its relevance for driving innovation around news reporting. The session was really successful, we had a 25 minute slot and ended up having an interesting discussion about Linked Data, trust, provenance and related issues that ran on for a whole hour; I’m really pleased with how well it went. Especially as I only put the slides together on the way to the event!
My short deck of slides are now up on Slideshare, and in the rest of this blog post I’ll briefly summarise the talk.
I opened by speaking about the fundamental idea behind Linked Data: that data be put online, in a very fine-grained way. This takes us beyond having stable links for datasets or just articles, and yields web identifiers for the Who, Why, What, Where and When of the content: every person; place; category; and event can each be identified, annotated and ultimately linked together into a navigable whole. RDF, as the core technology for Linked Data, is very simple to get to grips with, with the notion of resources and their connections being something that anyone can intuitively grasp in a few minutes.
Readers of this blog will already be aware of the success of the Linked Data movement, and a large and growing amount of data is available for people to use and re-use in their applications. Quality varies considerably across the Linked Data web, but ultimately this is the nature of any web based system. With the growing engagement from organizations like the BBC, Library of Congress, and the New York Times, the availability of good quality data is only going to increase.
So in what way is Linked Data useful for driving increasing innovation and change in the way that news is created, reported and accessed?
Well there are some obvious answers around providing new ways to search and discover relevant content, e.g. everything about a specific individual or place. But there are two specific areas where I think Linked Data is important to driving innovation around news. The first is context, the second provenance.
Using Linked Data we can take a mesh of inter-related facts and figures and wrap it in a narrative that can help others understand that information and its relationships. Trends can be observed and reported on; data can be summarized along with a particular perspective. What’s important about Linked Data is that this contextualisation can happen without losing the assocation between the narrative and the underlying resources — the Who, What, Why, Where and When. Because those links are preserved then the reader has the ability to drill down into the underlying data in order to inspect that data for themselves. The reader can also find other narratives that draw on the same set of data, discovering extra context and alternate viewpoints much more easily. This creates a rich fabric for allowing for navigation between stories and their referents.
The other aspect is Provenance, or more simply: the ability to back-track to the source of some content. If the news were presented as Linked Data then would be able to explore not just relationships between the content, but also journalists and their affiliations. As readers we’ll be able to gain context not just on the stories, but also on the people that are producing them. Through the ability to drill-down into the underlying data, we are presented with the opportunity to confirm conclusions; we can fact check stories for ourselves. The ability to identify and ignore questionable sources, or identify stories that are drawn from inaccurate data or analyses, is something that has been previously been very hard to do.
Issues like context, provenance, and trust are all areas that the Linked Data and semantic web community are actively exploring and have been so some time. I don’t see any other approaches that are really addressing that space. There is clearly lots of interesting work happening around helping people tell stories with data, and understand the context of news stories (e.g. journalisted), but these are largely disconnected efforts: Linked Data should provide a framework for connecting all that together. IMO, this is an area where Linked Data can add real value in a number of different ways.