Scoping the Bristol City Council data platform

Today I attended the Supplier Engagement Session held by Bristol City Council (#databristol). The event consisted of a series of presentations discussing Bristol’s recent activities around publishing open data and their plans to procure a new open and shared data platform. The event was attended by a mixture of suppliers and also members of the local data community.

The session was intended to set out Bristol’s vision for open and shared data in the city, its commitment to building on its current foundations, and a desire to work with the local community to deliver impact for the city. I was really impressed with what I heard. Particularly how rights to reuse and publish data was becoming embedded into council procurements and the council’s plans to make the platform accessible for the collection and publication of data from local citizens and partner organisations.

The session was also an opportunity for the local community to provide input into the requirements for the platform. This included a short presentation on the findings of a small survey of local data users on their ideas for how the platform might be used.

The council also invited feedback and suggestions for how they should frame the procurement to best allow suppliers of all sizes to engage. Given the ambition and potential scope of the platform it seems likely to me that a consortium approach will be the most successful.

This post provides a slightly expanded version of the feedback I gave at the session.

The requirements

Bristol Council have a very broad set of requirements. They want to:

  • have a platform for publication and use of open data
  • have a platform to allow for the collection and use of shared data, both within the council and with partners
  • enable the publication of citizen collected open data to support community-led project
  • use the platform to publish its performance metrics and information
  • explore how such a platform can enable analytics and integration of both open and shared data, both within the council and by the community

This is just my brief summary, so apologies to the organisers if I’m missing essential elements. Don’t take this as a definitive list! As you can see this is an ambitious plan and one that will need some careful specification and design.

There is a risk that this could turn into a very large IT project producing a complex, monolithic platform. I believe that the council also recognise this as a potential risk, hence the request for feedback on how best to approach the procurement. The presentations and Q&A made it clear that the council were setting out a vision to work towards and not a feature list that had to be delivered on day one.

A suggested scoping approach

My personal view is that the procurement ought to be broken down into smaller elements with some well-defined boundaries. This will give more opportunity for SMEs to collaborate to bid for the whole project, allowing each to play to their strengths.

One useful breakdown would be to think of the platform as different components:

  • Publishing – the ability to publish and store datasets with supporting metadata and documentation. The datasets might be tabular, geographic, or a real-time stream. The publishing component could include workflow and curation elements, e.g. tools for checking quality and validity of data. Or those could be ruled as out of scope, restricting the publishing aspect to administration tools and interfaces.
  • Discovery – a catalog or similar tool that supports re-users in finding and accessing data. This is a well-understood piece: a data portal that provides a directory and search interface over the dataset metadata and, optionally, their contents.
  • Reuse – tools and interfaces to support the consumption of the data, e.g. as downloads, APIs, graphs, etc. This is the most complex element as there are many, many ways in which data might be reused.

These would be underpinned by some form of access control that could be used to restrict access to the shared data and appropriate administrative functions.

My feeling is that the first two elements are relatively easily defined and scoped. There are existing products and tools that provide a range of functionality here, even if there’s still scope for innovation.

The reuse aspects are more complex as the requirements vary considerably depending on:

  • who is consuming the data: developers, citizens, analysts, etc. The council will need to be clear about who their primary users are, so that features can be prioritised accordingly.
  • how they wish to consume the data, e.g. analytics tools, workflows, visualisation tools, etc.
  • and how much of the reuse happens in the platform and how much is handled by separate tools

As I’ve written before, depending on your perspective, existing data portals might be under-serving your needs.

The council made it clear that they understood that the future will continue to bring ever greater volumes and richness of data and that their platform needed to be designed for that future. I would suggest that its also important to recognise that the ways in which people are reusing data is also rapidly evolving.

We still have a long way to go to improve access and use of data. We’re still at the start of really exploring what using data on the web means. And there are an increasing variety of ways in which data is being use and processed.

With this in mind, when it comes to specifying the reuse aspects of the platform, I’d recommend focusing on the minimal viable product: some basic visualisation and story telling tools and a rich set of APIs to enable a variety of other tools and services to interact with the data.

This will allow the council to continue experiment with its data infrastructure and benefit from ongoing innovations around Big Data, data on the web, etc.