This is the second part of a two part post looking at the building blocks of data infrastructure. In part one we looked at definitions of data infrastructure, and the first set of building blocks: identifiers, standards and registers. You should read part one first and then jump back in here.
We’re using the example of weather data to help us think through the different elements of a data infrastructure. In our fictional system we have a network of weather stations installed around the world. The stations are feeding weather observations into a common database. We’ve looked at why its necessary to identify our stations, the role of standards, and the benefits of building registers to help us manage the system.
Technology is obviously part of data infrastructure. In part one we have already introduced several types of technology.
The sensors and components that are used to build the weather stations are also technologies.
The data standards that define how we organise and exchange data are technologies.
The protocols that help us transmit data, like WiFi or telecommunications networks, are technologies.
The APIs that are used to submit data to the global database of observations, or which help us retrieve observations from it, are also technologies.
Unfortunately, I often see some mistaken assumptions that data infrastructure is only about the technologies we use to help us manage and exchange data.
To use an analogy, this is a bit like focusing on tarmac and kerb stones as the defining characteristics of our road infrastructure. These materials are both important and necessary, but are just parts of a larger system. If we focus only on technology it’s easy to overlook the other, more important buildings blocks of data infrastructure.
We should be really clear when we are talking about “data infrastructure”, which encompasses all the building blocks we are discussing here and when we are talking about “infrastructure for data” which focuses just on the technologies we use to collect and manage data.
Technologies evolve and become obsolete. Over time we might choose to use different technologies in our data infrastructure.
What’s important is choosing technologies that ensure our data infrastructure is as reliable, sustainable and as open as possible.
Our data infrastructure is taking shape. We now have a system that consists of weather stations installed around the world, reporting local weather observations into a central database. That dataset is the primary data asset that we will be publishing from our data infrastructure.
We’ve explored the various technologies, data standards and some of the other data assets (registers) that enable the collection and publishing of that data.
We’ve not yet considered the organisations that maintain and govern those assets.
The weather stations themselves will be manufactured and installed by many different organisations around the world. Other organisations might offer services to help maintain and calibrate stations after they are installed.
A National Meteorological Service might take on responsibility for maintaining the network of stations within it’s nation’s borders. The scope of their role will be defined by national legislation and policies. But a commercial organisation might also choose to take on responsibility for running a collection of stations.
In our data infrastructure, the central database of observations will be curated and managed by a single organisation. The (fictional) Global Weather Office. Our Global Weather Office will do more than just manage data assets. It also has a role to play in choosing and defining the data standards that support data collection. And it helps to certify which models of weather station conform to those standards.
Organisations are a key building block of data infrastructure. The organisational models that we choose to govern a data infrastructure and which take responsibility for its sustainability, are an important part of its design.
The value of the weather observations comes from their use. E.g. as input into predictive models to create weather forecasts and other services. Many organisations will use the observation data provided by our data infrastructure to create a range of products and services. E.g. national weather forecasts, or targeted advice for farmers that is delivered via farm management systems. The data might also be used by researchers. Or by environmental policy-makers to inform their work.
Mapping out the ecosystem of organisations that operate and benefit from our data infrastructure will help us to understand the roles and responsibilities of each organisation. It will also help clarify how and where value is being created.
Guidance and Policies
With so many different organisations operating, governing and benefiting from our data infrastructure we need to think about how they are supported in creating value from it.
To do this we will need to produce a range of guidance and policies, for example:
- Documentation for all of the data assets that helps to put them in context, allowing them to be successfully used to create products and services. This might include notes on how we have collected our data, the standards used, and locations of our stations.
- Recommendations for how data should be processed and interpreted to ensure that weather forecasts that use the data are reliable and safe
- Licences that define how the data assets can be used
- Documentation that describes the data governance processes that are applied to the data assets
- Policies that define how organisations gain access to the data infrastructure, e.g. to start supplying data from new stations
- Policies that decide how, when and where new stations might be added to the global network, to ensure that global coverage is maintained
- Procurement policies that define how stations and the services that relate to them purchased
- National regulations that apply to manufacture of weather stations, or that set safety standards that apply when they are installed or serviced
Guidance and policies are an important building block that help to shape the ecosystem that supports and benefits from our data infrastructure.
A strong data infrastructure will have policies and governance that will support equitable access to the system. Making infrastructure as open as possible will help to ensure that as many organisations as possible have the opportunity to use the assets it provides, and have equal opportunities to contribute to its operation.
Why do we collect weather data? We do it to help create weather forecasts, monitor climate change and a whole host of other reasons. We want the data to be used to make decisions.
Many different people and organisations might benefit from the weather data we are providing. A commuter might just want to know whether to take an umbrella to work. A farmer might want help in choosing which crops to plant. Or an engineer planning a difficult construction task made need to know the expected weather conditions.
Outside of the organisations who are directly interacting with our data infrastructure there will be a number of communities, made up of both individuals and organisations who will benefit from the products and services made with the data assets it provides. Communities are the final building block of our data infrastructure.
These communities will be relying on our data infrastructure to plan their daily lives, activities and to make business decisions. But they may not realise it. Good infrastructure is boring and reliable.
In his book on the social value of infrastructure, Brett Frischmann refers to infrastructure as “shared means to many ends”. Governing and maintaining infrastructure requires us to recognise this variety of interests and make choices that balances a variety of needs.
The choices we make about who has access to our data infrastructure, and how it will be made sustainable, will be important in ensuring that value can be created from it over the long-term.
Reviewing our building blocks
To summarise, our building blocks of data infrastructure are:
- Technology, of various kinds
- Organisations, who create, maintain, govern and use our infrastructure
- Guidance and Policies that inform is use
- Communities who are impacted or are affected by it
The building blocks have different sizes. Identifiers are a well-understood technical concept. Organisations, policies and communities are more complex, and perhaps less well-defined.
Understanding their relationships, and how they benefit from being more open, requires us to engage in some systems thinking. By identifying each building block I hope we can start to have deeper conversations about the systems we are building.
Over time we might be able to tease out more specific building blocks. We might be able to identify important organisational roles that occur as repeated patterns across different types of infrastructure. Or specific organisational models that have been found to be successful in creating trusted, sustainable infrastructures. Over time we might also identify key types of policy and guidance that are important elements of ensuring that a data infrastructure is successful. These are research questions that can help us refine our understanding of data as infrastructure.
There are other aspects of data infrastructure which we have not explicitly explored here. For example ethics and trust. This is because ethics is not a building block. It’s a way of working that will enable fairer, equitable access to data infrastructure by a variety of communities. Ethics should inform every decision and every activity we take to design, build and maintain our data infrastructure.
Trust is also not a building block. Trust emerges from how we operate and maintain our data infrastructures. Trust is earned, rather than designed into a system.
Help me make this better
I’ve written these posts to help me work through some of my thoughts around data infrastructure. There’s a lot more to be said about the different building blocks. And choosing other examples, e.g. that focus on data infrastructure that involves sharing of personal data like medical records, might better highlight some different characteristics.
Let me know what you think of this breakdown. Is it useful? Do you think I’ve missed some building blocks? Leave a comment or tweet me your thoughts.
Thanks to Peter Wells and Jeni Tennison for feedback and suggestions that have helped me write these posts.