Cooking up a new approach to supporting purposeful use of data

In my last post I explored how we might better support the use of datasets. To do that I applied the BASEDEF framework to outline the ways in which communities might collaborate to help unlock more value from individual datasets.

But what if we changed our focus from supporting discovery and use of datasets and instead focused on helping people explore specific types of problems or questions?

Our paradigm around data discovery is based on helping people find individual datasets. But unless a dataset has been designed to answer the specific question you have in mind, then it’s unlikely to be sufficient. Any non-trivial analysis is likely to need multiple datasets.

We know that data is more useful when it is combined, so why isn’t our approach to discovery based around identifying useful collections of datasets?

A cooking metaphor

To explore this further let’s use a cooking metaphor. I love cooking.

Many cuisines are based on a standard set of elements. Common spices or ingredients that become the base of most dishes. Like a mirepoix, a sofrito, the holy trinity of Cajun cooking, or the mother sauces in French cuisine.

As you learn to cook you come to appreciate how these flavour bases and sauces can be used to create a range of dishes. Add some extra spices and ingredients and you’ve created a complete dish.

Recipes help us consistently recreate these sauces.

A recipe consists of several elements. It will have a set of ingredients and a series of steps to combine them. A good recipe will also include some context. For example some background on the origins of the recipe and descriptions of unusual spices or ingredients. It might provide some things to watch out for during the cooking (“don’t burn the spices”) or suggest substitutions for difficult to source ingredients.

Our current approach to dataset discovery involves trying to document the provenance of an individual ingredient (a dataset) really well. We aren’t helping people combine them together to achieve results.

Efforts to improve dataset metadata, documentation and provenance reporting are important. Projects like the dataset nutrition label are great examples of that. We all want to be ethical, sustainable cooks. To do that we need to make informed choices about our ingredients.

But, to whisk these food metaphors together, nutrition labels are there to help you understand what’s gone into your supermarket pasta sauce. It’s not giving you a recipe to cook it from scratch for yourself. Or an idea of how to use the sauce to make a tasty dish.

Recipes for data-informed problem solving

I think we should think about sharing dataset recipes: instructions for how to mix up a selection of dataset ingredients. What would they consist of?

Firstly, the recipe would need to based around a specific type of question, problem or challenge.  Examples might include:

  • How can I understand air quality in my city?
  • How is deprivation changing in my local area?
  • What are the impacts of COVID-19 in my local authority?

Secondly, a recipe would include a list of datasets that have to be sourced, prepared and combined together to explore the specific problem. For example, if you’re exploring impacts of COVID-19 in your local authority you’re probably going to need:

  • demographic data from the most recent census
  • spatial boundaries to help visualise and present results
  • information about deprivation to help identify vulnerable people

Those three datasets are probably the holy trinity of any local spatial analysis?

Finally, you’re going to need some instructions for how to combine the datasets together. The instructions might identify some tools you need (Excel or QGIS), reference some techniques (Reprojection) and maybe some hints about how to substitute for key ingredients if you can’t get them in your local area (FOI).

The recipe might ways to vary the recipe for different purposes: add a sprinkle of Companies House data to understand your local business community, and a dash of OpenStreetMap to identify greenspaces?

As a time saver maybe you can find some pre-made versions of some of the steps in the recipe?

Examples in the wild

OK, its easy to come up with a metaphor and an idea. But would this actually meet a need? There’s a few reasons why I’m reasonably confident that dataset recipes could be helpful. Mostly because I can see this same approach re-appearing in some related contexts. For example:

If you have examples then let me know in the comments or on twitter.

How can dataset recipes help?

I think there’s a whole range of ways in which these types of recipe can be useful.

Data analysis always starts by posing a question. By documenting how datasets can be applied specific questions will make them easier to find on search engines. It just fits better with what people want to do.

Data discovery is important during periods where there is a sudden influx of new potential users. For example, where datasets have just been published under an open licence and are now available to more people, for a wider range of purposes.

In my experience data analysts and scientists who understand a domain, e.g population or transport modelling, have built up an tacit understanding of what datasets are most useful in different contexts. They understand the limitations and the process of combining datasets together. This thread from Chris Gale with a recipe about doing spatial analysis using PHE’s COVID-19 data is a perfect example. Documenting and sharing this knowledge can help others to do similar analyses. It’s like a cooking masterclass.

Discovery is also difficult when there is a sudden influx of new data available. Such as during this pandemic. Writing recipes is a good way to share learning across a community.

Documenting useful recipes might help us scale innovation across local areas.

Lastly, we’re still trying to understand which datasets are a most important part of our local, national and international data infrastructure. We’re currently lacking any real quantitative information about how datasets are combined together. In the same way that recipes can be analysed to create ingredient networks, dataset recipes could be analysed to find out how datasets are being used together. We can then strengthen that infrastructure.

If you’ve built something that helps people publish dataset recipes then send me a link to your app. I’d like to try it.

How can you help support the use of a dataset?

Getting the most value from data, whilst minimising its harmful impacts, is a community activity. Datasets need to be governed and published well. Most of that responsibility falls on the data publisher. Because the choices they make shapes data ecosystems.

But other people have a role to play too. Being a good data user means engaging with that process.

Helping others to find data and find the value in it, feels particularly important at the moment. During the pandemic there are many new datasets becoming available. And there are lots of questions to be answered. Some of them can be answered through better use of data.

So, how can communities work together to support use of data?

There are a lot of different ways to explore that question. But there’s a framework called BASEDEF, created by the open source community, which I find helpful.

BASEDEF stands for Blog, Apply, Suggest, Extend, Document, Evangelize and Fix. It describes the different types of contributions that can support an open source project. It can also be applied to help organise a small team in doing that work. Here’s a handy cheat sheet.

But the framework can also be applied to the task of supporting the use of an openly licensed dataset. Let’s run through the framework with that in mind.


Blog

You can write about a dataset to help others to discover it. You can help explain the potential value of applying the dataset to specific problems. Or perhaps you can see some downsides that others should consider.

Writing about how a dataset has been useful to you, by describing how you’ve successfully applied it in a project, will also help others see its potential value.

Apply

You can show how a dataset can be used, by creating something with it. You might do a detailed analysis of the data, but some simpler contributions can also be helpful.

For example you might create a simple visualisation. Or write and publish some code that illustrates how the dataset can be accessed and used. You could publish a quick demo showing how the dataset can be imported and used in some frequently used tools and platforms.

At the moment everyone is a bit tired of charts and graphs. And I agree with the first principle in the visualisation design principles for the pandemic. But a helpful visualisation can do a range of things. Visualisation can be exploratory rather than explanatory.

A visualisation could support other people in understanding the shape of a dataset, to inform their analysis and interpretation of it. It can help identify outliers, gaps, or highlight some of the richness in the data. I’d recommend making it clear when you’re doing it type of visualisation, rather than trying to derive specific insights.

Suggest

Read the documentation. Download and explore the dataset. Ask questions. Give feedback.

Make suggestions to the publisher about changes they could make to publish the data better. Rather than just offer academic critique, be clear about how suggested changes will support your needs or that of your community.

Extend

The freedoms granted by an open licence allow you to enrich and improve a dataset.

Sometimes the smallest changes can have the most impact. Convert the data into other common or standard formats. Extracting data from spreadsheets into CSV files. Convert data published in more complex formats or via APIs into simpler tabular data to make it more accessible to analysts rather than programmers.

Or maybe you can enrich a dataset by adding identifiers that will allow it to be linked to other sources. Do the work of merging with other datasets to bring in more context.

The downside here is that if the original data changes your extended version will get out of date. If you can’t commit to keeping your version up to date, then be sure to share your code and document your methods.

Allow others to repeat the steps you’ve taken. And don’t forget to suggest the improvements to the publisher.

Document

Write additional documentation to fill in gaps where the publisher has not provided sufficient background or explanation. Explain technical concepts or academic terms to a non-specialist audience.

As a user of the data, you’re able to write that documentation from a perspective that reflects the needs and questions of your specific community and the kinds of questions you need to ask. The original publisher might not have all that context or understand those needs, so this work can be really helpful.

Good documentation can be a finding aid. There are structured ways that you can go about writing documentation, such as this tool for writing civic data guides. (Check out some of the examples).

Evangelise

Email people that might have a need for the data. Tweet about it to a wider community. Highlight it in a presentation. Talk about it over coffee Zoom.

Fix

If the dataset is collaboratively maintained then go ahead and fix errors and omissions. If you’re not confident about making a fix, then submit an error report. In addition to fixing errors you might be able to help verify that data is correct.

If a dataset isn’t collaboratively maintained then, when you find errors, be sure to flag them to the publisher and highlight the issue for others. Or consider publishing an enriched version with fixes applied.


This framework isn’t perfect. The name is a bit clunky for a start. But there’s a couple of things that I like about it.

Firstly, it recognises that not all contributions need to be technical. There’s room for others to use different skills and in different ways.

Secondly, the elements overlap and reinforce one another. Writing documentation and blogging about how you’ve used a dataset helps to evangelise it. Enriching a dataset can help demonstrate in a practical way how a publisher can improve how data is published.

Finally, it serves to highlight some important aspects of community curation which aren’t always well supported in existing data platforms and portals. We can do better here.

If you’re interested in working on adapting this further then happy to chat!. It might be useful to have a cheat sheet that supports its application to data and more examples of how to do these different elements well.

How can publishing more data decrease the value of existing data?

Last month I wrote a post looking at how publishing new data might increase the value of existing data. I ended up listing seven different ways including things like improving validation, increasing coverage, supporting the ability to link together datasets, etc.

But that post only looked at half of the issue. What about the opposite? Are there ways in which publishing new data might reduce the value of data that’s already available?

The short answer is: yes there are.  But before jumping into that, lets take a moment to reflect on the language we’re using.

A note on language

The original post was prompted by an economic framing of the value of data. I was exploring how the option value for a dataset might be affected by increasing access to other data. While this post is primarily looking at how option value might be reduced, we need to acknowledge that “value” isn’t the only way to frame this type of question.

We might also ask, “how might increasing access to data increase potential for harms?” As part of a wider debate around the issues of increasing access to data, we need to use more than just economic language. There’s a wealth of good writing about the impacts of data on privacy and society which I’m not going to attempt to precis here.

It’s also important to highlight that “increasing value” and “decreasing value” are relative terms.

Increasing the value of existing datasets will not seem like a positive outcome if your goal is to attempt to capture as much value as possible, rather than benefit a broader ecosystem. Similarly, decreasing value of existing data, e.g. through obfuscation, might be seen as a positive outcome if it results in better privacy or increased personal safety.

Decreasing value of existing data

Having acknowledged that, lets try and answer the earlier question. In what ways can publishing new data reduce the value we can derive from existing data?

Increased harms leading to retraction and reduced trust

Publishing new data always runs the risk of re-identification and the enabling of unintended inferences. While the impacts of these harms are likely to be most directly felt by both communities and individuals, there are also broader commercial and national security issues. Together, these issues might ultimately reduce the value of the existing data ecosystem in several ways:

  • Existing datasets may need to be retracted, have their scope changed, or have their circulation reduced in order to avoid further harm. Data privacy impact assessments will need to be updated as the contexts in which data is being shared and published change
  • Increased concerns over potential privacy impacts might lead to organisations to choose not to increase access to similar or related datasets
  • Increased concerns might also lead communities and individuals to reduce the amount of data they are willing to share with previously trusted sources

Overall this can lead to a reduction in the overall coverage, quality and linking of data across a data ecosystem. It’s likely to be one of the most significant impact of poorly considered data releases. It can be mitigated through proper impact assessments, consultation and engagement.

Reducing overall quality

Newly published data might be intended to increase coverage, enrich, link, validate or otherwise improve existing data. But it might actually have the opposite effect because its of poor quality. I’ve briefly touched on this in a previous post on fictional data.

Publication of poor quality data might be unintended. For example an organisation may just be publishing the data it has to help address an issue, without properly considering or addressing underlying problems with it. Or a researcher may publish data that contains honest mistakes.

But publication of poor quality data might also be deliberate. For example as spam or misinformation intended to “poison the well“.

More subtly, practices like p-hacking and falsification of data which might be intended to have a short-term direct benefit to the publisher or author, might have longer term issues by impacting the use of other datasets.

This is why understanding and documenting the provenance of data, monitoring of retractions, fixes and updates to data, and the ability to link analyses with datasets are all so important.

Creating unnecessary competition or increasing friction

Publishing new datasets containing new observations and data about an area or topic of interest can lead to positive impacts, e.g. by increasing confidence or coverage. But datasets are also competing with one another. The same types of data might be available from different sources, but under different licences, access arrangements, pricing, etc.

This competition isn’t necessarily positive. For example, the data ecosystem might not benefit as much from the network effects that follow from linking data because key datasets are not linked or cannot be used together. Incompatible and competing datasets can add friction across an ecosystem.

Building poor foundations

Data is often published as a means of building stronger data infrastructure for a sector, or to address a specific challenge. But if that data is poorly maintained or is not sustainably funded, then the energy that goes into building the communities, tools and other datasets around that infrastructure might be wasted.

That reduces the value of existing datasets which might otherwise have provided a better foundation to build upon. Or whose quality is dependent on the shared infrastructure. While this issue is similar to that of the previous one about competition, its root causes and impacts are slightly different.

 

As I noted in my earlier post. I don’t think this is an exhaustive list and it can be improved by contributions. Leave a comment if you have any thoughts.

How can publishing more data increase the value of existing data?

There’s lots to love about the “Value of Data” report. Like the fantastic infographic on page 9. I’ll wait while you go and check it out.

Great, isn’t it?

My favourite part about the paper is that it’s taught me a few terms that economists use, but which I hadn’t heard before. Like “Incomplete contracts” which is the uncertainty about how people will behave because of ambiguity in norms, regulations, licensing or other rules. Finally, a name to put to my repeated gripes about licensing!

But it’s the term “option value” that I’ve been mulling over for the last few days. Option value is a measure of our willingness to pay for something even though we’re not currently using it. Data has a large option value, because its hard to predict how its value might change in future.

Organisations continue to keep data because of its potential future uses. I’ve written before about data as stored potential.

The report notes that the value of a dataset can change because we might be able to apply new technologies to it. Or think of new questions to ask of it. Or, and this is the interesting part, because we acquire new data that might impact its value.

So, how does increasing access to one dataset affect the value of other datasets?

Moving data along the data spectrum means that increasingly more people will have access to it. That means it can be used by more people, potentially in very different ways than you might expect. Applying Joy’s Law then we might expect some interesting, innovative or just unanticipated uses. (See also: everyone loves a laser.)

But more people using the same data is just extracting additional value from that single dataset. It’s not directly impacting the value of other dataset.

To do that we need to use that in some specific ways. So far I’ve come up with seven ways that new data can change the value of existing data.

  1. Comparison. If we have two or more datasets then we can compare them. That will allow us to identify differences, look for similarities, or find correlations. New data can help us discover insights that aren’t otherwise apparent.
  2. Enrichment. New data can enrich an existing data by adding new information. It gives us context that we didn’t have access to before, unlocking further uses
  3. Validation. New data can help us identify and correct errors in existing data.
  4. Linking. A new dataset might help us to merge some existing dataset, allowing us to analyse them in new ways. The new dataset acts like a missing piece in a jigsaw puzzle.
  5. Scaffolding. A new dataset can help us to organise other data. It might also help us collect new data.
  6. Improve Coverage. Adding more data, of the same type, into an existing pool can help us create a larger, aggregated dataset. We end up with a more complete dataset, which opens up more uses. The combined dataset might have a a better spatial or temporal coverage, be less biased or capture more of the world we want to analyse
  7. Increase Confidence. If the new data measures something we’ve already recorded, then the repeated measurements can help us to be more confident about the quality of our existing data and analyses. For example, we might pool sensor readings about the weather from multiple weather stations in the same area. Or perform a meta-analysis of a scientific study.

I don’t think this is exhaustive, but it was a useful thought experiment.

A while ago, I outlined ten dataset archetypes. It’s interesting to see how these align with the above uses:

  • A meta-analysis to increase confidence will draw on multiple studies
  • Combining sensor feeds can also help us increase confidence in our observations of the world
  • A register can help us with linking or scaffolding datasets. They can also be used to support validation.
  • Pooling together multiple descriptions or personal records can help us create a database that has improved coverage for a specific application
  • A social graph is often used as scaffolding for other datasets

What would you add to my list of ways in which new data improves the value of existing data? What did I miss?

Three types of agreement that shape your use of data

Whenever you’re accessing, using or sharing data you will be bound by a variety of laws and agreements. I’ve written previously about how data governance is a nested set of rules, processes, legislation and norms.

In this post I wanted to clarify the differences between three types of agreements that will govern your use of data. There are others. But from a data consumer point of view these are most common.

If you’re involved in any kind of data project, then you should have read all of relevant agreements that relate to data you’re planning to use. So you should know what to look for.

Data Sharing Agreements

Data sharing agreements are usually contracts that will have been signed between the organisations sharing data. They describe how, when, where and for how long data will be shared.

They will include things like the purpose and legal basis for sharing data. They will describe the important security, privacy and other considerations that govern how data will be shared, managed and used. Data sharing agreements might be time-limited. Or they might describe an ongoing arrangement.

When the public and private sector are sharing data, then publishing a register of agreements is one way to increase transparency around how data is being shared.

The ICO Data Sharing Code of Practice has more detail on the kinds of information a data sharing agreement should contain. As does the UK’s Digital Economy Act 2017 code of practice for data sharing. In a recent project the ODI and CABI created a checklist for data sharing agreements.

Data sharing agreements are most useful when organisations, of any kind, are sharing sensitive data. A contract with detailed, binding rules helps everyone be clear on their obligations.

Licences

Licences are a different approach to defining the rules that apply to use of data. A licence describes the ways that data can be used without any of the organisations involved having to enter into a formal agreement.

A licence will describe how you can use some data. It may also place some restrictions on your use (e.g. “non-commercial”) and may spell out some obligations (“please say where you got the data”). So long as you use the data in the described ways, then you don’t need any kind of explicit permission from the publisher. You don’t even have to tell them you’re using it. Although it’s usually a good idea to do that.

Licences remove the need to negotiate and sign agreements. Permission is granted in advance, with a few caveats.

Standard licences make it easier to use data from multiple sources, because everyone is expecting you to follow the same rules. But only if the licences are widely adopted. Where licences don’t align, we end up with unnecessary friction.

Licences aren’t time-limited. They’re perpetual. At least as long as you follow your obligations.

Licences are best used for open and public data. Sometimes people use data sharing agreements when a licence might be a better option. That’s often because organisations know how to do contracts, but are less confident in giving permissions. Especially if they’re concerned about risks.

Sometimes, even if there’s an open licence to use data, a business would still prefer to have an agreement in place. That’s might be because the licence doesn’t give them the freedoms they want, or they’d like some additional assurances in place around their use of data.

Terms and Conditions

Terms and conditions, or “terms of use” are a set of rules that describe how you can use a service. Terms and conditions are the things we all ignore when signing up to website. But if you’re using a data portal, platform or API then you need to have definitely checked the small print. (You have, haven’t you?)

Like a Data Sharing Agreement, a set of terms and conditions is something that you formally agree to. It might be by checking a box rather than signing a document, but its still an agreement.

Terms of use will describe the service being offered and the ways in which you can use it. Like licences and data sharing agreements, they will also include some restrictions. For example whether you can build a commercial service with it. Or what you can do with the results.

A good set of terms and conditions will clearly and separately identify those rules that relate to your use of the service (e.g. how often you can use it) from those rules that relate to the data provided to you. Ideally the terms would just refer to a separate licence. The Met Office Data Point terms do this.

A poorly defined set of terms will focus on the service parts but not include enough detail about your rights to use and reuse data. That can happen if the emphasis has been on the terms of use of the service as a product, rather than around the sharing of data.

The terms and conditions for a data service and the rules that relate to the data are two of the important decisions that shape the data ecosystem that service will enable. It’s important to get them right.

Hopefully that’s a helpful primer. Remember, if you’re in any kind of role using data then you need to read the small print. If not, then you’re potentially exposing yourself and others to risks.

When can expect more from data portability?

We’re at the end of week 5 of 2020, of the new decade and I’m on a diet.

I’m back to using MyFitnessPal again. I’ve used it on and off for the last 10 years whenever I’ve decided that now is the time to be more healthy. The sporadic, but detailed history of data collection around my weight and eating habits mark out each of the times when this time was going to be the time when I really made a change.

My success has been mixed. But the latest diet is going pretty well, thanks for asking.

This morning the app chose the following feature to highlight as part of its irregular nudges for me to upgrade to premium.

Downloading data about your weight, nutrition and exercise history are a premium feature of the service. This gave me pause for thought for several reasons.

Under UK legislation, and for as long as we maintain data adequacy with the EU, I have a right to data portability. I can request access to any data about me, in a machine-readable format, from any service I happen to be using.

The company that produce MyFitnessPal, Under Armour, do offer me a way to exercise this right. It’s described in their privacy policy, as shown in the following images.

Note about how to exercise your GDPR rights in MyFitnessPalData portability in MyFitnessPal

Rather than enabling this access via an existing product feature, they’ve decide to make me and everyone else request the data directly. Every time I want to use it.

This might be a deliberate decision. They’re following the legislation to the letter. Perhaps its a conscious decision to push people towards a premium service, rather than make it easy by default. Their user base is international, so they don’t have to offer this feature to everyone.

Or maybe its the legal and product teams not looking at data portability as an opportunity. That’s something that the ODI has previously explored.

I’m hoping to see more exploration of the potential benefits and uses of data portability in 2020.

I think we need to re-frame the discussion away from compliance and on to commercial and consumer benefits. For example, by highlighting how access to data contributes to building ecosystems around services, to help retain and grow a customer base. That is more likely to get traction than a continued focus on compliance and product switching.

MyFitnessPal already connects into an ecosystem of other services. A stronger message around portability might help grow that further.  After all, there are more reasons to monitor what you eat than just weight loss.

Clearer legislation and stronger guidance from organisations like ICO and industry regulators describing how data portability should be implemented would also help. Wider international adoption of data portability rights wouldn’t hurt either.

There’s also a role for community driven projects to build stronger norms and expectations around data portability. Projects like OpenSchufa demonstrate the positive benefits of coordinated action to build up an aggregated view of donated, personal data.

But I’d also settle with a return to the ethos of the early 2010s, when making data flow between services was the default. Small pieces, loosely joined.

If we want the big platforms to go on a diet, then they’re going to need to give up some of those bytes.

[Paper Review] The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work

This blog post is a quick review and notes relating to a research paper called: The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work (PDF available here)

It’s part of my new research notebook to help me collect and share notes on research papers and reports.

Brief summary

This paper explores the impact of data infrastructure, and in particular the use of identifiers and the design of databases, on the delivery of human (public) services. By reviewing the use of identifiers and data in service delivery to support homelessness and those affected by AIDS, the authors highlight a number of tensions between how the design of data infrastructure and the need to share data with funders and other agencies has an inevitable impact on frontline services.

For example, the need to evidence impact to funders requires the collection of additional personal, legal identifiers. Even when that information is not critical to the delivery of support.

The paper also explores the interplay between the well defined, unforgiving world of database design, and the messy nature of delivering services to individuals. Along the way the authors touch on aspects of identity, identification, and explore different types of identifiers and data collection practices.

The authors draw out a number of infrastructure problems and provide some design provocations for alternate approaches. The three main problems are the immutability of identifiers in database schema, the “hegemony of NOT NULL” (or the need for identification), and the demand for uniqueness across contexts.

Three reasons to read

Here’s three reasons why you might want to read this paper:

  1. If, like me, you’re often advocating for use of consistent, open identifiers, then this paper provides a useful perspective of how this approach might create issues or unwanted side effects outside of the simpler world of reference data
  2. If you’re designing digital public services then the design provocations around identifiers and approaches to identification are definitely worth reading. I think there’s some useful reflections about how we capture and manage personal information
  3. If you’re a public policy person and advocating for consistent use of identifiers across agencies, then there’s some important considerations around the the policy, privacy and personal impacts of data collection in this paper

Three things I learned

Here’s three things that I learned from reading the paper.

  1. In a section on “The Data Work of Human Services Provision“, the authors highlighted three aspects of frontline data collection which I found it useful to think about:
    • data compliance work – collecting data purely to support the needs of funders, which might be at odds with the needs of both the people being supported and the service delivery staff
    • data coordination work – which stems from the need to link and aggregate data across agencies and funders to provide coordinated support
    • data confidence work – the need to build a trusted relationship with people, at the front-line, in order to capture valid, useful data
  2. Similarly, the authors tease out four reasons for capturing identifiers, each of which have different motivations, outcomes and approaches to identification:
    • counting clients – a basic need to monitor and evaluate service provision, identification here is only necessary to avoid duplicates when counting
    • developing longitudinal histories – e.g. identifying and tracking support given to a person over time can help service workers to develop understanding and improve support for individuals
    • as a means of accessing services – e.g. helping to identify eligibility for support
    • to coordinate service provision – e.g. sharing information about individuals with other agencies and services, which may also have different approaches to identification and use of identifiers
  3. The design provocations around database design were helpful to highlight some alternate approaches to capturing personal information and the needs of the service vs that of the individual

Thoughts and impressions

As someone who has not been directly involved in the design of digital systems to support human services, I found the perspectives and insight shared in this paper really useful. If you’ve been working in this space for some time, then it may be less insightful.

However I haven’t seen much discussion about good ways to design more humane digital services and, in particular, the databases behind them, so I suspect the paper could do with a wider airing. Its useful reading alongside things like Falsehoods Programmers Believe About Names and Falsehoods Programmers Believe About Gender.

Why don’t we have a better approach to managing personal information in databases? Are there solutions our there already?

Finally, the paper makes some pointed comments about the role of funders in data ecosystems. Funders are routinely collecting and aggregating data as part of evaluation studies, but this data might also help support service delivery if it were more accessible. It’s interesting to consider the balance between minimising unnecessary collection of data simply to support evaluation versus the potential role of funders as intermediaries that can provide additional support to charities, agencies or other service delivery organisations that may lack the time, funding and capability to do more with that data.