[Paper Review] The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work

This blog post is a quick review and notes relating to a research paper called: The Coerciveness of the Primary Key: Infrastructure Problems in Human Services Work (PDF available here)

It’s part of my new research notebook to help me collect and share notes on research papers and reports.

Brief summary

This paper explores the impact of data infrastructure, and in particular the use of identifiers and the design of databases, on the delivery of human (public) services. By reviewing the use of identifiers and data in service delivery to support homelessness and those affected by AIDS, the authors highlight a number of tensions between how the design of data infrastructure and the need to share data with funders and other agencies has an inevitable impact on frontline services.

For example, the need to evidence impact to funders requires the collection of additional personal, legal identifiers. Even when that information is not critical to the delivery of support.

The paper also explores the interplay between the well defined, unforgiving world of database design, and the messy nature of delivering services to individuals. Along the way the authors touch on aspects of identity, identification, and explore different types of identifiers and data collection practices.

The authors draw out a number of infrastructure problems and provide some design provocations for alternate approaches. The three main problems are the immutability of identifiers in database schema, the “hegemony of NOT NULL” (or the need for identification), and the demand for uniqueness across contexts.

Three reasons to read

Here’s three reasons why you might want to read this paper:

  1. If, like me, you’re often advocating for use of consistent, open identifiers, then this paper provides a useful perspective of how this approach might create issues or unwanted side effects outside of the simpler world of reference data
  2. If you’re designing digital public services then the design provocations around identifiers and approaches to identification are definitely worth reading. I think there’s some useful reflections about how we capture and manage personal information
  3. If you’re a public policy person and advocating for consistent use of identifiers across agencies, then there’s some important considerations around the the policy, privacy and personal impacts of data collection in this paper

Three things I learned

Here’s three things that I learned from reading the paper.

  1. In a section on “The Data Work of Human Services Provision“, the authors highlighted three aspects of frontline data collection which I found it useful to think about:
    • data compliance work – collecting data purely to support the needs of funders, which might be at odds with the needs of both the people being supported and the service delivery staff
    • data coordination work – which stems from the need to link and aggregate data across agencies and funders to provide coordinated support
    • data confidence work – the need to build a trusted relationship with people, at the front-line, in order to capture valid, useful data
  2. Similarly, the authors tease out four reasons for capturing identifiers, each of which have different motivations, outcomes and approaches to identification:
    • counting clients – a basic need to monitor and evaluate service provision, identification here is only necessary to avoid duplicates when counting
    • developing longitudinal histories – e.g. identifying and tracking support given to a person over time can help service workers to develop understanding and improve support for individuals
    • as a means of accessing services – e.g. helping to identify eligibility for support
    • to coordinate service provision – e.g. sharing information about individuals with other agencies and services, which may also have different approaches to identification and use of identifiers
  3. The design provocations around database design were helpful to highlight some alternate approaches to capturing personal information and the needs of the service vs that of the individual

Thoughts and impressions

As someone who has not been directly involved in the design of digital systems to support human services, I found the perspectives and insight shared in this paper really useful. If you’ve been working in this space for some time, then it may be less insightful.

However I haven’t seen much discussion about good ways to design more humane digital services and, in particular, the databases behind them, so I suspect the paper could do with a wider airing. Its useful reading alongside things like Falsehoods Programmers Believe About Names and Falsehoods Programmers Believe About Gender.

Why don’t we have a better approach to managing personal information in databases? Are there solutions our there already?

Finally, the paper makes some pointed comments about the role of funders in data ecosystems. Funders are routinely collecting and aggregating data as part of evaluation studies, but this data might also help support service delivery if it were more accessible. It’s interesting to consider the balance between minimising unnecessary collection of data simply to support evaluation versus the potential role of funders as intermediaries that can provide additional support to charities, agencies or other service delivery organisations that may lack the time, funding and capability to do more with that data.

 

 

[Paper review] Open data for electricity modeling: Legal aspects

This blog post is a quick review and notes relating to a research paper called: Open data for electronic modeling: Legal aspects.

It’s part of my new research notebook to help me collect and share notes on research papers and reports.

Brief summary

The paper reviews the legal status of publicly available energy data (and some related datasets) in Europe, with a focus on German law. The paper is intended to help identify some of the legal issues relevant to creation of analytical models to support use of energy data, e.g. for capacity planning.

As background, the paper describes the types of data relevant to building these types of model, the relevant aspects of database and copyright law in the EU and the properties of open licences. This background is used to assess some of the key data assets published in the EU and how they are licensed (or not) for reuse.

The paper concludes that the majority of uses of this data to support energy modelling in the EU, whether for research or other purposes, is likely to be infringing on the rights of the database holders, meaning that users are currently carrying legal risks. The paper notes that in many cases this is likely not the intended outcome.

The paper provides a range of recommendations to address this issue, including the adoption of open licences.

Three reasons to read

Here’s three reasons why you might want to read this paper

  1. It provides a helpful primer on the range of datasets and data types that are used to develop applications in the energy sector in the EU. Useful if you want to know more about the domain
  2. The background information on database rights and related IP law is clearly written and a good introduction to the topic
  3. The paper provides a great case study of how licensing and legal protections applies to data use in a sector. The approach taken could be reused and extended to other areas

Three things I learned

Here’s three things that I learned from reading the paper.

  1. That a database might be covered by copyright (an “original” database) in addition to database rights. But the authors note this doesn’t apply in the case of a typical energy dataset
  2. That individual member states might have their own statutory exemptions to the the Database Directive. E.g. in Germany it doesn’t apply to use of data in non-commercial teaching. So there is variation in how it applies.
  3. The discussion on how the Database Directive relates to statutory obligations to publish data was interesting, but highlights that the situation is unclear.

Thoughts and impressions

Great paper that clearly articulates the legal issues relating to publication and use of data in the energy sector in the EU. It’s easy to extrapolate from this work to other use cases in energy and by extension to other sectors.

The paper concludes with a good set of recommendations: the adoption of open licences, the need to clarify rights around data reuse and the role of data institutions in doing that, and how policy makers can push towards a more open ecosystem.

However there’s a suggestion that funders should just mandate open licences when funding academic research. While this is the general trend I see across research funding, in the context of this article it lacks a bit of nuance. The paper clearly indicates that the current status quo is that data users do not have the rights to apply open licences to the data they are publishing and generating. I think funders also need to engage with other policy makers to ensure that upstream provision of data is aligned with an open research agenda. Otherwise we risk perpetuating an unclear landscape of rights and permissions. The authors do note the need to address wider issues, but I think there’s a potential role of research funders in helping to drive change.

Finally, in their review of open licences, the authors recommend a move towards adoption of CC0 (public domain waivers and marks) and CC-BY 4.0. But they don’t address the fact that upstream licensing might limit the choice of how researchers can licence downstream data.

Specifically, the authors note the use of OpenStreetmap data to provide infrastructure data. However depending on your use, you may need to adopt this licence when republishing data. This can be at odds with a mandate to use other licences or restrictive licences used by other data stewards.