I've been reading about different approaches to watermarking AI and the datasets used to train them. This seems to be an active area of research within the machine learning community. But, of the papers I've looked at so far, there hasn't been much discussion of how these techniques might be applied and what groundwork needs … Continue reading How could watermarking AI help build trust?
Category: Data Infrastructure
Why are we still building portals?
The Geospatial Commission have recently published some guidance on Designing Geospatial Data Portals. There's a useful overview in the accompanying blog post. It's good clear guidance that should help anyone building a data portal. It has tips for designing search interfaces, presenting results and dataset metadata. There's very little advice that is specifically relevant to … Continue reading Why are we still building portals?
24 different tabular formats for half-hourly energy data
A couple of months ago I wrote a post that provided some background on the data we use in Energy Sparks. The largest data source comes from gas and electricity meters (consumption) and solar panels (generation). While we're integrating with APIs that allow us to access data from smart meters, for the foreseeable future most … Continue reading 24 different tabular formats for half-hourly energy data
Schema explorers and how they can help guide adoption of common standards
Despite being very different projects Wikidata and OpenStreetmap have a number of similarities. Recurring patterns in how they organise and support the work of their communities. We documented a number of these patterns in the ODI Collaborative Maintenance Guidebook. There were also a number we didn't get time to write-up. A further pattern which I … Continue reading Schema explorers and how they can help guide adoption of common standards
Building data validators
This is a post about building tools to validate data. I wanted to share a few reflections based on helping to design and build a few different public and private tools, as well as my experience as a user. I like using data validators to check my homework. I've been using a few different recently … Continue reading Building data validators
Some lessons learned from building standards around Schema.org
OpenActive is a community-led initiative in the sport and physical activity sector in England. It's goal is to help to get people healthier and more active by making its easier for people to find information about activities and events happening in their area. Publishing open data about opportunities to be active is a key part … Continue reading Some lessons learned from building standards around Schema.org
The UK Smart Meter Data Ecosystem
Disclaimer: this blog post is about my understanding of the UK's smart meter data ecosystem and contains some opinions about how it might evolve. These do not in any way reflect those of Energy Sparks of which I am a trustee. This blog post is an introduction to the UK's smart meter data ecosystem. It … Continue reading The UK Smart Meter Data Ecosystem
The importance of tracking dataset retractions and updates
There are lots of recent examples of researchers collecting and releasing datasets which end up raising serious ethical and legal concerns. The IBM facial recognition dataset being just one example that springs to mind. I read an interesting post exploring how facial recognition datasets are being widely used despite being taken down due to ethical … Continue reading The importance of tracking dataset retractions and updates
Four types of innovation around data
Vaughn Tan's The Uncertainty Mindset is one of the most fascinating books I've read this year. It's an exploration of how to build R&D teams drawing on lessons learned in high-end kitchens around the world. I love cooking and I'm interested in creative R&D and what makes high-performing teams work well. I'd strongly recommend it … Continue reading Four types of innovation around data
Increasing inclusion around open standards for data
I read an interesting article this week by Ana Brandusescu, Michael Canares and Silvana Fumega. Called "Open data standards design behind closed doors?" it explores issues of inclusion and equity around the development of "open data standards" (which I'm reading as "open standards for data"). Ana, Michael and Silvana rightly highlight that standards development is … Continue reading Increasing inclusion around open standards for data