How could watermarking AI help build trust?

I've been reading about different approaches to watermarking AI and the datasets used to train them. This seems to be an active area of research within the machine learning community. But, of the papers I've looked at so far, there hasn't been much discussion of how these techniques might be applied and what groundwork needs … Continue reading How could watermarking AI help build trust?

Why are we still building portals?

The Geospatial Commission have recently published some guidance on Designing Geospatial Data Portals. There's a useful overview in the accompanying blog post. It's good clear guidance that should help anyone building a data portal. It has tips for designing search interfaces, presenting results and dataset metadata. There's very little advice that is specifically relevant to … Continue reading Why are we still building portals?

24 different tabular formats for half-hourly energy data

A couple of months ago I wrote a post that provided some background on the data we use in Energy Sparks. The largest data source comes from gas and electricity meters (consumption) and solar panels (generation). While we're integrating with APIs that allow us to access data from smart meters, for the foreseeable future most … Continue reading 24 different tabular formats for half-hourly energy data

Schema explorers and how they can help guide adoption of common standards

Despite being very different projects Wikidata and OpenStreetmap have a number of similarities. Recurring patterns in how they organise and support the work of their communities. We documented a number of these patterns in the ODI Collaborative Maintenance Guidebook. There were also a number we didn't get time to write-up. A further pattern which I … Continue reading Schema explorers and how they can help guide adoption of common standards

Some lessons learned from building standards around Schema.org

OpenActive is a community-led initiative in the sport and physical activity sector in England. It's goal is to help to get people healthier and more active by making its easier for people to find information about activities and events happening in their area. Publishing open data about opportunities to be active is a key part … Continue reading Some lessons learned from building standards around Schema.org

The importance of tracking dataset retractions and updates

There are lots of recent examples of researchers collecting and releasing datasets which end up raising serious ethical and legal concerns. The IBM facial recognition dataset being just one example that springs to mind. I read an interesting post exploring how facial recognition datasets are being widely used despite being taken down due to ethical … Continue reading The importance of tracking dataset retractions and updates

Increasing inclusion around open standards for data

I read an interesting article this week by Ana Brandusescu, Michael Canares and Silvana Fumega. Called "Open data standards design behind closed doors?" it explores issues of inclusion and equity around the development of "open data standards" (which I'm reading as "open standards for data"). Ana, Michael and Silvana rightly highlight that standards development is … Continue reading Increasing inclusion around open standards for data