Acceptable answers only

It can be hard to comment on a lot of tech news without coming across like Apu taking a bullet for a big tech platform. But a few aspects to the current debate around the new StackOverflow deal with OpenAI have irked me, as reported in TechCrunch and The Register and debated on Mastodon. So … Continue reading Acceptable answers only →

Design organisations not licences

There's an article in the Register this week about Bruce Perens' "Post-Open Zero Cost Licence". In brief, Perens is aiming to try to fix one problem that some people have with open source. Specifically finding a way for maintainers to get paid to continue to develop software. I'm being careful not to write "fix open … Continue reading Design organisations not licences →

A basis for better definitions of “open”

There's been a lot of discussion around what is means to be "open" recently. I think this has largely been driven by issues and concerns around the development and deployment of Large Language Models and claims for at least some of those models to be "open". What does it mean for an LL or other … Continue reading A basis for better definitions of “open” →

What datasets have been classified as Digital Public Goods?

Update: 2024-04-14, I've updated this post with some corrections. See below A couple of years ago I wrote a short series of posts looking at some different approaches for assessing data infrastructure. It includes this post on the Digital Public Goods standard and registry. Digital Public Goods are defined as: open-source software, open data, open … Continue reading What datasets have been classified as Digital Public Goods? →

Confused by SOLID

I keep checking in on the Solid project. But I'm baffled by its lack of functionality. I've written up some of my questions.

It takes about 4588 quasars to help you get around and get paid

I love learning about the data infrastructure that shapes the world we live in. Like all good infrastructure it's usually invisible, because it just works. But there's always something interesting to learn if you dig into the detail. For example, a few years ago when I was researching how geospatial data is accessed, used and … Continue reading It takes about 4588 quasars to help you get around and get paid →

Will AI hamper our ability to crawl the web for useful data?

As websites start to block Common Crawl, and as the project leans in to its role in training LLMs, will it become harder to use data from the web for other purposes?

Increasing consistency of data with FAIR Implementation Profiles

FAIR implementation profiles offer a means to increase consistency around how data is shared.

Norming on Mastodon

Reminding myself that in order to build norms we need to share them. Those "preachy" posts on Mastodon are an essential part of doing that.

Consistency before standards

Before jumping straight into scoping and designing new standards, we should look at other quick wins to increase consistency around how data is published.