Some idle thoughts for a Friday afternoon. I was just taking a look at Source.Plus a dataset of public domain images for training Foundation models. It's a project of Spawning.ai which is working to build "data governance for generative AI". I have some thoughts on the tools they're building, but that's not what I'm writing … Continue reading What does community-driven data governance look like?
Year: 2024
Comments on “A data for AI taxonomy”
Jack Hardinges and Elena Simperl recently published a taxonomy to describe the data relevant to AI models and systems. Their goal is to help to better distinguish between the different types of data relevant to developing, using and monitoring AI models and systems to help to better distinguish them and thereby add some nuance to … Continue reading Comments on “A data for AI taxonomy”
How to accidentally DDOS yourself
We had some performance issues last week. Entirely of our own making but not in the usual way. We nearly DDOS ourselves by sending out emails. We do a lot of analysis in Energy Sparks and, to be honest, some of it needs optimising. Tickets are in the backlog and we are exploring solutions. Anyway, … Continue reading How to accidentally DDOS yourself
Acceptable answers only
It can be hard to comment on a lot of tech news without coming across like Apu taking a bullet for a big tech platform. But a few aspects to the current debate around the new StackOverflow deal with OpenAI have irked me, as reported in TechCrunch and The Register and debated on Mastodon. So … Continue reading Acceptable answers only
Design organisations not licences
There's an article in the Register this week about Bruce Perens' "Post-Open Zero Cost Licence". In brief, Perens is aiming to try to fix one problem that some people have with open source. Specifically finding a way for maintainers to get paid to continue to develop software. I'm being careful not to write "fix open … Continue reading Design organisations not licences
I made a Downpour…game?
I've been playing with Downpour recently. It's a lot of fun. You could explain what Downpour is by comparing it to something like Hypercard. By combining text, images with some basic interactivity you can create little packages of hypertext that you can publish for anyone to use. You could also explain Downpour as a tool … Continue reading I made a Downpour…game?
Doom WAD Bot
I like to follow bots on social media. Not the ones posting spam, misinformation or trolling replies. The ones that post algorithmic art, content and other fun things that brighten up your timeline. Twitter used to have a great community of bot builders but they destroyed that when they changed the API access. I used … Continue reading Doom WAD Bot
A basis for better definitions of “open”
There's been a lot of discussion around what is means to be "open" recently. I think this has largely been driven by issues and concerns around the development and deployment of Large Language Models and claims for at least some of those models to be "open". What does it mean for an LL or other … Continue reading A basis for better definitions of “open”
About / Ideas / Now
I think I discovered this project via Steve Messer, so hat tip to him if so. Or even if not, as he's a nice guy. aboutideasnow.com is a neat idea to index personal websites and specifically three pages: /about - which is about how people see themselves and a look at the past /now - … Continue reading About / Ideas / Now
What datasets have been classified as Digital Public Goods?
Update: 2024-04-14, I've updated this post with some corrections. See below A couple of years ago I wrote a short series of posts looking at some different approaches for assessing data infrastructure. It includes this post on the Digital Public Goods standard and registry. Digital Public Goods are defined as: open-source software, open data, open … Continue reading What datasets have been classified as Digital Public Goods?