Shelley Powers has posted a list of concerns that she has with FOAF as currently defined and used:
Practical RDF: Danger, Will Robinson: Part 2
To summarise, these are:
- Inappropriate use of Myers Briggs property
- What does foaf:knows mean?
- Social implications of foaf:knows
- Publishing data about other people
For more details on each see Shelley’s post.
These set me to wondering about these concerns might be addressed. Here’s some thoughts aimed at promoting discussion only. I’m not suggesting I’ve thought of all the angles. Take ’em as a starter for ten.
Concern 1: Ditch the Myers Briggs property. Personally I’m not wedded to it, so if its uncontextualised use is wrong, then lets remove it. It’s only in “testing” phase anyway.
Concern 2: The FOAF spec has quite a lot to say about foaf:knows. What I take away from reading the property description is that it doesn’t imply friendship, merely that one person is stating (some form) of relationship. It would be easy, and wrong, to infer more from that. But perhaps the relationship naming leads one to that confusion. Should we rename the property? Not sure, perhaps the docs just need to be more explicit.
I’m a bit uncomfortable about the “reciprocated interaction” part myself, though. I’m not sure that the reciprocal link can be correctly inferred. I do think that we need to be able to more fully define relationships (e.g. kinship) and that those relationships might not be visible to all.
Concern 3: Shelley’s third concern concern is over the social implications of foaf:knows and therefore I think (perhaps wrongly, we’ll see) that there’s not a great deal to be done in the FOAF spec or tools about this one. Other than perhaps clarifying or protecting statements about relationships as noted above.
Whenever you publish any data on the web, even simply posting to a mailing list with a public archive, you’re releasing information that could be used to profile you. I could build a tool to smush together information about mailboxes from mail archives without FOAF. FOAF just makes this data, and the act of publication, more explicit. FOAF is therefore a showcase for the kind of issues that are going to be increasingly important in the future (now, even); it’s not necessarily a problem for FOAF to solve, IMO.
Here’s another example of how information about your interests and tastes can bleed out onto the net. Take a file-sharing application like Kazaa. This sets up a publically readable (to other users of the application anyway) directory on your hard disk that contains the files you’ve downloaded. You’re encouraged to keep the files there so that others can share them through the peer-to-peer network. Kazaa stores metadata about each file, e.g. title, author, categories, etc. Have you considered that this directory of files provides a lot of information about you? The music, films, etc that you’re interested in? Simple enough to harvest those directory listings, and smush them with IP addresses collected from web site visits.
This is not to divert attention from Shelley’s comments, just to illustrate the often subtle ways that information can creep out. That’s why you need a big moat around your computer.
Concern 4: Publishing information about other people. You shouldn’t do this, for all the reasons that Shelley points out. Take care to only publish data that you own. Point to your friend’s FOAF file, if they have one, to let them make their own statements. In fact, only list them in a foaf:knows if they have a FOAF file. Again, I’m going to suggest that this is a documentation issue to make it clear why it’s bad practice to do this, and hopefully tool authors will take note.
However one solution to this is already built-in to RDF datastores and that’s the concept of attribution: recording who said what, not just what was said. So the situation isn’t entirely broken even if people persistent in this practice.
There you go, thats my 2p worth.
FOAF confluence
Today Shelley Powers and Leigh Dodds posted regarding the social aspects of information, in particular, the possibility that someone would build an information smushing tool.
Whenever you publish any data on the web, even simply posting to a maili