Reputation data portability

Yesterday I went to the ODI lunchtime lecture on portability of reputation data. It was an interesting discussion which triggered a few thoughts which I thought I’d share here.

The debate was prompted by a call for evidence from the Department formally known as BIS around consumer data and account switching:

“The government would like to understand whether the reputation data earned by a user on a particular platform could be used to help them win business or prove their trustworthiness in other contexts. We would also be interested in views on the technical and other challenges that would be associated with making this reputation data portable”

The consultation includes this question:

“What new opportunities or risks for businesses, workers and consumers would be created if they were able to port their reputation and feedback data between platforms?”

It also asks about the barriers that might hinder this type of portability.

One useful way to answer these questions is to break them down into smaller pieces:

  1. Should consumers be able to access data they’ve contributed in a platform?
  2. Should businesses be able to access data about them in a platform, e.g. reputation data such as reviews
  3. Should businesses and consumers be able to move this this data between platforms?
  4. Should it be permitted for that data to be reused by others, e.g. in competing platforms?

The first two questions are about exporting data.

The third and fourth questions are really about portability and data licensing.

I would say that broadly the answers to all these questions is: Yes.

I think consumers and businesses should be able to access this data and, further, that it should be machine-readable open data. They should also be able to access any of their personal data held in the platform, but this isn’t really an area of debate. The new EU GDPR regulations requires platforms to provide you with your data if you request it, although it doesn’t (to my knowledge) require it to be in a machine-readable reusable form.

I think this also answers the last question, the data should be reusable. However I expect there to be resistance from platforms as where this type of data is currently made available it is done so under non-open terms. For example, via API agreements that prohibit some forms of reuse, such as use in a competing service.

The question on portability is trickier though. While I think  that portability is something to aspire to, in practice it is going to be difficult to achieve.

Portability requires more than just creating a data standard to enable export and import of data, or APIs that enable more dynamic synchronisation. I think that’s the easy part.

Portability would also require platforms to agree or converge around how reputation data is collected and calculated. It’s no good moving data from one system to another if they have incompatible definitions. There are many ways in which platforms might differ:

  • They can use a different rating scheme, e.g. 5 stars, 10 stars, or just “likes”
  • They might allow, or require, a text review in addition to a rating
  • They can allow anonymous reviews or require users to make themselves known
  • They can allow anyone to review any service or business (e.g. TripAdvisor, Amazon product reviews), or they can enforce that reviews are only made when there has been evidence of a transaction (e.g. rating a supplier on EBay or Amazon)
  • Related, they might allow both forms of review, but distinguish those that are based on a transaction
  • Or they may not allow explicit reviews at all and measure reputation in some other way (e.g. completing transactions within an expected time period, or number of sales made)
  • …etc, etc

And this is without even getting into the weeds of what users think they are reviewing. For example are you reviewing the restaurant, the service you received on a specific visit, the menu choice with respect to your preferences, or perhaps even a specific person that delivered that service. I think we’ve all seen examples of all of those variations even within single platforms.

XKCD has nicely summarised a variety of issues with rating systems in these three cartoons. And we shouldn’t forget the creative ways in which review systems get repurposed.

It’s important to highlight here that this type of variation doesn’t really occur with data like banking transactions, utility bills, etc. I tend to think portability there is much easier to achieve. These is variation, but this is typically around charging models not in the meaning and method of collection of the data.

Is all the variation in rating and review schemes warranted? Perhaps not. Some convergence might actually be useful. But these variations are also likely to be key parts of the user experience and functionality of the platform. So I’m personally very wary about restricting product developers in innovating in this area.

In my view rather than focusing on portability, we should be asking for this data to be published as open data. This will then open the possibility for the data to be aggregated and presented across platforms.

Enabling the creation of aggregated reference points for reputation data may be more practical that requiring true portability across platforms. In fact we have models for this already: price comparison sites and credit reference agencies. In fact if these data becomes more open it seems likely that credit agencies will be the first to benefit from it.

The state of open licensing

I spend a lot of time reading through licences and terms & conditions. Much more so than I thought I would when I first started getting involved with open data. After all, I largely just like making things with data.

But there’s still so much data that is public but not open. Or datasets that are nearly open but which have been wrapped in awkward additional terms. And still plenty of confusion about what open data actually is, as Andy Dickinson highlighted yesterday.

And yet the open data licensing choices really aren’t that hard, you can get the essential choices in a tweet.

Resolving this is just going to take more time, education and patient explanation of the benefits and disadvantages of different licensing models.

But I’ve been wondering about what direction we’re moving in with regards to licensing.

Reducing friction

Since the release of 4.0 series of creative commons licences we’ve had a standard, globally applicable set of terms that allow us to openly licence all forms of creative works and datasets. I really don’t see any reason to continue to use the Open Database Licence and I would love the maintainers to either clarify the continued role it plays or acknowledge that it’s deprecated and discourage its use.

The UK Open Government Licence (OGL) has spawned a variety of national licences. But, now that it is interchangeable with CC-BY 4.0, its continued existence also seems largely unnecessary. Governments currently without a standard national licence are better off adopting CC-By 4.0 than creating another fork of the OGL.

There may be good reasons for retaining the OGL and I’d be interested in hearing them if anyone has opinions. But it feels like we might continue to simplify the licensing landscape by planning for it to become obsolete.

I continue to wrestle with the fact that I’m becoming an open data pedant. (And maybe I am!) But I feel these are issues that are important to pay attention to, if only to follow evolving best practice.

That said, I’m convinced that any friction around licensing can potentially hamper the reuse of open data. So I think its something to recognise and remove wherever possible. The more the commons is used, the more value will be unlocked. And this will help it grow, not just by increasing contribution, but also through increasing investment, so we can have a proper open data infrastructure for society as a whole.

And friction not only hampers reuse it also slows publication of new data. I know from experience that confusion around appropriate licences are a common area of uncertainty for publishers. Especially commercial publishers who are concerned about the risks of adopting open licences rather than using custom terms which are within their comfort zone.

As specific licencing frameworks and model terms and conditions become embedded, they will be harder to remove later. It’s important to not overlook the impacts of bespoke terms.

Evolving practice

For example its interesting to see how, for example, the OpenOpp terms borrow heavily from those of OpenCorporates. As a successful open data business its not surprising that OpenCorporates is being used as an exemplar.

But, in my opinion, the OpenCorporates terms have some niggling issues. Firstly there is the specific requirement around how attribution must be presented (font sizes, and not just a text and a link), coupled with the requirement that anyone re-publishing the data must ensure that downstream users also conform with those requirements. That’s really not dissimilar to the custom attribution requirements that were present in the Ordnance Survey’s original fork of the OGL.

The open data community has campaigned at length to convince governments that they should, at most, require simple attribution statements from re-users of its data. I don’t think its a positive move for that same data to begin accumulating new terms and licences within its first few steps into the ecosystem.

That said, the more concerning way in which practice may evolve is by stepping away from open licensing entirely. That goes hand-in-hand with the increasing interest and reference to “data markets” which I’ve encountered from many city-based initiatives. I’ve already written at length about my thoughts on the Copenhagen marketplace and I’m hoping London isn’t going in the same direction.

Elsewhere though, I see promising progress. The scientific research community has long been converging on CC0 (public domain) for its data and CC-BY for its content. CC0 avoids problems with attribution stacking and that community has long had social norms that encourage recognition of sources, without requiring it through a licensing regime.

But that practice isn’t yet so commonplace elsewhere. Even though it part and parcel of being a good re-user. The visible impact of open data and content is a tide that raises all boats. If you call yourself an open data start-up you should have be able to proudly point to where your data sources are listed on your website.

I also read that the US may be adopting legislation that will ensure that its open government data remains in the public domain. This is fantastic. That change will also clarify that the data is in the public domain internationally. It’s currently unclear whether “public domain” actually means “public domain within the US”. It may be crystal clear to IP and Copyright lawyers but not necessarily to non-experts like myself, which is my general point.

I wonder whether the general trajectory will be as the EFF recommend, for more open data to be placed into the public domain? That would require a big step forward for many governments as well as established projects like OpenStreetMap. Large scale licensing changes of that form are tricky to co-ordinate. Realistically I don’t see it happening unless there are either major changes to the social norms around data reuse, or until we start bumping into compatibility issues between data from different communities.

That’s not entirely unlikely however. For example the prevalence of CC-BY and CC-BY-SA style licencing from the commercial and public sector is at odds with research norms that require raw and derived data to be placed into the public domain under a CC0 waiver. You can’t draw from one well and then add to the other. However, there are bigger issues to address first, as the recent OKCupid data release highlighted.

 

From services to products

Over the course of my career I’ve done a variety of consulting projects as both an employee and freelancer. I’ve helped found and run a small consulting team. And, through my experience leading engineering teams, some experience of designing products and platforms. I’ve been involved in a few discussions, particularly over the last 12 months or so, around how to generate repeatable products off the back of consulting engagements.

I wanted to jot down a few thoughts here based on my own experience and a bit of background reading. I don’t claim to have any special insight or expertise, but the topic is one that I’ve encountered time and again. And as I’m trying to write things down more frequently, I thought I’d share my perspective in the hope that it may be useful to someone wrestling with the same issues.

Please comment if you disagree with anything. I’m learning too.

What are Products and Services?

Lets start with some definitions.

A service is a a bespoke offering that typically involves a high-level of expertise. In a consulting business you’re usually selling people or a team who have a particular set of skills that are useful to another organisation. While the expertise and skills being offered are common across projects, the delivery is usually highly bespoke and tailored for the needs of the specific client.

The outcomes of an engagement are also likely to be highly bespoke as you’re delivering to a custom specification. Custom software development, specially designed training packages, and research projects are all examples of services.

A product is a packaged solution to a known problem. A product will be designed to meet a particular need and will usually be designed for a specific audience. Products are often, but not always, software. I’m ignoring manufacturing here.

Products can typically be rapidly delivered as they can be installed or delivered via a well-defined process. While a product may be tailored for a specific client they’re usually very well-defined. Product customisation is usually a service in its own right. As is product support.

The Service-Product Spectrum

I think its useful to think about services and products being at opposite ends of a spectrum.

At the service end of the spectrum your offerings are:

  • are highly manual, because you’re reliant on expert delivery
  • are difficult to scale, because you need to find the people with the skills and expertise which are otherwise in short supply
  • have low repeatability, because you’re inevitably dealing with bespoke engagements

At the product end of the spectrum your offerings are:

  • highly automated, because you’re delivering a software product or following a well defined delivery process
  • scalable, because you need fewer (or at least different) skills to deliver the product
  • highly repeatable, because each engagement is well defined, has clear life-cycle, etc.

Products are a distillation of expertise and skills.

Actually, there’s arguably a stage before service. Lets call those “capabilities” to borrow a phrase. These are skills and expertise that you have within your team but which you’ve not yet sold. I think it’s a common mistake to draw up lists of capabilities, rather than services or products.

The best way to test whether your internal capabilities are useful to others is to speak to as many potential customers as possible. And one of the best ways to develop new products is to undertake a number of bespoke engagements with those customers to understand where the opportunities lie for creating a repeatable solution. Many start-ups use consulting engagements as discovery tools.

Why Productise?

There are many obvious reasons why you’d start to productise a service:

  • to allow your business to scale. Consulting businesses can only scale with people, product businesses can scale to the web.
  • to make your engagements more repeatable, so that you can deliver a consistent quality of output
  • to distil learning and expertise in such a way as to support the training and development of junior staff, and grow the team
  • to ensure business continuity, so you’re less reliant on individual consultants
  • to reduce costs, by allowing more junior staff to contribute to some or all of an engagement. Check-lists, standard processes and internal review stages providing the appropriate quality controls
  • to focus on a specific market. Tailoring your service to a specific sector can help target your sales and marketing effort
  • to more easily measure impacts. Products solve problems and, when manifested as software, can be instrumented to collect metrics on usage and hopefully impacts.

Because they have a bounded scope, products are easier to optimise to maximise revenue or impacts. Or both.

A Product Check-list

By my definition above, a product will:

  1. solve a specific well-defined problem
  2. be targeted at a specific customer or audience
  3. be deliverable via a well-documented process, which may be partially or completely automated
  4. be deliverable within a well-defined time scale
  5. be priced according to a tried and tested pricing model

If you can’t meet at least the first three of these criteria then I’d argue that what you have is still a bespoke service. And if you’ve not sold it at all then all you have is a capability or at best an idea.

Products evolve from client engagements.

Approaches to Productisation

Some organisations will be using consulting engagements as a means to identify user needs and/or as a means to fund development of a software product or platform.

But developing a product doesn’t necessarily involve building software, although I think some form of automation is likely to be a component of a more repeatable, productised service.

You might start productising a service simply by documenting your last engagement. The next time you do a similar engagement you can base it on your previous most successful project. As you continue you’re likely to iterate on that process to start to distil it into a check-list or methodology. Ideally the process should start from pre-sales and run through to final delivery.

There’s already lots been written about lean product development, the importance of adding metrics (which can include measure product process). And also about the care you need to take about extrapolating the needs of early adopters to later customers. I already feel like I’m doing stating the obvious here when there’s a wealth of existing product development literature, so we’ll skip over that.

But I’ll also note that there’s (of course!) a lot of overlap between what I’m outlining here and the discovery phase of service design. The difference is really just in how you’re being funded.

I’d argue that taking an iterative approach is important even for freelancers or small consulting firms. Even if your end goal isn’t a software product. It’s how you get better at what you do. Retrospectives, ideally involving the client, are another useful technique to adopt from agile practices.

But productisation also takes effort. You can iterate in small steps to improve, but you need to build in the time to do that. Even a small amount of reflection and improvement will pay dividends later.

“The Wizard of the Wash”, an open data parable

The fourth open data parable.

In a time long past, in a land far away, there was once a great fen. A vast, sprawling wetland filled with a richness of plants and criss-crossed with many tiny streams and rivers.

This fertile land was part of a great kingdom ruled by a wise and recently crowned king. The fen was home to a hardy and industrious people who made a living from fishing, cutting peat and gathering the rare herbs that sprouted amongst the verdant grasses.

At the time of this tale the new king was travelling across his lands to learn more about his people. In a certain area of the fen he expected to find a thriving town that had become widely renowned for the skills of its herbalists and fishermen.

Instead he came upon a ramshackle collection of makeshift huts and tents clinging to patches of dry ground. The dejected people living in these shelters had clearly fallen on hard times and were eking out a living on the verges of the fen. Nearby was what was clearly the ruins of their settlement. Houses had tumbled haphazardly into the waters. The broken remains were being picked over for materials to build shelters and provide wood for fires.

Speaking to a fisherman, the king asked “What terrible disaster has befallen your village? How have you good people been brought so low?”

While continuing his task of mending a fishing net, the fisherman proceeded to tell the following tale:

“Our town has grown slowly over the years, sire. We live a hard life in the fens, and building on this treacherous land takes great care. For years our people were limited to building on isolated patches of stable ground. Our original village clung tightly to the patches of rock hidden just beneath the surface of these waters.

Until we made our pact with the Wizard of the Wash.

One day the Wizard came to us and demonstrated his great magicks. Showing how his powers could be used to drive great wooden piles deep into the peat. Deep enough to reach the bedrock and let us build wherever we wished. We would need only ask the Wizard to create a stable footing and we could build wherever we chose. In return, and to complete our pact, we need only to collect for him the rarest herbs and plants for his research. An easy task for us as we have long known the secrets of the fen.

And so for many years we have prospered. Each year we have planned out where we would build our new houses and workshops. And pointed to where we needed new roads, inns and store houses. And each year the Wizard would oblige us with his magicks. The town has spread across the fen and we great started to grow rich from trade.

But then things began to change.

In the beginning the Wizard refused to drive new piles in a few places. He explained that he was concerned that the buildings may hinder certain herbs which grew in that area. And we followed his wishes for there were other places to build.

And so this continued. Each year the Wizard would reject some of our plans or convince us to change them for his own ends. For example where we once had planned a school he instead convinced us to build a new dock for his supply boats. Disappointed, we again submitted to his wishes, for we still needed to build and there was still space aplenty elsewhere. As traders we had grown accustomed to compromise.

But then the Wizard began to visit us more frequently, demanding to review in more detail our plans. He objected to certain buildings being extended as they blocked views that he enjoyed. He began to refuse to build in ever more locations and expressed opinions about how the town should grow.

Once he even required us to dismantle several houses so that we might build a better inn for him to stay in during his visits. He threatened to simply remove the foundations if we didn’t comply. In return he choose to drive in only a few new piles. As a result some families were forced to live in cramped and poorer lodgings. And what choice did we have but to comply?

In these last few years the Wizard has became ever more demanding. He has argued that these piles were his, had always been his, and that we have only been using them with his permission. If we were unhappy, he argued, we could simply return to building as we had before.

Sire, while these lands are ours and have been for many generations, we had gladly given ourselves over to a petty tyrant. Once the pact had been made it was easier to comply than to resist.

The final disaster happened a few months ago. The Wizard had long been growing old and unwell. One night he passed away whilst staying in our finest inn. And on that night all of his magicks were undone. And so our fine town suddenly fell back into the swamp.

And so, as you see, we were ruined.”

Sadden by the tale, the king realised that here was a people whose needs had long been overlooked, leaving them at the mercy of fickle powers. He resolved to help them rebuild.

On the spot he issued a decree for the Royal Engineers to provide assistance to any town, village or people that required help. His kingdom would be built on firm foundations.

Discussion document: archiving open data

This is a brief post to highlight a short discussion document that I recently published about archiving open data.  The document is intended to help gather ideas, suggestions and best practices around archiving open data to the Internet Archive. The goal being to gather together useful guidance that can help encourage archiving and distribution of open data from existing portals, frameworks, etc.

This isn’t an attempt to build a new standard, just encourage some convergence and activity. At present the guidance recommends building around the Data Package specification as it is simple and provides a well-defined unit (a zip file) for archiving purposes.

Archiving data can help build resilience in the open data commons providing backups of important data resources. This will help deal with:

  • Unexpected system outages that could take down data portals
  • Decisions by publishers to remove data previously published under an open licence, ensuring an original copy remains
  • Decisions by publishers to take down data
  • Services and portals permanently going offline

If you have thoughts or suggestions then feel free to add them to the document. It would particularly benefit from input from those in the archival community and especially those who are already familiar with working with the Internet Archive.

I hope to build a small reference implementation to illustrate the idea and help to archive the data from Bath: Hacked.

What 3 Words? Jog on mate!

The OpenAddresses.io website notes that “Address data is essential infrastructure“. Geography underpins so much of the data we collect and is collected about us, making address registers important parts of national data infrastructure.

In the UK we’ve been wrestling with the fact that our address register is not open for many years. After the decision to sell the register as part of the privatisation Royal Mail money has been spent on exploring the creation of an open alternative. But it’s looking positive that we may end up getting a free, open version albeit at the cost of another £5m.

What3Words is a UK startup that also recognises the importance of address registers. Their website notes that: “Poor addressing costs businesses billions of dollars and hampers the growth and development of entire nations.

The company has developed an algorithm to assign unique 3 word identifiers to the entire world, creating a global addressing system. The website does a great job of explaining why improving addresses globally is important and highlights the benefits it can bring.

The problem is that What3Words is a proprietary, closed system. The algorithm is patented. The data is closed, with the terms and conditions spelling out in great detail all of the things you can’t do with the system, including:

  • You must not pre-fetch, cache, index, copy, re-utilise, extract or store any what3words Data
  • You may store What3words Data solely for the purpose of improving Your implementation of the API into Your Product provided that such storage: (i) is temporary (and in no event lasts for more than 30 calendar days), (ii) is limited to an amount of What3words Data which is strictly required to improve Your API implementation, (iii) is secure, and (iv) shall in no event enable You or a third party to use the what3words Data outside of Your Products, in any way, or to re-utilise or extract such data
  • For the avoidance of doubt, You must not use any what3words Data (whether accessed from the API or otherwise) for any purposes not expressly permitted under this Agreement, including for Your own use or for distribution, licence or sale to any third-party
  • ..etc, etc

These are all characteristics that help to make What3Words a good prospect for investment: all the defensive walls are in place to protect their intellectual property.

But these are also all characteristics that make What3Words completely unsuitable as either a global or national address register. So I was dismayed to read that Mongolia have decided to adopt it as their national register. I’m hoping that this isn’t really the case and that story is similar to the apocryphal tales of Honduras’s blockchain based land registry.

Clearly Mongolia is in need of a better data infrastructure and I can understand why a system like What3Words would be attractive. But I think the closed nature of the platform makes it a poor foundation for future growth. While the service might be great for parcel delivery, address and location information is used in so many other ways.

The licensing restrictions mean that its not possible to publish open data to help shed transparency on land ownership, report on crisis mapping, collect and process census or other statistics, and a myriad of other use cases. You can’t even store the data for your own re-use, other than on a temporary basis.

With this in mind I’d find it hard to recommend that any organisation collecting and sharing data should use What3Words. Otherwise the keys to your dataset are tied up with the intellectual property and API licensing of a third party. With terms that can be changed at any time. NGOs and other organisations hoping to publish open data about their activities should approach the service with a great deal of caution.

The fix for all this would be simple: What3Words could publish their data and algorithm under an open licence. I think that’s unlikely though.

Being an idealist I’d like to think that more data startups will start to recognise their role in contributing to a global commons and design products accordingly. And perhaps what we need is not more startup incubators, but institutions that will support the creation of data infrastructure that builds a more open future.

Beyond Publishers and Consumers: Some Example Ecosystems

Yesterday I wrote a post suggesting that we should move beyond publishers and consumers and recognise the presence of a wider variety of roles in the open data ecosystem. I suggested a taxonomy of roles as a starting point for discussion.

In this post I wanted to explore how we can use that taxonomy to help map and understand an ecosystem. Eventually I want to work towards a more complete value network analysis and some supporting diagrams for a few key ecosystems. But I wanted to start with hopefully simple examples.

As I’ve been looking at it recently I thought I’d start by examining Copenhagen’s open data initiative and their city data marketplace.

What kind of ecosystems do those two programmes support?

The copenhagen open data ecosystem

The open data ecosystem can support all of the roles I outlined in my taxonomy:

  • Steward: The city of Copenhagen is the steward of all (or the majority of) the datasets that are made available through its data platform, e.g. the location of parking meters
  • Contributor: The contributors to the dataset are the staff and employees of the administration who collect and then publish the data
  • Reuser: Developers or start-ups who are building apps and services, such as I Bike CpH using open data
  • Beneficiary: Residents and visitors to Copenhagen

Examples of the tangible value being exchanged here are:

  • (Steward -> Reuser) The provision of data from the Steward to the Reuser
  • (Reuser -> Beneficiary) The provision of a transport application from the Reuser to the Beneficiary

Examples of the intangible value are:

  • (Contributor -> Steward) The expertise of the Contributors offered to the Steward to help manage the data
  • (Beneficiary -> Reuser) The market insights gained by the Reuser which may be used to create new products
  • (Reuser -> Steward) The insights shared by the Reuser with the Steward into which other datasets might be useful to release or improve

In addition, the open licensing of the data enables two additional actors in the ecosystem:

  • Intermediaries: who can link the Copenhagen data with other datasets, enrich it against other sources, or offer value added APIs. Services such as TransportAPI.
  • Aggregators: e.g. services that aggregate data from multiple portals to create specific value-added datasets, e.g. an aggregation of census data

In this case the Intermediaries and Aggregators will be supporting their own community of Reusers and Beneficiaries. This increases the number of ways in which value is exchanged.

The copenhagen city data marketplace

The ecosystem around the city data marketplace is largely identical to the open data ecosystem. However there are some important differences.

  • Steward: The city of Copenhagen is not the only Steward, the goal is to allow other organisations to publish their data via the marketplace. The marketplace will be multi-tenant.
  • Intermediary: the marketplace itself has become an intermediary, operated by Hitachi
  • The ecosystem will have a greater variety of Contributors, reflecting the wider variety of organisations contributing to the maintenance of those datasets.
  • Reusers and Beneficiaries will be present as before

In addition, because the marketplace offers paid access to data, there are other forms of value exchange, e.g. exchange of money for services (Reuser -> Intermediary).

But the marketplace explicitly rules out the Intermediary and Aggregator roles. Services like TransportAPI or Geolytix could not build their businesses against the city data marketplace. This is because the terms of use of the market prohibit onward distribution of data and the creation of potentially competitive services.

In an effort to create a more open platform to enable data sharing, the result has been to exclude certain types of value exchange and value-added services. The design of the ecosystem privileges a single Intermediary: in this case Hitachi as operator of the platform.

Time will tell whether this is an issue or not. But my feeling is that limiting certain forms of value creation isn’t a great basis for encouraging innovation.

An alternative approach would be to have designed the platform to be part of the digital commons. For example, by allowing Stewards the choice of adding data to the platform under an open licence would give space for other Intermediaries and Aggregators to operate.

Let me know if you think this type of analysis is useful!