When I’m discussing business models around open data I regularly refer to a few different examples. Not all of these have well developed case studies, so I thought I’d start trying to capture them here. In this first write-up I’m going to look at Discogs.
In an attempt to explore a few different aspects of the service I’m going to:
- describe the service: a brief set of background Discogs, what it is and how its developed
- characterise its data ecosystem using the roles outlined in “Beyond Publishers and Consumers“
- review it as a data infrastructure, using the model I outlined in “Sustainable Open Data Curation Projects“, the ODI’s definition and Elinor Ostrom’s principles.
How well that will work I don’t know, but lets see!
Discogs: the service
Discogs is a crowd-sourced database about music releases: singles, albums, artists, etc. The service was launched in 2000. In 2015 it held data on more than 6.6 million releases. As of today there are 7.7 million releases. That’s a 30% growth from 2014-15 and around 16% growth in 2015-2016. The 2015 report and this wikipedia entry contain more details.
The database has been built from the contributions of over 300,000 people. That community has grown about 10% in the last six months alone.
The database has been described as one of the most exhaustive collections of discographical metadata in the world.
The service has been made sustainable through its marketplace, which allows record collectors to buy and sell releases. As of today there are more than 30 million items for sale. A New York Times article from last year explained that the marketplace was generating 80,000 orders a week and was on track to do $100 million in sales. Of which Discogs take an 8% commission.
The company has grown from a one man operation to having 47 employees around the world, and that the website has 20 million visitors a month and over 3 million registered users. So approximately 1% of users also contribute to the database.
In 2007 Discogs added an API to allow anyone to access the database. Initially the data was made available under a custom data licence which included attribution and no derivatives clauses. The latter encouraged reusers to contribute to the core database, rather than modify it outside of the system. This licence was rapidly dropped (within a few months, as far as I can tell) in favour of a public domain licence. This has subsequently transitioned to a Creative Commons CC0 waiver.
The API has gone through a number of iterations. Over time the requirement to use API keys has been dropped, rate limits have been lifted and since 2008 full data dumps of the catalogue have been available for anyone to download. In short the data has been increasingly open and accessible to anyone that wanted to use it.
Wikipedia lists a number of pieces of music software that uses the data. In May 2012 Discogs and The Echo Nest both announced a partnership which would see the Discogs database incorporated into Echo Nest’s Rosetta Stone product which was being sold as a “big data” product to music businesses. It’s unclear to me if there’s an ongoing relationship. But The Echo Nest were acquired by Spotify in 2014 and have a range of customers, so we might expect that the Discogs data is being used regularly as part of their products.
Discogs: the data ecosystem
Looking at the various roles in the Discogs data ecosystem, we can identify:
- Steward: Discogs is a service operated by Zink Media, Inc. They operate the infrastructure and marketplace.
- Contributor: The team of volunteers curating the website as well as the community support and leaders on the Discogs team
- Reusers: The database is used in a number of small music software and potentially by other organisations like Echo Nest and their customers. Some more work required here to understand this aspect more
- Aggregator: Echo Nest aggregates data from Discogs and other services, providing value-added services to other organisations on a commercial basis. Echo Nest in turn support additional reusers and applications.
- Beneficiaries: Through the website, the information is consumed by a wide variety of enthusiasts, collectors and music stores. A larger network of individuals and organisations is likely supported through the APIs and aggregators
Discogs: the data infrastructure
To characterise the model we can identify:
- Assets: the core database is available as open data. Most of this is available via the data dumps, although the API also exposes some additional data and functionality, including user lists and marketplace entries. It’s not clear to me how much data is available on the historical pricing in the marketplace. This might not be openly available, in which case it would be classified as shared data available only to the Discogs team.
- Community: the Contributors, Reusers and Aggregators are all outlined above
- Financial Model: the service is made sustainable through the revenue generated from the marketplace transactions. Interestingly, originally the marketplace wasn’t a part of the core service but was added based on user demand. This clearly provided a means for the service to become more sustainable and supported growth of staff and office space.
- Licensing: I wasn’t able to find any details on other partnerships or deals, but the entire data assets of the business are in the public domain. It’s the community around the dataset and the website that has meant that Discogs has continued to grow whilst other efforts have failed
- Incentives: as with any enthusiast driven website, the incentives are around creating and maintaining a freely available, authoritative resource. The marketplace provides a means for record collectors to buy and sell releases, whilst the website itself provides a reference and a resource in support of other commercial activities
Exploring Discog as a data infrastructure using Ostrom’s principles we can see that:
- Does the community have clearly defined boundaries?
- Yes. The community is clearly defined. There are users and there are contributors
- The mission of Discogs as a whole is clearly articulated
- The community is public as is the level of their contributions
- Are there rules around how the community resources are used?
- The data is openly licensed
- There is clear guidance for contributors including full guidance on cataloguing different types of data
- There is reasonably good API documentation and a dedicated support forum
- Are there arrangements to support decision making?
- Is there effective monitoring by moderators who are accountable to the community?
- There is a community support team that provides support
- The team documents their values and their social contract to the community.
- The contribution system has evolved over time
- The site publishes its statistics and there is a transparent annual report on growth of the dataset.
- Are there clear sanctions for abuse or misuse?
- There is a code of conduct for contributions
- There is guidance on conflict resolution in the marketplace
- Accounts can be suspended on a temporary or permanent basis.
- Are there simple easy approaches to conflict resolution?
- Conflicts and community discussions are all managed via public forums and marketplace order pages
- Is the self-determination of the community recognised more widely?
- The dataset and community has a clear legal owner who is able to contract services, staff, etc as required
- How does it interact with other initiatives?
- Not clear, although there is a great deal of transparency and openness on display
While it is hard to assess any community from the outside, the fact that both the marketplace and contributor communities are continuing to grow suggests that these measures are working.
I’ll leave this case study with the following great quote from Discog’s founder, Kevin Lewandowski:
“See, the thing about a community is that it’s different from a network. A network is like your Facebook group; you cherrypick who you want to live in your circle, and it validates you, but it doesn’t make you grow as easily. A web community, much like a neighborhood community, is made up of people you do not pluck from a roster, and the only way to make order out of it is to communicate and demonstrate democratic growth, which I believe we have done and will continue to do with Discogs in the future.”
If you found this case study interesting and useful, then let me know. It’ll encourage me to do more. I’m particularly interested in your views on the approach I’ve taken to capture the different aspects of the ecosystem, infrastructure, etc.