What is an Open API?

I was reading a document this week that referred to an “Open API”. It occurred to me that I hadn’t really thought about what that term was supposed to mean before. Having looked at the API in question, it turned out it did not mean what I thought it meant. The definition of Open API on Wikipedia and the associated list of Open APIs are also both a bit lacklustre.

We could probably do with being more precise about what we mean by that term, particularly in how it relates to Open Source and Open Data. So far I’ve seen it used in several different ways:

  1. An API that is free for anyone to use — I think it would be clearer to refer to these as “Public APIs”. Some may require authentication, some may only have a limited free tier of usage, but the API is accessible to anyone that wants to use it
  2. An API that is backed by open data — the data that is extracted by the API is covered by an open licence. A Public API isn’t necessarily backed by Open Data. While it might be free for me to use an API, I may be limited in how I can use the data by API terms and/or a non-open data licence that applies to the data
  3. An API that is based on an open standard — the data available via an API might not be open, but the means of accessing and querying the data is covered by a specification that has been created by a standards body or has otherwise be openly published, e.g. the specification of the API is covered by an open licence. The important thing here is that the API could be (re-)implemented in an open source or commercial product without infringing on anyone’s rights or intellectual property. The specification of APIs that serve open data aren’t necessarily open. A commercial vendor may provide a data publishing service whose API is entirely proprietary.

Personally I think an Open API is one that meets that final definition.

These are important distinctions and I’d encourage you to look at the APIs you’re using or the API’s you’re publishing and considering into which category they fall. APIs built on open source software typically fall into the third category: a reference implementation and API documentation are already in the open. It’s easy to create alternate versions, improve an existing code base, or run a copy of a service.

While the data in a platform may be open, lock-in (whether planned or otherwise) can happen when APIs are proprietary. This limits competition and the ability for both data publishers and consumers to choose other vendors. This is also one reason why APIs shouldn’t be the default for open government data: at some level the raw data should be portable and useful outside of whatever platform the organisation may choose to deploy. Ideally platforms aimed at supporting open government data publishing should be open source or should, at the very least, openly licence their API documentation.

Its about more than the link

To be successful the web sacrificed some of the features of hypertext systems. Things like backwards linking and link integrity, etc. One of the great things about the web is that its possible to rebuild some of those features, but in a distributed way. Different communities can then address their own requirements.

Link integrity is one of those aspects. In many cases link integrity is not an issue. Some web resources are ephemeral (e.g. pastebin snippets), but others — particularly those used and consumed by scholarly communities — need to be longer lived. CrossRef and other members of the DOI Foundation have been successfully building linking services that attempt to provide persistent links to material references in scholarly research, for many years.

Yesterday Geoff Bilder published a great piece that describes what CrossRef and others are doing in this area, highlighting the different communities being served and the different features that the services offer. Just because something has a DOI doesn’t necessarily make it reliable, give any guarantees about its quality, or even imply what kind of resource it is; but it may have some guarantees around persistence.

Geoff’s piece highlights some similar concerns that I’ve had recently. I’m particularly concerned that there seems to be some notion that for something to be citeable it must have a DOI. That’s not true. For something to be citeable it just needs to be online, so people can point at it.

There may be other qualities we want the resource to have, e.g. persistence, but if your goal is to share some data, then get it online first, then address the persistence issue. Data and content sharing platforms and services can help there but we need to assess them against different criteria, e..g whether they are good publishing platforms, and separately whether they can make good claims about persistence and longevity.

Assessing persistence means more than just assessing technical issues, it means understanding the legal and business context of the service. What are its terms of service? Does the service have any kind of long term business plan that means it can make viable claims about longevity of the links it produces, etc.

I recently came across a service called perma.cc that aims to bring some stability to legal citations. There’s a New York Times article that highlights some of the issues and the goals of the service.

The perma.cc service allows users to create stable links to content. The content that the links refer to is then archived so if the original link doesn’t resolve then users can still get to the archived content.

This isn’t a new idea: bookmarking services often archive bookmarked content to build personal archives; other citation and linking services have offered similar features that handle content going offline.

It’s also not that hard to implement. Creating link aliases is easy. Archiving content is less easy but is easily achievable for well-known formats and common cases: it gets harder if you have to deal with dynamic resources/content, or want to preserve a range of formats for the long term.

It’s less easy to build stable commercial entities. It’s also tricky dealing with rights issues. Archival organisations often ensure that they have rights to preserve content, e.g. by having agreements with data publishers.

Personally I’m not convinced that perma.cc have nailed that aspect yet. If you look at their terms of service (PDF, 23rd Sept 2013), I think there are some problems:

You may use the service “only for non-commercial scholarly and research purposes that do not infringe or violate anyone’s copyright or other rights“. Defining “non-commercial” use is very tricky, it’s an issue with many open content and data licenses. One might argue that a publisher creating perma.cc links is using it for commercial purposes.

But I find Section 5 “User Submitted Content and Licensing” confusing. For example it seems to suggest that I either have to own the content that I am creating a perma.cc link for, or that I’ve done all the rights clearance on behalf of perma.cc.

I don’t see how that can possibly work in the general case. Particularly as you must also grant perma.cc a license to use the content however they wish. If you’re trying to build perma.cc links to 3rd party content, e.g. many of the scenarios described in the New York Times article, then you don’t have any rights to grant them. Even if its published under an open content license you may not have all the rights they require.

They also reserve the right to remove any content, and presumably links, that they’re required to remove. From a legal perspective this makes some sense, but I’d be interested to know how that works in practice. For example will the perma.cc link just disappear or will there be any history available?

Perhaps I’m misunderstanding the terms (entirely possible) or the intended users of the service, I’d be interested in hearing any clarifications.

My general point here is not to be overly critical of perma.cc — I’m largely just confused by their terms. My pointis that bringing permanence to (parts of) the web isn’t necessarily a technical issue to solve, its one that has important legal, social and economic aspects.

Signing up to a service to create links is easy. Longevity is harder to achieve.

Thoughts on Coursera and Online Courses

I recently completed my first online course (or “MOOC“) on Coursera. It was an interesting experience and wanted to share some thoughts here.

I decided to take an online course for several reasons. Firstly the topic, Astrobiology, was fun and I thought the short course might make an interesting alternative to watching BBC documentaries and US TV box sets. I certainly wasn’t disappointed as the course content was accessible and well-presented. As a biology graduate I found much of the content was fairly entry-level, but it was nevertheless a good refresher in a number of areas. The mix of biology, astronomy, chemistry and geology was really interesting. The course was very well attended, with around 40,000 registrants and 16,000 active students.

The second reason I wanted to try a course was because MOOCs are so popular at the moment. I was curious how well an online course would work, in terms of both content delivery and the social aspects of learning. Many courses are longer and are more rigorously marked and assessed, but the short Astrobiology course looked like it would still offer some useful insights into online learning.

Clearly some of my experiences will be specific to the particular course and Coursera, but I think some of the comments below will generalise to other platforms.

Firstly, the positives:

  • The course material was clear and well presented
  • The course tutors appeared to be engaged and actively participated in discussions
  • The ability to download the video lectures, allowing me to (re)view content whilst travelling was really appreciated. Flexibility around consuming course content seems like an essential feature to me. While the online experience will undoubtedly be richer, I’m guessing that many people are doing these courses in spare time around other activities. With this in mind, video content needs to be available in easily downloadable chunks.
  • The Coursera site itself was on the whole well constructed. It was easy to navigate to the content, tests and the discussions. The service offered timely notifications that new content and assessments had been published
  • Although I didn’t use it myself, the site offered good integration with services like Meetup, allowing students to start their own local groups. This seemed like a good feature, particularly for longer running courses.

However there were a number of areas in which I thought things could be greatly improved:

  • The online discussion forums very quickly became unmanageable. With so many people contributing, across many different threads, it’s hard to separate the signal from the noise. The community had some interesting extremes: people associated with the early NASA programme, through to alien contact and conspiracy theory nut-cases. While those particular extremes are peculiar to this course, I expect other courses may experience similar challenges
  • Related to the above point, the ability to post anonymously in forums lead to trolling on a number of occasions. I’m sensitive to privacy, but perhaps pseudonyms may be better than anonymity?
  • The discussions are divorced from the content, e.g. I can’t comment directly on a video I have to create a new thread for it in a discussion group. I wanted to see something more sophisticated, maybe SoundCloud style annotations on the videos or per-video discussion threads.
  • No integration with wider social networks: there were discussions also happening on twitter, G+ and Facebook. Maybe its better to just integrate those, rather than offer a separate discussion forum?
  • Students consumed content at different rates which meant that some discussions contained “spoilers” for material I hadn’t yet watched. This is largely a side-effect of the discussions happening independently from the content.
  • Coursera offered a course wiki but this seemed useless
  • It wasn’t clear to me during the course what would happen to the discussions after the course ended. Would they be wiped out, preserved, or would later students build on what was there already? Now that it’s finished it looks each course is instanced and discussions are preserved as an archive. I’m not sure what the right option is there. Starting with a clean slate seems like a good default, but can particularly useful discussions be highlighted in later courses? Seems like the course discussions would be an interesting thing to mine for links and topics, especially for lecturers

There are some interesting challenges with designing this kind of product. Unlike almost every other social application the communities for these courses don’t ramp up over time: they arrive en masse at a particular date and then more or less evaporate over night.

As a member of that community this makes it very hard to identify which people in the community are worth listening too and who to ignore: all of a sudden I’m surrounded by 16000 people all talking at once. When things ramp up more slowly, I can build out my social network more easily. Coursera doesn’t have any notion of study groups.

I expect the lecturers must have similar challenges as very quickly they’re faced with a lot of material that they might have to potentially read, review and respond to. This must present challenges when engaging with each new intake.

While a traditional discussion forum might provide the basic infrastructure for enabling the necessary basic communication, MOOC platforms need to have more nuanced social features — for both students and lecturers — to support the community. Features that are sensitive to the sudden growth of the community. I found myself wanting to find out things like:

  • Who is posting on which topics and how frequently?
  • Which commentators are getting up-voted (or down-voted) the most?
  • Which community members are at the same stage in the course as me?
  • Which community members have something to offer on a particular topic, e.g. because of their specific background?
  • What links are people sharing in discussions? Perhaps filtered by users.
  • What courses are my fellow students undertaking next? Are there shared journeys?
  • Is there anyone watching this material at the same time?

Answering all of these requires more than just mining discussions but it feels like some useful metrics could be nevertheless. For example, one common use of the forums was to share additional material, e.g. recent news reports, scientific papers, you tube videos, etc. That kind of content could either be collected in other ways, e.g. via a shared reading list, or as a list that is automatically surfaced from discussions. I ended up sifting through the forums and creating a reading list on readlists, as well as a YouTube playlist just to see whether others would find them useful (they did).

All of these challenges we can see playing out in wider social media, but with a MOOC they’re often compressed into relatively short time spans.

(Perhaps inevitably) I also kept thinking that much of the process of creating, delivering and consuming the content could be improved with better linking and annotation tools. Indeed, do we even need specialised MOOC platforms at all? Why not just place all of the content on services like YouTube, ReadLists, etc. Isn’t the web our learning infrastructure?

Well I think there is a role for these platforms. The role in certification — these people have taken this course — is clearly going to become more important, for example.

However I think their real value is in marking out a space within which the learning experience takes place: these people are taking this content during this period. The community needs a focal point, even if its short-lived.

If everything was just on the web, with no real definition to the course, then that completely dissolves the community experience. By concentrating the community into time-boxed, instanced courses, it creates focus that can enrich the experience. The challenge is balancing unwieldy MOOC “flashmobs” against a more diffused internet community.

Google AppEngine for Personal Web Presence?

Some thinking aloud…
I’ve browsed through the Google App Engine gallery and the applications you can find there at the moment are pretty much what you’d expect: lots of Web 2.0 “share this, share that” sites. These are what you’d expect because firstly they’re the kind of simple application you’d build whilst exploring any new environment. Secondly because they’re exactly the kind of sites that are currently being released every which way you turn.
But for me App Engine is intriguing as it might provide an interesting new perspective on distributing shrink-wrapped packaged software. When Google take the lid off of the number of sign-ups, its going to be a simple matter for anyone to have their own App Engine environment. Forget cheap web hosting and the expensive and configuration overhead that that entails: just sign up for an App Engine account.
App Engine has the potential to provide an enormous number of people with a well-documented stable environment into which an application can be deployed.
It will be interesting to see if anyone seizes on App Engine as an opportunity to create a simple personal application that combines elements of all of the Web 2.0 favourites: bookmarks, blogging, calendar, photos, travel, and perhaps an OpenId provider. One that that makes me the administrator of all of my own data, but doesn’t scrimp on the options for other people to harvest, syndicate and browse what I’m uploading.
At the moment our online identities start out fragmented, because we have to push data into a number of different services. And then we strive for ways to bring that data together and knit it into other sites that we, or our social network, use.
But why not turn this on it’s head? And seize on App Engine as a way to avoid this early fragmentation and instead start out with a centralized, personal web presence; but one which seamlessly integrates with data in other spaces. The potential is in open data, and services that are built around it. So why aren’t we managing our own open data repositories and letting others offer us services against particular aspects of it?
The App Engine environment doesn’t involve any configuration on behalf of the end user, and I suspect you could probably create an App Engine Deployer using App Engine itself. So sign-up, deployment and upgrades could also be pretty straight-forward. Python seems well suited for creating a simple modular web application that could be extended to cover new areas as users needed.
Instead of using lots of different web applications, we can each have our own modular web application that is intimately linked into the web, and becomes the primary repository for the data you want on the web. Data portability follows from the fact that you’d be the administrator of your own data.
This would also change the nature of the kinds of applications that we’d need elsewhere on the web. Instead of lots of specialist databases, we need more generic services and more community/local/temporary aggregations.

Embracing the Wiki Way: Deploying a Corporate Wiki

This article originally appeared in Freepint Newsletter 210.

Wikis, currently one of the biggest buzzwords in online publishing, helped solve a problem for my company, Ingenta. We needed to share information between the research and engineering departments, and we needed a simple tool to manage our rapidly growing set of references on key research initiatives and topics relevant to Ingenta’s core business area: offering technology services to academic publishers. I had created a wiki for myself to support my research and development role back then, and it seemed natural to expand it into an intranet alternative that allowed Ingenta’s users to edit and contribute to content collectively.

Now, four years later, Ingenta’s wiki is extremely popular. It has grown from one department to the entire company. We have even created wikis to interact with our clients as an easy means of sharing information.

Many companies are exploring the use of wiki environments, pressing them into service behind the firewall as a way to capture knowledge and improve communications within a business. Creating successful social software systems isn’t an exact science. Case studies and experience reports provide essential background when considering the success factors.

Deploying a wiki involves more than just selecting and installing an appropriate software package. They’re quite different beasts to the typical enterprise groupware or intranet application. They eschew rigid notions of hierarchy and permissions, letting users quickly create and shape a knowledge-sharing environment that supports them. Wikis are social software. Creating a wiki environment is as much of an exercise in community building as it is in software installation.

With this in mind, the first section of this article outlines how I introduced the Ingenta wiki. My aim is to present some tips to help other organisations deploy a corporate wiki, and to give them advice on creating a wiki culture.

Establishing need: the Ingenta corporate wiki

Having undergone rapid growth through several acquisitions and a major re-engineering project that resulted in a new platform for our core products, Ingenta needed a way to quickly capture and share knowledge. Turnover of contract staff necessitated a good knowledge-capture environment. The infrastructure to support these needs had not grown as rapidly as the company itself. Information was often in silos created by various teams using different tools and technologies. Grander visions for a corporate groupware solution were still on the horizon, but the engineering department needed something more immediate.

The idea of a wiki environment especially for the engineering team was natural. Already comfortable with web-based environments, they were also capable of installing and maintaining their own wiki. But while their ability to quickly learn the wiki functionality certainly contributed to the rapid success of the experiment, the more critical issue was that the wiki met immediate needs.

The environment worked well across increasingly distributed teams. The barrier to entry to contribute to the wiki is very low; documentation could be added and maintained very easily. Finally, the team already had a need to pass documentation around for review and sign-off. Requesting and incorporating changes became much easier as the wiki captured discussions directly rather than being lost in email. Reviewers could correct text and check revisions using the wiki change history.

Expanding the experiment

The wiki became a formal part of the engineering process after its initial success. All deliverables are now authored directly as wiki pages. Engineers use wiki pages to list current work priorities and capture the requirements for each project and incorporate release and testing documentation. The wiki also links to other internal tools and information sources. For example, release documentation links directly to our web-based bug-tracking system.

The initial growth of the wiki was almost viral. With little evangelism, the tool gradually expanded its user base to the rest of the company. It became natural for other departments, such as product management, to begin using the wiki. Users required little training to get started, since writing a wiki is as easy as writing an email. They also increasingly used the wiki as a daily resource, as the content was already closely aligned to many existing business processes.

Knitting together other sources of information using the wiki proved simple. For example, our shared network folders are web accessible, as are a number disparate tools and documentation. It was easy to create an intranet page in the wiki and link to these resources, creating a simple resource directory.

Today the wiki is actively used by every department, with the exception of finance. Perhaps I can tempt them away from their spreadsheets with wikiCalc <http://www.softwaregarden.com/wkcalpha/>! A reasonable number of users actively contribute new content and update existing documentation, while a larger group of users simply use it as a reference resource.

We’re now evaluating whether we’ve outgrown our current wiki platform and are looking at possible alternatives.

Choose your wiki

The obvious first step is to select some wiki software to use. The two biggest features I consider essential in a wiki are version tracking and search. Strong search facilities become particularly important once your wiki reaches a certain size.

In all, there’s a huge number of different implementations <http://en.wikipedia.org/wiki/Wiki_software/> to choose from. These range from simple no-frills versions to complete content-management systems. We opted for JSPWiki <http://www.jspwiki.org/>, as it meshed well with our existing technology platform. Another popular wiki is MediaWiki <http://mediawiki.org/>, which currently supports the Wikipedia sites and has an active user community.

There is also an increasing range of enterprise wikis such as Socialtext <http://www.socialtext.com/>, Confluence <http://www.atlassian.com/software/confluence/> and JotSpot <http://www.jot.com/>. Each offers a good range of features and commercial support options. You’ll need to take time to evaluate and experiment with a few different options. Migration between platforms isn’t always easy, as many wikis differ in features and syntax.

Build your community

Next you need to start building your wiki community. Start small. Focus on one or two teams initially. The wiki will need shepherding through its infancy, so nominate someone as a champion to help train staff members and guide them on how to get the best from the environment.

The best training exercise is to simply encourage users to wade in and start writing pages. We initially promoted a ‘sandbox’, or personal homepage, as a safe environment to play with wiki editing. More recently we’ve been encouraging new joiners to create their initial wiki page as part of their induction. This gives them familiarity with the tool from day one.

You’ll find that many users don’t always feel comfortable with editing existing content. Using a sandbox lets them build confidence before embarking on contributing to the main content.

One technique to introduce users to aspects of the wiki syntax and subtly encourage the view that the wiki is a shared environment is to edit someone’s homepage yourself. For example, I might tweak the page to make their email address a hyperlink, or just improve the display of their personal information. Letting them know that anyone can freely edit and tidy the information in a wiki is the most important point for users to grasp. It’s also the one that takes the longest to learn.

Stay relevant

Attempt to find or suggest ways for your initial community to usefully apply the wiki. Ensuring that the wiki meets a need and has relevant content will encourage sustained usage. Here are a few of the different ways that I’ve observed the wiki being used at Ingenta:

  • As a user directory. Most of our staff have a personal homepage including their contact details and current assignments
  • As a personal notebook to capture to-do lists or useful personal notes
  • For recording minutes of meetings. Rather than write up and circulate meeting notes by email, we often now make notes directly into the wiki
  • Managing information on clients, both current and prospective
  • Brainstorming new product features
  • Publishing documentation for both internal users and external clients
  • Capturing technical documentation on our products and services
  • Creating glossaries of terms. Every company and industry has jargon; we often define terms as separate pages in the wiki, enabling links to be added to documentation for clarification.

Avoid attachments

Some die-hard users insist they can’t possibly live without a word processor and say that a means to attach Word documents or spreadsheets to wiki pages is an essential requirement. Attachments are a useful feature for attaching diagrams or additional documentation to a page, but you should discourage overuse of attachments. If the useful content is in an attachment, then it’s not in the wiki and not easily editable. That’s not the wiki way.

Lay down pathways

Initially, we divided our wiki into people and projects. Pages were also introduced for teams and departments. These pages provided a basic organizing principle that became the primary means of navigating through the wiki. A similar structure would work in any corporate wiki.

However, these initial pathways provide more than just navigation. A wiki grows by people adding new links and pages to existing content. Your initial structure provides a cue as to where new content could or should be added. By introducing this structure from the start, you’ll help avoid the wiki deteriorating into a morass of interlinked pages.

As your wiki grows you’ll need to continue to organize it to reflect the needs of users and the growing body of content.

Employ a gardener

‘Wiki gardening’ <http://c2.com/cgi/wiki?WikiGardener> is a phrase used to describe tending a wiki to ensure that it stays fresh and remains navigable. Your wiki will need a gardener. During the early stages of deployment, you’ll manage with just a single ‘WikiMaster’. His or her role will be to lay down some of the initial pathways, tidy up pages and ensure content stays relevant. As the wiki grows, this role will become more than a one-person job. At Ingenta a number of my colleagues quickly embraced the wiki and became good WikiCitizens <http://c2.com/cgi/wiki?WikiCitizen>.

Typical wiki gardening tasks include:

  • Tidying up and re-formatting pages to ensure they’re readable
  • Helping ensure content is up-to-date
  • Checking for orphaned pages that aren’t usefully connected into the main web of pages
  • Breaking up long pages into smaller more manageable and useful chunks
  • Identifying useful content to be contributed
  • Promoting the wiki within their own department or team
  • Renaming pages to better reflect their contents.

Ideally, your wiki gardeners will emerge naturally, but you can actively recruit them from individual departments. The idea isn’t to delegate maintaining the wiki to a small team of users; it’s more about community building. It’s essential for the user community to take ownership for their own content, and, most importantly, for other people’s content. This is one important difference between a wiki and traditional groupware.

Naming is everything

Naming is important in a wiki. Try to encourage good naming or navigation will suffer. Page names should reflect their content. Avoid use of abbreviations, acronyms, etc. Wikis work very well with CamelCaseNamesLikeThis. All wiki installations will automatically generate links from CamelCase words to the appropriately named page in the wiki.

With good naming you can write sentences like the following, and they will not only be readable, but also magically gain links to the relevant documentation:

When ConfiguringTheServer don’t forget to DeployTheWidget; if you need a reference read HowToStartTheApplication.

Or, perhaps:

We’re maintaining a list of CurrentClients and CurrentCompetitors. Delivery dates for ForthcomingProducts can be found in the ReleaseSchedule.

Naming conventions are also a good way to indicate that pages are related in some way. For example we often use a project’s name as a prefix for pages, e.g. ProjectNameOverview and ProjectNameReleases, or for user specific pages: LeighDoddsCalendar.

Avoiding a wiki explosion

If your wiki starts to become successful and other departments or teams embrace it, you may find yourself faced with a request that users need a wiki for their department only. Just say no!

If you create many small wikis then you inevitably recreate the kind of content silos that you’re undoubtedly trying to replace. Provide guidance on how users can create pages targeted for their own department, perhaps adopting a naming convention as outlined above. Explain that this will be less effective than a single inter-linked knowledge base. For example the content will not be cross-searchable.

Wherever we’ve deployed smaller per-department or per-team wikis they’ve rapidly grown stale. Either because there wasn’t enough content, or that users were already contributing to another wiki and naturally continued to add content there. In almost all cases we’ve ended up shutting them down.

The only occasion I’ve found when a separate wiki is not only useful but essential is when it’s shared with people outside the firewall. We’ve used a wiki at Ingenta as a way to share documentation with clients. It wouldn’t be appropriate for clients to have access to our main corporate wiki, so a separate installation works better.

Within an organisation, ensuring people share information requires extra work – anywhere from 30 minutes to a whole day. But I know I’m not the only one keeping an interested eye on the RecentChanges page on our wiki to see what’s happening elsewhere in the company.

Hopefully this article has provided some useful pointers that will help you explore the potential of your own corporate wiki. I’ve found it fascinating to see how a wiki environment can facilitate sharing and contribution amongst teams, as well as providing a low-cost and simple way of capturing knowledge within an organisation.

Embracing the Wiki Way

I was recently invited to write an article for the FreePint newsletter. The article “Embracing the Wiki Way: Deploying a Corporate Wiki” is now available. It serves as an update to my blog entry on bootstrapping a corporate wiki with more of an emphasis on tips for users.
If you’re thinking about deploying a corporate wiki then hopefully the article may help you plan your strategy. There are also plenty more case studies available for you to compare with.