At MW08 there was the beginnings of a push amongst the technically oriented for the development of APIs for museum data, especially collections. Driven in part by discussions and early demonstrations of semantic web applications in museums, the conceptual work of Ross Parry, and the presence of Eric Miller and Brian Sletten of Zepheria; Aaron Straup Cope and George Oates of Flickr, MW08 might well be a historic turning point for the sector in terms of data interoperability and experimentation.
Since April there has been a lot of movement, especially in the UK.
The ‘UK alpha tech team’ of Mike Ellis, Frankie Roberto, Fiona Romeo, Jeremy Ottevanger, Mia Ridge are leading the charge all working on various ways of connecting, extracting and visualising data from the Science Museum, Museum of London and the National Maritime Museum in new ways. Together with them and a few other UK commercial sector folk, I’ve been contributing to a strategy wiki around making a case for APIs in museums.
Whilst the tech end of things is (comparatively) straight forward, the strategic case for an API is far more complex to make. As we fiddle, though, others make significant progress.
Already a community project, dbPedia, has taken the content of Wikipedia and made it available as an open database. What this means is that it is now possible to make reasonably complex semantic queries of Wikipedia – something I’m yet to see done on a museum collection. There are a whole range of examples and mini-web applications already built to demonstrate queries like “people born in Paris” or “people influenced by Nietzsche“. More than this, though, are the exciting opportunities to use Wikipedia’s data and combine it with other datasets.
What should be very obvious is that if Wikipedia’s dataset is made openly available for combining with other datasets then, much as Wikipedia already draws audiences away from museum sites, then their dataset made usable in other ways, will draw even more away. You might well ask why similar complex queries are so hard to make in our own collection databases? “Show me all the artwork influenced by Jackson Pollock?”
On June 19 the MCG’s Museums on the Web UK takes place at the University of Leicester with the theme of “Integrate, federate, aggregate“. There’s going to be some lovely presentations there – I expect Fiona Romeo will be demoing some lovely work they’ve been doing and Frankie Roberto will be reprising his high entertaining MW08 presentation too.
The day before, like the MCGUK07 conference, there will be a mashup day beforehand. Last year’s mashup day produced a remarkable number of quick working prototypes drawing on data sources provided by the 24 Hour Museum (now Culture24). This year the data looks like it will be coming from the collection databases of some of the UK nationals.
Already Box UK and Mike Ellis have whipped up a really nice demonstration of data combining – done by scraping the websites of the major museums with a little bit of PHP code. Even better, the site provides XML feeds and I expect that it will be a major source of mashups at MCG UK.
I like the FAQ that goes along with the site. Especially this –
Q: Doesn’t this take traffic away from the individual sites?
We don’t think so, but not many studies have been done into how “off-site” browsing affects the “in-site” metrics. Already, users will be searching for, consuming, and embedding your images (and other content) via aggregators such as Google Images. This is nothing new.
Also, ask yourself how much of your current traffic derives from users coming to explicitly browse your online collections?
The aim is that by syndicating your content out in a re-usable manner, whilst still retaining information about its source, an increasing number of third-party applications can be built on this data, each addressing specific user needs. As these applications become widely used, they drive traffic to your site that you otherwise wouldn’t have received: “Not everyone who should be looking at collections data knows that they should be looking at collections data”.
I’ve spoken and written about this issue of metrics previously, and these and the control issues need to be sorted out if there is going to be any real traction in the sector.
Unlike the New York Times (who apparently announced an API recently), and the notable commercial examples like Flickr, the museum sector doesn’t have a working (business) model for their collections other than a) exhibitions, b) image sales and possibly c) research services.
Now back to that semantic query, wouldn’t it be useful if we could do this – “Play me all the music videos of singles that appear on albums whose record cover art was influenced by Jackson Pollock?”. This could, of course be done by combining the datasets of, say the Tate, Last.FM, Amazon and YouTube – the missing link being the Tate.