Categories
API Collection databases

More on museum datasets, un-comprehensive-ness, data mining

(Another short response post)

Thus far we’ve not had much luck with museum datasets.

Sure, some of us have made our own internal lives easier by developing APIs for our collection datasets, or generated some good PR by releasing them without restrictions. In a few cases enthusiasts have made mobile apps for us, or made some quirky web mashups. These are fine and good.

But the truth is that our data sucks. And by ‘our’ I mean the whole sector.

Earlier in the year when Cooper-Hewitt released their collection data on Github under a Creative Commons Zero license, we were the first in the Smithsonian family to do so. But as PhD researcher Mia Ridge found after spending a week in our offices trying to wrangle it, the data itself was not very good.

As I said at the time of release,

Philosophically, too, the public release of collection metadata asserts, clearly, that such metadata is the raw material on which interpretation through exhibitions, catalogues, public programmes, and experiences are built. On its own, unrefined, it is of minimal ‘value’ except as a tool for discovery. It also helps remind us that collection metadata is not the collection itself.

One of the reasons for releasing the metadata was simply to get past the idea that it was somehow magically ‘valuable’ in its own right. Curators and researchers know this already – they’d never ‘just rely on metadata’, they always insist on ‘seeing the real thing’.

Last week Jasper Visser pointed to one of the recent SIGGRAPH 2012 presentations which had developed an algorithm to look at similarities in millions of Google Street View images to determine ‘what architectural elements of a city made it unique’. I and many others (see Suse Cairns) loved the idea and immediately started to think about how this might work with museum collections – surely something must be hidden amongst those enormous collections that might be revealed with mass digitisation and documentation?

I was interested a little more than most because one of our curators at Cooper-Hewitt had just blogged about a piece of balcony grille in the collection from Paris. In the blogpost the curator wrote about the grille but, as one commenter quickly pointed out, didn’t provide a photo of the piece in its original location. Funnily enough, a quick Google search for the street address in Paris from which the grille had been obtained quickly revealed not only Google Street View of the building but also a number of photos on Flickr of the building specifically discussing the same architectural features that our curator had written about. Whilst Cooper-Hewitt had the ‘object’ and the ‘metadata’, the ‘amateur web’ held all the most interesting context (and discussion).

So then I began thinking about the possibilities for matching all the architectural features from our collections to those in the Google Street View corpus . . .

But the problem with museum collections is that they aren’t comprehensive – even if their data quality was better and everything was digitised.

As far as ‘memory institutions’ go, they are certainly no match for library holdings or archival collections. Museums don’t try to be comprehensive, and at least historically they haven’t been able to even consider being so. Or, as I’ve remarked before, it is telling that the memory institution that ‘acquired’ the Twitter archive was the Library of Congress and not a social history museum.