Categories
Collection databases Digitisation Folksonomies Web 2.0

OPAC2.0 – latest tag statistics and trends for simple comparison with Steve project

Another paper from the Steve researchers has gone online and is generating interesting discussions. It elaborates on the content of an earlier summary podcast. To be presented at ICHIM07 the paper describes some of the emerging patterns in tagging behaviour in the different interface trials.

Promising results from early prototype analysis showed that users could contribute significant numbers of new terms, reflecting new concepts (Trant, 2006b). Indeed up to 90% of the terms that users contributed were not present in the documentation of the works as provided by the museums even though those works provided by the museums had quite rich records. Significant numbers of new terms provided by more than one or two visitors, revealed that users see, and presumably recall, some details of images that are not explicitly documented by professionals, thus suggesting some scope for using these terms to aid in discovery.

These preliminary results are holding up in our more formal experiments. Of the tags assigned to all works during Term Set 1 (March 27– July 11, 2007), 76.5% (7,973 of 10,418) were not found in museum documentation.

To offer some data from our own work, here are some recent comparative statistics from the Powerhouse Museum’s collection database. We have generated these statistics against all our tags as well as separately for those objects tagged with our ‘bulk tagger‘ which draws an audience from readers of this blog and other museum professionals.

Overall tagging behaviour

Basic statistics excluding bulk tags

Total tags added – 6,749
Total unique tag words/phrases – 5,044
Number of objects tagged – 3,980

Top ten most used tags:

value – 35
blue – 34
chinese – 28
glass – 25
price – 24
green – 23
vase – 20
japanese – 20
white – 19
silver – 19

Tag to museum terminology relationships:

Tag terms that pre-exist in object description field – 1,637 (32.5%)
Tag terms that pre-exist anywhere in full object record – 1,911 (37.9%)
Tag terms that match taxonomic object or subject classification – 551 (10.9%)

Bulk tagger tagging behaviour (started July 2007)

Basic statistics

Total bulk tags added – 815
Total unique bulk tags – 563
Number of objects bulk tagged – 286

Top ten most used tags:

olympics – 11
australia – 9
costume – 7
floral – 6
flowers – 6
plate – 6
vase – 6
coin – 5
copper – 5
decorative – 5

Tag to museum terminology relationships:

Bulk tag terms that pre-exist in object description field – 267 (47.4%)
Bulk tag terms that pre-exist anywhere in full object record – 316 (56.1%)
Tag terms that match taxonomic object or subject classification – 125 (22.2%)

In the Steve results 23.5% of terms matched terms already in the object documentation. But in the Powerhouse example, our figures are significantly greater especially if we look at the entire object record (not just the basic description), and interestingly even more so amongst the bulk taggers (ie. you, the museum professionals). My initial feeling is that this result is because of the compartatively higher level of documentation for object records in the Powerhouse collection compared to the average for an art museum collection. This might indicate that whilst there is some degree of ‘semantic gap’ bridging going on here, there is also a content issue. Interestingly, within our collection there is even a reasonable correlation between object tags and formal thesaurus terms.

One reply on “OPAC2.0 – latest tag statistics and trends for simple comparison with Steve project”

seb,

thanks for such quick posting of comparative data — it’s useful to have a ‘real world’ corollary for our experimental results.

the simple match we reported in the ichim07 paper, of tag to museum documentation is indeed the most unforgiving comparison. But art museum label copy — what we lovingly call ‘tombstone data” [no art is not dead!] — is also the the data ‘harvested’ in OAI-based metadata projects or made available in Dublin Core based catalogues — both information discovery-led initiatives. So it’s a valid place to start.

it was interesting to see your stats sliced by bulk tagger and the regular tagger. do you get more tags per tagger per work (what we’re calling “velocity” in steve) in the bulk tagger?

we’ve got some other analyses to do with the steve data related to controlled vocabularies (AAT, and ULAN)extended object records — not just label copy but notes, audio tours, other interpretive materialsWordNet — to see if we can learn more about tags semantically this will certainly give us a more nuanced view of the nature of tags.

to come back to the contribution tagging is making, though, even in your most likely to match environment (bulk tags in the full object record) there are almost 45% new tags. that’s significant.

/jt

Comments are closed.