In mid June 2006 the Powerhouse Museum launched the new online collection database. Internally referred to as OPAC2.0, the project put online nearly 62,000 object records, and 30,000 images, opening access to these records with intelligent search tools, serendipity tools and the ability for users to self-classify using folksonomies.
In just six weeks visitiation to the Museum’s website increased over 100% (excluding spiders and bots). In the 6 weeks from June 14-July 31 OPAC2.0 on its own received 239,001 visitors (excluding internal museum users) who performed a total of 386,199 successful searches leading to object views (we currently track anonymous data on search terms linked to object views to provide the necessary data for our recommendation engine) and over 1.2 million individual object views.
This post looks at some of the initial trends that are being observed.
1) The ‘long tail’ of collections
The long tail theory when applied to museums goes that museums have limited space to exhibit their objects so they gather together what they consider to be the most popular and most important from their collections, put them in showcases with labels, and exhibit them. The rest, or in the Powerhouse Museum’s case, 96% of the collection, sits in a warehouse in storage being preserved for some future time (at not insignificant cost). With the advent of the web, it was thought that at last these unseen collections could at last be brought to the public gaze – activating the ‘long tail’ of geographically spread niche interests that exist out in the community.
However there are two major obstacles to be overcome.
The first of these which remains a huge obstacle for many institutions is that of digitisation. Digitisation is expensive, time consuming, and needs to be justified for reasons other that just being an end in itself. Digitisation policies and procedures, formats and outputs, storage and documentation all differ from organisation to organisation. Digitisation doesn’t just refer to the act of imaging (2D or 3D) but also of digital storing and preserving, collection research and records – which in the Powerhouse Museum’s case date back over 100 years. And what of those objects like computer software that are ‘born digital’?
The second, which we have been working hard to find solutions to, is ‘exposure’. Once a collection is online how does one make sure that it is exposed to all those geographically dispersed niche audiences that ‘long tail’ theories assume want to access them? There have been many noble attempts at federated collection searches across institutions, and at making individual collections available in novel ways. But the main problem has remained that usually the audiences for these niche collections are in the main, researchers – and that as a result the raw traffic that these sites receive is quite small and narrow.
Popular objects are the first indicator of the long tail.
[graph of top 200 objects by number of views]
The top 10 most popular objects (as of the date of this post) are –
(the number in brackets represents total views)
1 – (1274) 88/4 Steam locomotive, No. 3830, iron/steel/brass, New South Wales Government Railways, Eveleigh Rai …
2 – (873) 94/129/1 Evening dress, womens, `Chocolate box’, plastic/fabric, Jenny Bannister for Chai, Australia …
3 – (791) 2005/1/1 Evening dress, beaded pink chiffon trimmed with charms, designed by Lisa Ho and made in the …
4 – (788) 88/5 Locomotive, full size, steam, No.1243, metal/glass, Davy and Company, Atlas Engineering Works, …
5 – (754) B1495 Aircraft, flying boat, Catalina, PB2B-2, “Frigate Bird II”, VH-ASA, metal / fabric, Boeing Air …
6 – (606) 95/23/1 Dress, evening, silk / polyester, designed by Jenny Bannister, Melbourne, Victoria, Australi …
7 – (550) 97/208/1 Shoes, pair, womens, ‘Super elevated gillies’, leather/ cork/ silk, Autumn/ Winter collecti …
8 – (471) 92/305 Food safe (bush pantry), wood/ metal, unknown maker, [Queensland], Australia, c. 1925 …
9 – (444) 2005/127/1 Clothing (9), boys, cotton / wool / metal / mother-of-pearl / plastic / paper / cardboard …
10 – (432) 90/816 Aircraft, full-size, helicopter, Bell 206B Jetranger III, “Dick Smith Australian Explorer”, V …
Now from our total object view figures we can determine that even the most popular object – the steam locomotive no 3830 – represents only 0.1% of all views. Because OPAC2.0 has only be online for 7 weeks we are yet to reach a point where ALL possible objects have been viewed at least once – but we are already at 75%.
What is particularly interesting is the sheer diversity of objects viewed, and that once past the top 10 or 20 objects, the curve flattens right out and by the time we reach the 109th most popular object, the next 46,000 objects have under 200 object views – but still have been viewed at least once.
2. Serendipitous exploration
One of the key elements of OPAC2.0 is its serendipity features. Although only in its most basic first iteration at the moment, almost every object view ‘suggests’ other objects to view. For example viewing a piece of 1940s medical equipment also displays links to other similar equipment. The impact of this is not so pronounced for users that come in via the front door but for users coming in directly to object records via Google or other searches this actively encourages them to stay and look around (at least within their area of interest).
The average number of successful searches per visit is 1.62.
The average number of objects viewed per visit is 5.02.
Contrast this with the single view per visit that objects on our previous ‘packaged collection’ received and the change is particularly marked.
The serendipity engine currently mines the Museum’s object thesaurus. This allows users to browse other objects that belong to the same object ‘category’ as the object they are looking at. Users can also browse ‘up’ a branch to explore other related categories without leaving the object page.
We have recently turned on the displaying of ‘subject’ associations for object views. These subjects are entered at time of acquisition by registrars at the Museum, and added to by curators as research is undertaken.This adds extra metadata to each object classifying recent acquisitions by a thematic subject terms or terms.
For example, 97/278/1 Mirror, glass/metal, part of Narrabri Stellar Interferometer, designed by Robert Hanbury Brown and Richard Twiss, Officine Galileo, Italy, 1960-1961 calls up associations ‘optical astronomy’ and ‘School of Physics, University of Sydney’ – both of which allow further exploration for similar objects.
The next stage which will be implemented on OPAC2.0 in the coming weeks is ‘popularity’-based recommendations. These are currently being tested on Design Hub. Popularity based recommendations work at the search level and make two different recommendations to the user.
The first of these popularity recommendations is ‘similar searches’. Similar searches looks at the search terms entered by the user and compares the results with other terms that have resulted in ‘clicks’ for similar objects. This is very useful in revealing, for example, that the search term ‘chair’ is ‘similar’ to a search for ‘Marc Newson’ of ‘Frank Gehry’.
The second popularity-based recommendation is that of ‘popular objects’. This is a much simpler lookup which examines the search results and ranks them by past usage for that particular term. Through this method we can cut through a rather meaningless free text search for ‘chair’ and reasonably return the Wiggle Chair as most popular (but not necessarily, by algorithm, ‘most relevant’).
3. Search terms, folksonomies & Google
What complicates matters considerably is an examination of ‘search terms’ used on the site.
Here are the top 20 terms for July.
88-4, 99-9-1, 98-2, ring, suit, female, 1934, fashion, mineral, nylon, shoe, bowls, human, camera, birmingham, costume, cap, shell, bag, satin
The first three of these correspond directly to 3 objects that are being redirected from our old catalogue – so they can be discounted. The other 17, though, interestingly enough are terms that have been entered as ‘user keywords’ and appear on the home page as part of our tag cloud. Coupled with the serendipity features described earlier, what seems to be happening in this early stage is two fold.
Firstly, Google is picking up the folksonomy keywords and associated objects. It is also picking up every object in our collection – very nicely. We are not quite certain yet as to the impact here but overall our Google-originating traffic has increased by roughly the same margin as overall site visitation. Our ability to replicate traffic patterns ourselves in Google is at best unreliable although we do know that doing a search for ‘Delta Goodrem dress’ in Google explains the sudden popularity of a particular Goodrem-related object. However searching for any of the above 17 terms doesn’t, by themselves, get any Powerhouse results in the first few pages.
Secondly, we think that single word search terms will always win out over phrase searching – despite the best efforts of library folk to educate users to make ‘more accurate’ searches. This explains the popularity of these rather odd single words to some degree. The other major factor with single words is that the prominence of the tag cloud encourages first-time users to start their browsing of our collection through those entry points. Certainly usability testing tells us that the average user is more likely to click a word in a big font rather than type something.
OPAC 2.0 was developed by Sebastian Chan and Giv Parvaneh with collection assistance from Lynne McNairn. OPAC 2.0 is a project wholly developed by the Powerhouse Museum using internal staff resources and creativity. It is written in PHP, runs off a large Microsoft SQL database and uses some AJAX for display purposes. The collection database that it searches and displays runs Emu by kEmu as a collection management system.
OPAC 2.0’s success to date rests on the high quality content written and produced over many years by the research and collection staff at the Powerhouse Museum. Without their expertise, there would be no collection to search.