Fresh & New(er)

Henry Jenkins – notes from CCI ‘Creating Value Between Commons and Commerce’ conference, Brisbane, 2008

Post author By Seb Chan
Post date June 28, 2008

I’ve been in Brisbane the last few days – presenting the Powerhouse Museum’s Creative Commons and public domain projects and also managed attend one day of the CCI’s conference ‘Creating Value Between Commons and Commerce‘. In amongst some truly awful examples of how not to use Powerpoint, there were some interesting presentations and papers.

Here’s the first of a set of notes scribed during the main sessions.

Web metrics

Google Trends does basic comparative metrics

Post author By Seb Chan
Post date June 22, 2008

Google Trends has started to allow domain level searches. This means that you can now pull up rough traffic figures, as calculated by Google, on any top level domain (subdomains like play.powerhousemuseum.com or artgallery.nsw.gov.au won’t work), and compare them to others. This moves Google Trends into territory covered by services like Compete, Quantcast (both US-centric) and, to a lesser extent, Hitwise.

Metadata Semantic Web

Collaborative collective classificiation – BBC Labs on using Wikipedia as metadata

Post author By Seb Chan
Post date June 14, 2008
1 Comment on Collaborative collective classificiation – BBC Labs on using Wikipedia as metadata

Chris Sizemore at the BBC’s Radio Labs demonstrates an experiment in automated metadata, much akin to Open Calais.

Sizemore has taken Wikipedia and has built a simple web application that uses Wikipedia to disambiguate entities in a block of text and suggest broad categories for the content. Because Wikipedia has broad coverage of topics and deep coverage of specific niches, it can provide, as Sizemore writes, for some areas (especially popular culture), a good enough data source for automated classification.

Here’s Sizemore’s methodology –

1. Download entire contents of the English language Wikipedia (careful, that’s a large 4GB+ xml file!)

2. Parse that compressed XML file into individual text files, one per Wikipedia article (and this makes things much bigger, to the tune of 20GB+, so make sure you’ve got the hard drive space cleared)

3. Use a Lucene indexer to create a searchable collection (inc. term vectors) of your new local Wikipedia text files, one Lucene document per Wikipedia article

4. Use Lucene’s ‘MoreLikeThis’ to compare the similarity of a chunk of your own text content to the Wikipedia documents in your new collection

5. Treat the ranked Wikipedia articles returned as suggested categories for your text

Basically what is going on here is that the text you wish to classify is compared to Wikipedia articles and the articles with the ‘closest match’ in terms of content, have their URLs thrown back as potential classification categories.

Combine this with Open Calais and there will be some very interesting results across a broad range of text datasets.

As regular readers will know, we’ve been experimenting quite a bit with Open Calais at the Powerhouse with some exciting initial results. We’ve been looking at the potential of Calais in combination with other data sources including Wikipedia/dbPedia/Freebase and we’ll be watching Sizemore’s experiment with interest.

Perhaps my throwaway line in recent presentations that ‘humans should never have to create metadata’ might actually be becoming closer to a reality.

Collection databases Search Web metrics

OPAC2.0 – Examining Delta Goodrem’s dress again / more on search

Post author By Seb Chan
Post date June 14, 2008
2 Comments on OPAC2.0 – Examining Delta Goodrem’s dress again / more on search

The most popular object in our online collection database is still a dress worn by Delta Goodrem.

I’ve previously written about how the popularity of this dress was driven in part by coverage on a number of Delta Goodrem fan forums. But this neglects the criticality of search. Google has always driven traffic to this object and looking at last months analytics where Google search represented 86% of referrers to the object, the top 5 keywords used to discover this dress were these –

1. lisa ho – 11.24%
2. evening dresses – 4.55%
3. lisa ho dresses – 2.71%
4. formal dress – 2.13%
5. chiffon dress – 1.07%

Because of the frequency of the keywords ‘lisa ho’ in the title, description and body text of the object record, and the trusted PageRank of the Powerhouse Museum domain, we rank 11th in Google search results for ‘lisa ho’; 2nd for ‘lisa ho dress’; and 4th for ‘lisa ho dresses’.

Fortunately for us, this external traffic isn’t fleeting. Visitors to this object view almost double the average number of pages viewed by others on our site; and they spend more time on the site too.

Looking at the internal search terms for that same object the results are very different.

1. Australian fashion (also a subject classification)
2. tennis (user tag)
3. lisa ho
4. delta goodrem
5. elegant (user tag)

External search has effectively driven nearly 10 times the traffic of internal users to this object. It has also brought audiences to the object who have very little behavioural similarities to those who search within the context of our own site (internal search). This creates many new challenges in terms of usability and user experience.

Over the entire collection there are pockets of objects for which the difference between internal and external search is not as great however this needs much greater data analysis (and may be the subject of a future post or paper).

Search User experience

SEO (search engine optimisation) basics and museums

Post author By Seb Chan
Post date June 14, 2008

One of the most common questions asked over the past few years has been “how do I get the best out of SEO for my museum?”. This comes up in casual conversations and without fail at conferences. We are all becoming increasingly aware of the higher and higher proportion of our traffic coming via search, and that as content on the web grows exponentially the chance of our content lying buried deep in search engine results increases.

Often the problem for museums with search relates to the diversity of their web presence. Other than our brand name, our content, especially those held in collections, is often very diverse and our exhibitions equally so. I’ve previously written about the need to tackle exhibition naming so that at least on the web exhibition titles are more ‘search-friendly’, but this is very tricky to apply to collection and education content.

The news media have taken to rewriting headlines for search – knowing that timeliness and findability are crucial to their success of their content – Scott Gledhill’s fantastic SEO presentation from Web Directions South 2007 is an eye-opening look at how News Limited journalists in Australia are maximising the reach of their articles (link is to a full Slidecast).

Is this possible with museum content?

Should (and can) curators, education staff, marketing staff, get a quick dashboard that reports the web performance of the content they are creating? Should (and can) they iterate their content, improving it, guided by real world performance? If museums are ‘slow media’, then is performance-guided content creation even a desirable outcome? (Update: do we really want to get to a situation like this parodied in the Slate?)

Maybe you need to tackle the basics first – getting your key content more visible. So where do you start?

Fortunately there are plenty of great SEO resources on the web and plenty of ways of testing SEO performance for free or very low cost. Last month Web Designers Wall posted a simple introduction to SEO which is worthwhile reading for the very basics. This along with Scott’s presentation should provide a good start point.

Social networking Web metrics

Just how popular is that Facebook application? Artshare and Steve Art Tagger and Developer Analytics

Post author By Seb Chan
Post date June 13, 2008
11 Comments on Just how popular is that Facebook application? Artshare and Steve Art Tagger and Developer Analytics

I’ve been wondering for a long time about the real popularity of Facebook apps that are targetted at specific niche user groups.

Well with Developer Analytics you can find out – without needing to be the actual developer of the Facebook application in question.

With the museum community starting to build useful applications like the Brooklyn’s ArtShare or the Steve Art Tagger, the ability for us all to evaluate the success of these sort of projects is increasingly important. This is especially the case for cross-institutional projects to which we are all beginning to contribute our content. Are these projects reaching the audiences that we want our content to reach? Where should we focus our energies?

What can you learn from Developer Analytics?

For Artshare I can quickly see that as of today it has 2,900 install with 58 average daily users, as well as pull up a popularity graph to see this over time. (Update: Shelley at the Brooklyn says that these stats conflict with the ones she can pull up from within Facebook – see comments below) I can also compare it with the Steve Art Tagger which has been up for a few months and has 200 installs but only an average of 2 active daily users. Readers from the libraries world might be interested in taking a look at the statistics for the OCLC’s recently released WorldCat Facebook app.

I can also look at which commercial applications are most successful and track trends across, say, the multitude of Flickr-related applications to see which are the most sticky and used.

There are important lessons to be learnt from the other successful Facebook applications which we can draw upon when building our own.

Here’s a chart from the My Flickr application which, being an app with a large-ish userbase provides significantly more data about users – including the other apps that users of My Flickr use, gender, age and friend demographics. (A side note – the availability of this information to application developers in itself should be of interest to all Facebook users concerned with privacy).

Head over to Developer Analytics and do some digging of your own.

Search User experience

User experience is all that matters – a reminder about content, search and users

Post author By Seb Chan
Post date June 6, 2008
5 Comments on User experience is all that matters – a reminder about content, search and users

Scott Karp over at Publishing 2.0 has been griping about his experience using his local newspaper website which just so happens to be the Washington Post. Driven by a desire to find out about power cuts as a result of storm, Karp was unable to quickly find what he wanted, and thus turned to other websites, finding them through Google.

Imaging Interactive Media Mobile MW2007

Mobile augmented animals – Wellington Zoo

Post author By Seb Chan
Post date June 2, 2008
2 Comments on Mobile augmented animals – Wellington Zoo

One of the really wild things at Museums and the Web 2007 was a demonstration booth from the National Science Museum, Japan. At the booth were a series of paper pop up dinosaurs. By themselves the dinosaur popups were impressive but once a consumer grade webcam was pointed at the paper cutouts they came to life as proper 3d models on screen.

The technology was written up in their paper over at Archimuse.

Imaging Web 2.0

Brooklyn joins the Commons, we hit the 500 mark

Post author By Seb Chan
Post date May 30, 2008
1 Comment on Brooklyn joins the Commons, we hit the 500 mark

The Brooklyn Museum have just joined the Commons on Flickr and some of the material they’ve released is spectacular. Amongst the highlights are some amazing lantern slides of Egypt as well as colourised photographs from the Paris Exposition in 1900. Some of the colourised images are quite surreal.

Brookyln have also released some of them at 3000 pixel and higher resolutions – asking re-users of these images to contact them to tell them whether this extra high resolution is useful. (I immediately thought that it might be fun to Photoshop in some Indiana Jones images into some of the Egypt images).

Flickr is already flagging that there will be many more contributors to the Commons coming very soon and that there will be some new features – an internal Commons search – as well as greater promotion of the Commons across Flickr. The addition of Brooklyn also seems to have solved the problem of the Commons needing a separate account – Brooklyn have sensibly merged their Commons images into their already very successful Flickr presence.

Back at the Powerhouse we’ve just uploaded our 500th image. This latest batch includes some lovely shots of the Sydney Observatory which is celebrating its 150th anniversary this year. There are also more shots of old Sydney, and the Tyrrell Today group is now starting to fill up with complimentary contemporary shots of the Tyrrell locations takne by a diverse range of other Flickr users.

And two other things, if you search Flickr regularly then you will love CompFight. It is a really nifty quick search of Flickr with various options for Creative Commons images and (un)Safe Search that leverages the Flickr API.

If you want the more ‘wow’ but far less practical search of Flickr then this 3D globe-style search from Germany, Tag Galaxy, is pretty amusing – especially on a fast connection.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: