Categories
Collection databases Web 2.0

A practical model for analyzing long tails / Kalevi Kilkki in First Monday

Kalevi Kilkki from Nokia writes an interesting essay titled A practical model for analyzing long tails over at First Monday. For those anaysing how visitors dig into their websites, use their collections, this is useful reading.

This essay offers a dozen of examples of phenomenon, from books to square kilometers, that manifest themselves with a long tail of popularity. The long tail distributions are so similar that there is an obvious opportunity to model them by a single function. The main requirement for the function is that the cumulative distribution should generate a smooth S–shape when the x–axis is logarithmic.

As to the accuracy of the model, in many cases there are discrepancies that call for explanations. First, some anomalies could be explained by pure random variations, particularly with the objects with the highest ranks. Secondly, the abrupt end of the tail often is caused by the fact that in reality the size of the object is finite (e.g., one book), while the long tail function continues to eternity with ever smaller objects. Thirdly, the current environment may artificially shorten the tail. For instance, the business model of movie theaters significantly favors the most popular movies compared to an ideal distribution channel that can effectively distribute movies with a small audience. Fourthly, the effect of minorities (e.g. languages other than English) may considerably lengthen the end of the tail but are invisible in the base of the tail. Finally, in some cases there is no apparent explanation for the difference. To explain those unclear cases, we need more studies and better understanding.