Saturday, 5 January 2008

The Long Tail Of Social Networks

The long tail distribution curve of social network usage.

Two things came together this week.

Firstly, I was reading Chris Andersson's excellent book - The Long Tail, which if you haven't read it a must read. He quite clearly explains what the effect of lowering distribution and production costs have done to modern day retailing and the related success of ecommerce businesses. One point he makes is that the when the cost of distribution is low, the spread of available inventory increases.

Secondly, I came across an excellent site that graphically represents the social graphs of on-line communities and social networking sites. http://orgnet.com/community.html by Valdis Crebs shows how most networks have a core of heavy, dedicated users, a second group that are loosely connected and a large outer ring of "disconnected nodes, commonly known as lurkers". He notes that communities have various levels of belonging and contribution.

The graphical representation by Valdis Krebs seemed to me to be an indication of a Pareto distribution of usage. That's to say that the highest intensity of usage comes from the smallest proportion of users and that the lowest intensity of usage comes from the highest proportion of users. I had also heard the same pattern described by a friend who was researching her MBA project on online communities.

So I asked myself the question: can I test to see if social network usage has a "Long Tail"?

To see if this was the case, I ran the numbers from my LinkedIn network. (For info: If you are not a member of LinkedIn, it's a free to join professional networking site).

I looked at all of my 227 connections to see how many connections each of them had. From Valdis' observations I expected to see that there would be a very few "highly connected" users and exponentially more users that had fewer connections. What I wanted to see however was whether the resulting chart would result in a Long Tail distribution curve.

My reasoning was: given that it was free to join, there are zero "costs of distribution" - therefore I would expect to see a "Long Tail".

Sure enough, there were only a few top end users. On Linkedin, once you go over 500 connections it displays as "500+". So the top end of the pattern is slightly distorted in that not all of the 500+ users will have the same number of connections. Some will be in the 500's, but I imagine that one or two might even reach to 700 or even 900. In total there are only 6 users with 500+ connections (out of 227).

What about the bottom end of the scale? It turns out that just over half are low usage users. 115 people have less than 50 connections.

Here's what it looks like if you break it down:





ConnectionsPeople% of total
500+63%
201-500125%
101-2004018%
51-1005424%
1-5011551%

If you plot it on a graph, the "Long Tail" curve is most definitely evident:



The curve extends right out to the right where there are 8 users with just 1 connection. These are people that I invited to join my network, who accepted, but have done nothing since. They have not added their own connections. Note that there are more people in this category than there are those with 500 or more connections at the other end of the scale.

Of course, I have only plotted the data from one network, for one node (me). It would be interesting to know if other networks also follow the same pattern.

Despite my limited analysis, I'm pretty convinced that most online social networks and communities will display similar Pareto distribution curves. This is because there is no barrier to entry. There is no cost to join a network. It's free. Therefore, it's no big deal to join up, trial it out and see if if you find it useful. There are no costs in extending the tail.

I assume therefore that I'm probably in the top 10 to 15% of users of LinkedIn and therefore most probably one of the "core" users as described by the graph on orgnet.com.

However, if I consider my half-hearted efforts on Facebook, I'm definitely not in the core. I'm probably floating around "loosely connected" in the bottom half of the curve. (As are, I expect the majority of registered users).

Conclusion? Everything so far points to a long tail distribution curve of social network usage.

No comments: