Clustering coefficient

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Example clustering coefficient on an undirected graph for the shaded node i. Black line segments are actual edges connecting neighbors of i, and dotted red segments are missing edges.

The clustering coefficient of a vertex in a graph quantifies how close the vertex and its neighbors are to being a clique (complete graph). Duncan J. Watts and Steven Strogatz introduced the measure in 1998 [1] to determine whether a graph is a small-world network.

[edit] Formal Definition

A graph G = (V,E) formally consists of a set of vertices V and a set of edges E between them. An edge eij connects vertex i with vertex j.

The neighbourhood N for a vertex vi is defined as its immediately connected neighbours as follows:

N_i = \{v_j : e_{ij} \in E \and e_{ji} \in E\}.

The degree ki of a vertex is defined as the number of vertices, | Ni | , in its neighbourhood Ni.

The clustering coefficient Ci for a vertex vi is then given by the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, eij is distinct from eji, and therefore for each neighbourhood Ni there are ki(ki − 1) links that could exist among the vertices within the neighbourhood (ki is the total (in + out) degree of the vertex). Thus, the clustering coefficient for directed graphs is given as

C_i = \frac{|\{e_{jk}\}|}{k_i(k_i-1)} : v_j,v_k \in N_i, e_{jk} \in E.

An undirected graph has the property that eij and eji are considered identical. Therefore, if a vertex vi has ki neighbours, \frac{k_i(k_i-1)}{2} edges could exist among the vertices within the neighbourhood. Thus, the clustering coefficient for undirected graphs can be defined as

C_i = \frac{2|\{e_{jk}\}|}{k_i(k_i-1)} : v_j,v_k \in N_i, e_{jk} \in E.

Let λG(v) be the number of triangles on v \in V(G) for undirected graph G. That is, λG(v) is the number of subgraphs of G with 3 edges and 3 vertices, one of which is v. Let τG(v) be the number of triples on v \in G. That is, τG(v) is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is v and such that v is incident to both edges. Then we can also define the clustering coefficient as

C_i = \frac{\lambda_G(v)}{\tau_G(v)}.

It is simple to show that the two preceding definitions are the same, since

\tau_G(v) = C({k_i},2) = \frac{1}{2}k_i(k_i-1).

These measures are 1 if every neighbour connected to vi is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to vi connects to any other vertex that is connected to vi.

The clustering coefficient for the whole system is given by Watts and Strogatz as the average of the clustering coefficient for each vertex:

\bar{C} = \frac{1}{n}\sum_{i=1}^{n} C_i.

A graph is considered small-world, if its average clustering coefficient \bar{C} is significantly higher than a random graph constructed on the same vertex set, and if the graph has a short mean-shortest path length.

[edit] References

  1. ^ D. J. Watts and Steven Strogatz (June 1998). "Collective dynamics of 'small-world' networks". Nature 393: 440–442. doi:10.1038/30918. http://web.archive.org/web/20070418032327/http://www.tam.cornell.edu/SS_nature_smallworld.pdf. 
Personal tools