83

Visualiser et fouiller des réseaux - Méthodes et exemples dans R

  • Upload
    tuxette

  • View
    106

  • Download
    0

Embed Size (px)

DESCRIPTION

AG du PEPI IBIS, 1er avril 2014 Cet exposé introduira la notion de réseaux et les problématiques élémentaires qui y sont généralement associées (visualisation, recherche de sommets importants, recherche de modules). Les notions seront illustrées à l'aide d'exemples utilisant le logiciel R sur un réseau réel.

Citation preview

Page 1: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualiser et fouiller des réseauxMéthodes et exemples dans R

Nathalie Villa-Vialaneix - [email protected]://www.nathalievilla.org

Unité MIA-T, INRA, Toulouse

AG PEPI IBIS - 1er avril 2014

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 1 / 39

Page 2: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Outline

1 Import data

2 Visualization

3 Global characteristics

4 Numerical characteristics calculation

5 Clustering

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 2 / 39

Page 3: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

What is a network/graph? réseau/grapheMathematical object used to model relational data between entities.

A relation between two entities is modeled by an edgearête

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39

Page 4: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

What is a network/graph? réseau/grapheMathematical object used to model relational data between entities.

The entities are called the nodes or the vertexes (vertices in British)n÷uds/sommets

A relationbetween two entities is modeled by an edgearête

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39

Page 5: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

What is a network/graph? réseau/grapheMathematical object used to model relational data between entities.

A relation between two entities is modeled by an edgearête

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39

Page 6: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

(non biological) ExamplesSocial network: nodes: persons - edges: 2 persons are connected(�friends�), as in my facebook network:

In the following: this network will be used to illustrate the presentation.Note: if you want to test the script with your own facebook network, you can extract it

at http://shiny.nathalievilla.org/fbs.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 4 / 39

Page 7: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

(non biological) Examples

Modeling a large corpus of medieval documents

Notarial acts (mostly baux à �ef, moreprecisely, land charters) established in aseigneurie named �Castelnau Montratier�,written between 1250 and 1500, involvingtenants and lords.a

ahttp://graphcomp.univ-tlse2.fr

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 4 / 39

Page 8: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

(non biological) Examples

Modeling a large corpus of medieval documents

• nodes: transactions and individuals(3 918 nodes)

• edges: an individual is directly involvedin a transaction (6 455 edges)

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 4 / 39

Page 9: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

(non biological) Examples

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 4 / 39

Page 10: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Standard issues associated with networks

Inference

Giving data, how to build a graph whose edges represent the direct linksbetween variables?Example: co-expression networks built from microarray data (nodes = genes; edges =signi�cant �direct links� between expressions of two genes)

Graph mining (examples)

1 Network visualization: nodes are not a priori associated to a givenposition. How to represent the network in a meaningful way?

2 Network clustering: identify �communities� (groups of nodes that are

densely connected and share a few links (comparatively) with the other groups)

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 5 / 39

Page 11: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Standard issues associated with networks

Inference

Giving data, how to build a graph whose edges represent the direct linksbetween variables?

Graph mining (examples)

1 Network visualization: nodes are not a priori associated to a givenposition. How to represent the network in a meaningful way?

Random positionsPositions aiming at representingconnected nodes closer

2 Network clustering: identify �communities� (groups of nodes that are

densely connected and share a few links (comparatively) with the other groups)

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 5 / 39

Page 12: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Standard issues associated with networks

Inference

Giving data, how to build a graph whose edges represent the direct linksbetween variables?

Graph mining (examples)

1 Network visualization: nodes are not a priori associated to a givenposition. How to represent the network in a meaningful way?

2 Network clustering: identify �communities� (groups of nodes that are

densely connected and share a few links (comparatively) with the other groups)

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 5 / 39

Page 13: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

More complex relational modelsNodes may be labeled by a factor

... or by a numerical information. [Villa-Vialaneix et al., 2013]Edges may also be labeled (type of the relation) or weighted (strength ofthe relation) or directed (direction of the relation).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 6 / 39

Page 14: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

More complex relational modelsNodes may be labeled by a factor

... or by a numerical information. [Villa-Vialaneix et al., 2013]

Edges may also be labeled (type of the relation) or weighted (strength ofthe relation) or directed (direction of the relation).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 6 / 39

Page 15: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

More complex relational modelsNodes may be labeled by a factor

... or by a numerical information. [Villa-Vialaneix et al., 2013]Edges may also be labeled (type of the relation) or weighted (strength ofthe relation) or directed (direction of the relation).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 6 / 39

Page 16: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Before we start...

Available material

• this slide (on slideshare or on my website, page �seminars�);

• data: my facebook network with two �les: fbnet-el.txt (edge list)and fbnet-name.txt (node names and list)

• script: a R script with all command lines included in this slide,fb-Rscript.R

Outline

Introduce basic concepts on network mining (not inference)Illustrate them with R (required package: igraph) on my facebooknetwork

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 7 / 39

Page 17: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Before we start...

Available material

• this slide (on slideshare or on my website, page �seminars�);

• data: my facebook network with two �les: fbnet-el.txt (edge list)and fbnet-name.txt (node names and list)

• script: a R script with all command lines included in this slide,fb-Rscript.R

Outline

Introduce basic concepts on network mining (not inference)Illustrate them with R (required package: igraph) on my facebooknetwork

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 7 / 39

Page 18: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Outline

1 Import data

2 Visualization

3 Global characteristics

4 Numerical characteristics calculation

5 Clustering

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 8 / 39

Page 19: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Import a graph from an edge list with igraph

edgelist <- as.matrix(read.table("fbnet -el.txt"))

vnames <- read.table("fbnet -name.txt")

vnames <- read.table("fbnet -name.txt", sep=",",

stringsAsFactor=FALSE ,

na.strings="")

The graph is built with:

# with ` graph . edgelist '

fbnet0 <- graph.edgelist(edgelist , directed=FALSE)

fbnet0

# IGRAPH U --- 152 551 --

See also help(graph.edgelist) for more graph constructors (from anadjacency matrix, from an edge list with a data frame describind the nodes,...)

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 9 / 39

Page 20: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Vertexes, vertex attributes

The graph's vertexes are accessed and counted with:

V(fbnet0)

# Vertex sequence :

# [1] 1 2 3 4 5...

vcount(fbnet0)

# [1] 152

Vertexes can be described by attributes:

# add an attribute for vertices

V(fbnet0)$initials <- vnames [,1]

V(fbnet0)$list <- vnames [,2]

fbnet0

# IGRAPH U --- 152 551 --

# + attr : initials (v/c), list (v/c)

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 10 / 39

Page 21: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Edges, edge attributes

The graph's edges are accessed and counted with:

E(fbnet0)

# [1] 11 -- 1

# [2] 41 -- 1

# [3] 52 -- 1

# [4] 69 -- 1

# [5] 74 -- 1

# [6] 75 -- 1

# ...

ecount(fbnet0)

# 551

igraph can also handle edge attributes (and also graph attributes).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 11 / 39

Page 22: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Settings

Notations

In the following, a graph G = (V ,E ,W ) with:

• V : set of vertexes {x1, . . . , xp};• E : set of edges;

• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

Example 2: Medieval network: 10 542 nodes and the largest connectedcomponent contains 10 025 nodes (�giant component� / composantegéante).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 12 / 39

Page 23: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Settings

Notations

In the following, a graph G = (V ,E ,W ) with:

• V : set of vertexes {x1, . . . , xp};• E : set of edges;

• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

The graph is said to be connected/connexe if any node can be reachedfrom any other node by a path/un chemin.

Example 2: Medieval network: 10 542 nodes and the largest connectedcomponent contains 10 025 nodes (�giant component� / composantegéante).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 12 / 39

Page 24: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Settings

Notations

In the following, a graph G = (V ,E ,W ) with:

• V : set of vertexes {x1, . . . , xp};• E : set of edges;

• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

The graph is said to be connected/connexe if any node can be reachedfrom any other node by a path/un chemin.The connected components/composantes connexes of a graph are all itsconnected subgraphs.

Example 2: Medieval network: 10 542 nodes andthe largest connected component contains 10 025 nodes (�giantcomponent� / composante géante).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 12 / 39

Page 25: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Settings

Notations

In the following, a graph G = (V ,E ,W ) with:

• V : set of vertexes {x1, . . . , xp};• E : set of edges;

• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

Example 1: Natty's FB network has 21 connected components with 122vertexes (professional contacts, family and closest friends) or from 1 to 5vertexes (isolated nodes)

Example 2: Medieval network: 10 542 nodes and the largest connectedcomponent contains 10 025 nodes (�giant component� / composantegéante).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 12 / 39

Page 26: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Settings

Notations

In the following, a graph G = (V ,E ,W ) with:

• V : set of vertexes {x1, . . . , xp};• E : set of edges;

• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.

Example 2: Medieval network: 10 542 nodes and the largest connectedcomponent contains 10 025 nodes (�giant component� / composantegéante).

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 12 / 39

Page 27: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Connected components

is.connected(fbnet0)

# [1] FALSE

As this network is not connected, the connected components can beextracted:

fb.components <- clusters(fbnet0)

names(fb.components)

# [1] " membership " " csize " " no "

head(fb.components$membership , 10)

# [1] 1 1 2 2 1 1 1 1 3 1

fb.components$csize

# [1] 122 5 1 1 2 1 1 1 2 1

# [11] 1 2 1 1 2 3 1 1 1 1

# [21] 1

fb.components$no

# [1] 21

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 13 / 39

Page 28: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Import data

Largest connected component

fbnet.lcc <- induced.subgraph(fbnet0 ,

fb.components$membership ==

which.max(fb.components$csize))

# main characteristics of the LCC

fbnet.lcc

# IGRAPH U --- 122 535 --

# + attr : initials (v/x)

is.connected(fbnet.lcc)

# [1] TRUE

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 14 / 39

Page 29: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Outline

1 Import data

2 Visualization

3 Global characteristics

4 Numerical characteristics calculation

5 Clustering

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 15 / 39

Page 30: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization tools help understand the graphmacro-structurePurpose: How to display the nodes in a meaningful and aesthetic way?

Standard approach: force directed placement algorithms (FDP)algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])

• attractive forces: similar to springs along the edges

• repulsive forces: similar to electric forces between all pairs of vertexes

iterative algorithm until stabilization of the vertex positions.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 16 / 39

Page 31: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization tools help understand the graphmacro-structurePurpose: How to display the nodes in a meaningful and aesthetic way?Standard approach: force directed placement algorithms (FDP)algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])

• attractive forces: similar to springs along the edges

• repulsive forces: similar to electric forces between all pairs of vertexes

iterative algorithm until stabilization of the vertex positions.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 16 / 39

Page 32: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization tools help understand the graphmacro-structurePurpose: How to display the nodes in a meaningful and aesthetic way?Standard approach: force directed placement algorithms (FDP)algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])

• attractive forces: similar to springs along the edges

• repulsive forces: similar to electric forces between all pairs of vertexes

iterative algorithm until stabilization of the vertex positions.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 16 / 39

Page 33: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization tools help understand the graphmacro-structurePurpose: How to display the nodes in a meaningful and aesthetic way?Standard approach: force directed placement algorithms (FDP)algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])

• attractive forces: similar to springs along the edges

• repulsive forces: similar to electric forces between all pairs of vertexes

iterative algorithm until stabilization of the vertex positions.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 16 / 39

Page 34: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization tools help understand the graphmacro-structurePurpose: How to display the nodes in a meaningful and aesthetic way?Standard approach: force directed placement algorithms (FDP)algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])

• attractive forces: similar to springs along the edges

• repulsive forces: similar to electric forces between all pairs of vertexes

iterative algorithm until stabilization of the vertex positions.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 16 / 39

Page 35: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization• package igraph1 [Csardi and Nepusz, 2006] (staticrepresentation with useful tools for graph mining)

• free interactive software: Gephi, Tulip,

Cytoscape, ... 2

1http://igraph.sourceforge.net/

2http://gephi.org, http://tulip.labri.fr/TulipDrupal/, http://www.cytoscape.org/

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 17 / 39

Page 36: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Visualization• package igraph1 [Csardi and Nepusz, 2006] (staticrepresentation with useful tools for graph mining)

• free interactive software: Gephi, Tulip,

Cytoscape, ... 2

1http://igraph.sourceforge.net/

2http://gephi.org, http://tulip.labri.fr/TulipDrupal/, http://www.cytoscape.org/

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 17 / 39

Page 37: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Network visualization

Di�erent layouts are implemented in igraph to visualize the graph:

plot(fbnet.lcc , layout=layout.random ,

main="random layout", vertex.size=3,

vertex.color="pink", vertex.frame.color="pink",

vertex.label.color="darkred",

edge.color="grey",

vertex.label=V(fbnet.lcc)$initials)

Try also layout.circle, layout.kamada.kawai,layout.fruchterman.reingold... Network are generated with somerandomness. See also help(igraph.plotting) for more information onnetwork visualization

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 18 / 39

Page 38: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 19 / 39

Page 39: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

Plot attributes

igraph integrates a pre-de�ned graph attribute layout:

V(fbnet.lcc)$label <- V(fbnet.lcc)$initials

and label and color node attributes

V(fbnet.lcc)$label <- V(fbnet.lcc)$initials

V(fbnet.lcc)$color <- rainbow(length(unique(

V(fbnet.lcc)$list )))[

as.numeric(factor(V(fbnet.lcc)$list ))]

plot(fbnet.lcc , vertex.size=3,

vertex.label.color="black", edge.color="grey")

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 20 / 39

Page 40: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Visualization

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 21 / 39

Page 41: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Outline

1 Import data

2 Visualization

3 Global characteristics

4 Numerical characteristics calculation

5 Clustering

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 22 / 39

Page 42: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?

Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 43: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?

Examples

Example 1: Natty's FB network

• 152 vertexes, 551 edges ⇒ density = 551152×151/2 ' 4.8%;

• largest connected component: 122 vertexes, 535 edges ⇒ density' 7.2%.

Example 2: Medieval network (largest connected component): 10 025vertexes, 17 612 edges ⇒ density ' 0.035%.Projected network (individuals): 3 755 vertexes, 8 315 edges ⇒ density' 0.12%.

Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 44: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Density is equal to 44×3/2 = 2/3 ; Transitivity is equal to 1/3.

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 45: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 46: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 47: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 48: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Density / Transitivity Densité / TransitivitéDensity: Number of edges divided by the number of pairs of vertexes. Isthe network densely connected?Transitivity: Number of triangles divided by the number of tripletsconnected by at least two edges. What is the probability that two friends ofmine are also friends?

Examples

Example 1: Natty's FB network

• density ' 4.8%, transitivity ' 56.2%;

• largest connected component: density ' 7.2%, transitivity ' 56.0%.

Example 2: Medieval network (projected network, individuals): density' 0.12%, transitivity ' 6.1%.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 23 / 39

Page 49: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Global characteristics

Global characteristics

graph.density(fbnet.lcc)

# [1] 0.0724834

transitivity(fbnet.lcc)

# [1] 0.5604524

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 24 / 39

Page 50: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Outline

1 Import data

2 Visualization

3 Global characteristics

4 Numerical characteristics calculation

5 Clustering

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 25 / 39

Page 51: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

Vertexes with a high degree are called hubs: measure of the vertexpopularity.

Number of nodes (y-axis) with a given degree (x-axis)

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 52: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

Vertexes with a high degree are called hubs: measure of the vertexpopularity.

Two hubs are students who have been hold back at school and theother two are from my most recent class.

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 53: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

Vertexes with a high degree are called hubs: measure of the vertexpopularity.

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 54: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

1 2 5 10 20 50 100 200 500

15

5050

0

Names

Transactions

This distribution indicates preferential attachement attachementpréférentiel.

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 55: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

The orange node's degree is equal to 2, its betweenness to 4.

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 56: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 57: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 58: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 59: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 60: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 61: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).

Vertexes with a high be-tweenness (> 3 000) are2 political �gures.

Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 62: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Extracting important nodes1 vertex degree degré: number of edges adjacent to a given vertex or di =

∑j Wij .

The degree distribution is known to �t a power law loi de puissancein most real networks:

2 vertex betweenness centralité: number of shortest paths between all pairs of

vertexes that pass through the vertex. Betweenness is a centrality measure(vertexes that are likely to disconnect the network if removed).Example 2: In the medieval network: moral persons such as the�Chapter of Cahors� or the �Church of Flaugnac� have a highbetweenness despite a low degree.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 26 / 39

Page 63: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Degree and betweenness

fbnet.degrees <- degree(fbnet.lcc)

summary(fbnet.degrees)

# Min . 1 st Qu . Median Mean 3 rd Qu . Max .

# 1.00 2.00 6.00 8.77 15.00 31.00

fbnet.between <- betweenness(fbnet.lcc)

summary(fbnet.between)

# Min . 1 st Qu . Median Mean 3 rd Qu . Max .

# 0.00 0.00 14.03 301.70 123.10 3439.00

and their distributions:

par(mfrow=c(1,2))

plot(density(fbnet.degrees), lwd=2,

main="Degree distribution", xlab="Degree",

ylab="Density")

plot(density(fbnet.between), lwd=2,

main="Betweenness distribution",

xlab="Betweenness", ylab="Density")AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 27 / 39

Page 64: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Degree and betweenness distribution

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 28 / 39

Page 65: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

Combine visualization and individual characteristics

par(mar=rep(1,4))

# set node attribute `size ' with degree

V(fbnet.lcc)$size <- 2*sqrt(fbnet.degrees)

# set node attribute ` color ' with betweenness

bet.col <- cut(log(fbnet.between +1),10,

labels=FALSE)

V(fbnet.lcc)$color <- heat.colors (10)[11 - bet.col]

plot(fbnet.lcc , main="Degree and betweenness",

vertex.frame.color=heat.colors (10)[ bet.col],

vertex.label=NA, edge.color="grey")

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 29 / 39

Page 66: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Numerical characteristics calculation

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 30 / 39

Page 67: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Outline

1 Import data

2 Visualization

3 Global characteristics

4 Numerical characteristics calculation

5 Clustering

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 31 / 39

Page 68: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Vertex clustering classi�cationCluster vertexes into groups that are densely connected and share a fewlinks (comparatively) with the other groups. Clusters are often calledcommunities communautés (social sciences) or modules modules(biology).

Several clustering methods:• min cut minimization minimizes the number of edges between clusters;• spectral clustering [von Luxburg, 2007] and kernel clustering uses

eigen-decomposition of the Laplacian/Laplacien

Lij =

{−wij if i 6= jdi otherwise

(matrix strongly related to the graph structure);• Generative (Bayesian) models [Zanghi et al., 2008];• Markov clustering simulate a �ow on the graph;• modularity maximization• ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007,Schae�er, 2007, Brohée and van Helden, 2006])

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 32 / 39

Page 69: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Vertex clustering classi�cationCluster vertexes into groups that are densely connected and share a fewlinks (comparatively) with the other groups. Clusters are often calledcommunities communautés (social sciences) or modules modules(biology).Example 1: Natty's facebook network

Several clustering methods:• min cut minimization minimizes the number of edges between clusters;• spectral clustering [von Luxburg, 2007] and kernel clustering uses

eigen-decomposition of the Laplacian/Laplacien

Lij =

{−wij if i 6= jdi otherwise

(matrix strongly related to the graph structure);• Generative (Bayesian) models [Zanghi et al., 2008];• Markov clustering simulate a �ow on the graph;• modularity maximization• ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007,Schae�er, 2007, Brohée and van Helden, 2006])

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 32 / 39

Page 70: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Vertex clustering classi�cationCluster vertexes into groups that are densely connected and share a fewlinks (comparatively) with the other groups. Clusters are often calledcommunities communautés (social sciences) or modules modules(biology).Example 2: medieval network

Several clustering methods:• min cut minimization minimizes the number of edges between clusters;• spectral clustering [von Luxburg, 2007] and kernel clustering uses

eigen-decomposition of the Laplacian/Laplacien

Lij =

{−wij if i 6= jdi otherwise

(matrix strongly related to the graph structure);• Generative (Bayesian) models [Zanghi et al., 2008];• Markov clustering simulate a �ow on the graph;• modularity maximization• ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007,Schae�er, 2007, Brohée and van Helden, 2006])

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 32 / 39

Page 71: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Vertex clustering classi�cationCluster vertexes into groups that are densely connected and share a fewlinks (comparatively) with the other groups. Clusters are often calledcommunities communautés (social sciences) or modules modules(biology).Several clustering methods:• min cut minimization minimizes the number of edges between clusters;• spectral clustering [von Luxburg, 2007] and kernel clustering uses

eigen-decomposition of the Laplacian/Laplacien

Lij =

{−wij if i 6= jdi otherwise

(matrix strongly related to the graph structure);• Generative (Bayesian) models [Zanghi et al., 2008];• Markov clustering simulate a �ow on the graph;• modularity maximization• ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007,Schae�er, 2007, Brohée and van Helden, 2006])

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 32 / 39

Page 72: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Find clusters by modularity optimization modularité

The modularity [Newman and Girvan, 2004] of the partition(C1, . . . , CK ) is equal to:

Q(C1, . . . , CK ) =1

2m

K∑k=1

∑xi ,xj∈Ck

(Wij − Pij)

with Pij : weight of a �null model� (graph with the same degree distributionbut no preferential attachment):

Pij =didj2m

with di =12

∑j 6=i Wij .

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 33 / 39

Page 73: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

InterpretationA good clustering should maximize the modularity:• Q ↗ when (xi , xj) are in the same cluster and Wij � Pij

• Q ↘ when (xi , xj) are in two di�erent clusters and Wij � Pij

(m = 20)

Pij = 7.5

Wij = 5⇒Wij − Pij = −2.5di = 15 dj = 20

i and j in the same cluster decreases the modularity

• Modularity• helps separate hubs (6= spectral clustering or min cut criterion);• is not an increasing function of the number of clusters: useful to

choose the relevant number of clusters (with a grid search: severalvalues are tested, the clustering with the highest modularity is kept)but modularity has a small resolution default (see[Fortunato and Barthélémy, 2007])

Main issue: Optimization = NP-complete problem (exhaustive search isnot not usable)Di�erent solutions are provided in[Newman and Girvan, 2004, Blondel et al., 2008,Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011] (amongothers) and some of them are implemented in the R package igraph.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 34 / 39

Page 74: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

InterpretationA good clustering should maximize the modularity:• Q ↗ when (xi , xj) are in the same cluster and Wij � Pij

• Q ↘ when (xi , xj) are in two di�erent clusters and Wij � Pij

(m = 20)

Pij = 0.05

Wij = 5⇒Wij − Pij = 4.95di = 1 dj = 2

i and j in the same cluster increases the modularity

• Modularity• helps separate hubs (6= spectral clustering or min cut criterion);• is not an increasing function of the number of clusters: useful to

choose the relevant number of clusters (with a grid search: severalvalues are tested, the clustering with the highest modularity is kept)but modularity has a small resolution default (see[Fortunato and Barthélémy, 2007])

Main issue: Optimization = NP-complete problem (exhaustive search isnot not usable)Di�erent solutions are provided in[Newman and Girvan, 2004, Blondel et al., 2008,Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011] (amongothers) and some of them are implemented in the R package igraph.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 34 / 39

Page 75: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

InterpretationA good clustering should maximize the modularity:• Q ↗ when (xi , xj) are in the same cluster and Wij � Pij

• Q ↘ when (xi , xj) are in two di�erent clusters and Wij � Pij

• Modularity• helps separate hubs (6= spectral clustering or min cut criterion);• is not an increasing function of the number of clusters: useful to

choose the relevant number of clusters (with a grid search: severalvalues are tested, the clustering with the highest modularity is kept)but modularity has a small resolution default (see[Fortunato and Barthélémy, 2007])

Main issue: Optimization = NP-complete problem (exhaustive search isnot not usable)Di�erent solutions are provided in[Newman and Girvan, 2004, Blondel et al., 2008,Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011] (amongothers) and some of them are implemented in the R package igraph.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 34 / 39

Page 76: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

InterpretationA good clustering should maximize the modularity:• Q ↗ when (xi , xj) are in the same cluster and Wij � Pij

• Q ↘ when (xi , xj) are in two di�erent clusters and Wij � Pij

• Modularity• helps separate hubs (6= spectral clustering or min cut criterion);• is not an increasing function of the number of clusters: useful to

choose the relevant number of clusters (with a grid search: severalvalues are tested, the clustering with the highest modularity is kept)but modularity has a small resolution default (see[Fortunato and Barthélémy, 2007])

Main issue: Optimization = NP-complete problem (exhaustive search isnot not usable)Di�erent solutions are provided in[Newman and Girvan, 2004, Blondel et al., 2008,Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011] (amongothers) and some of them are implemented in the R package igraph.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 34 / 39

Page 77: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Node clustering

One of the function to perform node clustering is multilevel.community(probably not the best with large graphs):

fbnet.clusters <- multilevel.community(fb.lcc)

modularity(fbnet.clusters)

# Graph community structure calculated with the multi level algorithm

# Number of communities ( best split ): 7

# Modularity ( best split ): 0.566977

# Membership vector :

# [1] 3 2 2 2 2 3 5 3 5 5 4 3 3 2 ...

table(fbnet.clusters$membership)

# 1 2 3 4 5 6 7

# 2 18 26 32 12 8 24

See help(communities) for more information.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 35 / 39

Page 78: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Combine clustering and visualization

# create a new attribute

V(fbnet.lcc)$community <- fbnet.clusters$membership

fbnet.lcc

# IGRAPH U --- 122 535 --

#+ attr : layout (g/n), initials (v/c), list (v/c),

# label (v/c), color (v/c), community (v/n),

# size (v/n)

Display the clustering:

par(mfrow=c(1,1))

par(mar=rep(1,4))

plot(fbnet.lcc , main="Communities",

vertex.frame.color=

rainbow (9)[ fbnet.clusters$membership],

vertex.color=

rainbow (9)[ fbnet.clusters$membership],

vertex.label=NA, edge.color="grey")AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 36 / 39

Page 79: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Combine clustering and visualization

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 37 / 39

Page 80: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Export the graph

The graphml format can be used to export the graph (it can be read bymost of the graph visualization programs and includes information on nodeand edge attributes)

write.graph(fbnet.lcc , file="fblcc.graphml",

format="graphml")

see help(write.graph) for more information of graph exportation formats

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 38 / 39

Page 81: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

Further references...

on clustering...

• overlapping communities communautés recouvrantes;

• hierarchical clustering [Rossi and Villa-Vialaneix, 2011] provides an approach;

• �organized� clustering (projection on a small dimensional grid) andclustering for visualization [Boulet et al., 2008,

Rossi and Villa-Vialaneix, 2010, Rossi and Villa-Vialaneix, 2011];

• ...

on R

CRAN Task View: gRaphical Models in R

on network inference

you can start with the course �An introduction to network inference andmining�

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 39 / 39

Page 82: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

References

Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008).

Fast unfolding of communites in large networks.Journal of Statistical Mechanics: Theory and Experiment, P10008:1742�5468.

Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008).

Batch kernel SOM and related Laplacian methods for social network analysis.Neurocomputing, 71(7-9):1257�1273.

Brohée, S. and van Helden, J. (2006).

Evaluation of clustering algorithms for protein-protein interaction networks.BMC Bioinformatics, 7(488).

Csardi, G. and Nepusz, T. (2006).

The igraph software package for complex network research.InterJournal, Complex Systems.

Fortunato, S. and Barthélémy, M. (2007).

Resolution limit in community detection.In Proceedings of the National Academy of Sciences, volume 104, pages 36�41.doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract.

Fruchterman, T. and Reingold, B. (1991).

Graph drawing by force-directed placement.Software, Practice and Experience, 21:1129�1164.

Newman, M. and Girvan, M. (2004).

Finding and evaluating community structure in networks.Physical Review, E, 69:026113.

Noack, A. and Rotta, R. (2009).

Multi-level algorithms for modularity clustering.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 39 / 39

Page 83: Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Clustering

In SEA '09: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257�268,Berlin, Heidelberg. Springer-Verlag.

Rossi, F. and Villa-Vialaneix, N. (2010).

Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealingapproach.Neurocomputing, 73(7-9):1142�1163.

Rossi, F. and Villa-Vialaneix, N. (2011).

Représentation d'un grand réseau à partir d'une classi�cation hiérarchique de ses sommets.Journal de la Société Française de Statistique, 152(3):34�65.

Schae�er, S. (2007).

Graph clustering.Computer Science Review, 1(1):27�64.

Villa-Vialaneix, N., Liaubet, L., Laurent, T., Cherel, P., Gamot, A., and SanCristobal, M. (2013).

The structure of a gene co-expression network reveals biological functions underlying eQTLs.PLoS ONE, 8(4):e60045.

von Luxburg, U. (2007).

A tutorial on spectral clustering.Statistics and Computing, 17(4):395�416.

Zanghi, H., Ambroise, C., and Miele, V. (2008).

Fast online graph clustering via erdös-rényi mixture.Pattern Recognition, 41:3592�3599.

AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 39 / 39