Upload
juan-david-cruz-gomez
View
357
Download
0
Embed Size (px)
Citation preview
Point of View BasedClustering ofSocio-Semantic NetworksJuan David CRUZ1
Cécile BOTHOREL1
François POULET2JT – FGG2010
Outline
1 IntroductionSocial Networks and Points of ViewSome Previous Work
2 The Point of View of Social Networks
3 Influencing the Community Detection with the Point of ViewPhase 1Phase 2
4 Preliminary Experiments and Results
5 Conclusion
page 2 Cruz,Bothorel,Poulet Clustering by Point of View
MotivationIntroduction I Social Networks and Points of View
� Socio–semantic networks are contains both:• The social graph• Semantic information represented by the features of thevertices and the edges.
� Given this information, how to identify communities derivedfrom the conjoint use of it?
� It is necessary to measure the quality of the partitions foundusing this information from two perspectives:
• The quality of the graph clustering• The quality of the semantic information within thecommunities
page 3 Cruz,Bothorel,Poulet Clustering by Point of View
Social Networks and Points of ViewIntroduction I Social Networks and Points of View
It is possible to obtain different partitions from different pointsof view
page 4 Cruz,Bothorel,Poulet Clustering by Point of View
Quality MeasuresIntroduction I Some Previous Work
Type Objective Examples
Similarity
Reduce the distance betweenthe members of the samegroup while the distance
between groups is increased.
Manhattan L1Euclidean L2Chebyshev L∞
Quality
Increase the number of edgeswithin each community whilethe number of edges betweencommunities is reduced. In
general: index (C) = f (C)+g(C)N(G)
[Gae05]
CoverageConductancePerformanceModularity
page 5 Cruz,Bothorel,Poulet Clustering by Point of View
Graph Clustering AlgorithmsIntroduction I Some Previous Work
Several graph clustering algorithms have been developed, amongothers:� Newman [New01] (Modularity optimization)� Fast unfolding [BGLL08] (Modularity optimization)� Maximal cliques enumeration and kernel generation [DWP+07](Modularity optimization)
� Genetic algorithm for detecting communities in large graphs[LM09] (Fitness function based on modularity)
� Genetic algorithm for detecting overlapped communities[Piz09] (Fitness function based on internal edges vs. outgoingedges)
page 6 Cruz,Bothorel,Poulet Clustering by Point of View
The Representation of a Point of ViewThe Point of View of Social Networks
� The semantic information can be described by a set of features� The point of view is PoVi = si ×V
Point of ViewNodes Feature 1 Feature 2 . . . Feature fNode 1 1 0 . . . 0Node 2 0 1 . . . 1
......
......
Node n 1 0 . . . 1
� The assignation of features to each node in the network
page 7 Cruz,Bothorel,Poulet Clustering by Point of View
General ArchitectureInfluencing the Community Detection with the Point of View
� Guide the community detection algorithm according tosemantic information.
� Use of clustering techniques from different domains.
page 8 Cruz,Bothorel,Poulet Clustering by Point of View
Semantic ClusteringInfluencing the Community Detection with the Point of View I Phase 1
page 9 Cruz,Bothorel,Poulet Clustering by Point of View
Semantic ClusteringInfluencing the Community Detection with the Point of View I Phase 1
� Clustering of the defined point of view: search nodes withsimilar instances of features.
� Use of Self–Organizing Maps (SOM): non–supervised machinelearning method [Koh97].
� The proximity between the input vector (instance) and theweight vector of the network is measured with the Euclideandistance.
� The SOM algorithm will find some number of groups.
page 10 Cruz,Bothorel,Poulet Clustering by Point of View
Weights Assignation and CommunityDetection
Influencing the Community Detection with the Point of View I Phase 2
page 11 Cruz,Bothorel,Poulet Clustering by Point of View
Weights Assignation and CommunityDetection
Influencing the Community Detection with the Point of View I Phase 2
� Given the trained SOM network N and a graph G (V ,E ):� For each e (i , j) ∈ E , the weight will be changed according to:
wij = 1+αd (Nij)δij
where α ≥ 1 is constant parameter, d (Nij), is the distancebetween the node i and the node j in the SOM network andδij = 1 if i , j belong to the same group in the SOM network.
� After the weights are set, a classic graph clustering algorithm(the fast unfolding algorithm [BGLL08]) is used.
page 12 Cruz,Bothorel,Poulet Clustering by Point of View
Experiments Configuration IPreliminary Experiments and Results
� In each experiment three algorithm were compared:• SOM• Fast unfolding• Our method
� Performed in two levels:• The final modularity: to measure the quality of the partition.• The average intra–cluster Euclidean distance: to measure thequality of the semantic clustering.
� The experiment were executed using a graph of 5389 nodesand 27347 edges extracted from a Twitter data set. The initialmodularity of this graph is −2.5192×10−3
page 13 Cruz,Bothorel,Poulet Clustering by Point of View
Experiments Configuration IIPreliminary Experiments and Results
� The experiments were performed using two different points ofview.
� Point of View 1:• Composed of 33 features. Each feature represents a time zonefrom the Twitter data set.
• A feature will be set to 1 if the node has at least one friend inthe time zone represented by the feature.
• Distances vary from 0 to√32
� Point of View 2:• Composed of 4 features representing the messaging profile ofeach user.
• The first feature is set to 1 if the user has more friends thanfollowers.
page 14 Cruz,Bothorel,Poulet Clustering by Point of View
Experiments Configuration IIIPreliminary Experiments and Results
• The next three features indicate the user behavior according tothe number of messages sent: below the mean, between themean plus three standard deviations and, over mean plus threestandard deviations.
• Distances vary from 0 to√3
page 15 Cruz,Bothorel,Poulet Clustering by Point of View
Case Twitter – Point of View 1Preliminary Experiments and Results
Experiment Final Q Avg. Intracluster DistanceSOM Graph −7.5×10−3 0.3697
Graph based clustering 0.5728 1.8091PoV based Clustering 0.5747 1.1947
� The average intracluster distance found by our proposedmethod is less than the average intracluster distance found bythe graph based algorithm.
� The modularity obtained is very similar: the point of view usesinformation associated with the localization of people’s friends.
� The modularity of the graph from the SOM clustering is notvery different from the modularity of the original graph.
� SOM groups are close to the structure of the non–clusteredgraph.
page 16 Cruz,Bothorel,Poulet Clustering by Point of View
Case Twitter – Point of View 2Preliminary Experiments and Results
Experiment Final Q Avg. Intracluster DistanceSOM Graph -0.2991 0
Classic clustering 0.5728 0.7100PoV based Clustering 0.6351 0.5507
� The SOM clustered the nodes into six groups, each oneexpressing one of the possible instances described above. Thisexplains the average distance found.
� Creating a graph from the SOM clustering will produce bettersemantic clusters, however, the modularity is worst than theone from the original graph.
� The SOM groups are totally unrelated with the structure ofthe graph.
� Regarding the modularity and the average intracluster distance,the performance of the PoV based algorithm was better.
page 17 Cruz,Bothorel,Poulet Clustering by Point of View
Conclusion IConclusion
� The classic community detection algorithms do not take intoaccount the semantic information to influence the clusteringprocess.
� Changing the weights according to the results of the semanticclustering, the semantic information is included into thecommunity detection process.
� The two types of informations are merged to find andvisualize a social network from a selected point of view.
page 18 Cruz,Bothorel,Poulet Clustering by Point of View
Conclusion IIConclusion
� Outlook• Make tests over the obtained partitions: rand index, robustnesstests...
• Study the case of overlapping communities.• Include the features of the edges into the point of viewgeneration.
• Development of a visualization algorithm for representing thePoV and the transition between two points of view.
page 19 Cruz,Bothorel,Poulet Clustering by Point of View
nav
Thanks for your attentionAppendix I Appendix
Questions?
page 20 Cruz,Bothorel,Poulet Clustering by Point of View
For Further Reading IAppendix I Appendix I For Further Reading
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte,and Etienne Lefebvre.Fast unfolding of communities in large networks.Journal of Statistical Mechanics: Theory and Experiment,2008(10):P10008 (12pp), 2008.
Nan Du, Bin Wu, Xin Pei, Bai Wang, and Liutong Xu.Community detection in large-scale social networks.In WebKDD/SNA-KDD ’07: Proceedings of the 9th WebKDDand 1st SNA-KDD 2007 workshop on Web mining and socialnetwork analysis, pages 16–25, New York, NY, USA, 2007.ACM.
page 21 Cruz,Bothorel,Poulet Clustering by Point of View
For Further Reading IIAppendix I Appendix I For Further Reading
Marco Gaetler.Network Analysis: Methodological Foundations, chapterClustering, pages 178 – 215.Springer Berlin / Heidelberg, 2005.
Teuvo Kohonen.Self-Organizing Maps.Springer, 1997.
page 22 Cruz,Bothorel,Poulet Clustering by Point of View
For Further Reading IIIAppendix I Appendix I For Further Reading
Marek Lipczak and Evangelos Milios.Agglomerative genetic algorithm for clustering in socialnetworks.In GECCO ’09: Proceedings of the 11th Annual conference onGenetic and evolutionary computation, pages 1243–1250, NewYork, NY, USA, 2009. ACM.
M. E. Newman.Scientific collaboration networks. ii. shortest paths, weightednetworks, and centrality.Phys Rev E Stat Nonlin Soft Matter Phys, 64, July 2001.
page 23 Cruz,Bothorel,Poulet Clustering by Point of View
For Further Reading IVAppendix I Appendix I For Further Reading
Clara Pizzuti.Overlapped community detection in complex networks.In GECCO ’09: Proceedings of the 11th Annual conference onGenetic and evolutionary computation, pages 859–866, NewYork, NY, USA, 2009. ACM.
page 24 Cruz,Bothorel,Poulet Clustering by Point of View