Transcript
Page 1: Analysing the evolution of social aspects of open source software ecosystems

Analysing the evolution of social aspects of open

source software ecosystemsTom Mens & Mathieu Goeminne

Service de Génie LogicielDépartement des Sciences InformatiquesFaculté des Sciences - Université de Mons

Belgique

Page 2: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu Goeminne

• Study of software evolution

• in particular, the software process and software quality

• Focus on software ecosystems (see next slide)

• Focus on social aspects (see next slide)

• By analysing and visualising historical data of open source projects

• obtained by mining and combining data from different types of repositories

Research topic

Tuesday 7 June 2011, IWSECO workshop, Brussels

Page 3: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Goal

• Analyse the success factors (e.g. popularity, quality) of OSS projects?

• What are good and bad practices of OSS evolution? What lessons can we learn?

• Study how social aspects co-evolve with, and affect the software product and process

• Exploit this knowledge to improve the current practice (e.g., guidelines, tool support)

Page 4: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Software ecosystem

• There are many views on a software ecosystem. Our focus:

• the ecosystem of a single project, containing all artefacts (code, documentation, etc.) and persons involved in using, producing or modifying these artefacts

• the ecosystem of a coherent collection or distribution of individual projects

• Example: GNOME desktop environment for Linux, containing hundreds of applications/project

• The ecosystem of each individual GNOME project can be studied

• The interaction between all GNOME projects and their contributors can be studied

• Other examples: Linux OS distributions (e.g. Debian, Ubuntu)

Page 5: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Social aspects• Which communities (e.g. users, developers) are involved in a

software project?

• relation between project quality and popularity?

• How do communities communicate / interact?

• e.g. how are developers driven by user requests and how does this influence the project evolution?

• How are communities structured?

• relation between community structure and software quality or maintainability?

• How is work distributed among persons?

• relation between distribution of work / responsibility and maintainability?

• Which processes (formal or informal) are used?

Page 6: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons

Why open source?

• Free access to source code, defect data, developer and user communication

• Historical data available in open repositories

• Observable communities

• Observable activities

• Increasing popularity for personal and commercial use

• A huge range of community and software sizes

Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Page 7: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Methodology• Exploit available data from different repositories

• code repositories (e.g. SVN, Git, ...)

• mail repositories (mailing lists)

• bug repositories (bug trackers)

• Select open source projects

• Use Herdsman framework

• Based on FLOSSMetrics data extraction

• Use identity merging tool

• Use of statistical analysis and visualisation

Page 8: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Selecting Projects

• Criteria for selecting projects

• Availability of data from repositories

• Data processable by FLOSSMetrics tools

• CVSAnaly2, MLStats, Bicho

• Size of considered projects: persons involved, code size, activity in each repository

• Age of considered projects

Page 9: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons

Herdsman Framework

Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Page 10: Analysing the evolution of social aspects of open source software ecosystems

Identity Merging

Page 11: Analysing the evolution of social aspects of open source software ecosystems

Comparing Merge Algorithms• Based on a reference merge model• manually created• iterative approach• relying on information contained in different files (COMMITTERS, MAINTAINERS, AUTHORS, NEWS, README)

• Compute, for each algorithm, precision and recall w.r.t. reference model

Page 12: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Comparing Merge Algorithms

• Simple

• Bird (code and mail repositories only)

• based on Levenshtein distance

• Bird, extended for bug repositories

• Improved

• Combining ideas from Bird and Robles

Page 13: Analysing the evolution of social aspects of open source software ecosystems

Comparing Merge AlgorithmsBrasero Evince

Page 14: Analysing the evolution of social aspects of open source software ecosystems

Comparing Merge Algorithms(varying parameter values) - Evince

ImprovedSimple

Bird

Bird extended

Page 15: Analysing the evolution of social aspects of open source software ecosystems

AnalysingActivity Distribution

Page 16: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Analysing Activity distribution

Three different longitudinal studies

1. Distribution of work, for a single project, using coarse-grained data from different repositories

How are developer activities (commits, mails, bug fixes) distributed across repositories? How is activity distributed across developers? How does this evolve over time?

2. Distribution of work, for a single project, using fine-grained data from a single repository

Based on commits of different file types in version repository

3. Distribution of work across different projects belonging to the same community (e.g. GNOME)

Page 17: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionFirst study

• Analyse, for individual Gnome projects, historical data from version repositories, bug trackers and mailing lists

• Focus on 3 types of activities per person: commits, mails, bug report changes

• How are activities distributed: over different persons, over time?

Page 18: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distribution

Brasero

Evince

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

commits

mails

bug report changes

Page 19: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionFirst study

• Evidence of Pareto principle (20/80 rule)?

• Most activity is carried out by a small group of persons

• Typically : 20% do 80% of the job

• Distribution of code activity more unequal than mail activity

Page 20: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionFirst study

• Analyse activity distribution over time

• Use econometrics

• express inequality in a distribution

• aggregation metrics:Gini, Hoover, Theil (normalised)

• Values between 0 and 1

• 0 = perfect equality; 1 = perfect inequality

Page 21: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distribution First study

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Apr-99 Dec-99 Aug-00 Apr-01 Dec-01 Aug-02 Apr-03 Dec-03 Aug-04 Apr-05 Dec-05 Aug-06 Apr-07 Dec-07 Aug-08

Gini Hoover Theil (normalised)

Evolution of aggregation indices for Evince

Page 22: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionFirst study

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!#*"

!#+"

!#,"

$"

-./0!)" 1230!*" -./0!*" 1230!+" -./0!+" 1230!," -./0!," 1230$!" -./0$!"

.4556/7"

58697"

:;"0".<8=>?7"

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!#*"

!#+"

!#,"

$"

-./0,," -./0!!" -./0!$" -./0!%" -./0!&" -./0!'" -./0!(" -./0!)" -./0!*" -./0!+" -./0!," -./0$!"

1233456"

37486"

9:;"/<.2/5"1=7>;<6"

Brasero

Evince

Evolution of Gini index

Page 23: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionFirst study

• Identifying core groups

• Display Venn diagrams of most active (top 20) persons, according to each activity type (committing, mailing, bug report changing)

• For each person, show percentage of activity attributable to this person

• Take into account identity merges

Page 24: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distribution

Brasero

Evince

Identifying core groups

Page 25: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionFirst study

• Conclusion

• Activity distributions seem to become more and more unequally distributed

• The Pareto principle is clearly present in studied projects

• For Brasero and Evince, the activity is led by a few persons involved in 2 or 3 of the defined activities

Page 26: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distribution First study

• Future work

• Identify the type of statistical distribution

• Use sliding window approach

• detect impact of personnel turnover: ignore inactive persons, and discover new active persons

• Automate detection of core groups

• Study the evolution of core groups over time

Page 27: Analysing the evolution of social aspects of open source software ecosystems

Activity distributionSecond study

Page 28: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionSecond study

• Analysing fine-grained activities, restricted to source code repository

• Analyse contributors to a project based on types of activity they contribute to

type of activity file typecoding .c, .h, .cc, .pl, .java, .ada,.cpp, .chh, .py

documenting .html, .txt, .ps, .tex, .sgml, .pdftranslation .po, .pot, .mo, .charsetmultimedia .mp3, .ogg, .wav, .au, .mid

...

Page 29: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionSecond study

•Example: Rhythmbox• overlap between 5 activity types

A = code, B = build, C = devel-doc, D = translation

• Visualised using Venn diagramvalues representnumber of personsinvolved in a particularactivity (i.e., havingcommitted at leastone file of this type)

AB C

D

53

5 4

71

15

16

0

6

7

84

19

14

51

19

Page 30: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionSecond study

coding versus translation coding versus developer documentation

translation versus developer documentation

Evince - scatter plots of correlation

Page 31: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionSecond study

Evince - correlation matrixCorrelations with p-value > 0.01 not shown;Medium (>0.5) to strong correlations (>0.8) are colouredEvince document

ationimages i18n ui multimedi

acode meta config build devel.doc other

documentation

1 0,14715687

images 1 0,16954996 0,20213002 0,23079965 0,41266264 0,4596056 0,4757661 0,44918361 0,19411888

i18n 1 0,15575507 0,13010625 0,17912559 0,15815786 0,30906678

ui 1 0,16142962 0,56632106 0,31157703 0,54728747 0,55080516 0,51066149 0,16779495

multimedia 1 0,21869572 0,41933948 0,33361174 0,36632992 0,20712209

code 1 0,30763128 0,8242121 0,90169181 0,87365497 0,4087696

meta 1 0,31914846 0,54909454 0,2434247 0,61723848

config 1 0,89979553 0,95124778 0,43475402

build 1 0,90540444 0,58183399

devel.doc 1 0,41281866

other 1

Page 32: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons

i18n12,6%

code46,2%

config0,8%

build16,1%

devel-doc19%

images1,2%

documentation2,2%

meta1,6%

Tom Mens & Mathieu GoeminneIWSECO 2011, Brussels

Activity distributionSecond study

• Correlation graph for Evince

• Nodes represent relative amount of activity of each type

• Edges represent a statistically significant (p<0.01) high correlation (>0.8) Edges with weak or not statistically significant correlation,

and activity nodes <0,5% are not shown.

Page 33: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons

i18n12,6%

code46,2%

config0,8%

build16,1%

devel-doc19%

images1,2%

documentation2,2%

meta1,6%

Tom Mens & Mathieu GoeminneIWSECO 2011, Brussels

Activity distributionSecond study

• Evince: Interpretation of results

• The activities of building, coding and development documentation are done by the same group of persons

• Translators (i18n) are also involved in development documentation but not in coding

• Other documentation is largely independent from these activities

Page 34: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionSecond study

• Conclusion

• Some activities are done by same group of persons, other activities are largely independent

• Future work

• Study and compare the activity patterns on other projects

• Study how the separation of activities influences the process

• Compare activity distribution of different types of distributions

• E.g. translators have a more equal distribution of work than coders

Page 35: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionThird study

• Study the activity of persons across software projects in the GNOME ecosystem

• Case: Large long-lived GNOME projects using Git

ID Project name Authors Years Files

A Banshee 268 5,9 2700

B Rhythmbox 364 8,9 937

C Tomboy 290 6,6 766

D Evince 381 12,1 699

E Brasero 193 4,2 797

Page 36: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionThird study

• How are activities distributed in different GNOME projects?

• counted in terms of number of files modified

• code files are most frequently committed, followed bybuild, development documentation and translation files

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)!"#

*!"#

+!"#

,!"#

$!!"#

-./0122# 314516789# :86784# ;<=/>2# -?.02?8# @185A2BB# C12202#

#6DBE62F=.##

#8512?#

#D=##

#F8>D62/5.E8/##

#625.##

#=6.G20##

#>8/HG##

#=$+/##

#F2<2BIF8>##

#7D=BF##

#>8F2##!"

#!!!!"

$!!!!"

%!!!!"

&!!!!"

'!!!!"

()*+,--" .,/0,1234" 53123/" 678*9-" (:)+-:3" ;,30<-==" >,--+-"

"1?=@1-A8)""

"30,-:"

"?8""

"A39?1-*0)@3*""

"1-0)""

"81)B-+""

"93*CB""

"8#D*""

"A-7-=EA39""

"2?8=A""

"93A-""

Page 37: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionThird study

• How are activities distributed in different GNOME projects?

• counted in terms of number of persons involved in modifying files

• most persons are involved in translations,followed by development documentation,followed by code and build

!!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)!"#

*!"#

+!"#

,!"#

#-./0##

#1234/##

#/05046/.-##

#3$+7##

#-.789##

#3:;90<##

#:0=;##

#/.-2:07=;>.7##

#23##

#.=?0@#

#:24>:0/3;##

A;7<?00#

B?C=?:1.D#

E.:1.C#

Page 38: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionThird study

• How many persons are contributing to more than 1 GNOME project?

A

B

C

DE

127

162

77

14826

8

15

61

922

7

24

4

12

4

4

4

61

1

187

11

18

11

1

5

4

21

70

Page 39: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionThird study

• Which types of activities are carried out by persons involved in multiple projects?Mainly translators. Coders tend to stick to one project

A

B

C

DE

96

106

46

1076

1

10

30

09

5

2

0

2

0

0

0

21

1

00

5

0

0

0

0

0

1

0

A

B

C

DE

37

66

32

4923

5

1

21

99

3

24

5

6

3

2

4

61

0

176

7

17

10

1

5

4

21

69

coders translators

Page 40: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Activity distributionThird study

• Work in progress

• Study activities of developers across different projects

• How, when and why do developers contribute to different projects?

• Simultaneously

• Over time

Page 41: Analysing the evolution of social aspects of open source software ecosystems

Université de Mons Tom Mens & Mathieu GoeminneTuesday 7 June 2011, IWSECO workshop, Brussels

Thank you


Recommended