présentée par Sidney Rosario - Inriapeople.rennes.inria.fr/Albert.Benveniste/pub/SidneyPhD_0911.pdf · Sidney Rosario préparée à l’unité de recherche IRISA Équipe d’accueil

No d’ordre : 4015 ANNÉE 2009

THÈSE / UNIVERSITÉ DE RENNES 1sous le sceau de l’Université Européenne de Bretagne

pour le grade de

DOCTEUR DE L’UNIVERSITÉ DE RENNES 1

Mention : Informatique

Ecole doctorale Matisse

présentée par

Sidney Rosario

préparée à l’unité de recherche IRISAÉquipe d’accueil : DISTRIBCOM

IFSIC

Qualité de Services dans

les compositions des

services Web.

Quality of Service issues

in compositions of

Web Services.

Thèse soutenue à Rennesle 30 Novembre 2009

devant le jury composé de :

Jean-Pierre BANATREProfesseur, Université de Rennes 1 / Président

Jean-Bernard STEFANIDirecteur de Recherche, INRIA/ Rapporteur

Bruno GAUJALDirecteur de Recherche, INRIA/ Rapporteur

William COOKProfesseur, Université de Texas à Austin /Examinateur

Albert BENVENISTEDirecteur de Recherche, INRIA /Directeur de thèse

Claude JARDProfesseur, ENS Cachan, Bretagne /Co-directeur de thèse

Remerciements

Je remercie sincerement mes superviseurs, Albert et Claude, pour leur support et en-cadrement. Ils ont toujours trouvé le temps pour moi, et des solutions à mes problèmes.Ce travail aurait été impossible sans leur soutien.

Je remercie Jean-Pierre Banâtre d’avoir présider ce jury, Jean-Bernard Stefani et BrunoGaujal, d’avoir bien voulu accepter la charge de rapporteur et d’avoir bien lu et jugé cetravail. Merci à William Cook, d’avoir assisté a mon soutenance à distance, à six heure dumatin !

Merci ma famille, pour votre soutien et amour constante. Je vous aime.

Une grande merci à tous les membres de ’la colloc’, anciens, nouveaux, et leurs amis,qui sont vite devenu mes amis, les membres de l’équipe DISTRIBCOM, et mes collegues àl’irisa. Vous avez tous fait de mon séjour en France un beau chapitre de ma vie.

Contents

Table of Contents 1

1 Introduction en Français 5

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Les Services Web et leurs orchestrations . . . . . . . . . . . . . . . . . . . . 61.3 Modèles pour les orchestrations de services . . . . . . . . . . . . . . . . . . . 81.4 QoS dans les orchestrations de services Web . . . . . . . . . . . . . . . . . . 13

1.4.1 QoS des services Web . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4.2 Négociation de SLA . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4.3 Composition de QoS . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4.4 Monotonicité dans les Orchestrations: . . . . . . . . . . . . . . . . . 191.4.5 QoS monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Organisation de thèse, les contributions . . . . . . . . . . . . . . . . . . . . 21

2 Introduction 27

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Web services and their compositions . . . . . . . . . . . . . . . . . . . . . . 282.3 Models for service orchestrations . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.1 Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Process Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3.3 Orc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.3.4 BPEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.3.5 Other Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.3.6 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.4 QoS issues in Web service orchestrations . . . . . . . . . . . . . . . . . . . . 432.4.1 QoS of Web services . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.4.2 SLA Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.4.3 QoS composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.4 QoS-based Orchestration synthesis . . . . . . . . . . . . . . . . . . . 492.4.5 Monotonicity in Orchestrations: . . . . . . . . . . . . . . . . . . . . . 522.4.6 QoS monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.5 Thesis organisation, contributions . . . . . . . . . . . . . . . . . . . . . . . . 56

3 A Net system semantics for Orc 63

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.2 The orchestration model: Orc . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2.1 Orc syntax and intuitive semantics . . . . . . . . . . . . . . . . . . . 643.2.2 The CarOnLine illustrative example . . . . . . . . . . . . . . . . . . 65

2 Contents

3.3 Translating Orc into colored Petri nets: principles . . . . . . . . . . . . . . . 673.3.1 Reflecting the Orc programming model . . . . . . . . . . . . . . . . . 673.3.2 The Coloring mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 683.3.3 The marking equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.4 The detailed translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.4.1 Site Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.4.2 Sequential composition . . . . . . . . . . . . . . . . . . . . . . . . . . 733.4.3 Symmetric parallel composition . . . . . . . . . . . . . . . . . . . . . 733.4.4 Asymmetric parallel composition (where expression) . . . . . . . . . 743.4.5 Expression Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 763.4.6 The Main Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.5 Translating the CarOnLine example . . . . . . . . . . . . . . . . . . . . . . . 783.6 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4 Event Structure Semantics of Orc 834.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.2 Asymmetric Event Structures and Heaps . . . . . . . . . . . . . . . . . . . . 85

4.2.1 Asymmetric Event Structures with Labels . . . . . . . . . . . . . . . 854.2.2 Heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.2.3 From Heaps to LAES . . . . . . . . . . . . . . . . . . . . . . . . . . 884.2.4 Generic Operations on Heaps . . . . . . . . . . . . . . . . . . . . . . 89

4.3 Orc Syntax and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.4 Denotations for Orc Expressions . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4.1 Heaps of Base Expressions . . . . . . . . . . . . . . . . . . . . . . . . 924.4.2 Heaps for the Combinators . . . . . . . . . . . . . . . . . . . . . . . 924.4.3 Recursive Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.5 Correctness of Orc heap semantics . . . . . . . . . . . . . . . . . . . . . . . 934.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5 Branching Cells for Asymmetric Event Structures 975.1 Event Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.1.1 Pre, Asymmetric and Prime Event Structures . . . . . . . . . . . . . 985.1.2 Minimal Asymmetric Conflict, Stopping Prefix . . . . . . . . . . . . 101

5.2 Recursive Stopping, Branching Cells . . . . . . . . . . . . . . . . . . . . . . 1035.2.1 Branching Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.2.2 Max-initial decomposition . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3 Stochastic AES and occurrence probabilites . . . . . . . . . . . . . . . . . . 1065.3.1 Stochastic AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.3.2 Occurrence of an event . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.3.3 Probability of occurrence . . . . . . . . . . . . . . . . . . . . . . . . 108

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6 Probabilistic QoS and soft contracts for transaction based Web servicesorchestrations 1116.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 QoS issues in web services and their compositions . . . . . . . . . . . . . . . 114

6.2.1 Example of an orchestration . . . . . . . . . . . . . . . . . . . . . . . 1146.2.2 QoS Issues for web service Orchestrations . . . . . . . . . . . . . . . 116

6.2.2.1 Flow may be data dependent . . . . . . . . . . . . . . . . . 116

Contents 3

6.2.2.2 Flow may be time dependent . . . . . . . . . . . . . . . . . 1166.2.2.3 Orchestrations may not be “monotonic” . . . . . . . . . . . 1166.2.2.4 Orchestrations face the Open World paradigm . . . . . . . 117

6.2.3 Conclusions drawn from this discussion . . . . . . . . . . . . . . . . . 1176.3 Contract Composition and the TOrQuE tool . . . . . . . . . . . . . . . . . 118

6.3.1 How to establish Probabilistic Contracts and how to compose them . 1186.3.2 The TOrQuE tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.3.3 Discussion on criticality . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.4 Experimental Results for Contract Composition: opportunities for over-booking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.4.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.4.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.5 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.6 Experimental Results: Monitoring . . . . . . . . . . . . . . . . . . . . . . . 130

6.6.1 Contract of the orchestration . . . . . . . . . . . . . . . . . . . . . . 1306.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

7 Monotonicity in Service Orchestrations 135

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1367.2 Examples for Non-monotonic Orchestrations . . . . . . . . . . . . . . . . . . 1377.3 The Orchestration Model: OrchNets . . . . . . . . . . . . . . . . . . . . . . 139

7.3.1 Background on Petri nets and Occurrence nets . . . . . . . . . . . . 1397.3.2 Orchestration model: OrchNets . . . . . . . . . . . . . . . . . . . . . 1407.3.3 The semantics of OrchNets . . . . . . . . . . . . . . . . . . . . . . . 141

7.4 Characterizing monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.4.1 Defining and characterizing monotonicity . . . . . . . . . . . . . . . 1437.4.2 A structural condition for the monotonicity of workflow nets . . . . . 144

7.5 Probabilistic monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457.5.1 Probabilistic setting, first attempt . . . . . . . . . . . . . . . . . . . 1457.5.2 Probabilistic setting: second attempt . . . . . . . . . . . . . . . . . . 146

7.6 Getting Rid of Non-Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . 1487.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

8 A Theory of QoS for Web Service Orchestrations 151

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.2.1 The CarOnLine motivating example . . . . . . . . . . . . . . . . . . . . 1538.2.2 Summary of our approach for QoS management . . . . . . . . . . . . 155

8.3 QoS Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568.3.1 QoS domains and the algebra of QoS computing for guarantee pa-

rameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568.4 The Orchestration Model: OrchNets . . . . . . . . . . . . . . . . . . . . . . 160

8.4.1 QoS domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1608.4.2 Background on Petri nets and Occurrence nets . . . . . . . . . . . . 1628.4.3 OrchNets: formal definition and semantics . . . . . . . . . . . . . . . 163

8.5 Study of Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1648.5.1 Probabilistic monotonicity . . . . . . . . . . . . . . . . . . . . . . . . 166

4 Contents

8.6 Probabilistic contracts and their composition . . . . . . . . . . . . . . . . . 1688.6.1 Probabilistic Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . 1698.6.2 Contract composition . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.7 Probabilistic Contract Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 1718.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1728.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

9 The Torque Tool. 1759.1 Torque Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759.2 The Orc interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1769.3 Partial Order Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779.4 QoS Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1819.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

A Proofs of Chapter 4 183A.1 Proof of Theorem 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183A.2 Characteristic property of the Stop operator . . . . . . . . . . . . . . . . . 183A.3 Proof of Lemma 4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184A.4 Proof of Theorem 4.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185A.5 Proof of Theorem 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

B Proofs of Chapter 5 193B.1 Proof of Lemma 5.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

B.1.1 Proof of Lemma 5.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . 196B.1.2 Proof of Theorem 5.19 . . . . . . . . . . . . . . . . . . . . . . . . . . 196B.1.3 Proof of Lemma 5.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

C Proofs of Chapter 7 199C.1 Proof of Theorem 7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199C.2 Proof of Theorem 7.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199C.3 Proof of Theorem 7.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200C.4 Proof of Theorem 7.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

D Proofs of Chapter 8 203D.1 Study of the contract composition procedure . . . . . . . . . . . . . . . . . . 203D.2 Proof of Theorem 8.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204D.3 Proof of Theorem 8.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

D.3.1 Proof of Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204D.3.2 Proof of Necessity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

D.4 Proof of Theorem 8.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Bibliography 216

List of figures 217

Chapter 1

Introduction en Français

1.1 Motivation

Le world wide web est de plus en plus utilisé comme moyen de fournir des services à desclients autour du monde. Le Web a transformé la manière dont un logiciel a été exécutétraditionnellement. Au lieu d’être installées et exécutées localement sur une machine, denombreuses applications logicielles — ou services Web — peuvent désormais s’exécuter surdes serveurs éloignés et sont accessibles par les utilisateurs sur le Web. Par exemple, avecles services “Google docs”, un utilisateur peut créer et manipuler des documents avec unéditeur de texte qui est appelé en utilisant n’importe quel navigateur du Web.

La disponibilité des services Web engendre la création de nouvelles applications dis-tribuées sur le Web. Les services, offrant des fonctionnalités différentes, peuvents êtreséparés par de grandes distances géographiques et peuvent être composés, donnant ainsidifférentes façons d’établir des nouveaux services. Ces services composés peuvent à leur tourêtre invoqués par d’autres services, formant ainsi des modèles arbitrairement complexes decalculs distribués. Les orchestrations de services web sont des services Web composites, oùune seule unité – l’orchestrateur - contrôle l’ordre des appels de service dans l’exécutionde le service composite. Dans les entreprises, les services Web et leurs les orchestrationssont de plus la voie privilégiée pour intégrer les applications à travers l’entreprise (EAI).Celles-ci sont aussi communément utilisées pour mettre en oeuvre des “Workflows” et deconstruire des applications qui s’étendent à travers différentes entreprises.

Dans les milieux d’affaires, le comportement non-fonctionnel, aussi appelé la Qualité deService (QoS) des services Web est un aspect important du service. La QoS comprend unlarge éventail de questions comme la performance, la disponibilité, la fiabilité et la sécuritéofferte par le service. La QoS d’un service est caractérisée par les réponses à différentesquestions telles que “avec quelle vitesse les services répondent-ils à un appel?”, “pour com-bien de temps le service est-il disponible?”, “Quelle est la fiabilité de ses résultats?”, etc.Pour une entité commerciale, la QoS de son service décide souvent si un client choisit sesservices plutôt qu’un service similaire d’un autre fournisseur. Un service qui ne correspondpas à sa qualité de service exigée peut aussi causer des pertes à son fournisseur. Il estdonc important pour le fournisseur de services d’être capables de modéliser et de prédirela QoS de son service, avant que le service ne soit déployé. Des solutions ad-hoc, comme lesur-dimensionnement des ressources au moment de l’exécution peuvent être très coûteuses

6 Introduction en Français

et ne sont pas garanties de fonctionner.L’importance de la QoS souligne la nécessité d’un cadre global pour la gestion de la QoS

dans les services Web et de leurs orchestrations. Dans cette thèse nous nous intéressons àla gestion de la QoS dans des orchestrations de services Web, et nous visons à fournir unframework de gestion de QoS. Un tel framework doit répondre à une variété de questions:

1. Comment modéliser l’orchestration, spécifiant à la fois son comportement fonctionnelet non fonctionnel? Afin de raisonner sur la QoS de l’orchestration, nous avonsbesoin de modèles appropriés pour décrire l’orchestration. Ces modèles devraient êtreformels, et devraient avoir une sémantique sans ambiguïté. Un tel modèle, convenablepour l’analyse de la qualité de service doit tenir compte des aspects fonctionnelset la QoS de l’orchestration. Il est important que le modèle soit au bon niveaud’abstraction pour analyser les orchestrations.

2. Comment relier la QoS de l’orchestration et la QoS des services auxquels elle faitappel? La QoS de l’orchestration est clairement influencée par la QoS des servicesqu’elle appelle. Par exemple, un service qui prend beaucoup de temps pour répon-dre à un appel ralentit les réponses de l’orchestration. Relier la QoS des services etl’orchestration est un problème non-trivial. Les données et le temps s’entremêlentde façon complexe et ceci peut influencer l’exécution et la QoS de l’orchestration.Il est aussi important d’avoir une bonne estimation de la QoS de l’orchestration:Une estimation de QoS trop prudente peut engendrer un sur-dimensionnement desressources, ce qui coûte cher et pourrait causer des fuites de clients vers les concur-rents. D’autre part, avec une estimation trop optimiste, atteindre le niveau de QoSsouhaité pourrait devenir impossible au moment de l’exécution et pourrait conduireà payer des pénalités.

3. Comment surveiller la qualité de service au moment de l’exécution? La QoS des ser-vices appelés et de l’orchestration doit être surveillée lors de l’exécution, afin d’assurerqu’ils répondent aux niveaux désirés. Si le processus de surveillance détecte un faibleniveau de QoS, l’orchestration doit prendre les mesures appropriées. L’orchestrationpeut imposer des sanctions sur les services de mauvaises performances ou décider dese reconfigurer elle-même, par le remplacement d’un service par un ńmeilleurż service.

Résumé de ce chapitre: Ce chapitre doit être vu comme un tutoriel d’introduction àla gestion de QoS dans les orchestrations. Nous commençons par décrire brièvement lesservices Web et leurs compositions dans la section suivante. Section 1.3 étudie certainsmodèles formels qui sont pertinents pour l’étude des orchestrations, et elle introduit lesmodèles qui sont utilisés dans cette thèse. Section 1.4 introduit les problèmes spécifiquesà la QoS dans les services Web et leurs orchestrations et regarde les différentes approchesutilisées pour aborder cette question. Ce chapitre se termine par un aperçu du reste de lathèse, et donne les principales contributions de cette thèse.

1.2 Les Services Web et leurs orchestrations

Cette section présente brièvement les services Web, leurs compositions et quelques-unesdes les technologies qui les sous-tendent. Le terme ’Web Service’ (Service Web) n’a aucunedéfinition largement acceptée, et dans son sens générique est utilisé pour décrire touteapplication logicielle qui peut être appelée automatiquement, sur un réseau. La plupartdes services Web sont des applications sur Internet, qui sont appelées à l’aide de protocoles

1.2 Les Services Web et leurs orchestrations 7

Web standard comme HTTP. Tout service Web est composée de 1) Une unique adresse ouidentifiant du service (généralement son URI), 2) Un document d’entrée dans un formatstandard (généralement un document XML) qui contient les la demande du client, 3) Unlogiciel qui comprend le document d’entrée et qui traite ce document. Les services Webpeuvent être classés selon leur style d’invocation, les trois plus courants étant:

• Services Web de type RPC: Ces services Web peuvent être vus comme l’application deRemote Procedure Calls (RPC), sur le Web. Les services Web de style RPC, quoiqueassez commun, ne sont pas une méthode recommandée pour l’implementation des ser-vices Web car les appels aux services sont souvent liés au langage d’implementation.Le code du client est donc étroitement couplé avec la mise en oeuvre du serveur.

• Services Web basés sur REST: Ces types de services sont de plus en en plus populaire.Ils suivent l’architecture REST (REpresentational State Transfer), qui a été proposéepar Roy Fielding [Fie00] comme un modèle pour le Web. L’interface de ces servicesest limitée à quatre méthodes HTTP: GET, POST, PUT et DELETE. L’ensembledes opérations du service est mis en oeuvre en définissant différentes ressources surlesquelles ces quatre méthodes peuvent être appelées. Les services Web basés surREST n’ont pas besoin de traiter les documents XML, et des formats plus légerscomme JavaScript Object Notation (JSON) [JSO] peuvent être utilisés.

• Services Web basés sur SOAP : Ces services sont les plus populaires dans l’industrieet des sociétés commerciales qui ont construit des nombreux outils pour soutenir leurdéveloppement. SOAP [SOA] est un protocole basé sur l’échange de messages XMLpour les services Web. Un message SOAP est constitué d’un header qui est utilisépour spécifier diverses informations non fonctionnelles, et d’un body qui porte lemessage XML traité par le service. SOAP est extensible, le “header” peut être utilisépour ajouter des protocoles de gestion différents du protocole de base. Parmi ceux-ci sont les différents WS-* spécifications, comme WS-Security pour avoir l’échangesécurisé de messages; WS-Reliability pour assurer le transfert fiable du message,c’est-à-dire que toutes les parties du message arrivent à l’autre bout dans l’ordre;WS-Policy qui permet de préciser les capacités et les politiques d’exigence.

L’interface fonctionnelle d’un service Web, qui énumère ses appels de procédures etles paramètres requis, est souvent décrite dans un langage XML “Web Service DescriptionLanguage” (WSDL). À partir du fichier WSDL d’un service, on peut générer automatique-ment le code du côté client pour appeler le service. Les services Web peuvent publier leursservices dans un registre commun, en utilisant le “Universal Description, Discovery and In-tegration”(UDDI) langage. Les clients peut interroger un UDDI répositoire pour rechercherdes services Web. Bien que WSDL soit couramment utilisé pour décrire l’interface des ser-vices, l’utilisation d’UDDI n’est pas fréquente.

L’un des aspects les plus intéressants des services Web est qu’ils peuvent être composéspour créer des nouveaux services Web à valeur ajoutée. Le service composé est appelé uncomposition de service Web. Les compositions de services Web sont généralement connuessous les termes d’orchestrations ou chorégraphies.

Dans les orchestrations de services Web, il existe une unité centrale, l’orchestrateur quigère les appels aux différents services. C’est la façon la plus courante pour composer desservices Web. Les orchestrations sont particulièrement populaires dans les entreprises oùils peuvent être utilisés pour intégrer les diverses applications à travers l’entreprise (EAI).Ils peuvent également être utilisés pour construire des applications business-to-business(B2B), par exemple pour mettre en oeuvre des Workflows dans les organisations.


Les chorégraphies, d’autre part n’ont pas une unité centrale qui contrôle l’exécution.Le contrôle est réparti sur les différents services qui se synchronisent de temps en tempspour réaliser un objectif commun. Les chorégraphies peuvent être considérées comme unréseau «peer-to-peer» de services Web. Les chorégraphies, cependant, ne semblent pasêtre populaires sur le web et il n’y a pas d’outil commercial permettant leur développe-ment. WS-CDL [KBRL04] est une proposition du W3C pour décrire les interactions dansles chorégraphies. Nous étudierons seulement les orchestrations de services Web et saufmention explicite, une composition de service Web fera référence à une orchestration deservices Web. De plus amples détails se trouvent dans la version anglaise.

Un exemple typique d’orchestration, Caronline, est montré à la Figure 1.1. Un client ap-pelle Caronline en indiquant un type de véhicule comme paramètre d’entrée. L’orchestrationrecherche les offres de prix pour cette voiture, en appelant deux différents services WebGarageA et GarageB. Les appels à ces garages sont gardés par une horloge Timeout. Si ungarage ne répond pas dans le délai d’attente, sa réponse éventuelle est ignorée. Le Best Offer

est une méthode locale à l’orchestration, qui sélectionne la meilleure offre à partir des deuxréponses, selon certains critères (par exemple, le prix de la voiture). Pour cette meilleureoffre, Caronline cherche ensuite des offres de crédit et d’assurance en parallèles. Si la voitureest une voiture de luxe, alors le service GoldInsure offre une assurance, sinon InsurePlus etInsureAll sont appelés en parallèle, et l’offre avec le taux d’assurance minimale est choisi.Pour les offres de crédit AllCredit et AllCreditPlus, l’offre ayant le taux de crédit minimal estchoisi. Enfin, la collection des meilleurs prix, crédit et assurances est retourné au client.

GarageA GarageB

Best Offer

InsureAll InsurePlusAllCredit AllCreditPlus

minmin

sync

CarOnLine Request

Timeout Timeout

car=deluxe

GoldInsure

yes no

merge

Figure 1.1 – L’orchestration Caronline.

1.3 Modèles pour les orchestrations de services

Dans cette section, nous passerons en revue quelques formalismes qui sont utiles pourmodéliser les orchestrations de services. Pour l’essentiel, les orchestrations de service sontdes systèmes distribués et les modèles formels existants pour les systèmes distribués peuvent

1.3 Modèles pour les orchestrations de services 9

être utilisés pour les modéliser. Les détails techniques se trouvent dans la version anglaise.

Réseaux de Petri. Les réseaux de Petri (Petri Nets) [Rei85, Mur89] ont été introduitespar C.A. Petri dans les années 1960 comme un modèle pour décrire les systèmes distribuéset ont été considérablement développé au cours des années. Les réseaux de Petri ont unereprésentation graphique intuitive avec une sémantique formelle. En conséquence, ils ontété largement utilisés dans la conception et analyse des systèmes distribués.

Les réseaux de Petri et leurs extensions ont été utilisés avec succès à modeliser des work-flows et des orchestrations. Les “Workflow Nets” d’Aaslt [vdA97], utilisé pour modéliser desWorkflows, sont essentiellement des réseaux de Petri fortement lié. Dans [RtHvdAM06],les auteurs identifient différents “patterns” de Workflow et donne la sémantique de ces pat-terns avec les réseaux de Petri. Les réseaux de Petri ont d’ailleurs été utilisé pour donnerune sémantique formelle aux langages d’orchestration comme BPEL, dont la sémantiqueest décrite en langage naturel. Ils ont été aussi utilisés pour vérifier les propriétés de cesorchestrations [HSS05, LMSW06].

Les Réseaux d’occcurrence, Processus de branchement et Dépliages, sont des formesspécifiques de réseaux de Petri qui sont propres à représenter les exécutions concurrentesde tout réseau de Petri. Nous les utiliserons largement dans cette thèse.

Les réseaux de Petri se sont avérés être un formalisme très réussi pour la modélisationdes systèmes distribués, et ils ont été utilisés pour modéliser (et / ou) analyser un largeéventail de systèmes comme les systèmes de fabrication, réseaux informatiques, les systèmesbiologiques, etc. Nous donnons quelques-uns des avantages de l’utilisation des réseaux dePetri.

1. Les réseaux de Petri sont un modèle approprié pour les systèmes distribués: La séman-tique de tir est locale à une transition. De multiples transitions dans les différentesparties du réseau peuvent être tirées en même temps, et l’ordre de leur tirage est nondéterministe. Cela correspond bien à l’idée d’un système distribué où l’«état global»du système n’est pas spécifié explicitement, mais est calculé comme la somme desétats locaux du système.

2. Les réseaux de Petri bénéficient de nombreuses techniques d’analyse: Un réseau dePetri peut être considéré comme un graphe biparti, et il peut aussi être représentécomme un système d’équations linéaires. Par conséquence, de nombreuses techniquesd’analyse de la théorie des graphes et de l’algèbre linéaire ont été utilisées pour vérifierles propriétés du système.

3. Les réseaux de Petri sont intuitifs: Bien que ceux-ci soient un modèle formel, lareprésentation graphique des réseaux de Petri est très intuitive et il est assez facilepour un non-spécialiste de modéliser leur système en réseau de Petri. Les relationsde la causalité et la concurrence apparaissent explicitement dans le graphe.

4. Les réseaux de Petri ont une sémantique d’ordre partiel: Les événements dans uneexécution (configuration) d’un réseau apparaissent comme un ordre partiel qui reflètela causalité entre les événements. Ceci est à l’opposé de la sémantique d’entrelacementdes automates où les événements sont totalement ordonnés. Les exécutions partielle-ment ordonnées sont une manière compacte de représenter toutes les exécutions pos-sibles, et elles ont aussi d’autres avantages. Elles sont utiles dans nos études de laQoS et permettent, en particulier, de composer des paramètres de QoS afin d’endéduire la QoS de bout-en-bout pour l’orchestration.


5. Les réseaux de Petri ont beaucoup d’extensions: Connues sous le nom générique deréseaux de Petri de haut niveau, il existe de nombreuses extensions des réseaux dePetri pour modéliser des systèmes complexes. Par exemple les réseaux de Petri colorésassocient des valeurs aux jetons qui peuvent être modifiées lors du tir d’une transition.Ceci est utile par exemple pour modéliser les données et la manière dont elles sontmanipulées lors d’une exécution. Les réseau de Petri temporisés et stochastiques sontdes extensions des réseaux de Petri qui associent les délais aux transitions et onttrouvé des applications dans la modélisation de latence pour les systèmes critiqueset à l’évaluation de leur performance.

L’utilisation des réseaux de Petri a cependant aussi des inconvénients. Nous en men-tionnons certains maintenant. Certains de ces inconvénients sont spécifiques à leur usagedans la modélisation de compositions de services Web.

1. Il est difficile de modéliser la terminaison: Il est souvent nécessaire de modéliser laterminaison d’une partie d’un système. Par exemple sur la réception d’un messagecancel de l’utilisateur d’un service, un calcul en cours correspondant à cet utilisateurdoit être terminé. La sémantique de tirage des réseaux de Petri, qui est locale à unetransition peut rendre cette tâche difficile. On pourrait avoir besoin de connectertoutes les places du réseau à une transition de terminaison, ce qui demande beaucoupd’arcs et encombre souvent le modèle.

2. Il n’est pas facile de modéliser des nouvelles instances de processus: Dans le contextedes orchestrations, on peut avoir besoin de modéliser le lancement des nouvelles in-stances du même processus, par exemple lorsque des nouvelles requêtes arrivent àl’orchestration. Cela ne peut pas être fait en utilisant des réseaux de Petri simples,et on a souvent besoin des réseaux de Petri de haut niveau. La modélisation des nou-veaux processus peut nécessiter l’utilisation de jetons colorés, ainsi qu’un mécanismepour les gérer, ce qui est parfois non trivial à modéliser.

3. Ils ne sont pas facilement composables: Les réseaux de Petri ne sont pas définis d’unefaçon récursive et structurée. Ceci est acceptable par exemple, pour modéliser desworkflows non structurés. Mais, n’étant pas définis de manière récursive et structurée,les réseaux de Petri sont pas directement composables. Il y a bien des travaux pourrésoudre ce problème, cependant. Plusieurs façons de composer les Réseaux de Petriont été définis, par exemple en fusionnant les transitions ou les places ayant les mêmesétiquettes dans deux réseaux. La Petri Net Algebra [BDK01] essaie d’établir un lienentre une classe simple des réseaux de Petri et une algèbre de processus simple, laPetri Box Calculus (PBC). Les éléments de base de la PBC sont de simples PetriNets, qui peuvent être composés par les différents opérateurs de PBC.

Les Algèbres de processus. Ceci est un terme utilisé pour désigner une catégorie demodèles qui sont très populaires dans la modélisation des systèmes concurrents. Les al-gèbres de processus les plus populaires sont Communicating sequential processes (CSP) [Hoa78]par Tony Hoare, Calculus for Communicating Systems (CCS) par Robin Milner [Mil80] etune extension de la CCS pour décrire des systèmes dynamiques/mobiles, le Pi-calculus [Mil99].

La plupart des algèbres de processus représentent un système concurrent comme unensemble de processus qui communiquent entre eux en s’envoyant des messages par descanaux. Le plus simple processus est un action. Comme son nom l’indique, une algèbre deprocessus possède des opérateurs pour composer les processus de différentes manières pour

1.3 Modèles pour les orchestrations de services 11

obtenir de nouveaux processus. Par exemple dans Pi-calcul un processus P est défini parrécurrence comme suit:

P ::= π.P∣∣ P + Q

∣∣ P | Q∣∣ !P

∣∣ (νx)P∣∣ 0

où Q est un processus. π.P est un processus qui fait l’action π, puis se comporte commeprocessus P . Action π est une action de communication et il peut être la lecture oul’écriture d’une valeur sur un canal. P + Q est un processus qui peut choisir d’agir soitcomme processus P ou comme processus Q de manière non-déterministe. P | Q exécuteen parallèle les processus P et Q, !P a un nombre infini des instances de P en parallèle,(νx)P assure qu’un nouveau instance du canal x est créé dans P . Le processus 0 est unprocessus qui ne effectue n’importe quelle action.

Les algèbres de processus sont des modèles utiles de calculs concurrents, mais ils nesont pas vraiment destinés à être utilisés comme langages de programmation. Cette situ-ation pourrait être comparée au λ-calcul qui est un bon modèle pour le calcul séquentiel,mais n’est pas vraiment censé être un langage de programmation. Il y a des langages deprogrammation concurrents, comme Pict et Concurrent ML dont la conception est inspiréepar les algèbres de processus. La conception de nombreux langages pour décrires des com-position de services Web comme WS-CDL, WSCI et XLANG est souvent dit s’être inspirépar les algèbres de processus comme le Pi-calcul. Il existe des travaux pour donner unesémantique formelle pour ces langues en utilisant les algèbres de processus. Par exempledans [SBS04] les auteurs traduisent les opérations de BPEL en CCS.

Orc. Dans cette section, nous présentons brièvement le langage Orc [KQM09, KQCM09,MC07, QKCM]. Orc est un processus de calcul et un langage de programmation basésur ce calcul. Orc est utile pour la modélisation et la programmation des applicationsdistribuées. Orc est utile pour la spécification et l’exécution d’applications distribuées surle web. Un programme Orc est un expression Orc, qui se compose d’expressions de baseappelé sites. Les expressions Orc sont composés à l’aide de quatre combinateurs d’Orc.Une orchestration orchestres l’exécution d’un programme Orc: il peut faire appel à dessites en parallèle, en séquence, etc, il peut attendre pour leurs réponses, il peut mettre àfin des computations, et ainsi de suite.

Nous allons maintenant examiner les constructions du langage Orc, c.a.d les sites et lesquatre combinateurs d’Orc.

Sites: Les sites sont les expressions les plus élémentaires dans Orc. Un site est une entitéinformatique qui peut être interne ou externe à l’orchestration. Par exemple un site peutêtre une fonction locale pour effectuer des calculs simples comme l’addition, la soustraction,etc, ou il peut être un service distant effectuant les recherches complexes dans ses bases dedonnées. Un service Web, en particulier, peut être modélisé comme un site en Orc.

Les opérateurs Orc. Nous allons maintenant voir comment construire les différentes ex-pressions Orc en utilisant les quatre opérateurs de composition que Orc fournit. Ici f et greprésentent des expressions Orc génériques.

1. L’opérateur parallel (f | g): Le composition parallèle de deux expressions Orc f et gs’écrit f | g. Lors de l’exécution, f | g exécute à la fois f et g en parallèle. Il n’y a pasd’interaction directe entre f et g. Les valeurs publiées par f | g est un entrelacement desvaleurs publiées par f et g, dans l’ordre de leur publication.

2. L’operateur séquentielle (f >x> g): La composition séquentielle f >x> g exécute f enpremier. Si une valeur v est publiée par f , une nouvelle instance de g est lancée en parallèle


dans laquelle la valeur de x est fixée à v. La publication de !v est effectuée. Comme gest lancé en parallèle, f continue son exécution comme avant. Les valeurs publiées parf >x> g sont l’ensemble des valeurs publiées par les différentes instances de g. Notez quesi f ne publie pas de valeurs, alors aucune instance de g est lancée, et donc f >x> g nepublie pas une valeur non plus.

3. L’opérateur d’élagage (f <x< g): Lors de l’exécution, f <x< g lance f et g en parallèle.Lorsque g publie sa première valeur, l’évaluation de g est arrêtée et x prends cette valeurpubliée. Les appels de f qui ont la variable x comme paramètre sont bloqués jusqu’à ceque g retourne sa première valeur. Les valeurs publiées par f <x< g sont l’ensemble desvaleurs publiées par f . Notez qu’il n’est pas forcément nécessaire pour g de publier unevaleur pour que f <x< g publie.

4. L’operateur otherwise (f ; g): Le combinateur f ; g exécute f en premier. Si f publieune valeur, alors g est mis au rebut et f continue de s’exécuter. Toutefois, si f s’achèvesans publier une valeur, alors g est exécuté. L’achèvement d’une expression est définierécursivement comme suit: 1) Un site d’appel s’achève s’il renvoie une valeur ou s’il s’arrête.2) f | g s’achève si les deux f et g s’achèvent. 3) f >x> g s’achève si f s’achève et toutesles instanciations de g s’achèvent. 4) f <x< g s’achève si f s’achève et g a achevé ou publiéune valeur. Si g s’achève sans publier, alors tout les appels en f qui ont x comme paramètreaussi s’achèvent. 5) f ; g s’achève si f termine après la publication, ou si f s’achève sanspublier, puis g s’achève.

Les valeurs publiées par f ; g sont les valeurs publiées par f si elle publie, sinon ce sontles valeurs publiées par g.

Expressions de définition. Orc permet de définir des expressions pour la modularité. Lesdéfinitions d’expressions peuvent en outre être récursives. Une expression de définition ala forme

def E(x) = f

où x est un ensemble de paramètres et f est une expression qui peut utiliser ces paramètres.Lorsque E(x) est appelée, l’expression f est exécutée avec les paramètres x remplacés parles paramètres réels au moment de l’appel. Notez qu’un appel d’expression peut renvoyerplusieurs valeurs, contrairement aux appels de site.

Les appels d’expressions sont non-stricts, et certains des paramètres dans x peuventêtre indéfinis lorsque E est appelée. Toutefois les appels aux sites en E qui utilisent cesparamètres sont bloqués jusqu’à ce qu’ils soient définis.

Note sur l’utilisation d’Orc dans cette thèse: Dans les chapitres de cette thèse nousécrivons f where x :∈ g pour signifier f <x< g. Cette notation a été utilisée dans denombreux documents précédents sur Orc [WKCM08, MC07, KCM06]. L’opérateur oth-erwise a été ajouté récemment à Orc, et il ne figure pas dans nos exemples et dans nossémantiques d’Orc dans les chapitres 3 et 4.

BPEL. Business Process Execution Language (BPEL) [Bpe07] est un langage de mod-élisation et exécution pour les Business Processes. Bien qu’il ne soit pas un modèle mathé-matique, nous le présentons ici, car c’est le langage le plus populaire pour décrire les orches-trations de services Web. Il y a des travaux pour donner une sémantique formelle à BPEL,par exemple en le traduisant dans les réseaux de Petri [HSS05, OVvdA+07, LMSW06], lesalgèbres de processus [Fer04] ou les automates finis [AFFK04, FBS04].

1.4 QoS dans les orchestrations de services Web 13

BPEL est un langage très populaire de préciser les orchestrations, particulièrementdans les entreprises. L’ensemble des constructions de BPEL est assez riche et donc desorchestrations assez complexes peuvent être spécifiées avec BPEL [vdAtHKB03]. Le faitque l’on puisse spécifier des processus abstraits, puis en générer des raffinements détaillésqui peuvent être exécutés, est utile pour le concepteur d’Orchestrations. Il y a beaucoupd’outils disponibles pour la spécification et l’exécution des orchestrations en BPEL commeActive BPEL, Websphere d’IBM, BizTalk de Microsoft, Oracle BPEL Process Manager,ODE Apache et Open ESB de Sun.

Toutefois, comme BPEL est un langage de spécification qui vise à exécuter les or-chestrations sur le Web, il n’est pas basé sur une fondation formelle. Sa spécification estinformelle et des parties de celle-ci peuvent être ambigües. La richesse de ses constructionsen fait un langage complexe. Il y a des redondances entre ses constructeurs et différentesconstructions peuvent être utilisées pour modéliser le même processus. La modélisation enXML est assez verbeuse, ce qui cache la structure des processus. Des outils commerciauxutilisent généralement un langage graphique pour spécifier le programme BPEL en cachantle code XML de l’utilisateur. Cette représentation varie selon les outils cependant, puisqu’iln’y a pas de standard pour la représentation d’un processus BPEL.

Notre contribution. Pour notre travail de thèse nous avons recherché un formalismepour spécificier des orchestrations et pour analyser leur qualité de service. Nous avons choisiOrc parce que c’est un langage mathématique élégant et simple, avec peu de constructionsprimitives, qui peut exprimer une variété de modèles d’orchestration. Comme une exécutionpartiellement ordonnée est utile pour notre analyse de la QoS, et puisque la sémantique detrace d’Orc représente les événements d’exécution comme un ordre total, nous avons donnéune sémantique pour Orc en termes de réseaux de Petri colorés et dynamiques [RBHJ06a].Nous avions développé un simulateur pour ces réseaux, mais nous avons réalisé que lesappels dynamiques avec le codage des jetons de couleur n’a pas une implementation trèsefficace, notamment en cas de récurrence.

En conséquence, nous avons choisi d’encoder Orc directement dans un ensemble d’événementspartiellement ordonnés, en donnant une sémantique dénotationnelle pour Orc en termesde Structures d’événements Asymmetrique (AES) dans [RKB+07b]. Cette sémantique aservi comme une spécification pour la mise en oeuvre de l’exécution partielle ordonnée dansnotre outil TorQue pour l’analyse QoS.

Nous avons ensuite étendu ces reseaux d’occurrence avec des couleurs pour représenterles paramètres de QoS. Nous appelons ces réseaux Orchnets. Les Orchnets sont notre baseformelle pour étudier les questions comme la monotonie dans des orchestrations (voir lasection 1.4.4), et à étudier l’évolution des paramètres de QoS dans une exécution.

1.4 QoS dans les orchestrations de services Web

Dans cette section nous abordons les questions de qualité de service dans la gestion desservices web et de leurs orchestrations. Nous commençons par examiner les définitions etles modèles pour la QoS des services Web dans la section 1.4.1. Section 1.4.2 traite de lanégociation automatisée de SLA. Section 1.4.3 définit le problème de la composition de QoSet examine les différentes techniques de composition de QoS existant dans la littérature.Nous parlons brièvement de la (non) monotonie des orchestrations dans la section 1.4.4.Enfin, section 1.4.5 étudie les techniques de la surveillance de QoS. Nous mentionneronsbrièvement les contributions de cette thèse dans les sections respectives.


1.4.1 QoS des services Web

Définir les paramètres de QoS d’un service Web: La Qualité de service est un termequi peut signifier des choses différentes pour des communautés différentes. Dans le contextedes réseaux, la QoS peut traiter des questions comme les délais et les bandes passantes desliens du réseau, ou le nombre de paquets perdus dans les transmissions. Garantir la QoSdans le réseau revient à avoir des routeurs de paquets qui mettent en oeuvre des protocolesà priorité comme DiffServ et IntServ. Ces méthodes visent à fournir des performances“meilleures que best-effort” pour certaines flux jugés critiques, en donnant une plus grandepriorité dans les files de routage aux les paquets appartenant à ces flux.

Quand on parle de la QoS dans les services Web, on raisonne sur un niveau supérieur,appelé la QoS de niveau application. Il existe un large éventail de propriétés non-fonctionnelles(paramètres de QoS) qui sont pertinentes à ce niveau et certains d’entre eux peuventêtre spécifiques à l’application. Le W3C a eu pour objectif dans [W3c03] d’identifier lesparamètres de QoS applicables aux services Web. Nous donnons ici une liste de certainsde ces paramètres, ainsi que quelques autres qui sont couramment utilisés dans les étudesde QoS des services Web.

• Le paramètre de latence (aussi connu comme le temps de réponse), utilisé pourdésigner le temps pris par un service pour répondre à une requête. Cela peut com-prendre le retard causé par le réseau lors de l’appel au service Web.

• Le débit d’un service est le nombre de demandes que le service est en mesure detraiter dans un intervalle de temps donné.

• La qualité de la réponse ou la qualité des données est une mesure qualitative pourune réponse. La définition exacte de ce paramètre dépendra de l’application. Parexemple, pour un service d’agrégation qui retourne les prix de différents compagniesaériennes pour un itinéraire, la qualité de la réponse pourrait être le nombre des dif-férentes réponses retournées au client. La qualité de la réponse ici pourrait égalementdépendre de la meilleure offre de prix offerte au client.

• La paramètre de disponibilité est une mesure du temps où le service est actif etrépond aux demandes des clients. Il est habituellement estimé par le rapport entrele temps d’exécution d’un service et la durée totale d’une fenêtre dans laquelle il estéchantillonné.

• La paramètre de fiabilité représente la capacité du service à accomplir sa fonctionrequise correctement. Parfois, il est aussi appelé le taux d’exécution avec succès.

• Le paramètre de coût ou prix apparaît fréquemment dans les services Web commer-ciaux. Habituellement, pour chaque invocation du service, le client paie un certainprix.

• La sécurité d’un service Web, comporte différents aspects assurant que les échangesde messages entre le client et le service sont sécurisés.

Un large éventail d’autres paramètres de QoS peuvent être trouvés dans la littérature,nombreux parmi ceux-ci sont des variantes ou combinaisons des paramètres mentionnésci-dessus. Par exemple, la réputation d’un service est parfois utilisée comme un paramètrede qualité de service. La réputation d’un service est une valeur agrégée de commentairesde ses clients. Une telle evaluation par un client client d’un service reflète sa qualité deservice globale et est clairement influencée par multiples paramètres de QoS.


Comment spécifier la QoS d’un service Web? Une spécification claire et non-ambigüede la QoS du service est nécessaire pour permettre l’évaluation de la QoS dans les servicesWeb. On peut avoir différentes façons de faire ça.

1. En étendant des langages de description de service: Comme les langages de descriptionde service comme UDDI et WSDL ne concernent que les aspects fonctionnels du service, denombreuses propositions ont été faites pour renforcer ces spécifications afin de permettrela description de la QoS du service. Par exemple, Performance-enabled WSDL (P-WSDL)dans [DB07] et l’extension d’UDDI (UX) dans [ZCL04]. Dans ces formalismes, la QoS d’unservice est habituellement modélisée comme un tuple de paramètres de QoS, en spécifiantune valeur (ou un intervalle de valeurs) pour chaque paramètre.

2. En utilisant des contrats de QoS: On parle aussi de Service Level Agreements (SLA), lescontrats sont des accords conclus entre le fournisseur et le client d’un service concernant laQoS du service. Les contrats peuvent préciser les obligations du fournisseur et du client.Par exemple, un contrat peut dire que “à condition que le client fasse au plus cinq demandespar seconde, le fournisseur assure que ces demandes sont traitées en moins de 100 millisecondes”. La première partie de cette clause est une obligation que le client doit respecteret la seconde est une obligation du fournisseur. Un contrat peut avoir plusieurs clauses dece genre, qui décrivent ensemble la QoS du service.

Les contrats sont généralement négociés off-line, avant que le service ne soit appelépar le client. Toute méthode de gestion de QoS impliquant des contrats est accompa-gnée par techniques de monitoring, pour veiller à ce que les obligations dans le contratsont respectées. WSLA [KL03] et WS-Agreement [ACD+] sont deux cadres répanduspour la spécification des contrats pour les services Web. Nous étudierons WSLA dans lasection 1.4.5 où l’on considère la surveillance de QoS.

Notre contribution: les contrats probabilistes La plupart des contrats ont desclauses qui sont dures, c.a.d que la valeur des paramètres de QoS est fixée, ou que lesvaleurs maximale et/ou minimale de QoS sont précisées. Deux exemples en sont les clauses“le temps de réponse est toujours inférieur à 5 ms” ou “la taux de disponibilité du serviceest 95%”. Dans le chapitre 6 nous argumentons que les contrats “durs” ne modélisent pasavec réalisme le comportement non-fonctionnel des services, qui sont de nature hautementvariable. Nous avons proposé l’utilisation de contrats probabilistes, où la QoS d’un serviceest modélisée par une distribution de probabilité sur les valeurs des paramètres de QoS.Nous montrons que les contrats probabilistes peuvent aider le fournisseur à éviter desclauses trop pessimistes dans ses contrats, et autorisent un certain niveau “d’overbooking”.A notre connaissance, il existe peu de travaux dans la littérature qui étudient les contratsprobabilistes. Une exception est [HWTS07], où les paramètres de QoS sont des variablesaléatoires indépendantes et discrètes. Les paramètres qu’ils considèrent sont le temps dereponse, fiabilité, fidélité et coût.

Dans notre approche fondée sur les contrats, l’orchestrateur établit les contrats proba-bilistes avec les services qu’il appelle ainsi qu’avec ses propres clients. Les contrats prob-abilistes peuvent être obtenus de différentes façons. Les services appelés spécifient leurcomportement de QoS comme une distribution de probabilité sur leurs paramètres de QoS.La distribution peut également être caractérisée par un ensemble de quantiles. Dans cer-tains cas, des mesures peuvent également être usilisées pour estimer le contrat probabiliste.Par exemple, dans le cas où le service est gratuit (comme de nombreux services Web deGoogle), il n’y a pas de contrat avec le service. Les mesures peuvent également être utilespour établir les contrats probabilistes du réseau sous-jacentes. Dans la plupart des cas,les orchestrations n’ont pas de contrats avec les domaines de réseau que ses messages tra-


versent, et des mesures de type “ping” peuvent être utiles pour estimer l’impact du réseausur la QoS de l’orchestration.

1.4.2 Négociation de SLA

Un aspect important de la gestion de la QoS est le processus de négociation de SLA (oude contrat).

Qu’est-ce que la négociation de SLA? Un SLA spécifie les droits et obligations des dif-férentes parties impliquées dans une accord concernant le service. Généralement, ces obli-gations et droits sont établis à la fin d’un processus de négociation, dans laquelle les dif-férentes parties font des offres ou des demandes à l’égard du service. Les offres et exigencesde chaque partie sont souvent flexibles, et ils peuvent faire des compromis pour arriver àun accord.

Il y peut avoir différente raisons pour négocier un SLA. Il est clair que le fournisseurd’un service et son client, qui ont des offres et exigences flexibles à l’égard de la QoS duservice, négocient pour arriver à un accord sur leurs obligations respectives. Le négociationde SLA peut aussi être faite lorsque les entités ont des ressources limitées qui doiventêtre réparties entre différentes tâches. Par exemple les fournisseurs de différentes servicespeuvent négocier pour réserver une partie de leurs ressources en bande passante pour uneapplication de streaming.

Les SLA sont généralement établies après les procédures de négociation entre les dif-férentes parties. Les techniques de négociation automatisée essaient de simplifier le pro-cessus de négociation en l’automatisant (en tout ou partie). La négociation automatiséede SLA a été étudiée dans le contexte des réseaux [Pou07], et il y a eu des algorithmesproposés pour, par exemple, automatiser la réservation de ressources dans les réseaux pourles applications de type streaming.

Il y a eu des tentatives pour modéliser la procédure de négociation de SLA commedes processus en interaction. L’objectif est de construire des processus qui négocient encommuniquant les uns avec les autres, et de parvenir à un accord à la fin. La plupart deces approches considèrent le problème de négociation comme un problème de satisfactionde contrainte. Les contrats, ou les exigences de qualité de service et les garanties de chacundes processus sont exprimées sous la forme de contraintes. Les variables des contraintessont en général les paramètres de QoS en cours de négociation. La négociation réussit sile problème admet une solution, c’est à dire il y a au moins une affectation des variablesqui répond à toutes les contraintes.

Ces techniques d’automatisation essaient de modéliser des négociations génériques, etne sont pas particulièrement destinées aux négociations dans les orchestrations de servicesWeb. Par exemple, aucune mention d’une orchestration sous-jacent n’est faite. Nous allonsregarder la négociation de SLA dans le contexte des orchestrations dans la section suivante.

1.4.3 Composition de QoS

La composition de QoS est le processus consistant à lier la QoS des services appelés à laQoS globale de l’orchestration. Dans une approches basée sur les contrats, ce processus estappelé la composition de contrats. Pour ce faire, l’orchestration négocie en premier des con-trats avec les services qu’elle appelle au cours de l’exécution. Pour chaque service appelé uncontrat doit être négocié. Ensuite, l’orchestration peut composer ces contrats, pour obtenirune estimation de sa propre qualité de service. Cette estimation aidera l’orchestration denégocier des contrats avec ses propre clients.


Il existe à la fois des techniques d’analyse et de simulation pour la composition deQoS. Avant d’examiner ces techniques, nous faisons quelques observations sur la naturedes orchestrations sur le Web, qui motivent et influencent notre choix d’approche pour lacomposition de QoS.

1. Le paradigme du “monde ouvert”: La QoS de l’orchestration est principalementinfluencée par trois entités : i) Le serveur d’orchestration, ii) Les services Web appelés,iii) Le réseaux sous-jacents. L’orchestration peut avoir des informations sur ses ressourceslocales. Il ne peut toutefois s’attendre à avoir des modèles détaillés pour ces ressources pourchacun des services Web qu’il appelle. La même chose est vraie pour le réseau sous-jacent.Les services Web peuvent être hébergés n’importe où sur le Web entier et ses requêtespourraient parcourir de nombreux domaines différents et inconnus. De plus, les détails surles ressources des entités sont souvent confidentielles et ne sont pas divulguées. Il n’estégalement pas possible pour l’orchestration de connaître la nature du trafic externe desservices appelés et du réseau sous-jacent.

2. Données, temps et l’exécution des orchestrations:

GarageA GarageB

Best Offer


minmin

sync

CarOnLine Request

Figure 1.2 – Orchestration Caronline, sans Timeouts et choix.

Contrairement à la situation des réseaux, les données de la requête et le temps peuventinfluencer l’exécution de l’orchestration. Nous examinons cela à travers deux versionssimplifiées de l’exemple Caronline de la figure 1.1. La première version de la Figure 1.2 estplus simple que la deuxième version de la Figure 1.3.

Dans l’orchestration de la figure 1.2 les appels aux garages sont non-surveillés, c.a.dsans Timeout associé. Il y a deux services d’assurance fixés qui sont appelés pour chaquetype de voiture d’entrée. Il n’y a pas de choix effectué par l’orchestration et chaque appelà l’orchestration invoque le même ensemble de services.

L’orchestration de la figure 1.3 est légèrement plus complexe, car il existe un choixdépendant de la donnée “car=Deluxe”. L’exécution ici n’est pas fixe et dépend des donnéesd’entrée. Si la voiture d’entrée est une voiture de luxe GoldInsure est appelé, sinon InsureAll

et InsurePlus sont appelés.Dans l’orchestration Caronline de la figure 1.1, en plus du choix, il y a des Timeouts

sur les appels des garages. Comme l’occurrence d’un timeout va ignorer la valeur de retour


GarageA GarageB

Best Offer


minmin

sync

CarOnLine Request

car=deluxe

GoldInsure

yes no

merge

Figure 1.3 – Orchestration Caronline sans Timeouts.

du garage, la meilleure valeur de l’offre est fonction du réglage des Timeouts.

Méthodes analytiques pour la composition de la QoS:

Plusieurs approches analytiques existent pour la composition de QoS. Les réseaux de filesd’attente [Kle75] ont été utilisés avec succès pour modéliser et prévoir les comportementsdans les réseaux de télécommunications. Certaines des hypothèses de ces théorie ne sontpas valides pour les orchestrations de services.

Les algèbres de processus Stochastiques [HHK02] et les reseau de Petri Stochastiques(SPN) [MBC+98] sont d’autres formalismes intéressants qui ont été utilisés dans l’évaluationdes performances des logiciels.

Les techniques analytiques pour la composition de la QoS sont intéressantes car ellessont rapides et les solutions sont précises. Nous affirmons cependant que ces techniquesne sont pas adaptées à l’analyse de QoS des orchestrations générales, car ces modèles deQoS sont trop simplistes et sont généralement irréalistes. Par exemple, seule l’orchestrationsimpliste de Figure 1.2 peut être analysée par les SPN, et cela uniquement lorsque les délaissont exponentiels. Quand le temps ou les données influence les choix dans l’orchestration,comme dans l’exemple de figure 1.3 et figure 1.1, ou lorsque la QoS des services ont unmodèle plus complexes, ces techniques ne fonctionneront pas. Nous avons choisi d’utiliserles techniques de simulation, ce qui permet l’utilisation des modèles de QoS réalistes. Lessimulations ne donnent pas des solutions exactes pour leurs modèles d’entrée, mais lessolutions sont calculées par des approximations de type loides grands nombres.

Approches par simulation:

Les techniques de simulations sont affranchies des nombreuses contraintes des techniquesanalytiques. Par exemple, elles peuvent utiliser des distributions générales pour les délais etpeuvent impliquer des choix qui dépendent des données. Il n’y a pas eu beaucoup de travaux


dans la littérature qui utilisent des simulations pour analyser la QoS des orchestrations deservices Web. Une revue détaillée est fournie dans la version anglaise de cette introduction.

Dans cette thèse nous donnons une méthode dpour composer les distributions de prob-abilité des paramètres de QoS, basée sur des simulations de type Monte-Carlo. Pourcela, nous donnons des règles algébriques qui montrent comment les paramètres de QoSévoluent lors d’une exécution. Les paramètres de QoS que nous considérons peuvent êtremulti-dimensionnels et partiellement ordonnés. A partir de contrats négociés avec les ser-vices appelés, nous montrons comment estimer le contrat que l’orchestration peut offrir àses clients. Notre technique de composition est couplée avec un processus de re-négociationet il est itératif: dans le cas où le contrat n’est pas acceptable pour aucune des parties, onchange les clauses des contrats, et ré-exécute la procédure de composition.

Une thématique voisine mais distincte est celle de la synthèse d’orchestrations baséesur la QoS, formulé comme suit: Supposons que la structure de l’orchestration soit con-nue et qu’il existe un modèle (fonctionnel) de l’orchestration. Les services appelés parl’orchestration ne sont cependant pas spécifiés dans ce modèle. Chaque appel de servicepeut être réalisé par un service parmi un ensemble de services candidats, et tous les ser-vices candidats offrent la même fonctionnalité. Les services candidats ont cependant descomportements différents pour la QoS, et le problème est d’instancier l’orchestration avecles services tels que la QoS de l’orchestration soit "optimale". Une revue dsétaillée de lalitérature pour ce problème se trouve dans la version anglaise.

1.4.4 Monotonicité dans les Orchestrations:

A notre connaissance, toutes les études sur la composition de QoS dans les orchestrationsignorent le problème de (non) monotonicité dans les orchestrations. Intuitivement, uneorchestration est non monotone si “améliorer” la QoS de certains de ses services peut“empirer” la QoS de l’orchestration. Dans une approche basée sur les contrats, où lescontrats sont établis entre l’orchestration, les services et les clients, la non-monotonie del’orchestration est strictement indésirable. Une orchestration non-monotone peut violer lecontrat avec son client si un service appelé fonctionne mieux que ce qu’il a promis dansson contrat.

Le phénomène de la non-monotonie en général n’est pas nouveau et a été observé, parexemple dans [CS92]. Les auteurs y donnent des bornes de performances dans les réseauxde Petri stochastiques sans deadlock. Ces bornes sont valables avec une politique de pré-sélection pour résoudre les conflits (dans une politique de pré-sélection, des transitions enconflit sont tirées selon une probabilité pré-déterminée, qui ne dépend pas du délai destransitions). Les auteurs montrent la possibilité de non-monotonie dans ce cadre.

Dans cette thèse nous avons étudié la propriété de non-monotonie dans les orchestra-tions, en limitant notre étude au paramètre de latence (Chapitre 7. Nous définissonsformellement la monotonie des orchestrations en utilisant des réseaux d’occurrence colorés,et nous donnons des conditions nécessaires et suffisantes garantissant la monotonie d’uneorchestration. Nous avons également étudié la monotonie probabiliste, lorsque les QoS desservices sont des distributions de probabilité. Nous avons ensuite étendu cette l’analyse aucas des paramètres de QoS génériques (chapitre 8).

1.4.5 QoS monitoring

Toutes les infrastructures de gestion de la QoS dans les orchestrations devraient être enmesure de surveiller la performance des services appelés par l’orchestration. Ces servicesdoivent être surveillés car toute violation de leur contrat peut provoquer une violation de


contrat par l’orchestration vis-à-vis de ses clients. Si une violation du contrat est détectée,l’orchestration peut décider de se reconfigurer ou d’imposer des sanctions à la partie fautive.

Web Service Level Agreement (WSLA) [KL03] est le cadre le plus populaire poursurveiller la QoS des services Web. Il se compose: 1) du langage WSLA pour spécifierun SLA entre un fournisseur d’un service Web et ses clients. 2) d’une architecture desurveillance pour mesurer la performance du service et détecter toute violation du SLA.WSLA est un langage basé sur XML qui peut être utilisé pour spécifier un SLA. Un doc-ument WSLA se compose grosso modo de trois composantes:

1. La composante “Parties” contient des informations d’identification à propos du four-nisseur, du client et aussi d’autres parties impliqués dans le processus de monitoring.

2. La composante “Service description” spécifie les paramètres ‘de SLA concernant leservice. Différents paramètres de SLA peuvent être définis comme combinaison deparamètres de base. La définition exacte de paramètres de base, et comment ilsdoivent être mesuré sont également donnés dans cette composante.

3. La composante “Obligations” définit les garanties sur les paramètres de SLA, et lesmesures à prendre si les garanties ne sont pas respectées.

Plus de détails sont donnés dans la version anglaise.Dans cette thèse, puisque nous proposons l’utilisation de contrats probabilistes, nous

proposons une technique permettant de surveiller les contrats probabilistes et de détecterdes violations éventuelles. Notre technique de surveillance est basée sur des tests statis-tiques, et permet de définir et de détecter les violations dans le comportement des paramètresde QoS qui sont aléatoires. Notre travail sur le monitoring des contrats est complémentaireà ce que prend en charge WSLA: on peut inclure notre technique de surveillance dans laplateforme WSLA pour créer un plateforme de monitoring pour les contrats probabilistes.

1.5 Organisation de thèse, les contributions 21

1.5 Organisation de thèse, les contributions

Ce manuscript est structuré autour des publications faites au cours de cette thèse. Nousdonnons maintenant une vue d’ensemble de cette structure, soulignant les contributionsprincipales de chaque partie.

Chapitre 3: A Net system semantics for Orc

Les orchestrations de services Web exigent une base mathématique pour leur développe-ment. Nous partons du formalisme Orc proposé par J. Misra et al, à l’université de Texas àAustin. Orc est un langage élégant, qui colle au concept d’orchestration. Nous traduisonsOrc dans un système de réseaux de Petri colorés, une généralisation des réseaux de Petripermettant de gérer la récursivité — ce formalisme a été proposé par Devillers et al [DK04].Les réseaux de Petri colorés sont aussi utiles dans l’analyse des aspects non-fonctionnelsdes orchestrations.

Contributions:

1. Donne une sémantique des réseaux de Petri colorés pour les expressions Orc.

2. Donne une transformation d’un programme Orc dans un système de réseaux finis.

3. Indique comment les couleurs peuvent être utilisées dans une relation d’équivalencede marquage pour modéliser l’invocation des expressions.

4. Démontre comment ces réseaux peuvent être utilisés pour l’analyse de QoS.

Publication: Une version abrégée de cet article [RBHJ06a], sans tous les détails sur latraduction a été publiée dans le deuxième International Symposium on Leveraging Applica-tions of Formal methods, Verification and Validation (ISOLA) 2006. La version intégrale,telle que présentée ici, apparaît comme un rapport interne IRISA no. 1780 [RBHJ06b].


Chapitre 4: Event Structure Semantics of Orc

Un défi dans le développement des applications distribuées à grande échelle est l’analyse despropriétés non-fonctionnelles du système. Cette analyse nécessite une sémantique formelleet précise du langage dans lequel le système est décrit. Les systèmes de transitions etles sémantiques de traces ne facilitent pas ce genre d’analyse. Les structures d’evénementpermettent une représentation explicite de la causalité et des dépendances entre les événe-ments dans l’exécution d’un système. Mais les structures d’événements sont difficiles àconstruire par composition, parce qu’elles ne peuvent pas représenter des fragments decalcul. Dans cet article, nous présentons une sémantique d’ordre partiel basée sur destas (une sorte d’encodage des réseaux d’occurrence avec arcs de lecture), qui représententnaturellement les fragments d’une exécution. Les tas sont ensuite facilement traduits enstructures d’événements asymétriques. La sémantique est développée pour Orc, un langaged’orchestration dans laquelle les services sont invoqués en parallèle pour atteindre un but,tout en gérant des timeouts, les exceptions, et la priorité. Orc et cette nouvelle sémantiquesont utilisés pour étudier la qualité de service (QoS) dans les orchestrations.

Contributions:

1. Donne une sémantique en structure d’événements pour Orc.

2. Introduit la notion de tas pour représenter des fragments d’un calcul, et montrecomment en extraire une structure d’événements asymétrique.

3. Donne une sémantique dénotationnelle en termes de tas pour les expressions Orc, ycompris des expressions récursives.

4. Établit la correspondance entre la sémantique de structure d’événements et la sé-mantique SoS d’Orc.

Publications: Une version abrégée de cet article [RKB+07b] a été présenté lors dela 4ème International Workshop on Web Services and Formal Methods (WS-FM) 2007.Le version longue présentée ici est publiée comme un Rapport de Recherche INRIA no.6221 [RKB+07a].


Chapitre 5: Branching cells for Asymmetric Event Structures

Dans ce chapitre, nous étendons, aux structures d’événements asymétrique (AES), la no-tion de cellule de branchement introduite pour les les structures d’événements premièresdans [AB06, AB08]. Cela implique d’étendre la notion de conflit minimal aux AES. Cesnotions sont ensuite utilisées pour calculer la probabilité de l’occurrence d’un événementdans les AES stochastiques avec une politique de compétition, lorsque les événements sontsupposés avoir des distributions exponentielles.

Contributions:

1. Définit la notion de conflit minimale, les préfixes d’arrêt et les cellules de branchementpour les structures d’événements asymétriques.

2. Etend les résultats de [AB06, AB08] concernant les cellules de branchement auxstructures d’événements asymétriques.

3. Utilise les notions précédentes pour calculer la probabilité de l’occurrence d’un événe-ment dans une structure d’événement asymétrique stochastique.

Publications: L’extension des cellules de branchements pour les structures d’événementsasymétriques est un travail non-publié. Le calcul de la probabilité d’occurrence a été faitdans le cadre du papier “Critical paths in the Partial Order Unfolding of Stochastic PetriNets” [BHR09], qui a été présenté dans 7th International Conference on Formal Modellingand Analysis of Timed Systems (FORMATS) 2009.


Chapitre 6: Probabilistic QoS and soft contracts for transaction basedWeb services orchestrations

Les Service level agreements (SLA), ou contrats jouent un rôle important dans les servicesWeb. Ils définissent les obligations et des droits du fournisseur d’un service web et deses clients, tant pour la fonction que pour la qualité de service (QoS). Pour les orchestra-tions de services Web, les contrats sont calculés par un processus appelé composition decontrats de QoS, basé sur des contrats établis entre l’orchestration et les services appeléspar l’orchestration. Ces contrats sont généralement sous la forme de garanties dures (parexemple, temps de réponse toujours inférieur à 5 ms). L’utilisation de ces bornes duresn’est pas réaliste, cependant, et des approches statistiques sont nécessaires.

Dans ce papier nous proposons d’utiliser les contrats probabilistes, qui consistent enune distribution de probabilité pour le paramètre de QoS — dans ce papier, nous nous re-streignons à la latence. Nous montrons comment composer des contrats probabilistes pouren déduire un contrat pour l’orchestration. Notre approche est mise en oeuvre par l’outilTOrQuE. Des expériences sur TOrQuE montrent que les contrats pessimistes peuvent êtreévités et que l’on peut faire du “surbooking”.

Une composante essentielle dans la gestion des SLA est la surveillance en continu de laperformance des services Web appelés pour détecter des violations de SLA. Nous proposonsune technique statistique pour la surveillance de contrats probabilistes.

Contributions:

1. Introduit un approche basée sur les contrats pour la gestion de la QoS dans lesorchestrations.

2. Propose l’utilisation de contrats probabilistes pour modéliser la QoS des services.

3. Indique comment composer ces contrats et calculer la qualité de service de l’orchestration.

4. Propose une technique de surveillance adaptée aux contrats probabilistes.

5. Démontre la possibilité de “surbooking” fondé.

Publications: Une première version de ce document [RBHJ07] a été présenté à l’IEEEInternational Conference on Web Services (ICWS) 2007. La section sur les monitoring aparu dans la mini-conférence 11th IFIP / IEEE International Symposium on IntegratedNetwork Management (IM) 2009. La version présentée ici [RBHJ08] est parue dans l’IEEETransactions on Service Computing.


Chapitre 7: Monotonicity in Service Orchestrations

Les orchestrations de services Web sont des compositions de différents services Web pouren former un nouveau service. Les services appelés par l’orchestration garantissent uncertain niveau de QoS à l’orchestrateur, généralement sous la forme de contrats. Cescontrats peuvent ensuite être composés par l’orchestrateur pour déduire le contrat qu’ilpeut offrir à ses propres clients. Une hypothèse de la monotonicité implicite dans cetteapproache est: “meilleure est la QoS des services dans l’orchestration, meilleure sera la QoSde l’orchestration.”

Dans certaines orchestrations, toutefois, la monotonie peut être violée, à savoir la per-formance de l’orchestration s’améliore lorsque la performance d’un service se dégrade. Cen’est absolument pas souhaitable car ceci peut rendre le principe du recours aux contratsincohérent.

Dans ce papier, nous définissons formellement la monotonie pour les orchestrations mod-élisés par des réseaux d’occurrence colorés, et nous caractérisons les classes d’orchestrationsmonotones. Les contrats peuvent être formulés comme des contrats durs ou être proba-bilistes. Notre travail couvre les deux cas. Nous montrons que très peu d’orchestrationssont en fait monotones, principalement en raison des interactions complexes entre contrôle,données et temps. Nous fournissons également des conseils à l’utilisateur pour éviter lesproblèmes de non-monotonie lors de la conception des orchestrations.

Contributions:

1. Attire l’attention sur la problème de la non-monotonie dans les orchestrations deservices Web.

2. Formalise la notion de non-monotonie grâce aux réseaux d’occurrence colorés.

3. Donne des conditions nécessaires et suffisantes de monotonie pour une orchestration.

4. Etend l’étude au cas probabiliste en définissant la monotonie probabiliste, et montrela correspondance entre la monotonie probabiliste et la précedente notion de mono-tonie.

Publications: Une version abrégée de cet article [BRBH09] a paru dans la 30ème Con-férence internationale sur l’application et la théorie des réseaux de Petri et d’autre modèlesde concurrence (ICAPTN) 2009. Une version plus longue [BRBH08] est publiée commerapport de recherche INRIA no. 6528.


Chapter 8: A Theory of QoS for Web Service Orchestrations

Dans ce papier, nous développons une théorie générale de la QoS pour les orchestrationsde services Web. Pour autoriser des paramètres de QoS multi-dimensionnels, les domainesde QoS doit être partiellement, et non pas totalement ordonnés. Nous identifions l’algèbrenécessaire pour décrire comment la QoS est transformée lors de la synchronisation desréponses de services, et nous représentons la manière dont un appel de service contribue àla QoS bout-en-bout de l’orchestration. Les approches fondées sur les contrats supposentimplicitement que meilleure est la performance d’un service appelé, meilleure sera la per-formance de l’orchestration. Cette propriété, appelée la monotonie, n’est pas toujourssatisfaite. Nous donnons les conditions pour garantir la monotonie dans les orchestrations.Ensuite, nous montrons comment les contrats entre l’orchestration et les services qu’elleappelle peuvent être composés, ce qui donne un contrat entre l’orchestration et ses clients.Pour tenir compte de la forte variabilité des paramètres de QoS, nous nous appuyons à lafois sur des approches probabilistes et non probabilistes. Enfin, nous proposons une légèreextension du langage Orc pour aider à la gestion de QoS selon notre théorie.

Contributions:

1. Étend et généralise la gestion de QoS de [RBHJ08] à des paramètres de QoS àplusieurs dimensions.

2. Formalise la notion de paramètre de QoS composite de domaines partiellement or-donnés. Définit une algèbre qui décrit comment les paramètres de QoS évoluent lorsde l’exécution de l’orchestration.

3. Propose une procédure flexible de composition de contrats tenant compte des obli-gations et des garanties entre les couples orchestration, service appelé et client ,orchestration.

4. Etend la technique de surveillance probabiliste au cas général des paramètres de QoSpartiellement ordonnés.

5. Propose quelques extensions du langage Orc pour faciliter la gestion de qualité deservice.

Publications: Une première version de ce travail [RBJ09b] a paru dans l’IEEE Interna-tional Conference on Web Services (ICWS) 2009. La version étendue [RBJ09a] présentéici est soumis au International Journal of Web Services Research (JWSR).

Chapter 2

Introduction

2.1 Motivation

The world wide web is being increasingly used as a medium for providing services toclients over the globe. The web has transformed the way in which software has beentraditionally consumed. Instead of being installed and run locally on a machine, manysoftware applications — or Web services — now run on distant servers and are accessedby users over the web. For instance, with Google docs services a user can create andmanipulate documents with a text editor which is invoked using any standard web browser.

Due to the availability of Web services which can be programatically invoked, newdistributed applications are being created over the web. Services with differing function-alities, possibly separated by large geographical distances, can be composed together inmany different ways to form new services. These composed services can in turn be invokedby other services, thus forming arbitrarily complex patterns of distributed computations.Web service orchestrations refer to such composite services where one single unit - the or-chestrator - controls the order of the service calls in the execution of the composite service.In business enterprises, Web services and their orchestrations are becoming the preferredway to integrate the applications across the enterprise (EAI). They are also commonly usedto implement Workflows, and to build applications which span across different enterprises.

In business environments, the non-functional behaviour, also called the Quality ofService (QoS) of Web services is an important aspect of its behaviour. QoS includes a widearray of issues like the performance, availability, reliability and the security offered by theservice. The QoS of a service is characterised by the responses to different questions like’how fast does the service respond to a call?’, ’how frequently is the service unavailable?’,’how reliable are its results?’, etc. For a business entity, the QoS of its service often decideswhether a client chooses its service over a similar service of another provider. A servicethat does not meet its QoS requirements might also cause losses to its provider. It is thusimportant for the service provider to be able to model and predict the QoS of its servicereasonably well, before the service is deployed. Ad-hoc solutions like over-provisioningresources at run time can be quite expensive and are not guaranteed to work.

The importance of QoS underscores the need of a comprehensive framework for man-aging QoS in Web services and their orchestrations. In this thesis we are interested inQoS management of Web service orchestrations, and we aim to provide a framework for

28 Introduction

this. A comprehensive QoS management framework for orchestrations needs to respond toa variety of questions:

1. How to model the orchestration, specifying both its functional, and non-functionalbehaviour? In order to reason about the QoS of the orchestration, we need to haveappropriate models to describe the orchestration. These should be formal mathe-matical models, and should have a clear semantics which is free from ambiguity. Anappropriate model, suitable for QoS analysis should consider both the functional andthe QoS aspects of the orchestration. It is important that the model is at the rightabstraction level for analysing orchestrations.

2. How to relate the QoS of the orchestration to the QoS of the services it calls? TheQoS of the orchestration is clearly influenced by the QoS of the services that it calls.For e.g., a service that takes a long time to respond to a call will in turn slow theorchestration’s responses. Relating the QoS of the services and the orchestrationis a non-trivial problem. Data and time can complexly intertwine and influencethe execution flow and the QoS of the orchestration. It is important to have agood estimate of the orchestration’s QoS. A QoS estimate which is too conservativewould result in over-provisioning of resources which is expensive and could cause lossof clients to competitors. On the other hand, with an overly optimistic estimate,attaining the desired level of QoS at run-time might become impossible and couldlead to paying penalties and loss of clients.

3. How to monitor QoS at run-time? The QoS of the called services and the or-chestration needs to be monitored at runtime, to ensure that they meet the desiredlevels. If the monitoring process detects a low level of QoS, the orchestration willneed to take appropriate action. The orchestration could impose penalties on thebadly performing services or decide to reconfigure itself, by replacing the service bya ’better’ service.

Outline of the chapter: This chapter is intended to serve as an introductory tutorialfor QoS management in Web service orchestrations. We start by briefly describing Webservices and their compositions in the next section. Section 2.3 surveys some prominentformal models that are relevant to the study of Web service orchestrations, and it introducesthe models that are used in this thesis. Section 2.4 introduces the problems specific to QoSin Web services and their orchestrations and surveys the different approaches used to tacklethe issue. This chapter ends with an outline of the rest of the thesis, and by giving theprincipal contributions of this thesis.

2.2 Web services and their compositions

This section briefly introduces Web services, their compositions and some of the technolo-gies that surround them. The term ’Web service’ has no widely accepted definition but inits generic sense is used to describe any software application that can be called automati-cally, over a network. Most Web services are applications over the internet, that are calledusing standard Web protocols like HTTP. Any Web service is composed of 1) A uniqueaddress or identifier of the service (usually its URI); 2) An input document in a standard,well-known format (usually an XML-document), that contains the client’s request; 3) Asoftware application that understands the input document and processes it. Web servicescan be categorised by their invocation style, the three most common being

2.2 Web services and their compositions 29

• RPC-style Web services: These Web services can be seen as implementing the tra-ditional Remote Procedure Calls (RPC), over the Web. RPC-style Web servicesthough quite common, are not a recommended way of implementing Web servicessince the calls to the services’ operations often map to language-specific constructs.The client’s code is thus tightly-coupled with the server’s implementation.

• REST-based Web services: These kinds of services are becoming increasingly pop-ular. They follow the REST (REpresentational State Transfer) architecture, whichwas proposed by Roy Fielding [Fie00] as a design model for the web. The interfaceof these services is restricted to the four HTTP methods: GET, POST, PUT andDELETE. The variety of operations of the service are implemented by defining dif-ferent resources upon which these four methods can be called. REST-based Webservices do not need to process XML documents, and more light-weight formats likeJavaScript Object Notation (JSON) [JSO] can be used.

• SOAP-based Web services: These services are the most popular kind of servicesamong the industry and commercial vendors, who have built many tools to supporttheir development. SOAP [SOA] is an XML-based message exchange protocol forWeb services. A SOAP message consists of a header which is used to carry variousnon-functional information, and a body which carries the XML-message to be pro-cessed by the service. SOAP is extensible, its header portion can be used to adddifferent management protocols over the basic communication protocol. Amongstthese are the various WS-* specifications, like WS-Security to have secure messageexchange; WS-Reliability to ensure reliable transfer of the message, with all of themessage parts reaching the other end in order; WS-Policy to specify capability andrequirement policies.

The functional interface of a Web service, which lists its procedures calls and the re-quired parameters, is often described in an XML based “Web Service Description Language”(WSDL). From a service’s WSDL file, one can automatically generate the client side codefor calling the service. Web services can publish their services in a common registry, usingthe “Universal Description, Discovery and Integration” (UDDI) language. Clients can querya UDDI repository to search for Web services. Though WSDL is being commonly used todescribe the interface of services, public UDDI repositories are not commonly found.

Web service compositions

One of the most interesting aspects of having automatically callable services is that theycan be programmed to be composed together to form novel services with added value. Thecomposed service is called a Web service composition. Compositions of Web services areusually thought of in terms of orchestrations and choreographies.

In Web service orchestrations, there is a central entity, the orchestrator that managesthe calls to the different services. This is the most common way to compose Web services.Orchestrations are particularly popular in business enterprises where they can be used tointegrate the diverse applications across the enterprise (EAI). They can also be used tobuild business-to-business (B2B) applications, for e.g. to implement cross-organisationalworkflows.

Web service choreographies, on the other hand do not have one central unit that controlsthe execution. The control is distributed over different services that synchronize with eachother from time to time to realise a common goal. Choreographies can be thought ofas a peer-to-peer network of Web services. True choreographies however do not seem

30 Introduction

to be common over the web and there is no real commercial tool support for them. WS-CDL [KBRL04] is a W3C proposition for describing interactions in choreographies. We willonly look at Web service orchestrations and unless explicited, a Web service compositionwill refer to a Web service orchestration.

A typical orchestration example, called CarOnLine is shown in Figure 2.1. A client callsthe CarOnLine orchestration with a car type as an input parameter. The orchestrationsearches for price quotes for this car, by calling two different Web services GarageA andGarageB. The calls to these garages are guarded by a timer Timeout. If a garage does notrespond before the timeout, its eventual response (if any) is ignored. The Best Offer is amethod local to the orchestration, which selects the best offer from the two responses,according to some criteria (for e.g., the price of the car). For the best offer, the CarOnLine

orchestration then finds credit and insurance offers in parallel. If the input car was a deluxecar, then only the GoldInsure service provides insurance for it, if not InsurePlus and InsureAll

are called in parallel, and the offer with the minimal insurance rate is chosen. Similarlyfor the credit offers from AllCredit and AllCreditPlus, the offer with the minimal credit rate ischosen. Finally the collection of the best price, credit and insurance offers is returned backto the client.

GarageA GarageB

Best Offer


minmin

sync

CarOnLine Request

Timeout Timeout

car=deluxe

GoldInsure

yes no

merge

Figure 2.1 – The CarOnLine orchestration.

In businesses, ’Business Process Execution Language’ (BPEL) [Bpe07] is the mostcommonly used language to specify and execute orchestrations. BPEL is a XML-basedlanguage which includes constructs for composing services in different ways. Orc [MC07] isa process algebra based language for specifying and executing orchestrations. Orc can beseen as a web-scripting language and the Orc interpreter [QKCM] enables specifying andexecuting mashups involving calls to services over the Web. We will look at BPEL andOrc in greater detail in section 2.3.

Static and Dynamic compositions: Service compositions can rougly be divided intotwo types: static compositions and dynamic compositions. In static compositions, thestructure of the composition and the services called during the composition are fixed.

2.3 Models for service orchestrations 31

They are useful and sufficient to implement processes whose structure and functionalityremain fixed or do not change very often. Static compositions are commonly found insideenterprises where they integrate applications across the enterprise, or they may span acrossdifferent business entities. In the second case, SLAs are often established between thebusiness entities to guarantee the behaviour of the services of the composition.

Dynamic compositions, as the name suggests, are more flexible. The structure of thecomposition may change frequently, possibly for every new call to the composition. Eachuser may have different constraints on the behaviour (both functional and non-functional)of the composition, and a composition which satisfies those constraints is computed foreach user. Dynamic compositions are suitable in fast-changing environments where newservices may be discovered or services may become unavailable in short time periods.

2.3 Models for service orchestrations

In this section we will review a couple of formalisms that are useful to model serviceorchestrations. Service orchestrations are essentially distributed systems and so many ofthe existing formal models for discrete distributed systems can be used to model them.

2.3.1 Petri Nets

Petri nets [Rei85, Mur89] as a model to describe distributed system were introduced by C.A. Petri in the 1960s and were developed significantly over the years. Petri nets have anintuitive graphical representation together with a clear formal semantics. As a result, theyhave been widely used in system design and analysis.

Petri nets and their extensions have successfully been used to model workflows andorchestrations. The Workflow nets of Aaslt [vdA97], used to model generic workflows are es-sentially strongly connected Petri nets with unique input and output places. In [RtHvdAM06],the authors identify different Workflow patterns and give semantics for these patterns usingPetri nets. Petri nets have moreover been used to give semantics to orchestration languageslike BPEL whose semantics is described in natural language, and they have been used forverifying properties of the orchestration [HSS05, LMSW06].

In this section we very briefly introduce Petri nets.

Definition 2.1 A Petri net is a tuple N = (P, T, F, M0) where

• P is a finite set of places, denoted by circles.

• T is a finite set of transitions, denoted by rectangles. Moreover P ∩ T = ∅.

• F is a flow relation such that F ⊆ (P × T ) ∪ (T × P ).

• M0 the initial marking, is a function M0 : P → N, where N is the set of naturalnumbers including 0.

Figure 2.2 shows a simple Petri net with three places p0, p1, p2 shown as circles andone transition t0 shown as a rectangle. The flow relation is represented by the directed arcsbetween the places and the transition. The initial marking M0 is such that M0(p0) = 1and M0(p1) = 2 and M0(p2) = 0. This marking can also be represented by the multi-setp0, p1, p1.

For a Petri net N = (P, T, F, M0), the tuple N = (P, T, F ) is a directed bipartite graph.This is usually referred to as a net N . The elements of P ∪ T are called the nodes of net

32 Introduction

N . For a node x of N , the set •x = y | (y, x) ∈ F is called the pre-set of x and the setx• = y | (x, y) ∈ F is called the post-set of x. For e.g. in figure 2.2, •t0 = p0, p1 andp1

• = t0. The pre-set and post-set of a node x are often referred to as the input and theoutput nodes of x.

A marking of a Petri net is a mapping from the places P of the net to N. In figure 2.2,the marking is represented by assigning tokens (the small black circles) to the places. Atransition t is said to be enabled in a marking if all the input places of t have at least onetoken in that marking. Formally, t is enabled in marking M if ∀p ∈ •t, M(p) > 0. Clearlyt0 is enabled in the marking of figure 2.2.

p2

p0 p1

t0

Figure 2.2 – A Petri net.

p2

p0 p1

t0

Figure 2.3 – The net after firing t0.

A transition enabled in a marking can fire. The firing of an enabled transition changesthe marking of the Petri net. If transition t is enabled in marking M , the firing of t resultsin the marking M ′ defined by

M ′(p) =

M(p) if p /∈ •t ∪ t•, or p ∈ •t ∩ t•

M(p) − 1 if p ∈ •t and p /∈ t•

M(p) + 1 if p ∈ t• and p /∈ •t

The firing and the associated change of marking is denoted by Mt−→ M ′. The new marking

resulting from firing transition t0 (enabled in figure 2.2) is shown in figure 2.3. A firingsequence of the Petri net with initial marking M0 is any sequence of transitions σ =

t0, t1, . . . tn such that M0t0−→ M1

t1−→ . . .Mntn−→ Mn+1. This is also represented as M0

σ−→Mn+1. A marking M is called a reachable marking if M0

σ−→ M for some firing sequence σ.A Petri net is said to be k-bounded for any reachable M , ∀p ∈ P, M(p) ≤ k holds. A

1-bounded Petri net is called a safe net.The semantics of a Petri net expressed as a firing sequence is a linear semantics. This

semantics does not give the full information about the causal relationships between thetransition firings: Some of the transition firings of the net may be independent of eachother and there is no inherent order to their firings. A linear execution trace does notreflect the independence relation between firings. This is overcome by giving partial ordersemantics for Petri nets, which explicits the causal and concurrency relationships betweenthe transition firings. One of the most popular partial order semantics for Petri nets arePetri net processes, which themselves are special kinds of nets.

Occurrence nets, Branching Processes and Unfoldings

We will now describe a partial order semantics of Petri nets through processes. For this,we introduce a restricted class of nets called occurrence nets.

Consider a net O = (B, E, G). Let ≺ be the transitive closure of the flow relation Gand let ¹ be the reflexive closure of ≺. The set of causes of a node x of O is given by the


set ⌊x⌋ = y | y ¹ x. Two nodes x and y of O are said to be in conflict, denoted by x#y,if there are transitions t, t′ ∈ E such that •t ∩ •t′ 6= ∅ and t ¹ x, t′ ¹ y holds.

Definition 2.2 A net O = (B, E, G) is called an occurrence net if:

1. ¹ is a partial order.

2. Each place has at most one input transition, i.e. ∀p ∈ B, |•b| ≤ 1.

3. ¹ is well-founded, i.e ⌊x⌋ is finite for every node x.

4. no node is in self-conflict, i.e. x#x does not hold for any node x.

For an occurrence net O = (B, E, G), the elements of B and E are referred to asconditions and events respectively. A configuration κ of an occurrence net O is a subnetof O which is causally closed and conflict-free. For any node x of κ, ⌊x⌋ is contained in κand no two nodes x and y of κ are such that x#y. Figure 2.4 shows an occurrence net.The shaded part is one configuration of the net.

Figure 2.4 – An occurrence net and one of its configuration (shaded).

A set of (partially ordered) executions of a net can be represented by an occurrencenet, and is called a branching process of the net. To formally define branching processeswe need the notion of a homomorphism between nets. A homomorphism ϕ from a netN = (P, T, F ) to a net N ′ = (P ′, T ′, F ′) is a map ϕ : (P ∪ T ) → (P ′ ∪ T ′) such that: 1)ϕ(P ) ⊆ P ′ and ϕ(T ) ⊆ T ′. 2) For any node x of N , the restriction of ϕ to •x is a bijectionbetween •x and •ϕ(x), and the restriction of ϕ to x• is a bijection between x• and ϕ(x)•.For an occurrence net O = (B, E, G), let min(B) denote the minimal conditions of B withrespect to ¹.

Definition 2.3 A branching process of a Petri Net N = (N, M0) is a net B = (O, ϕ)where O = (B, E, G) is an occurrence net and ϕ is a homomorphism from O to N suchthat:

1. The restriction of ϕ to min(B) is a bijection between min(B) and M0.

2. If two events e, e′ ∈ E are such that •e = •e′ and ϕ(e) = ϕ(e′), then e = e′.

34 Introduction

a b c

p2p1

a

p1 p2

b c

p2p1 p1 p2

Figure 2.5 – A Petri net (left) and one of its branching process (right).

Any configuration of the branching process of a net is a process of the net and itcorresponds to a partially ordered execution of that net. Figure 2.5 shows a Petri net onthe left and one of it branching processes on the right.

The minimal nodes of the branching process B is the set of conditions min(B), whichwe sometimes denote as min(B). Branching processes of a Petri net can themselves beordered by a relation ⊑. For two branching processes B and B′, we write B ⊑ B′ if 1)There is an injective homomorphism ψ from B′ into B, such that ψ(min(B′)) = min(B).2) The composition ϕ ψ is equivalent to ϕ′. It has been shown in [Eng91] that for anyPetri net there exists a unique (up to isomorphism) maximal branching process accordingto ⊑. This maximal branching process of N is called the unfolding of N . The unfolding ofa Petri net is an appropriate partial order representation of all of its possible executions.All possible executions of a Petri nets are configurations of its unfolding and vice-versa.

a

p1 p2

b c

ca b b b

p1 p1 p2p2

Figure 2.6 – The unfolding of the Petri net of figure 2.5.

Figure 2.6 shows the unfolding of the net of figure 2.5. The unfolding of a net in generalis an infinite structure, but there is a finite representation of the unfolding [McM95] knownas the finite complete prefix, which is sufficient to verify many properties of the net.

Advantages and Disadvantages of Petri nets

Petri nets have turned out to be a very successful and popular formalism for modellingdistributed systems, and they have been used to model (and/or) analyse a wide array ofsystems like manufacturing systems, computer networks, biological systems, etc. We give


some of the advantages of the use of Petri Nets.

1. Petri nets are a suitable model for distributed systems: The execution of a Petri net,defined by the firing semantics is local to a transition. Multiple transitions at differentparts of the net may be enabled at the same time, and the order of their firing isnon-deterministic. This corresponds nicely to the idea of a distributed system wherethe ’global state’ of the system is not explicitly specified, but is derived as the sumof local states of the system.

2. Petri nets have many analysis techniques: A Petri nets can be seen as a bipartitegraph, and it can be represented as a system of linear equations. As a result, theyhave derived many analysis techniques from graph theory and linear algebra, whichcan be used to verify properties of the system.

3. Petri nets are intuitive: Although it is a formal model, the graphical representationof Petri nets is very intuitive and it is rather easy to get non-experts to modeltheir systems as Petri nets. The causality and concurrency relationships betweentransitions appears explicitly in the graph, which might be helpful to the modeler.For e.g. it is easy to translate an informal idea of a ’process flow’ into a Petri net.

4. Petri nets have a partial order semantics: As a consequence of the local, non-deterministic firing rule of Petri nets, the events in an execution (configuration)appear as a partial order which reflects the causalities between the events. This isopposed to an interleaving semantics of for e.g. an automata where events are totallyordered. Partial ordered executions are a compact way of representing all possibleexecutions, and they have other advantages too. They are useful in our QoS stud-ies in particular, to compose QoS parameters in order to derive the end-to-end QoSattributes of the orchestration.

5. Petri nets have many extensions: Generically known as high-level Petri nets, thereare many extensions of Petri nets to model complex systems. For e.g. colored Petrinets associates values to tokens which can be modified on the firing of a transition.This is useful to model data and the way they are manipulated in an execution.Timed and Stochastic Petri nets are extensions of Petri nets which associate delaysto transitions and have found applications in modelling time-critical systems and inperformance evaluations.

The use of Petri nets has it own set of disadvantages however. We now mention someof them. Some of them are specific to their use in modelling compositions.

1. Hard to model termination: It is often necessary to model a one-step termination ofthe execution of a system. For e.g. on receiving a cancel message from the user ofa service, an ongoing computation corresponding to the user has to be terminated.The firing semantics of Petri Nets, which is local to a transition might make thistask difficult. One might need to connect all the places of the net to a terminatingtransition, requiring a lot of arcs which often clutters the model.

2. Not easy to model new process instances: In the context of orchestrations, one mightwant to model the launching of new instances of the same process, for e.g. when newqueries arrive at the same orchestration. This can not be done using basic Petri nets,and this usually needs high-level Petri nets. Modelling new processes might requirethe use of colored tokens, along with a mechanism for managing them, which mightbe non-trivial to model.

36 Introduction

3. Not easily composable: Petri nets, unlike process algebras, are not defined in a recur-sive, structured way. This may be advantageous for e.g., in representing unstructuredWorkflows, which can be directly modelled using Petri nets, but whose modelling us-ing process algebras is non-trivial. By not being defined in a recusive and structuredway, Petri nets are not directly composable. There have been works to address thisissue however. Many ways of composing Petri nets have been defined, for e.g., byfusing transitions or places having the same labels in two nets. The Petri Net Alge-bra [BDK01] tries to draw a link between a simple class of Petri nets and a simpleprocess algebra, the Petri Box Calculus (PBC). The basic elements of PBC are simplePetri Nets, which can be composed together by the different operators of PBC.

2.3.2 Process Algebras

Process algebras is a term used to designate a class of models which are very popular inmodelling concurrent systems. The most popular of these are Communicating sequentialprocesses (CSP) [Hoa78] by Tony Hoare, Calculus for Communicating Systems (CCS) byRobin Milner [Mil80] and an extension of CCS to describe dynamic/mobile systems, thePi-calculus [Mil99].

Most process algebras view a concurrent system as a set of processes which communicateby sending messages over channels. The simplest process is a basic action. As the namesuggests, a process algebra has operators to compose processes together in different ways toget new processes. For e.g. in the Pi-calculus a process P is defined recursively as follows:

P ::= π.P∣∣ P + Q

∣∣ P | Q∣∣ !P

∣∣ (νx)P∣∣ 0

where Q is a process. π.P is a process that does action π and then behaves like processP . Action π is a communication action and it can be the reading or writing of a value ona channel. P + Q is a process that can choose to act either as process P or as process Qnon-deterministically. P | Q parallelly runs processes P and Q, !P has an infinite numberof instances of P running in parallel, (νx)P ensures that a new instance of channel x iscreated in P . The process 0 is a process which does not perform any action.

The execution of a process P happens in steps of the form Pa−→ P ′, which says that

process P does action a and then behaves like process P ′. A sequence of such steps

Pa−→ P ′ b−→ P ′′ . . . gives a sequence of actions a, b . . ., a trace of process P . Using the

execution semantics, one can define different notions of equivalence between processes. Ata basic level two processes can be considered to be equivalent if they have the same setof traces. But there is a richer notion of equivalence called bisimilarity. Intuitively, twoprocesses are said to be bisimilar if every action performed by one process can also beperformed by the other process, and the new processes resulting from performing thataction are again bisimilar. Two bisimilar processes have the same set of traces, but theconverse is not necessarily true. This notion of bisimilarity is also referred to as strongbisimilarity because it differentiates between processes that have different internal (i.e.non-observable) behaviour. A weaker notion called weak bisimilarity does not distinguishbetween processes which differ only in their internal actions. Weakly bisimilar processesare also said to be observationally equivalent since they would not be distinguishable fromthe point of view of an external observer.

Process algebras are useful models of concurrent computations, but they are not reallymeant to be used as programming languages. This situation might be compared to λ-calculus, which is good model for sequential computation, but is not really meant to bea programming language. There are concurrent programming languages like Pict and


Concurrent ML whose design is inspired by process algebras. The design of many Webservice composition languages like WS-CDL, WSCI and XLANG is often said to be inspiredby process algebras like Pi-calculus. There have also been efforts to give formal semanticsto these languages using process algebras. For e.g. in [SBS04] the authors map BPELconstructs to the CCS operators.

2.3.3 Orc

In this section we describe the Orc language. Since our QoS studies uses Orc as a languagefor modelling orchestrations, we will present it in some detail. The material in this sectionis based on many papers on Orc, see [KQM09, KQCM09, MC07, QKCM].

Orc is a process calculus and an programming language based on it, which is useful formodelling and programming concurrent, distributed applications. Orc is specially suitedfor specifying and executing distributed applications over the web. A program in the Orclanguage is an Orc expression, which consists of basic expressions called sites, which arecomposed together using four concurrency combinators. An orchestration orchestrates theexecution of an Orc program: it can call sites in parallel, in sequence, etc, it can wait fortheir responses, it can terminate sub-computations, and so on.

We will now look at the constructs of the Orc language, i.e. sites and the four Orccombinators.

Sites: Sites are the most basic expressions in Orc which can represent any computingentity. Sites can be both external and internal to the orchestration. For e.g. a site can bea locally implemented function performing simple computations like addition, subtraction,etc, or it could be a remote service performing complex query lookups in its databases. AWeb service, in particular, can be modeled as a site.

A call to a site - a site call - may require multiple parameters, all of which will have tobe defined before the call to the site can be made. In other words, site calls are strict. Aresponse to a site call may occur after any amount of time since the call was made. Thereare multiple ways in which a site responds to a given call: 1) The site responds once, withthe return value corresponding to the call. This value is said to be published by the call.2) The site halts, by explicitly indicating that it will not respond to the call. 3) The sitesimply does not respond to the call, forever blocking that call.

For e.g. the call Google(s) will call the Google search service with the query string s,and will return a set of search results for s. The Orc language also defines some fundamentalsites that are useful in writing many programs. The call let(x, y..) simply publishes thetuple of values (x, y..). The call if(b) responds with a signal (a value with no information)if b is true, else it does not respond. The call Rtimer(t) returns a signal t time unitsafter the call to it was made. Two special sites commonly used are signal and stop. Theformer returns a signal immediately when it is called, and the latter simply halts, withoutreturning a value.

The Orc operators

We will now see how to build different Orc expressions using the four composition operatorsthat Orc provides. Here f and g represent generic Orc expressions.

1. Parallel operator (f | g) : The parallel composition of two Orc expressions f and g iswritten as f | g. On execution, f | g executes both f and g in parallel. There is no directinteraction between f and g. The values published by f | g is an interleaving of the valuespublished by f and g, in the order of their publication.

38 Introduction

For e.g. the expression (Google(s) | Bing(s)) calls the Google and Bing search serversin parallel, with the same input query s. The result is the two sets of search results fromthe two services (if both the services respond). There may be only one or even no result,if any of the services do not respond.

2. Sequential operator (f >x> g) : The sequential composition f >x> g first executesf . If a value v is published by f , a new instance of g is launched in parallel in which thevalue of x is set to v. The publication !v is consumed in the process. Since g is launchedin parallel, f continues its execution as before. The values published by f >x> g is the setof values published by the different instances of g. Note that if f does not publish a value,then no instance of g is launched, and so f >x> g does not publish a value either.

For e.g. the expression (Google(s) | Bing(s)) >r> Print(r) will print two sets of results,if both Google and Bing respond.

3. Pruning operator (f <x< g) : On execution, f <x< g launches f and g in parallel.When g publishes its first value, the evaluation of g is terminated and the published valueis assigned to x. Site calls in f that have the variable x as a parameter are blocked until greturns its first value. The values published by f <x< g is the set of values published byf . Note that it might not be necessary for g to publish a value for f <x< g to publish.

For e.g. the expression Print(r) <r< (Google(s) | Bing(s)) will print the first set ofresults got from either Google or Bing.

4. Otherwise operator (f ; g) : The otherwise combinator f ; g first executes f . If f pub-lishes a value, then g is discarded and f continues executing. However, if f completeswithout publishing a value, then g is activated. The completion of an expression is recur-sively defined as follows: 1) A site call completes if it returns a value or if it halts. 2)f | g completes if both f and g complete. 3) f >x> g completes if f completes and allinstantiations of g complete. 4) f <x< g completes if f completes and g has completedor published a value. If g completes without publishing, then all site calls in f that havex as a parameter also complete. 5) f ; g completes if f completes after publishing, or if fcompletes without publishing and then g completes.

The values published by f ; g are the values published by f if it publishes any, else it isthe values published by g.

For e.g. the expression (Google(s) | Bing(s)); AOL(s) will call both Google and Bingin parallel. If any of them respond, then the call to AOL is not made. However if both ofthem halt, then the left expression completes and so the AOL search engine is called.

Expression Definitions

Orc allows defining expressions for modularity. Expression definitions can moreover berecursive. An expression definition looks like

def E(x) = f

where x is a set of parameters and f is any expression which may use those parameters.When E(x) is called, the expression f is executed, with the parameters x replaced by theactual parameters during the time of the call. Note that an expression call may returnmultiple values, unlike site calls.

Calls to expressions are non-strict, some of the parameters in x may be undefined whenE is called. However site calls in E that use those parameters will block until they aredefined.


As an example, consider the recursive expression

def Metronome(n) = Rtimer(1000) ≫ (let(n) | Metronome(n + 1))

The call Metronome(0) will then print the values 0,1,2,... at intervals of 1000 time units.

Programming idioms in Orc

Orc can be used to model different concurrent programming constructs. We will now showhow some of these constructs are modeled in Orc. These constructs also appear in manyworkflow examples.

1. Fork-join: This is a very common concurrency construct, and can be found in manyexamples. A fork-join evaluates expressions f1, f2, . . . fn in parallel, and waits for all ofthem to respond before continuing. This construct in Orc is simply written as

(f1, f2, . . . fn)

which is a shorthand for

(( let(x1, x2, . . . xn) <x1< f1) <x2< f2) . . . <xn< fn

2. Parallel-or: The Parallel-or construct does the following: Call two expressions f and gin parallel. f and g each return a boolean value. Return the disjunction of the booleanvalues as soon as it is defined. This is encoded in Orc as follows:

val b1 = fval b2 = gor(b1, b2) | (if(b1) ≫ true) | (if(b2) ≫ true)

which translates to

((or(b1, b2) | (if(b1) ≫ true) | (if(b2) ≫ true)) <b2< g) <b1< f

Note that this expression might return multiple values, if both f and g evaluate to true.To get the first value in such cases, we can embed the whole expression in the right side ofa prune operator.

3. Timeouts: A typical construct using timeouts is as follows: Call a site S. If S publishesa value within t time units, then call f . If not, call a backup expression g. In Orc, thiswould be written as

val b = (S ≫ true) | (Rtimer(t) ≫ false)if b then f else g

which translates as

((if(b) ≫ f) | (not(b) >nb> if(nb) ≫ g))<b< (S ≫ true) | (Rtimer(t) ≫ false)

4. Priority: During a computation, one may want to give priority to values got fromcertain expressions over others. For e.g. call sites M and N . For t time units, publish onlythe value got from M (if any). After t time units, a value from any one of them is publishedwhenever it is got. The computation is terminated when the expression publishes its firstvalue. We write this is Orc as:

val m = Mval n = Nlet(m | Rtimer(t) ≫ n)

40 Introduction

Semantics of Orc

The semantics of Orc is given in the form of SOS rules which transform the Orc programat each step. The rules are given in figure 2.7.

f, g ∈ Expression ::= M(p) | E(p) | f | g | f >x> g | f <x< g| f ; g | ?k

a ∈ Actions ::= Mk(v) | ?k | !v | τp ∈ Actual ::= x | v

Definition ::= E(x) ∆ f

k fresh

M(v)Mk(v)−−−−→ ?k

(SiteCall)f

a−→ f ′ a 6= !v

f >x> ga−→ f ′ >x> g

(Seq1N)

?k!v−→ stop (SiteRet)

f!v−→ f ′

f >x> gτ−→ (f ′ >x> g) | [v/x].g

(Seq1V)

?kτ−→ stop (SiteHalt)

fa−→ f ′

f <x< ga−→ f ′

<x< g(Asym1N)

fa−→ f ′

f | ga−→ f ′ | g

(Sym1)g

!v−→ g′

f >x> gτ−→ [v/x].f

(Asym1V)

ga−→ g′

f | ga−→ f | g′

(Sym2)g

a−→ g′ a 6= !v

f <x< ga−→ f <x< g′

(Asym2)

fa−→ f ′ a 6=!v

f ; ga−→ f ′; g

(Or1N)f

!v−→ f ′

f ; g!v−→ f ′

(Or1V)

stop; gτ−→ g

(Or2)JE(x) ∆ f K ∈ D

E(p)τ−→ [p/x].f

(Def)

Figure 2.7 – The Syntax (top) Operational Semantics (bottom) of Orc


Other than these rules, there are five rules that specify how the halting of a sub-expression might cause the overall expression to halt. This is given in figure 2.8.

1.stop | g

τ−→ g2.

f | stopτ−→ f

3.stop >x> g

τ−→ stop

4.f <x< stop

τ−→ [⊥/x]f5.

M(⊥)τ−→ stop

Figure 2.8 – Rules for halt propogation in Orc expressions.

This semantics is asynchronous, and there is no order imposed on the occurrence ofevents. This may not reflect the true behaviour of time-based sites like Rtimer since acall to it can not be forced to occur at any given instant. There is a timed-semantics forOrc which account for the evolution of time which we do not present here. It had beenpublished in [WKCM08].

A note on the use of Orc in this thesis: In the chapters of this thesis we write f wherex :∈ gto mean f <x< g. This was how the pruning expression was written in many previousOrc papers like [WKCM08, MC07, KCM06]. The otherwise operator was a recent additionto Orc, and it does not feature in our examples and in our semantic studies of Orc inChapter 3 and Chapter 4.

2.3.4 BPEL

The Business Process Execution Language (BPEL) [Bpe07] is a modelling and executionlanguage for business processes. Though it is not a formal mathematical model, we presentit here because it is the most popular language used to describe Web service orchestrations.There have been attempts to give a formal semantics to BPEL, for e.g. by translatingthem to Petri nets [HSS05, OVvdA+07, LMSW06], process algebras [Fer04] and finitestate machines [AFFK04, FBS04].

A BPEL process can be an abstract process or an executable process. An abstractprocess is meant to be a high-level model of the interactions in the process. It lacksthe precise details needed to call the services and process their responses. An executableBPEL process is a complete, detailed specification of the process can be executed by aBPEL engine.

The elements of a BPEL process are called activities which can be either primitive orstructured. Some of the main primitive activities are: 1)Invoke: to call a service. 2)Receive:to receive a message from an external source. 3)Reply : to respond to a message received.4)Assign: to assign a value to a variable. 5)Throw : throw a fault message. 6)Wait : todelay the execution for a certain time or till the occurrence of a certain event. 7)Exit : toterminate a process instance instantly.

The primitive activities can then be grouped into more complex forms using structuredactivities which are: 1)Sequence: perform one or more activities sequentially. 2)If : amutually-exclusive conditional execution. 3)While: loop over an activity. 4)Flow : paral-lelly perform a set of activities. There is an implicit synchronization at the end of the flow:the flow activity terminates when all the parallel activities terminate. 5)Pick : wait for theoccurrence of exactly one event from a set of events, and then perform the activity corre-sponding to that event. This resembles the pruning operator <x< in Orc. 6)ForEach:

42 Introduction

execute the activity a certain number of times. The ForEach block may be initialized toexecute the activities in sequence or in parallel.

Discussion: BPEL is a very popular language to specify orchestrations, specially in businessenterprises. The set of constructs that BPEL provides is quite rich and so rather complexworkflow patterns can be specified in BPEL [vdAtHKB03]. The fact that one can specifyabstract processes, and then build detailed refinements of them which can be executedis useful to the orchestration designer. There are a lot of available tools for specifyingand executing orchestrations in BPEL like Active BPEL, IBM’s Websphere, Microsoft’sBizTalk, Oracle’s BPEL Process Manager, Apache’s ODE and Sun’s Open ESB.

However, since BPEL is a specification language which aims to execute orchestrationsover the Web, it does not have a strong formal basis to it. Its specification is informaland parts of it are said to be ambiguous. The rich set of constructs makes it a complexlanguage. There is redundancy amongst its constructs and different constructs can be usedto model the same process. The use of XML makes simple processes quite verbose andhides the structure of the process. Commercial tools typically use a graphical languageto specify the BPEL program, hiding the XML details from the user. This representationvaries across the tools however, since there is no standardized graphical representation ofa BPEL process.

2.3.5 Other Models

There are many other formalisms for specifying and/or executing orchestrations. YAWL[vdAtH05] (Yet Another Workflow Language), is a language that was designed to imple-ment the Workflow Patterns in [vdAtHKB03]. It extends Petri nets with notions such ascancellation, multiple instances and or-join. There are also other standardized languagesfor specifying service compositions. Most notably WSCI [AAF+02] (Web Services Chore-ography Interface), XPDL [XPD] (XML Process Definition Language) and BPML [BPM](Business Process Modeling Language). WSCI is an interface description language, whichidentifies a set of peer services and describes the ways in which they interact. WSCIonly describes the interactions and no concrete implementation for each of the processesis given. XPDL is an language proposed by the Workflow Management Coalition (WfMC)which was aimed to be a generic Workflow language designed for interoperatibility betweendifferent languages. BPML is a BPEL-like XML based language for describing service or-chestrations. There is much lesser support from the industry for BPML, however. BPELseems to be the most popular commercial language for specifying and many enterpriseshave commercial BPEL implementations.

2.3.6 Our Contribution

In this thesis we searched for a suitable formalism for specifying orchestrations and foranalysing their QoS. We chose Orc because it is an elegant and simple mathematical modelwith a few primitive constructs, which can express a variety of orchestration patterns. Sincea partial ordered view of the execution was useful for our QoS analysis, and since the tracesemantics of Orc represents the execution events as a totally order, we first gave a (partialordered) semantics for Orc in terms of colored and dynamic Petri nets [RBHJ06a]. Wedeveloped a simulator for these nets, but we realised that the dynamic calls with thecoding of colored tokens did not make a very efficient implementation, especially in casesof recursion.

As a result, we chose to directly encode Orc as a partial ordered set of events, by giving adenotational semantics for Orc in terms of Labelled Asymmetric Event Structures (LAES)

2.4 QoS issues in Web service orchestrations 43

in [RKB+07b]. Occurrence nets serve as a model of the execution of the orchestration. Thissemantics served as a specification for the implementation of the partial ordered executionin our TorQue QoS analysis tool.

We later extended these occurrence nets with colors to represent QoS parameters. Wecall these nets Orchnets. They serve as a formal basis to study issues like monotonicityin orchestrations (see section 2.4.5), and to study the evolution of QoS parameters in anexecution. More details on this are given in Chapter 7 and Chapter 8.

2.4 QoS issues in Web service orchestrations

In this section we introduce and present QoS issues in the management of Web servicesand their orchestrations. We start by looking at definitions and models for QoS of Webservices in section 2.4.1. Section 2.4.2 deals with automated SLA negotiation. Section 2.4.3defines the QoS composition problem and surveys the different composition techniquesin the literature. We briefly introduce the issue of (non) monotonicity of orchestrationsin section 2.4.5. Finally, section 2.4.6 looks at QoS monitoring techniques. As an whenrelevant, we will briefly mention the contributions of this thesis placing them in the contextof the related work.

2.4.1 QoS of Web services

Defining the QoS parameters of a Web service: Quality of Service is a term whichmay mean different things to people in different communities. In the context of networks,QoS might deal with issues like delays and bandwidths of the network links, or the numberof packets lost in transmissions. Supporting QoS in the network usually refers to havingpacket routers that implement prioritization protocols like DiffServ and IntServ. Thesetechniques attempt to provide a “better than best-effort” performance to certain criticalflows, by providing a higher priority in the routing queues for packets belonging to suchflows.

When talking about QoS in Web services, we often reason at a level that is higherthan network-level QoS, commonly referred to as application-level QoS. There is a widespectrum of non-functional properties (QoS parameters) that are relevant at this level andsome of them can be application-specific. The W3C has aimed in [W3c03] to identify QoSparameters relevant to Web services. We give a list of some of the parameters mentionedthere, and some others which commonly appear in QoS studies of Web services.

• The latency (also known as the delay or response time) parameter, used to denotethe time taken by a service to respond to a client’s request. This might include thedelay introduced by the network when calling a service over the web.

• The throughput of a service is the number of requests that the service is able toprocess in a given interval of time.

• The quality of response or quality of data is a qualitative measure of how ’good’ theresponse of a service is. The exact definition of this parameter would depend uponthe concrete application. For e.g., for an aggregation service that returns price quotesfrom different airlines for a travel route given by the client, the quality of the responsecould be the number different quotes returned to the client. The quality of responsein this case could also depend on the best price quote that is offered to the client.

44 Introduction

• The availability parameter, is a measure of time in which the service is active andresponds to requests from clients. It is usually estimated as the ratio of the runningtime of a service to that of the total time of a window in which it is sampled.

• The reliability of a service represents the capability of the service to perform itsrequired function correctly. It is at times referred to as its successful execution rate.

• The cost or price parameter appears frequently in commercial web services. Usuallyfor each invocation of the service, the client pays a certain price.

• The security of a Web service, involves different aspects to ensure that the message-exchanges between the client and the service are secure.

A wide array of other QoS parameters can be found in the literature, many of whichare variants or combinations of the parameters mentioned above. For e.g., the reputationof a service is sometimes used as a QoS parameter. A service’s reputation is an aggregatedvalue of its clients’ ratings. A client’s rating of a service reflects its overall QoS is clearlyinfluenced by multiple QoS parameters.

How to specify the QoS characteristics of a Web service? A prerequisite for anyform of QoS support in Web services is the clear and unambiguous specification of the QoSbehaviour of the service. There may be different ways to do this.

1. By extending service descriptions languages: Since the service description technologieslike UDDI and WSDL involve only the functional aspects of the service, many proposalshave been made to enhance these specifications to allow the description of the service’s QoS.For e.g. the performance-enabled WSDL (P-WSDL) in [DB07] and the UDDI extension(UX) in [ZCL04]. In these formalisms, the QoS of a service is usually modeled as a tupleof QoS parameters, and by specifying a value (or an interval of values) for each parameter.

2. By using QoS Contracts: Also referred to as Service Level Agreements (SLAs), contractsare agreements made between the provider and the client of a service on the QoS behaviourof that service. Contracts may specify obligations of both the provider and the client of theservice. For e.g., a contract may have a clause which says “provided that the client makesless than five requests per second, the provider assures that these requests are answeredwithin 100 milli seconds”. The first part of this clause is an obligation that the client mustrespect and the latter is an obligation of the provider. A contract can have multiple clauseslike this, all of which together describe the QoS behaviour of the service.

Contracts are typically negotiated offline, before the service is actually invoked by theclient. Any QoS management method involving contracts is accompanied with monitoringtechniques at runtime, to ensure that the obligations in the contract are met. WSLA [KL03]and WS-Agreement [ACD+] are two popular frameworks for specifying contracts for Webservices. We will survey the WSLA framework in section 2.4.6 when we consider QoSmonitoring.

Our Contribution: Probabilistic ContractsMost contracts tend to have clauses which are hard, i.e. the values of the QoS parametersare fixed, or the maximal and/or minimal QoS values are specified. Consider for e.g. theclauses “the response time is always less than 5 msec”, or “service availability is 95%”. Weargue in Chapter 6 that hard contracts do not accurately model the QoS behaviour ofWeb services, which tend to be variable in nature. We instead propose using probabilisticcontracts, where the QoS of a service is modeled by a probabilistic distribution over thevalues of the QoS parameters. We show that probabilistic contracts can help the provider


to avoid overly pessimistic clauses in its contracts, and could also allow it to overbookits resources. To our knowledge, there are very few works in the literature which studyprobabilistic QoS contracts in Web services. An exception is [HWTS07], where the authorsmodel the QoS parameters of Web services as independent, discrete random variables. Theparameters they consider are response time, reliability, fidelity and cost.

In our contract-based approach, the orchestrator establishes probabilistic contracts withthe services it calls and with its own clients. The probabilistic contracts can be obtainedby different ways. The called services specify their QoS behaviour as a probabilistic dis-tribution of their QoS parameters. The distribution can also be approximately specifiedby a set of quantiles. In some cases measurements can also be used to derive the proba-bilistic contract. For e.g., if the called service is freely available (like many Web servicesfrom Google), it is not contracted with. Measurements can also be useful to derive theprobabilistic contracts of the underlying network. In most cases, orchestrations do notestablish contracts with the different network domains that its messages traverse, and someasurements of this kind can be useful to estimate the impact of the network’s QoS onthe orchestration.

2.4.2 SLA Negotiation

An important aspect of QoS management is the process of SLA (or contract) negotiation.

What is SLA negotiation? An SLA specifies the obligations and rights of the differentparties involved in an agreement concerning the service. Typically these obligations andrights are established at the end of a negotiation process, in which the different partiesmake offers or demands with respect to the service. The offers and demands of each partyis usually flexible, and they can make compromises to arrive at an agreement.

Why negotiate? There may be different reasons to negotiate SLAs. Clearly a provider of aservice and its client, who have flexible offers and demands with respect to the service’s QoS,negotiate to arrive at an agreement on their respective QoS obligations. SLA negotiationcan also be done when the entities have limited resources which have to be divided amongstdifferent tasks. For e.g., different service providers can negotiate to reserve a part of theirbandwidth resource for a certain streaming-based application.

Automating SLA Negotiation. SLAs are usually established after (possibly lengthy) ne-gotiation procedures between the different contracting parties. Automated SLA negotia-tion techniques try to simplify the process of SLA negotiation by automating the entireprocess, or parts of it. Automated SLA negotiation has been studied in the context ofnetworks [Pou07], and there have been algorithms proposed for e.g., to automate the reser-vation of resources in networks for streaming-based applications.

There have been attempts to model the SLA negotiation procedure as interacting pro-cesses, or agents. The goal is to build processes that negotiate by communicating witheach other, and arrive at an agreement in the end. Many of these approaches model thenegotiation problem as a constraint satisfaction problem. The contracts, or the QoS de-mands and guarantees of each of the process are expressed in the form of constraints. Thevariables of the constraints are usually the QoS parameters under negotiation. The ne-gotiation succeeds if the constraint problem admits a solution, i.e. there is at least oneassignment to the variables that satisfy all the constraints.

For example, in [BS09] the authors consider processes that interact with each otherthrough a central store, to which they add or remove constraints. The constraints theyconsider are soft, in the sense that a given assignment to the constraint variables has a

46 Introduction

certain preference value (The usual constraints are a special case where there are only twopreference values: accept or reject). The domain of these preference values is taken tobe an absorptive semi-ring, where the preference values are partially ordered. The partialorder on the preference values defines a corresponding partial order on the soft constraintsthemselves. Constraints of the processes that are added to the store are composed into oneglobal constraint using the multiplication operator of the semi-ring, which can be seen asa conjunction operator. For each transition that it makes, a process can specify a range ofpreference values for the solution to the global constraint. If the global constraint does notadmit solutions in this range, that process is blocked. Processes can also retract constraintsfrom the store, as a result of which the global constraint gets relaxed. A relaxation of theglobal constraint can enable a previously blocked process. The negotiation terminates whenall the processes reach a final ’success ’ state.

These automation techniques try to model generic negotiations, and are not particu-larly aimed at negotiations in service orchestrations. For e.g., there is no mention of anyunderlying orchestration. We will look at SLA negotiation in orchestrations, in the contextof QoS composition in the following section.

2.4.3 QoS composition

QoS composition is the process of relating the QoS of the services called in the orchestrationto the overall QoS of the orchestration. In a contract-based approach, this process is alsoreferred to as contract composition. For this, the orchestration first negotiates contracts(SLAs) with the services it intends to call during the orchestration. For each called servicea contract has to be negotiated. The orchestration may then compose these contracts,to get an estimation of its own QoS. This estimate will help the orchestration negotiatecontracts with the clients of its own service.

There are both analytical and simulation based techniques for QoS composition. Beforelooking at these techniques, we make a few observations about the nature of orchestrationsover the web, which motivate and influence our approach to QoS composition.

1. The Open World paradigm: The QoS of the orchestration is mainly influenced bythree entities i) The orchestration server, ii) The web services called, iii) The underlyingtransport network. The orchestration can have details and models about its local resources.It can not however, expect to have these detailed resource models for each of the webservices it calls. The same is true about the underlying network. Web services can behosted anywhere across the globe and queries to it might traverse the network of numerousproviders with different (and unknown) resources. Moreover details of the entities’ resourcesare often confidential and are not disclosed. It is also not possible for the orchestration toknow the nature of the external traffic at the called services, and of the underlying network.

2. Data, Time and the execution flow of orchestrations:

Unlike the flow in networks, the data of the query and time can influence the flow ofthe execution. We look at this through two simplified versions of the CarOnLine exampleof figure 2.1. The first version in figure 2.9 is simpler than the second version in figure 2.10.

In the orchestration in figure 2.9 the calls to the garages are unguarded i.e., withoutan associated timer. There are two fixed insurance services that are called for every typeof input car. This orchestration has a fixed execution flow. There are no choices made inthe orchestration and so each call to the orchestration invokes the same set of services.

The orchestration in figure 2.10 is slightly more complex since it has a data dependentchoice “car = deluxe”. The flow of the execution here is not fixed and it depends on the


GarageA GarageB

Best Offer


minmin

sync

CarOnLine Request

Figure 2.9 – CarOnLine orchestration, without Timeouts and Data dependant choices in the execution flow.

input data. If the input car is a deluxe car GoldInsure is called, if not InsureAll and InsurePlus

are called.In the CarOnLine orchestration of figure 2.1, in addition to the data-dependent choice,

there are timeouts that guard the calls to the garages. Since the occurrence of a timeoutmakes the return value of the guarded call to be ignored, the best offer value is dependson the value of the timers that guard the garage calls.

Analytical approaches to QoS composition:

Queuing networks: Queueing networks [Kle75], which have successfully been used in mod-elling and predicting behaviours in telephony networks, have also been used in the perfor-mance evaluation of software. Typically a software architecture specification of the systemis compiled and transformed into a queueing model which is then analysed. The use ofqueueing models is attractive since the theory gives analytical solutions to interesting per-formance measures like the end-to-end response time, the average throughput of the systemand measures of the system’s utilisation. There are also a number of tools which supportthe performance analysis of Queueing network based models. There are limitations to us-ing Queueing network models however, and some of the assumptions of the theory for e.g.infinite queue length, do not hold in real-life scenarios.

Stochastic Petri nets and Stochastic Process Algebras: Stochastic Process Algebras [HHK02]and Stochastic Petri Nets (SPN) [MBC+98] are other interesting formalisms that have beenused in the performance evaluation of software systems. They serve as a combined func-tional and non-functional model, where delays (usually an exponential distribution) areassociated with the actions of the system. From these models, a Markov chain which mod-els the (timed) dynamic behaviour of the system can be derived. The numerical solutionsto the Markov chain gives a steady-state distribution which gives the probability of thesystem being in any given state. The steady-state distribution can be used to computedifferent performance measures like the throughput of a transition or the utilisation ofsystem resources.

48 Introduction

GarageA GarageB

Best Offer


minmin

sync

CarOnLine Request

car=deluxe

GoldInsure

yes no

merge

Figure 2.10 – CarOnLine orchestration, without Timeouts (but with Data dependant choices) in the executionflow.

Analytic techniques for composing QoS are interesting in that they are fast and thesolutions are precise. We however argue that these techniques are not suitable for QoSanalysis of generic orchestrations, since the QoS models that they consider are too sim-plistic, and are usually unrealistic. For e.g., when using SPN based analysis, only thesimplistic orchestration in figure 2.9 can be analysed, and that too only when exponentialdelays are assumed for all the services involved. Whenever time or data-dependant choicesarise in the orchestration, like in the example of figure 2.10 and figure 2.1, or when theservices have a more complex QoS model, these techniques will not work. We choose todo simulation based analysis instead, which allows the use of realistic QoS models. Sim-ulations do not give exact solutions for their input models, but the solutions are derivedthrough tunable approximations.

Simulation approaches:

Simulations based techniques are free from many of the constraints of the analytical tech-niques. For e.g., they can use generic distributions for delay behaviours and can have datadependant choices in their models. There has not been a lot of work in the literature whichuses simulations in the context of Web service orchestrations.

In [CMS+03] the authors use WSFL (Web Service Flow Language) - a process com-position language proposed by IBM, which was a pre-cursor to BPEL - to model webservice compositions. The WSFL specification is translated into a simulation model inJava (JSim) which is simulated for performance analysis. The inputs to the JSim modelare the service-time distributions for each of the services of the orchestration. For themutually exclusive choices, the probability of taking a certain branch is specified. Thecomposition is then simulated and statistics like minimum, maximum and the mean of theresponse times of each of the services is computed. The simulator can also display thenumber of activities being queued up at any given server of the process. The authors do


not seem to compute end-to-end QoS estimates for the orchestration however. The studyis focused on the latency parameter, and other possible QoS parameters for Web services,like the ones specified in section 2.4 are not considered.

Web Services Performance Analysis Center (sPAC) [SL05], is a tool that models andsimulates Web service orchestrations. The orchestration is modeled as a UML activitydiagram which is translated into a simulation model. They also make actual invocations toservices over the Web to derive performance metrics which are then fed to the simulationmodels. As in [CMS+03], the authors seem to focus only on the latency parameter. Preciseinformation on the orchestration, simulation and performance models used are not givenin the paper.

Mobius [Mob] is a performance evaluation tool developed by the Performance Engineer-ing Research Group at the University of Illinois at Urbana-Champaign. In Mobius, theuser can model his system under different formalisms like Stochastic Petri Nets, StochasticProcess algebras, Queuing networks or Markov chains. The actions or steps taken by themodel can have delay behaviour associated to them, which is distributed according to oneof the standard probabilistic distributions such as uniform, exponential, beta, geometric,etc. Complex models can be built from smaller models in a bottom-up fashion by combin-ing models in simple pre-defined patterns. For e.g. two Petri Nets with common places canbe composed together into a single net. Mobius supports both analytical techniques andsimulation based solutions for its models. Though not targeted specifically for QoS analysisof orchestrations, the Mobius simulator can be used to do performance evaluation of Webservice orchestrations, for e.g. when they are modeled as Stochastic Petri Nets. UnlikesPAC, it can not invoke Web services and get their performance measures. Mobius alsois aimed at latency-specific performance evaluation, and is unable to model and composemulti-dimensional QoS parameters like those mentioned in section 2.4.1.

Our Contribution: In this thesis we give a method for composing randomly distributedQoS parameters, based on simulations. For this, we give algebraic rules that captureshow the QoS parameters evolve in an execution. The QoS parameters we consider can bemulti-dimensional and partially ordered. Starting from contracts negotiated with the calledservices, we show how to estimate the contract that the orchestration can make with itsclients. Our contract composition technique is coupled with a re-negotiation process and isiterative: in case the demand and guarantee clauses of a contract is not acceptable to anyof the parties, we accordingly relax or strengthen the clauses, and re-run the compositionprocedure.

2.4.4 QoS-based Orchestration synthesis

In the past few years a lot of literature on QoS in Web service orchestrations has dealtwith the following problem on synthesising orchestrations: Assume that the orchestration’sstructure is known, and that there is a (functional) model of the orchestration. The actualservices called in the orchestration are however not specified in this model. Each service callcan be done by one of possibly many candidate services of that call, and all the candidateservices have the same functionality. The candidate services have different QoS behaviourshowever, and the problem is to instantiate the orchestration with the services such thatthe QoS of the overall orchestration is ’optimum’.

The above problem is commonly referred to as the Web Service Composition (WSC)problem in the literature [AP07], but it is different in nature from the composition prob-lem that we consider. The WSC problem is an orchestration synthesis problem, wherethe component services of the orchestration are chosen such that the orchestration’s QoS

50 Introduction

is optimised. In our QoS composition problem, we start with the assumption that thecandidate services are already chosen. Given these services and their QoS behaviour, theQoS composition process tries to derive the orchestration’s QoS behaviour.

It may seem like that the QoS composition problem we consider is just a simplisticversion of the above orchestration synthesis problem and so it is already solved by themany papers in this area. This is not true. While the orchestration synthesis techniquesdo compose the QoS of the services to determine the orchestration’s QoS, they differ fromus in the QoS models they consider. All these studies consider the orchestration’s QoS tobe a tuple of fixed values, one for each QoS parameters. Given these QoS models, simpleaggregation rules are sufficient to derive the orchestration’s QoS (which is again a tuple ofQoS values). For e.g., the rules in [AP07] for deriving the orchestration’s QoS are given inTable 2.1.

QoS Parameter Aggregation Function

Price (p)∑

s∈Ex

ps

Latency (d)∑

s∈Crit(Ex)

ds

Availability (a)∏

s∈Ex

as

Data Quality (q) mins∈Ex

(qs)

Table 2.1 – Aggregation rules in [AP07]. Ex, an execution of the orchestration, is the set of services s thatare called in that execution. (An orchestration has many possible executions due to the presence of choices init). For price parameter p, ps denotes the price value of service s (and similarly for the other QoS parameters).Crit(Ex) is a path of the execution Ex that is “critical” for (or has the maximal) latency.

The QoS models we consider are probabilistic and are much richer than a simple tupleof values. The use of such realistic models however has a price: we are not able to usesimple rules like those in Table 2.1 to directly derive the orchestration’s QoS.

There have been many proposals to solve the WSC problem, mostly by modeling itas an optimisation problem [ZBN+04, AP07, CAH05] or as a goal-oriented planning prob-lem [LAP06]. We will now survey some of these approaches.Approaches using Optimization: In [ZBN+04], the authors model the orchestration as

a Statechart, and use integer programming to find the optimum QoS for each possibleexecution path of the Statechart (multiple execution paths arise from the presence of choicesin the Statechart). The QoS of each candidate service is assumed to be a tuple of valuescorresponding to the parameters (cost, response time, reputation, successful execution rate,availability). The overall QoS of the orchestration for a given execution path is composedfrom the QoS of the services in the path using simple operations like addition, multiplicationand maximum. Thus the orchestration’s QoS is expressed as linear constraints over theQoS of the services in the path. The orchestration’s multi-dimensional QoS is totallyordered using a user specified utility function, which assigns weights to each QoS parameter.The user can also impose constraints on the QoS of the orchestration, for e.g. the overallresponse time should be at least T seconds or the cost should be at most P units. The solution


to the optimisation problem finds the component services such that all the constraints aresatisfied, and the utility value is maximal. Since this approach optimizes each executionpath individually, the solutions have to be combined in the end to get one single optimalsolution for the whole orchestration.

In [AP07] the authors use a similar approach, modelling the composition problem as amixed integer linear programming problem. The authors here use orchestrations specifiedin BPEL, and consider the QoS parameters of price, reputation, response time, availabilityand data quality. The main difference here is that the optimality criteria is taken to be aweighted average of the QoS of the different execution paths, thus avoiding the problemof optimizing each path separately and then merging them in the end. They also allowspecifying some additional constraints, like requiring that two distinct tasks in the orches-tration must be done by calling the same service. In the case where there are no feasiblesolutions for the constraints imposed, they also propose a re-negotiation technique. Therenegotiation however requires an exhaustive search to identify solutions which satisfy themaximum number of constraints.

There have also been proposals to use genetic algorithms to solve the QoS optimisa-tion problem. In [CAH05] the authors use a multi-objective evolutionary algorithm calledNSGA-II to solve the optimization problem. Instead of considering a utility function tototally order the multi-dimensional QoS values, the algorithm treats these QoS values aspartially ordered. The algorithm thus computes many possible solutions, each one beingmaximal according to the partial order (a QoS tuple is maximal if no other tuple existssuch that all its components are better than or equal to the maximal value). Geneticalgorithms have the advantage that the optimization constraints do not have to be linear,and any non-linear function can be used.

Optimisation-based approaches are useful since they can find quick solutions to the QoScomposition problem, specially when the size of the candidate services is not large. Theyare also flexible in the sense that constraints can be modified over the course of time, andthe optimisation process can be re-run without a huge overhead. These techniques howeverassume the QoS of a service to be a constant value, and any kind of variation in the QoSof the services might result in the orchestration not meeting its constraints. In particular,these approaches directly suffer from the problem of non-monotonicity of orchestrations,that we describe in section 2.4.5.

Approaches based on Planning: Planning-based techniques have been used to solve a re-lated but more generic problem: automatically synthesising orchestration executions thatrespect certain constraints. Typically the orchestration client specifies the constraints,which can be over both functional and non-functional values. From a knowledge domainor a planning domain (usually a transition system), the planner tries to synthesize execu-tions that meet the constraints, using a set of candidate services that have a subset of therequired behaviour. We will look at one paper by Lazovik et. al [LAP06], which capturesthe generic flavour of many planning based techniques.

In [LAP06], the authors interleave the process of planning the orchestration with itsexecution. The planning process has two inputs: 1) A set of constraints defined overcertain variables, which typically specify the requirements of the orchestration’s client.These variables can model both functional and QoS parameters of the orchestrations. 2)A non-deterministic planning domain, which is essentially the orchestration modeled asa labelled state transition system. A transition can change the values of the variables.From these two inputs, a planning unit first tries to synthesize an execution path in thetransition system which satisfies the client’s constraints. This synthesis is done by a modelchecking algorithm. However since the response of the services is not known a priori, as the

52 Introduction

execution progresses the original constraints might no longer be satisfiable. In this case, amonitoring unit tries to find substitute services, and the planning algorithm is re-run.

Planning based techniques give interesting solutions for automated composition in or-chestrations. They are very flexible, since the orchestration flow is not required to bestatic and pre-defined, but can dynamically evolve as the execution progresses. Howeverthese techniques may not be well suited for the problem of optimally selecting servicesaccording to their QoS, since any path, not the best path, that satisfies the constraints canbe taken. The literature on planning techniques for orchestrations use transition systemsas the orchestration model, which does not faithfully represent concurrent execution thatcommonly occurs in orchestrations. Planning techniques with concurrent models do existhowever, but they are not very common.

Other approaches: In [HWTS07], the authors take the bottom-up approach to solve the QoScomposition problem in a probabilistic setting. They model the QoS of the services as a vec-tor with the four parameters response time, reliability, fidelity and cost, where each param-eter is assumed to be a discrete random variable. The Probability Mass Function (PMF) ofthe QoS parameters is known. They consider orchestrations with the five combination con-structs (sequence, parallel split-join, exclusive choice, discriminator and loops), and giverules to derive the PMF of the aggregate structure from that of its of its components. Fore.g, for two discrete random variables X and Y with domains Dom(X) = x1, x2 . . . xmand Dom(Y ) = y1, y2 . . . yn respectively, and PMFs fX and fY , their sum Z = X + Yis also a random variable. The domain of Z is Dom(Z) = z1, z2, . . . zk, where eachzi ∈ Dom(Z) iff zi = x + y for some x ∈ Dom(X) and y ∈ Dom(Y ). The PMF of Z is

fZ(zi) =∑

x+y=zi

fX(x).fY (y)

This addition rule can be used for e.g. to derive the PMF of response time parameter, forthe sequence construct. The authors give similar combination rules for the multiplication,maximum, minimum and exclusive selection of random variables. These are used to com-pose the four QoS parameters in the different constructs they consider. The idea of usingprobabilities to model QoS parameters is interesting, and in this thesis we advocate usingthis approach. The use of random variable that are discrete allows quick computation ofthe PMF of the orchestration’s QoS. This is not the case when the random variables arecontinuous, since no such composition techniques exist for generic distributions. However,to get a quick PMF calculation, the authors are obliged to use techniques to reduce thesample space of the random variables, which in turn results in less precise estimates.

2.4.5 Monotonicity in Orchestrations:

To the best of our knowledge, all studies on QoS composition of orchestrations do notconsider the (non) monotonic behaviour of orchestrations. Intuitively, an orchestrationis non-monotonic if “improving” the QoS of one of its services can “worsen” the overallorchestration’s QoS. In a contract-based approach where contracts are established betweenthe orchestration, the services and the clients, non-monotonicity of the orchestration ishighly undesirable. A non-monotonic orchestration could violate the contract with itsclient if a called service performs better than what it has promised in its contract.

The phenomenon of non-monotonicity in general is not new, and has been observedbefore for e.g., in [CS92]. The authors here give performance bounds for deadlock-freestochastic Petri nets with a pre-selection policy for resolving conflicts (in a pre-selectionpolicy, enabled transitions that are in conflict are fired according to a pre-specified ratio,


which is independent of the delay of the transitions). They consider the throughput of atransition in a steady-state, i.e., its average number of firings per time unit, and givean upper bound for a transition’s steady-state throughput. From these bounds, theyderive some monotonicity results on the throughput of transitions. For e.g., they showthat lowering the service rate of a transition (i.e., making it faster) can not decreasethe throughput bound of the other transitions. They remark that the intuitive idea that“increasing the service time of a transition slows the system” does not hold in general andgive an example of a Petri net with a race policy where this does not hold. This exampleis shown in Figure 2.11.

p2

p3

p1

t2

t3

t4

t1

Figure 2.11 – A non-monotonic net. The delays of transitions t1, . . . t4 follow an exponential distribution withrates λ1, . . . λ4 respectively. Under a race policy, increasing λ1 ( i.e., decreasing t1’s delay) can result in a lowerthroughput for t4 if λ2 is assumed to be significantly smaller than λ1 and λ3.

In a net with a pre-selection policy, where conflicts are resolved independently of thedelay of the transitions, the non-monotonic behaviour in Figure 2.11 does not appear.However under a race policy, timing and control is coupled, and the phenomenon of non-monotonicity is very real. In orchestration engines where timeouts and other preemptionmechanisms are supported, such race policies are implemented, and so orchestrations canbe non-monotonic.

Our Contribution: In this thesis we studied the property of non-monotonicity in Webservice orchestrations, restricting our study to the latency parameter (Chapter 7). Weformally define monotonicity of orchestrations using colored occurrence nets, and we givenecessary and sufficient conditions for an orchestration to satisfy monotonicity. We alsostudy probabilistic monotonicity, when the QoS of the services are probabilistic distribu-tions. We then lifted the analysis from the case of latency to generic QoS parameters(Chapter 8).

2.4.6 QoS monitoring

Any comprehensive QoS management framework for orchestrations should be able to mon-itor the performance of the services called in the orchestration. The services need to bemonitored since any violation of their contract might cause the orchestration to violate thecontract with it’s clients. If a contract violation is detected, the orchestration might decideto reconfigure itself or impose penalties on the violating party.

Web Service Level Agreement (WSLA) [KL03] is the most popular framework formonitoring the QoS of Web services. It consists of: 1) The WSLA language to specifyan SLA between a provider of a Web service and its consumer. 2) A runtime monitoring

54 Introduction

architecture to measure the service’s performance and detect any violations of the SLAs.The WSLA language is an XML-based language which can be flexibly used to specify SLAs.A WSLA document roughly consists of three parts :

1. The Parties section contains identification information about the service provider,the consumer and also of possibly other third-parties involved in the monitoringprocess.

2. The Service description section specifies the SLA parameters concerning the service.Different SLA parameters may be defined as a combination of basic metrics. Thedefinition of basic metrics, and how they are to be measure is also given in thissection.

3. The Obligations section defines the guarantees involving the SLA parameters, andthe actions to be taken if the guarantees are not met.

A typical scenario involving the SLA specification and monitoring is shown in Figure 2.12 [KL02].At first the customer and the provider build the WSLA document. At runtime, the infor-

Figure 2.12 – The WSLA monitoring architecture.

mation in the WSLA document is used to deploy different units: The Measurement unitcomputes higher level SLA parameters from the basic metrics; for e.g. the average responsetime over the calls in a certain time period. The values of these parameters are reported tothe Condition Evaluation unit, which evaluates the conditions (the guarantees) specifiedin the WSLA and checks if any violations have occurred. If the contract has been violated,the Management unit takes appropriate actions. The monitoring unit can be deployedboth at the customer and the provider sites, and possibly at third-party monitoring sites.The results of the measurements and evaluations at each party can be communicated tothe other party to ensure coherence of the monitoring procedure at the two sides.

The WSLA language provides a flexible foundation for defining SLA contracts. Insteadof defining a fixed set of SLA parameters, the parties are given the flexibility to definetheir own metrics and parameters. This flexibility can allow us to specify soft contracts in


WSLA: the different quantiles of a probabilistic distribution can be specified as differentSLA parameters. However, an obvious consequence of defining custom SLA parametersat a low-level of abstraction is that there is no easy way to relate SLA parameters acrossdifferent WSLA documents. The WSLA framework seems to be well suited for offline SLAnegotiations between two parties.

In a similar work [SDM02], the authors introduce an XML-based language for specify-ing SLAs and they present an automated and distributed engine for monitoring them. TheSLA contains a set of clauses based on measurable quantities which are evaluated at run-time by a Web Service Management Network (WSMN) agent. Like in WSLA, they allowmeasurements to be done at multiple sites (the service providers and consumers in partic-ular) by different WSMN agents, and they define a protocol to exchange the measurementinformation of the different WSMN agents.

Our Contribution: In this thesis, since we propose the use of probabilistic contracts,we propose a technique to monitor probabilistic contracts, and to detect violations if any.Our monitoring technique based on statistical testing, and permits to define and detectviolations in the behaviour of QoS parameters which randomly distributed. Our workon contract monitoring is complementary to the monitoring frameworks like WSLA. Wecan plug in our monitoring techniques into the WSLA platform to build a monitoringframework for probabilistic contracts.

56 Introduction

2.5 Thesis organisation, contributions

This document is structured around the publications made during this thesis, each chapterroughly corresponding to a published paper. We now give an overview of this structure,outlining the principal contributions of each part.

Chapter 3: A Net system semantics for Orc

Web Services orchestrations require a firm mathematical basis for their development. Westart from the Orc formalism proposed by J. Misra and co-workers, at Austin University.Orc is small and elegant and captures the essence of orchestrations. We translate Orc intocolored Petri net systems, a generalization of Petri nets allowing to handle recursion—thisformalism was recently proposed by Devillers et al [DK04]. Colored Petri nets are usefulin the analysis of non-functional or QoS aspects of orchestrations.

Contributions:

1. Gives a colored Petri net semantics for Orc expressions.

2. Gives a transformation of an Orc program to a system of finite nets.

3. Shows how colors can be used in a marking equivalence relation, to simulate the(possibly recursive) calling of expressions.

4. Demonstrates how these nets can be used for QoS analysis.

Publication: A short version of this paper [RBHJ06a], without all the details on thetranslation was published in the Second International Symposium on Leveraging Applica-tions of Formal Methods, Verification and Validation (ISOLA) 2006. The complete version,as presented here appears as an IRISA Technical Report no. 1780 [RBHJ06b].

2.5 Thesis organisation, contributions 57

Chapter 4: Event Structure Semantics of Orc

One challenge in developing wide-area distributed applications is analyzing the system’snon-functional properties, including timing constraints and internal dependencies that canaffect quality of service. Analysis of non-functional properties requires a precise formalsemantics for the language in which the system is written; but labelled transition systemsand trace semantics, which are commonly used for this purpose, do not facilitate this kind ofanalysis. Event structures provide an explicit representation of the the causal dependenciesbetween events in the execution of a system. But event structures are difficult to constructcompositionally, because they cannot easily represent fragments of a computation. In thispaper we present a partial-order semantics based on heaps (an explicitly encoded form ofoccurrence nets with read arcs), which naturally represent fragments of behavior. Heapsare then easily translated into asymmetric event structures. The semantics is developedfor Orc, an orchestration language in which concurrent services are invoked to achieve agoal while managing time-outs, exceptions, and priority. Orc, and this new semantics, arebeing used to study quality of service (QoS) for wide area orchestrations.

Contributions:

1. Gives an Event Structure semantics for Orc.

2. Introduces the notion of heaps to represent fragments of a computation, and showshow to extract an asymmetric event structure from it.

3. Gives a heap denotation for Orc expressions, including recursive expressions.

4. Establishes the correspondence between the Event structure semantics and the SOSsemantics of Orc.

Publication: A short version of this paper [RKB+07b] was presented at the 4th Interna-tional Workshop on Web Services and Formal Methods (WS-FM) 2007. The longer versionpresented here is published as an INRIA Research Report no. 6221 [RKB+07a].

58 Introduction

Chapter 5: Branching Cells for Asymmetric Event Structures

In this chapter, we extend the notion of branching cells introduced for prime-event struc-tures in [AB06, AB08] to asymmetric event structures (AES). This involves extending thenotion of minimal conflict for AES. These notions are then used to compute the probabil-ity of the occurrence of any given event of a stochastic AES under a race policy, when theevents are assumed to occur with exponential delays.

Contributions:

1. Defined the notion of minimal conflict, stopping prefix and branching cells for Asym-metric Event Structures.

2. Extended results on branching cells in [AB06, AB08] from prime event structures toasymmetric event structures.

3. Used the previous notions to calculate the probability of the occurrence of an eventin a stochastic asymmetric event structure.

Publication: The extension of branching cells to asymmetric event structures is unpub-lished work. The computation of the occurrence probability was done as part of the paper“Critical paths in the Partial Order Unfolding of a Stochastic Petri Net” [BHR09], whichwas presented at the 7th International Conference on Formal Modelling and Analysis ofTimed Systems (FORMATS) 2009.


Chapter 6: Probabilistic QoS and soft contracts for transaction basedWeb services orchestrations

Service level agreements (SLAs), or contracts, have an important role in web services. Theydefine the obligations and rights between the provider of a web service and its client, withrespect to the function and the Quality of the service (QoS). For Web service orchestrations,contracts are deduced by a process called QoS contract composition, based on contractsestablished between the orchestration and the called web services. These contracts aretypically stated in the form of hard guarantees (e.g., response time always less than 5msec). Using hard bounds is not realistic, however, and more statistical approaches areneeded.

In this paper we propose using soft probabilistic contracts, which consist of a probabilitydistribution for the considered QoS parameter—in this paper, we focus on timing. We showhow to compose such contracts, to yield a global probabilistic contract for the orchestration.Our approach is implemented by the TOrQuE tool. Experiments on TOrQuE show thatoverly pessimistic contracts can be avoided and significant room for safe overbooking exists.

An essential component of SLA management is then the continuous monitoring of theperformance of called web services, to check for violations of the agreed SLA. We proposea statistical technique for run-time monitoring of soft contracts.

Contributions:

1. Introduces a contract based approach for QoS management in orchestrations.

2. Proposes the use of soft, probabilistic contracts to model the QoS of services.

3. Shows how to compose these contracts, to derive the orchestration’s QoS.

4. Proposes a monitoring technique, suitable for probabilistic contracts.

5. Demonstrates the possibility of resource overbooking at the site of the orchestration,through experimentations.

Publication: A first version of this paper [RBHJ07] was presented at the IEEE Inter-national Conference on Web Services (ICWS) 2007. The section on monitoring appearedin the mini-conference of 11th IFIP/IEEE International Symposium on Integrated Net-work Management (IM) 2009. The version presented here [RBHJ08] appears in the IEEETransactions on Service Computing.

60 Introduction

Chapter 7: Monotonicity in Service Orchestrations

Web Service orchestrations are compositions of different Web Services to form a new service.The services called during the orchestration guarantee a given Quality of Service (QoS) tothe orchestrator, usually in the form of contracts. These contracts can then be used by theorchestrator to deduce the contract it can offer to its own clients, by performing contractcomposition. An implicit monotonicity assumption in contract based QoS managementis: “the better the component services perform, the better the orchestration’s performancewill be”.

In some orchestrations, however, monotonicity can be violated, i.e., the performance ofthe orchestration improves when the performance of a component service degrades. Thisis highly undesirable since it can render the process of contract composition inconsistent.

In this paper we formally define monotonicity for orchestrations modeled by ColoredOccurrence Nets (CO-nets) and we characterize the classes of monotonic orchestrations.Contracts can be formulated as hard, possibly nondeterministic, guarantees, or alterna-tively as probabilistic guarantees. Our work covers both cases. We show that few orches-trations are indeed monotonic, mostly because of complex interactions between control,data, and timing. We also provide user guidelines to get rid of non-monotonicity whendesigning orchestrations.

Contributions:

1. Brings attention to the problem of non-monotonicity in orchestrations of Web ser-vices.

2. Formalises the notion of non-monotonicity through colored Occurrence nets.

3. Gives necessary and sufficient conditions — both mathematical and structural — foran orchestration to be monotonic.

4. Extends the study to probabilistic behaviour by defining probabilistic monotonicity,and shows the correspondence between probabilistic monotonicity and the previouslydefined notion of monotonicity.

Publication: A short version of this paper [BRBH09] appeared in the 30th InternationalConference on Application and Theory of Petri Nets and Other Models of Concurrency(ICAPTN) 2009. A longer version [BRBH08] appears as an INRIA Research Report no.6528.


Chapter 8: A Theory of QoS for Web Service Orchestrations

In this paper we develop a comprehensive theory of QoS for Web service Orchestrations. Tosupport multi-dimensional or composite QoS parameters, QoS domains must be partially,not totally, ordered. We identify the needed algebra to capture how QoS get transformedwhen synchronising service responses and to represent how a service call contributes to theend-to-end QoS of the orchestration. Contract based approaches implicitly assume that,the better a called service performs, the better the orchestration does. This property, calledmonotonicity, not always holds, however. We provide conditions ensuring it. Then we showhow SLA or contracts between the orchestration and the services it calls can be composedto derive an SLA or contract between the orchestration and its clients. To account forhigh variability in measured QoS parameters for existing Web services, we support bothprobabilistic and non-probabilistic approaches. Finally, we propose a mild extension of theOrc language for service orchestrations to support flexible QoS management according toour theory.

Contributions:

1. Extends and generalises the QoS management framework of [RBHJ08] to multi-dimensional QoS parameters.

2. Formalises the notion of composite QoS parameters and their domains, which can bepartially ordered. Defines an associated algebra which models how the QoS parame-ters evolve during an execution of the orchestration.

3. Proposes a flexible contract composition procedure, relating the obligations and guar-antees of the contracts between the pairs (orchestration, called service), to the con-tract of the (client, orchestration) pair.

4. Extends the probabilistic monitoring technique to the case of generic, partially or-dered QoS parameters.

5. Proposes some language extensions to Orc, for supporting QoS management.

Publication: A first version of this work [RBJ09b] appeared in the IEEE InternationalConference on Web Services (ICWS) 2009. The extended version [RBJ09a] presented herehas been accepted for publication in the International Journal of Web Services Research(JWSR).

62 Introduction

Chapter 3

A Net system semantics for Orc

Sidney Rosario, Albert Benveniste, Stefan HaarIRISA/INRIA Rennes,Campus de Beaulieu, Rennes. France.

Claude JardEcole Normale Supérieure de Cachan,Campus de KerLann, Bruz. France.

AbstractWeb Services orchestrations require a firm mathematical basis for their development. Westart from the Orc formalism proposed by J. Misra and co-workers, at Austin University.Orc is small and elegant and captures the essence of orchestrations. We translate Orc intocolored Petri net systems, a generalization of Petri nets allowing to handle recursion—thisformalism was recently proposed by Devillers et al [DK04]. Colored Petri nets are usefulin the analysis of non-functional or QoS aspects of orchestrations.

64 A Net system semantics for Orc

3.1 Motivation

Web Services (WS) orchestrations and choreographies have been the subject of numerousstudies and standardisation actions [Bpe07, KBRL04]. Developing complex orchestrationsrequires techniques and tools to formally analyse both the functional behavior of an or-chestration as well as its Quality of Service (QoS) characteristics. While there have beennumerous works dealing with the functional aspects of Web service orchestrations, veryfew address the non-functional aspects involved.

Foundational studies on Web transactions and orchestrations are found, e.g., in [Vir04,LZ05], using abstract state machines, process algebras, or variants of the π-calculus. Severalsemantic studies have been performed for BPEL, e.g., [AFFK05, Rei05]. Studies closestto our are [Mar04, HSS05, OVvdA+07]; they provide a translation of BPEL into acyclicPetri Nets, aiming at property verification.

QoS for orchestrations is an important but delicate issue. It faces the closed/open worldparadox: orchestrations are specified as stand alone “closed” entities. Still, they operate inan open environment, by sharing resources with other orchestrations, other Web Services,and other computing and communication activities. In this respect, orchestrations are justanother client of a networked infrastructure.

Some recent work [MA01, SDM02, TGRS04, SL05] has been devoted to QoS for WebServices. This work either considers defining and conceptualizing Service Level Agreements(SLA) or address QoS from an experimental viewpoint. There is no work we know that pro-vides formal modeling of WS orchestration QoS in a way that compares with what has beendeveloped for the functional aspects. As a consequence, issues of SLA composition such asmentioned above are not well understood, except for very crude SLA parameterizations.

In this chapter we develop semantic foundations for the functional aspect of WS or-chestrations. We briefly show how our model can be extended to accomodate QoS analysistoo. We provide a semantic basis for orchestrations allowing for unbounded but finite re-cursion, and therefore providing a clean treatment of dynamic instantiation. To clarify theissue, we have chosen to analyse the Orc formalism proposed by J. Misra et al. [MC07]to specify orchestrations. The interest of Orc is that it is nicely designed, based on fewprimitive constructs; it implements the so-called “tree-programming” paradigm, where aninitial query can be forwarded in parallel and/or cascade to other sites that will contributeto building the answer; answers to partial sub-queries are then progressively collected andeventually returned to the original caller. This paradigm exactly fits the concept of orches-tration, it is more restricted than the model needed to encompass choreographies, wheredifferent WS act as peers. Semantic studies for Orc have been developed by Misra etal. [MC07, KCM06]. Our semantics targets Petri net systems [DK04] with colors, allowingfor a finite representation of infinite nets resulting form dynamic instantiation.

The advantage of this semantics is that a mild extension of it allows us to capture QoSin a mathematically sound way: the formal correlation mechanism that is provided withtoken colors allows tracking exceptions, and capturing response time is simply performedby adding one more color.

3.2 The orchestration model: Orc

3.2.1 Orc syntax and intuitive semantics

The syntax of the Orc and its intuitive semantics are shown in Table 3.1. Details of theOrc language are found in [MC07, KCM06].

3.2 The orchestration model: Orc 65

F : Expression name, S : Site, x : Variable, c : Constant, p : Parameter

x :∈ F (p) Evaluates F (p), assigns 1st value received to xF (p) Expression call (new instance thereof)S(p) Site call, returns at most 1 value or site identifier

0 zero expression; returns nothing and stops1 one expression; mirrors the result of its previous computation

let(p) publishes the value of pif(b) tests the status of boolean channel b:

if true then passes control,otherwise behaves like 0

f | g Symmetric and concurrent parallel composition:the returns from f and g are interleaved

f > x > g Sequential composition (x optional): each valuereturned by f causes a fresh evaluation of gand this value can be passed to g via channel x

f where x :∈ g Asymmetric parallel composition: the 1st valueproduced by g is passed to f via channel x

Table 3.1 – The abstract syntax of Orc and its intuitive semantics

Orc expressions specify orchestrations; they return zero, one, or a stream of values. Incontrast, site calls return at most one value. Timeouts are special sites that raise timebased exceptions. The time that is referenced in timeouts is the only local time that isattached to the orchestration; no time attached to distant sites is required; this avoidsclassical inconsistency problems regarding time, caused by distribution. Also, in the Orcmodel, the only mode for a site call is by invocation, service push cannot be capturedin Orc. Misra et al. call this restricted programming model “tree programming”. Thisparadigm does not exhibit all difficulties of full fledged distributed programming and isstill adequate for the simpler case of WS orchestrations—but not for choreographies whereorchestrations interact as equal peers. We now present our toy example that will supportthe rest of our presentation.

3.2.2 The CarOnLine illustrative example

The example is shown in Table 3.2. The service described consists in getting the pair

(BestCarPrice,BestCredit)

from the CarOnLine service. CarOnLine decomposes into the following sequence of opera-tions: 1/ getting, from CarPrice, the BestCarPrice; 2/ the latter is passed as a param-eter to CreditRate service, which returns BestCredit; if CarPrice returns an exception“Fault”, then the query is reemitted (recursive call). Service CarPrice is a broadcast ofthe same query to a pool of garages. Each garage of the list may return a price or anexception “Fault”, emitted on time out—note that the Rtimer sits on the orchestrationsite, so that no reference to global time is made. Observe that exceptions are describedas part of the orchestration itself; this is the normal way of dealing with exceptions whenspecifying WS orchestrations.


Client :in let(BestCarPrice,BestCredit)

where (BestCarPrice,BestCredit) :in CarOnLine

CarOnLine =

CarPrice >BestCarPrice>

if(BestCarPrice!=Fault) >>

CreditRate(BestCarPrice) >BestCredit>

let(BestCarPrice,BestCredit)

| if(BestCarPrice=Fault) >> CarOnLine

CarPrice =

broadcast(GarageList) >values> Min(values)

Min(values) returns Fault if values=Fault and otherwise it returns the minimum among thetuple of received (valid) values; broadcast is defined as follows:

broadcast([]) = let(Fault)

broadcast(g:gs) = mux(u,v) where

u :in g | Rtimer(timeout) >> let(Fault)

v :in broadcast(gs)

mux, the “multiplexer”, is a site call used to filter out the “Fault” values due to the timeout; for allvalid values x and y, mux is defined as:

mux(x,y) = (x,y)

mux(x,Fault) = x

mux(Fault,y) = y

mux(Fault,Fault) = Fault

Table 3.2 – The CarOnLine Orc program.

3.3 Translating Orc into colored Petri nets: principles 67

3.3 Translating Orc into colored Petri nets: principles

In this section, we define the semantics of Orc by a finite representation in terms of systemsof (colored) Petri net equations. This is a net-based formalism proposed by [DK04] thatallows describing unbounded nets in a finite manner.

3.3.1 Reflecting the Orc programming model

The first and most important feature of our translation is that it should reflect the Orcprogramming model. Accordingly:

• Each Orc expression possesses a single activation point. Therefore, its Petri nettranslation has a special minimal place that we call activation place; the start ofexecution of an corresponds to placing a token in the activation place.

• Each Orc expression on execution, publishes a (possibly empty) set of values. There-fore, its Petri net translation has one return place that is used to store the returnvalues.

• An Orc expression may use parameters for its execution; The Petri net of an Orcexpression possesses a set of minimal places for each of these parameters; these placesare called parameter places, and they are labeled by their corresponding parametername.

• Expression termination in Orc is modeled by having a distinct place for an expression,called the power place of the net. This place keeps track of all the active threadsduring an execution. An expression may evaluate as long as it has correspondingtokens in its power place.

• The translation targets colored Petri nets with the following kind of arcs:

– Ordinary directed arcs, consuming and producing tokens. Directed arcs createcausality.

– Read arcs, requiring the presence of tokens in their source node for their sinktransition to fire; read arcs do not consume tokens; therefore, they do not createcausality and are compatible with concurrency.

– Reset arcs, which remove all tokens from their anterior place, disabling thesubsequent firings of the posterior transitions.

Generic form for the Petri net translation of an Orc expression. The generic formof the translated Petri net is shown in Figure 3.1. There could be multiple arcs followingthe activation and parameter transitions. We show a single outgoing arc for convenience.The doubly directed arc from the power place to f represents a set of arcs from the powerplace to transitions in f and from transitions in f to the power place. These arcs could benormal, read or even reset arcs as we shall see later in the translation.

The tokens in the activation place (and the power place) which activate expressionsare called control tokens and the tokens in the parameter places which hold the values ofvariables are called data tokens. The power place essentially stores all the control tokensinvolved in the execution which is later useful in modelling termination. The power place,activation place, transition places and the return place together are called the interfaceplaces of the net.


f

x1 x2 xna

r

Figure 3.1 – Generic form for the Petri net translation of an Orc expression. The place labelled a is the activationplace. The places labelled x1, x2 . . . xn are the parameter places of the Orc expression, i.e the expression is ofthe form f(x1, x2 . . . xn). The double-bordered place in red is the power place of the net. The maximal placelabelled r is the return place of the net.

Capturing the Orc activation mechanism. The class of nets we use for the transla-tion are essentially colored Petri nets, equipped with a certain marking equivalence relation.This design choice is inspired by the work in [DK04], which aims to build finite Petri netsfor recursive expressions. This is essentially achieved by not aiming to build a single net forthe whole Orc program but by representing the system as a collection of nets which mayactivate each other. This is often referred to as “net equations” or “net systems” [DK04].

Each expression E (and so even the sub-expressions of E) has a unique net correspond-ing to it in this system of nets. The net for an expression E will be denoted by NE fromnow on. Note that there may be multiple occurences of E in an Orc program - an ex-pression defined may be called more than once in different parts of the program. Each ofthese occurences of E is called an instance of E. In the translated net, each instance ofan expression will be replaced by a high-level box of the form in Figure 3.1. Furthermore,each instance will have an unique label associated with it, called its instance label.

3.3.2 The Coloring mechanism

Colors. Colors for tokens in our translation serve many purposes: they are used todistinguish different activations of the same expression, to match the control and datatokens in a site call, to model termination of expressions, etc. The color of a token canessentially be seen as a (identifier,value) pair. The identifier component is used in matchingrelated tokens (eg, control and data tokens in a site call) and the value component holdsthe data carried by the token. We adopt the following conventions for the coloring tokens:

• A single Orc program may be activated more than once in general. Each of theseactivations are distinguished by a component in the token color, which holds a distinctcolor for each activation of the main expression of the Orc program. This componentwould be unnecessary if we consider only a single execution of an Orc expression.

• The parallel composition operator ( | ) in Orc enables the creation of a stream ofvalues from a single activation. Since these streams are merged in the return place, wewill need to distinguish the tokens of different streams. For this, we append distinctlabels to a token’s color component for each parallel branch.

• To distinguish the different activations of the same expressions we append the in-stance label (which is distinct for each instance) of the calling instance to a compo-nent of its color while transfering tokens from the activation place (and parameterplaces) of the instance of E to the activation place (and parameter places) of NE .The instance label is removed from the token color when the call returns.

3.3 Translating Orc into colored Petri nets: principles 69

• Finally, transitions corresponding to site calls add a color to the token which is thedata value returned by the site call.

As a result, we define the color of each token to be a tuple (id,pList,hList,data):

• id is the identifier, i.e., the distinct color added at the start of each different activa-tion;

• pList is the list of colors added by the parallel constructs; initially this list is empty,successive parallel constructs append distinct colors to this list;

• hList is the list of colors added during expression calls, and

• data is the value carried by the token.

The first three components (viz id,pList and hList) correspond to the identifier part of the(identifier,value) pair of the color mentioned previously. Throughout this section,

c = (i, p, h, v) = (Id,pList,hList,Data) (3.1)

shall denote a generic token color as above. For c as above, we shall denote by ic the i-colorof c and so on. For a color c′, write simply (i′, p′, h′, v′) to denote its components. Finally,if c is indexed, e.g., c = c1, c2, . . . , we write i1, i2, . . . , for the corresponding i-color.

The matching relation. Colors will play a central role in controlling the firing of tran-sitions. More precisely, to each transition t of net NE we shall attach a partial functionFt, mapping a tuple of input colors to a tuple of output colors,

we call Ft the firing rule of T . (3.2)

The domain of each firing rule, i.e., the set of allowed input color configurations, willbe expressed in terms of a special family of constraints involving the following matchingrelation defined over pairs of colors:

c ⊑ c′

⇔ (i = i′) ∧ (h = h′) ∧ (p ∈ prefix(p′))(3.3)

where prefix(p′) is the set of all prefixes of p. The explanation for defining the matchingrelation in this way is the following:

• i = i′: The control and the data tokens should obviously belong to the same initialactivation of the Orc program, which is enforced by this condition.

• h = h′: This condition ensures that the control and data tokens originate from thesame instance. Since there may be more than one instance of the same site call, thiscondition matches control-data tokens corresponding to the same instance.

• p ∈ prefix(p′) : Our coloring mechanism ensures that the pList components ofrelated control-data tokens satisfy this condition. We observe that the binding ofa variable to a value (creation of a data token) can happen either in the sequentialcomposition or in a “where” construct.

As we shall see, in a sequential composition f >x> g, the color of the data token forx is the color of the return token of f (this return token activates g). g may producemultiple control tokens corresponding to a single activation (through branches in


the parallel and/or where constructs) which use this same value of x. So the pListcomponent of these control tokens will be the pList component of the data token forx, possibly appended with one or more colors corresponding to the further branchingit underwent in g.

In f where x :∈ g, the (i, p, h) components of the data token for x, is set to the thatof the token that activates f . So the control tokens in f which use the value of xwill have the pList of the data token for x as a prefix of their pList. (The exactmechanism is detailed later).

The firing rule Ft of t specifies the matching inputs tokens and also defines the color ofthe output tokens.

3.3.3 The marking equivalence

Each Orc program has a main expression which is first called when starting an execution(The translation for the main expression is given later in section 3.4.6). Each activationof an Orc program would place a token in the activation place of the net for the mainexpression. The execution continues by calling the (sub) expressions of the Orc program.The modeling of this calling (and return) of expressions is done by using a marking equiva-lence relation between an instance of an expression E which calls it and its correspondingnet NE [DK04]. This relation is essentially a mapping between the interface places of theinstance of E and NE . The calling of an expression E transfers a token from the activationplace of the instance of E to the activation place of NE .

In general, there may be more than one instance for a given expression E occuring in aOrc program (eg, the expression E | E) while there will be only one net NE correspondingto all these instances. The unique instance label of each instance will be added to the colorof the tokens activating NE to distinguish tokens corresponding to different activations.

Let IE denote an instance of E with an instance label l. NE is the net correspondingto E. aNE

and aIEdenote the activation places of IE and NE respectively, PW denotes

the power place. Let c = (i, p, h, v) be a token of the form described in section 3.3.2 andlet c × P denote the marking with token c in the place P and nothing elsewhere. Thenfor a marking M , the calling of expression E from an instance IE is given by the followingrelation :

M + c × aIE+ c × PW ≡ M + (i, p, h.l, v) × aNE

+ (i, p, h.l, v) × PW

Essentially, the instance label of IE is added to the hList component of the token beforetransfering it to the activation place of NE . Strictly speaking, this addition of the instancelabel to the token color is only needed when there is more than one instance of the sameexpression, in order to distinguish the activations corresponding to different instances. Ifthere exists only a single instance of an expression, this addition of the instance label isredundant and can be avoided.

The return of an expression call would move a token in the return place of NE to thereturn place of the instance from which it was called. The instance label added during theexpression call will be removed from the token color while transfering it back. If rNE

andrIE

denote the return places of NE and IE respectively, the equivalence relation for thereturn of an expression call is given by :

M + (i, p, h.l, v) × rNE+ (i, p, h.l, v) × PW ≡ M + c × rIE

+ c × PW

where l is the instance label of IE .

3.4 The detailed translation 71

Marking equivalence relations also exist between the parameter places of IE and NE . Adata token in a parameter place of an instance IE needs to be transfered to its correspondingparameter place in NE . For a parameter x, let xIE

denote the parameter place in theinstance IE (with instance label l) and let xNE

denote the parameter place for x in NE .Then, for a token c = (i, p, h, v) we have

M + c × xIE≡ M + (i, p, h.l, v) × xNE

The only Orc construct that will not contain instances as its sub-expressions are the mostbasic expression i.e., site calls. As a result, the execution of a site call is completely definedby standard Petri net firing rules, without any marking equivalence relation.

3.4 The detailed translation

3.4.1 Site Calls

Sites are the most basic Orc expressions. Site calls may use a list of parameters, all ofwhose values will have to be defined before the site call can happen.

c c

S

xnx1a

r

c c′

c1 cn

[constraint on inputsoutput color

]=

[c1 ⊑ c . . . , cn ⊑ cc′ = (i, p, h, S(v1, . . . , vn))

]

Figure 3.2 – Petri net translation of a site call S(x1, x2. . . , xn).

The Petri net corresponding to a generic site call S(x1, x2. . . , xn) is shown in Figure 3.2.The place labeled a is the activation place of the net. The places labeled x1 . . . xn are theparameter places of the net. The power place is shown in red and return place is labeled r.

Every activation of a site call occurs by the placing of controls tokens in the activationplace a. The call will proceed only if the expression which called the site is “active” (i.e hasnot been terminated) which is ensured by the arc from the power place to the transitionfollowing a. For each control token, data tokens with matching colors are needed for thetransition of the site call S to fire. As mentioned earlier, this appears as a firing rule forS, detailed in the figure.

For example, the constraint c1 ⊑ c shown in the figure implies that the token in theplace x1 matches the token in place a.

The value component of the return token is set to the value returned by the site call asshown in the figure (The values of the variables x1, ...xn would be the value component inthe colors of the tokens corresponding to these parameters, and so the site call S(x1, ..., xn)corresponds to S(v1, ..., vn)). The first three components of the token added to the returnplace are the same as that of the initial control token activating the site call. A copy ofthe new token is stored in the power place too as shown.


The arcs from the places corresponding to the parameters, to the transition of the sitecall are special arcs called read arcs. These arcs are required to model the fact a variablebound to a value may be referenced for an unknown (possibly infinite) number of times.Consider the expression M >x> f ≫ S(x) where M and S are site calls and f is a recursiveexpression which returns a stream of infinite values. The value returned returned by M isbound to x which will be used infinitely (each value returned by f causes a new call S(x)).Thus consuming a data token during a site call would be incorrect. Also, using normalarcs with a loop instead of read arcs would introduce unnecessary sequencing between sitecalls corresponding to different activations. Read arcs allow us to nicely model the sharingand the concurrent usage of the data tokens.

tp : (i, p.t1.t2, h, 10)tx : (i, p, h, 10)tc : (i, p.t1.t2, h, 10)

tp : (i, p.t1.t2, h, 0)

tc : (i, p.t1.t2, h, 0)tx : (i, p, h, 10)

tx

tp

tc

tc tx

tp

Figure 3.3 – Site Call example : let(x)

An example of a site call let(x) is shown in Figure 3.3. The arc expressions are omittedfor simplicity. The left side shows the system before the site call occurs and the right sideis after the firing of the site call transition. On the left, the control token tc has a tokenof the same color in the power place (tp) and the token in the place for the argument x i.etx, is ⊑-related to it. On firing of the site call, the token in the argument place remainsunchanged (because of the read arc) while the value component of the control token is setto 10 (let(x) returns the value of x which is 10 here) and placed in the return place. Thetoken in the power place tp is also changed to this new color.

Two special kind of sites defined in Orc are the Constant 0 and 1 sites. The firstrepresents a site that is never called and the second a site that simply mirrors the result ofits previous computation. The Petri net translation for these sites is shown in Figure 3.4.The translations are self-explanatory.

a

r

c

c

c

r

a

Figure 3.4 – Petri net translation of Constant 0 and 1.


3.4.2 Sequential composition

A sequential composition in Orc f >x> g first evaluates the expression f ; each valuereturned by f spawns a new thread of execution for g, with all the occurrences of thevariable x in g bound to this value.

:

:

:

NE

g

f

Ig

Iflf

lg

f

xk

lf

ag

c

c

y

xk y

yu

w

glg

w

y

u y

(x1...xn)ag

af

af

c′

F (x1, . . . , xn) =[

c′ = (i, p, h, compk(v))]

Figure 3.5 – Translation for f >(x1, x2, ...xn)> g.

The translation for the generic case of sequential composition f >(x1, x2, ...xn)> g isshown in Figure 3.5. The left part shows the instances for f and g, If and Ig respectively(having unique labels lf and lg, shown on the bottom right of the instance) and the rightside shows the net of their sequential composition NE . If and Ig are shown in green in theright side and are linked together with a glue (a transition and two arcs) shown in blue.We assume that the parameter y is common to f and g, the others being distinct. Theparameter places for y in If and Ig are linked with a single parameter place for y in NE

as shown.The activation place for NE is the activation place for If and so each activation of

activation of NE will first call f by the marking equivalence relation described previously.When f adds a token to the return place of If (again defined by the marking equivalencerelation), it is transfered to the activation place of Ig thus resulting in a sequential call tog for every value returned by f . The return place of Ig becomes the return place of NE .

The value passing is carried out by the blue (x1, x2, ...xn) labeled transition. The valuecomponent of the token returned by f (the value returned by f) is to be bound to thetuple (x1, x2, ...xn). The parameter xk used in g corresponds to the kth component of thevalue returned by f . Therefore the data component of the token entering the parameterplace xk is set to compk(v), the kth component of a value v.

3.4.3 Symmetric parallel composition

Symmetric parallel composition of two Orc expressions f and g creates two separate threadsfor executing them. The output stream of the composition is the time-based merge of the


output streams of f and g.The translation for the symmetric composition is given in Figure 3.6. The instances for

f and g, If and Ig are shown on the left hand side. The result of their parallel compositionis given on the right hand side, the instances being shown in green here. The activationplaces of If and Ig are linked with the activation place of NE , allowing simultaneous callingof f and g. The power places are fused and the common parameter places (x in this figure),are linked together as in the case of sequential composition. Finally, the return places arefused so that the stream of values returned by the composed expression is the union of thestreams returned by If and Ig.

:

:

:

v f glf lg

x x

x

c

u v

af ag

ct1 ct2ct1ct2

g

lg

xag

Ig

fIf

lf

xaf u

NE

cT

FT =

[ct1 = (i, p.t1, h, v)ct2 = (i, p.t2, h, v)

]

Figure 3.6 – Translating f | g.

Colors t1 and t2 (t1 6= t2) are added to the pList component of the control tokensentering If and Ig, in order to distinguish the different control tokens of a stream whenthey merge in the return place of NE . The calls to the nets corresponding to If and Ig,the transfer of parameter values and the return of values from them happen according tothe marking equivalence relations. Note that since we merge the return places, f and gwill add tokens to the same return place, the return place of NE .

3.4.4 Asymmetric parallel composition (where expression)

The Orc expression f where x :∈ g creates two threads which start evaluating f and gin parallel. When g returns its first value, x is bound to it and the computation of g isterminated. f may use the value of x in its execution: portions of f requiring x for itsexecution will block until it is defined.

The translation for the generic where expression f where (x1, x2, ..., xn) :∈ g is shownin Figure 3.7. The left side shows the instances for f and g, If and Ig and the rightside shows the net NE of their where composition. The instances of f and g in NE

are shown in green, labeled lf and lg respectively. The activation places of If and Ig arelinked to the activation place of NE , just like in the parallel composition to enable theirsimultaneous activation and the control tokens entering them are labelled similarly. Thecommon parameter places (y here) are linked as in the previous constructs and the power


:

:

:xk

glg

f

ct1 ct2

ct2ct1ct2

lf

v

u

NE

y

y

y

ag

af

i

c c

T1

T2

c1 cc2

c′

g

lg

ag

Ig

vy

fIf

lf

xkaf

uy

FT2 =

[c1 ⊑ c, c1 ⊑′ c2

c′ = (i, remt2(c1), h, v)

]FT1 =

[ct1 = (i, p.t1, h, v)ct2 = (i, p.t2, h, v)

]

Figure 3.7 – Translating f where (x1, x2, ...xn) :∈ g.

places are fused as shown. The addtional net structure in blue helps in selecting the firstvalue returned by Ig and in realising its termination

To understand the termination mechanism we note that when Ig returns a value, theremight be other executions corresponding to it occuring at that time. All these executionswill have a control token associated with them, a copy of which resides in the power placeby means of our construction. Terminating Ig would thus involve removing all these tokensfrom the power place, disabling the execution of these computations. This is what isachieved by the reset arc shown in dashed blue. It removes all the tokens c2 in the powerplace which satisfy the condition c2 ⊑′ c1 where c1 is the color of the token activating g.The relation ⊑′ is defined as :

c1 ⊑′ c2 ⇔ (i1 = i2) ∧ (p2 ∈ prefix(p1)) ∧ (h2 ∈ prefix(h1))

This differs slightly from the relation ⊑ introduced earlier because here the hList compo-nents need not be exactly the same; it suffices if the second hList is a prefix of the first.This ensures that all the expression calls triggered (directly or indirectly) by the initialactivation token c2 are also terminated.

The function remt2(c) appearing on the input arc to the parameter place xk removes thelast pList component (which is always t2 for a token in the place i due to our construction)from the token color. As a result, the first three components of the data token for xk willhave the same color as the token c that activated NE and as a result, the ⊑ relation willhold between the control tokens in f corresponding to the activation token c, and the datatoken in the parameter place xk.


The need to distinguish the control tokens arises from the fact that when Ig returnsa value and has to be terminated, only the control tokens corresponding to it have to beremoved from the power place. The control tokens for If will remain for it to continueevaluating. This can only be achieved by distinguishing their control tokens.

3.4.5 Expression Definitions

Each occurence of an expression call E is replaced by an instance of it (IE), a box of theform in Figure 3.1 with a distinct label to it. The net for that expression NE is built fromthe expressions defining E, using the translation rules detailed previously. An expressioncall is similar to any other call described earlier: when a token appears in the activationplace of IE , we transfer it to the activation place of NE using our marking equivalencerelation. The parameter passing and return of value also occur according our the markingequivalence relation.

In recursive expressions, an instance of the expression call will appear in the net for thatexpression itself. These expressions are treated as any other normal expression, replacingthe expression call by an instance with a distinct label. We give a simple example here.Consider the nets in Figure 3.8. They correspond to two different states of execution of anOrc program with main expression E where

E ∆ S ≫ E

S is a site call without any arguments. For simplicity, we have omitted the power placeand the arcs from it to the transitions in the net. The labels on the places here denotethe tokens that they carry. We have only shown the hList component of the token color.Since E is a recursive expression, the net for it NE contains an instance of a call to E.It is however labeled distinctly (t2 6= t1) and calls from it to NE can be unambiguouslyidentified.

Calling Instance Calling Instance

E

t1

h1 S

E

t2

h1.t1

h2.t1

NE

S

t2E

E = S ≫ E

h2.t1.t2

E

t1

NE

Figure 3.8 – Example showing expression call/return

In the net on the left, the expression call for E is activated in the calling instance bythe token with hList component h1. According to our calling relation for expressions, weappend the label of the expression transition instance (t1) to h1, and move this token tothe activation place of the net NE . As a result the token with color h1.t1 is shown in


NE on the right side. A return of an expression call is also shown : the left side has atoken h2.t1.t2 in the return place of NE . The label of the last expression call in the hListcomponent (t2 in h2.t1.t2) is removed and the resulting token is placed in the return placeof the instance with label t2 as shown in the right side (the token h2.t1 in the return placeof the instance labeled t2). Note that actually it is not possible to have a token in thereturn place of E for this particular example, the reason being that NE has only one threadof execution with a recursive call in it.

Our translation excludes unguarded recursive expressions where an activation of the netNE causes another activation of it directly by a sequence of marking equivalences. Herethe control returns to the activation place instantaneously (for, e.g., in the expression,E ∆ f | E). This is equivalent to not having any site calls between the activation ofa recursive call and the next recursive call caused by that activation. Such kinds of ex-pressions could spawn infinitely many threads “simultaneously”, which is not representableusing our coding rules. Also, from a practical point of view, such expressions would beunimplementable.

3.4.6 The Main Expression

f

c

c

lf

c xnx1

a

c

af

i

cc1

c1c2

T 1

T 2

cn

c

FT2 =[

c1 ⊑ c, c1 ⊑′ c2

]FT1 =

c1 = (i, p, h, x1)...cn = (i, p, h, xn)

Figure 3.9 – Net for the main expression

Each Orc program consists of one top level expression—the main expression—which isthe starting point of execution of an Orc expression. The presence of this main expressioncan be likened to the presence of a main program in programming languages (Expressiondefinitions are the equivalent of functions). This top-level statement of an Orc program isof the form

x :∈ f

and essentially does three things : the evaluation of f , the assigning of the first valuereturned by f (if any), to x and finally the termination of the computation of f when


this first value is received (a main expression which does not return a value will never beterminated).

The translation for the main expression is shown in Figure 3.9. Each new activationadds a token with a distinct color to the activation place labeled a. The outgoing arcsfrom the transition following it add a copy of this control token to the activation place ofthe instance of f , af , and to the power place to enable execution of f . They also definesthe parameters values which will be used by f .

The termination mechanism is achieved by the net shown in blue which is the same asthe termination detailed in the where expression. It selects the first value returned foreach new activation of the main expression and terminates its execution at the same time.

3.5 Translating the CarOnLine example

Figures 3.10, 3.12, and 3.11, show the Petri Net translations of the three componentsCarOnLine, CarPrice, and Broadcast, respectively. We show the evolution of the colorsof a given token when it traverses the different places of the net.

General comments and conventions The following notational conventions are usedto describe the firing rule of each transition, cf. (3.2). Color c1 decomposes as c1 =(i1, p1, h1, v1) and similarly for other color names. A distinct label is attached to eachtransition, e.g., T1, T2, . . . Transitions denoting site calls are white; other transitions, addedfor the purpose of the translation, are black.

For each transition, we give in a separate table the firing rule as a set of constraintson input colors (e.g., v1 = [ ] for T1 and T2 in Figure 3.11) and the equations giving theresulting set of output colors. In these equations, names of colors are local to each transitionand are specified in the diagrams; for example, referring to Figure 3.11, transition T3 hasc and c1 as input colors and c′ as output color. Since names are local to transitions, theyare reused across the diagram without referring necessarily to identical colors.

For each transition, the activation token is systematically denoted by c; the constraintc′ ⊑ c always holds, where c is the color of activation token and c′ is the color of any inputtoken; to simplify the diagrams, these systematic constraints are not mentioned.

The power place and its related arcs are omitted from Figures 3.10, 3.12, and 3.11.Detailed comments follow, for each figure.

Figure 3.10 A simplification has been performed in this figure. The arc leaving thetransition BestCarPrice, which is shown as branching into four arcs, actually representsfour different arcs with the same arc label (i, p, h, bcp). The parallel appends a label ti, i = 1or 2 to the pList, for its two branches.

3.5 Translating the CarOnLine example 79

BestCarPri eiflet

if≫BestCreditCreditRate CarOnline

CarPri ec2

T1

c

T4

T2 T3

c

c1

c

≫

c1

c1

c

c1

c′

c2

c

c

c

c

c

c

c

c

c

c

c

c

c

cc

T1 :

ůc1 = (i, p.t1, h, v)c2 = (i, p.t2, h, v)

ÿT2 :

čv1 6= Fault

ď

T3 :č

v1 = Fault

ďT4 :

čc′ = (i, p, h, (v1, v2))

ď

Figure 3.10 – CarOnLine and its firing rules. Note the recursive call of CarOnLine.


Mux Fault broad astlet

Rtimer(τ) all(garage)T1

v :∈u :∈

parallelwhereτ

c1

c

cc

c

c1

T2

c1

T3 let

c′

c

c3

c2T4c

c1 c2

c2 c3c3c

c1

c2

c1

c′

T5

T6

T8

T11

T7c

c c1

c′

c′

c1cc

c′

T9

c2

c1

c1

c′

c′

c

GarageListGarageList

c1

c

T10

T1 :č

v1 = [ ]ď

T2 :

2664

v1 = [ ]c′ = c

c2 = (i1, p1, h1, head(v1))c3 = (i1, p1, h1, tail(v1))

3775 T3 :

čc′ = (i, p, h, v1)

ď

T4 :

24

c1 = (i, p.t1, h, v)c2 = (i, p.t2, h, v)c3 = (i, p.t3, h, v)

35 T5 :

ůc1 = (i, p.t4, h, v)c2 = (i, p.t5, h, v)

ÿT6 :

čc′ = (i, p, h, call(v1))

ď

T7 :č

c′ = (i, p, h, v1)ď

T8 :č

c′ = (i, p, h, v1)ď

T9 :č

c′ = (i, p, h, Mux(v1, v2))ď

T10 :

ůc = (i, p.t2, h, v)c′ = (i, p, h, v1)

ÿT11 :

ůc = (i, p.t3, h, v)c′ = (i, p, h, v1)

ÿ

Figure 3.11 – BroadCast and its firing rules. Note the recursive call of Broadcast with updated parameters.

Figure 3.11 This figure exhibits a “where” expression with its two “ :∈” subexpressionslocated at transitions T10 and T11. These two transitions extract only the first token incase a stream of tokens is generated—this may for example occur when an expression iscalled in the scope of the where; e.g., here Broadcast is re-called. The mechanism forachieving this is by matching token colors: according to (3.3), two tokens can synchronizeat the “v :∈” transition iff they possess identical hList color, i.e., h′ = [ ]; this constraintselects only the 1st token produced.

3.6 Conclusion and future work

We have formally defined the functional semantics of Web Services Orchestrations. Thisframework relies on an abstract model in the form of colored Petri net systems and encom-passes recursion. To make the presentation of things cleaner, we have chosen to build on

3.6 Conclusion and future work 81

broad astMin(values)

c

c

c

T

c′

c1

c

valuesGarageListT :

čc′ = (i, p, h,min(v1))

ď

Figure 3.12 – CarPrice and its firing rules.

top of the Orc formalism; however, the same principles would apply to BPEL.The advantage of this semantics is that a mild extension of it allows us to capture QoS

in a mathematically sound way: the formal correlation mechanism that is provided withtoken colors allows tracking exceptions, and capturing response time is simply performedby adding one more color.


Chapter 4

Event Structure Semantics of Orc


David Kitchin, William CookDepartment of Computer Science,University of Texas at Austin, Austin. USA.


AbstractOne challenge in developing wide-area distributed applications is analyzing the system’s

non-functional properties, including timing constraints and internal dependencies that canaffect quality of service. Analysis of non-functional properties requires a precise formalsemantics for the language in which the system is written; but labelled transition systemsand trace semantics, which are commonly used for this purpose, do not facilitate this kind ofanalysis. Event structures provide an explicit representation of the the causal dependenciesbetween events in the execution of a system. But event structures are difficult to constructcompositionally, because they cannot easily represent fragments of a computation. In thispaper we present a partial-order semantics based on heaps (an explicitly encoded form ofoccurrence nets with read arcs), which naturally represent fragments of behavior. Heapsare then easily translated into asymmetric event structures. The semantics is developedfor Orc, an orchestration language in which concurrent services are invoked to achieve agoal while managing time-outs, exceptions, and priority. Orc, and this new semantics, arebeing used to study quality of service (QoS) for wide area orchestrations.

84 Event Structure Semantics of Orc

4.1 Introduction

Orc is a structured language for computation orchestration, in which concurrent servicesare invoked to achieve a goal while managing time-outs, exceptions, and priority [MC07].The operational semantics of Orc was first defined as a labeled transition system. Adenotational semantics of Orc has also been defined; the denotations are sets of traces,which explicitly represent the observable behavior of an Orc program [KCM06]. For otherstudies of Orc semantics see [BMT06], where the authors link the Orc language to Petri netsand the join calculus, and [RBHJ06a, RBHJ06b], where Orc expressions are translated tocolored Petri net systems. On the other hand, a number of papers have been devoted to thesemantics of the most widely used language for orchestration, namely BPEL, see [HSS05,LMSW06, OVvdA+05, vBK05, KvB04] and the tutorial [vBK06]. Still, very little has beendone toward getting, for orchestration languages, a semantics that is suitable for Qualityof Service (QoS) studies.

Analyzing QoS or non-functional properties, like timing constraints derived from thecritical path of dependencies, can be quite difficult with either an operational or a de-notational trace semantics. The problem is that neither of these semantics exhibits thecausality constraints that govern concurrent execution. These causality constraints canbe represented explicitly as partial orders over events. With a partial order semantics,analysis and verification of programs are facilitated, and translations between differentformalisms can be checked for correctness. Last but not least, partial order representa-tions are crucial for evaluating overall durations of programs: time-consuming actions thatrun in parallel increase the overall delay less than actions that have to occur sequentially;see [GM99, MB02] for more on this type of dynamics. The partial order semantics is there-fore crucial for the QoS analyses for orchestrated services [RBHJ06a]. In this paper wedevelop a partial order semantics of Orc in terms of asymmetric event structures [BCM01].An event structure is a set of events with one or more relations that constrain the allowedsequences of events. Asymmetric event structures have an asymmetric conflict relation,a ր b, which states that event b cannot precede event a in a same execution. Asym-metric conflict is convenient to express preemption or termination, which is an essentialfeature needed for wide area computing and offered by Orc. In Orc, an execution A canbe preempted at the instant when a particular event e occurs. The preemption of A bye is expressed by imposing a ր e for all events a in A, which asserts that no event in Acan occur after e. In other words, e terminates the execution A. The asymmetric eventstructures for an Orc expression is defined by two steps.

The first step is a compositional translation of Orc expressions into mathematical struc-tures called heaps, introduced in Section 4.2.2. Heaps are sets of inductively defined events,following a method originally proposed by Esparza et al [ERV02] to encode net unfoldings.Heaps are useful for two reasons. First, they provide a concrete representation of asymmet-ric event structures that is suitable for effective coding of algorithms in software. Second,and more importantly, they can specify fragments of computations that refer to virtualevents offered by an execution from another heap. The latter feature proved extremelyuseful for deriving the heap semantics of Orc, structurally.

In the second step, the heap is converted into an asymmetric event structure which is arecognized semantic domain, equipped with well defined notions of configurations to modelpartially ordered executions. A correspondence of these asymmetric event structures withthe existing sequential trace semantics of Orc is also shown.

4.2 Asymmetric Event Structures and Heaps 85

4.2 Asymmetric Event Structures and Heaps

In this section we recall the needed background on Asymmetric Event Structures (AES).Then we motivate the need for the new concept of heap and introduce it. Finally, we showhow to generate AES from heaps.

4.2.1 Asymmetric Event Structures with Labels

Following [Win86, BCM01], an Asymmetric Event Structure (AES) is a model of computa-tion consisting of a set of events and two associated binary relations, the causality relation¹ and the asymmetric conflict relation ր. If for events e and e′, e ¹ e′ holds, then emust occur before e′ can occur. If e ր e′ holds, then the occurrence of e′ preempts theoccurrence of e in the future. Thus if both e and e′ occur in an execution, e necessarilyhappens before e′. In this sense, ր can also be seen as a “weak causality” relation.

Formally, an AES is a tuple G = (E,¹,ր), where E is a set of events, and ¹ and րare the causality and asymmetric conflict binary relations over E, satisfying the followingconditions:

1. ¹ is a partial order, and ⌊e⌋ =def e′ ∈ E | e′ ¹ e is finite;

2. ∀e, e′ ∈ E:

e ≺ e′ ⇒ e ր e′ (4.1)

the restriction of ր to ⌊e⌋ is acyclic (4.2)

e#ae′ ⇒ e ր e′ (4.3)

where #a is the conflict relation, which relates events that preempt each other. Fortwo events, if e ր e′ and e′ ր e then e#ae′, and only one of e and e′ can occur in anexecution. The conflict relation finds sets of mutually conflicting events using thisrecursive definition:

e0 ր e1 ր . . . en ր e0 ⇒ #a(e0, . . . , en) (4.4)

[#a(A ∪ e)] ∧[e ¹ e′

]⇒ #a(A ∪ e′) (4.5)

The second condition ensures that a conflict with e is inherited by all the events caused bye.

Given an event structure, a configuration is a set of events that obey the causality andconflict constraints, and so represent a valid execution instance of the event structure. ForG = (E,¹,ր) an AES, a configuration of G is a set κ ⊆ E of events such that

1. the restriction of ր to κ is well-founded;

2. e′ ∈ κ | e′ ր e is finite for every e ∈ κ;

3. κ is left-closed with respect to ¹, i.e., ∀e ∈ κ, e′ ∈ E, e′ ¹ e implies e′ ∈ κ.

For our coding of Orc, we will need to label the events. Thus we shall consider LabeledAES (LAES), which are tuples of the form G = (E,¹,ր, λ), where λ : E 7→ Λ, (Λ is aset of labels) is the labeling (partial) function.


Discussion: from event structures to heaps. Asymmetric event structures allow anevent to occur only if its causes have already occurred, and it is not prevented by theoccurrence of some other event. This yields a simple and elegant mathematical model forcomplete concurrent systems that, in all its variants, comes equipped with a comprehensivecategorical apparatus [BCM01].

Although event structures work well for complete programs, they cannot easily representfragments of behavior. Such fragments arise naturally when constructing the behavior ofa program from the behaviors of the subexpressions in the program – as is the standardpractice in denotational semantics. For such formalisms, structural translation of programsto (asymmetric) event structures cannot be directly achieved.

By offering the additional concept of place, Petri nets and their extensions and vari-ants [Win86, BCM01] make structural translation easier. Explicit encoding of places allowsone fragment to depend upon resources supplied by another fragment. Other features ofwide area languages are not so easily supported by Petri nets; modelling dynamic creationof processes requires non-trivial extensions of nets, such as, e.g., net systems [DK04]. Theseextensions require another layer of semantics to specify their executions. Therefore, usingsuch Petri net extensions results in a complex two-stage semantics: from the formalismto, e.g., net systems, and, from net systems to their semantic domain. Such a translationwas proposed in [RBHJ06a] for Orc, resulting in excessive formalism and complex softwarecoding.

So, a natural idea consists in bypassing the above two-stage approach, by consideringdirectly occurrence nets, with read arcs. To be more effective and get close to imple-mentation, we decided in addition to use an explicit inductive coding of such occurrencenets, following the technique first proposed by Esparza et al. [ERV02]. This results in thenotion of heap described in the next section. The subclass of “effective” heaps translateimmediately into asymmetric event structures.

4.2.2 Heaps

Heaps are sets of events coded in a particular form. A heap event is encoded based on theconditions that enable its occurrence. The enabling condition can either be consumed bythe event or can be read and not consumed. The conditions in turn, refer to the eventsthat created them. More precisely:

event = ( consume conditions, read conditions, label )condition = ( cause event, mark )

(4.6)

where

• consume conditions is the set of conditions that are consumed by the event;

• read conditions is the set of conditions that are only read (and not consumed) by theevent;

• label is a label (for our use in Orc semantics, it will be the Orc action performed bythe event);

• mark is a label to distinguish different conditions created by an event.

We formalize this next.

Definition 4.1 Call heap a tuple (E, C, S, A, M), where:

4.2 Asymmetric Event Structures and Heaps 87

1. E and S are two sets of events such that E ⊆ S, C is a set of conditions, A is analphabet of labels, and M is a set of marks.

2. Events e ∈ E have the following form:

e = (•e, e, a) (4.7)

where •e ⊆ C and e ⊆ C are the sets of conditions consumed and read by e, respec-tively, and a ∈ A is the label of e. We require that •e ∩ e = ∅ and •e ∪ e 6= ∅.

3. Conditions c ∈ C have the following form:

c = (f, µ) (4.8)

where f ∈ S and µ ∈ M is the mark of condition c.

4. C and S are minimal, for set inclusion, having the above properties. S is called thesupport of E and C is its set of conditions.

By abuse of notation, we call E alone a heap, and CE will denote the set of conditionsassociated to E. Throughout this paper, we distinguish a fixed event

⊥ = (∅, ∅, ⋆)

called the dummy event, where label ⋆ means the absence of label. Note that ⊥ cannotbelong to a heap, it can, however, belong to the support of a heap. Set E⊥ = E ∪ ⊥.For an event e of the form (4.7), the set of conditions

•e = •e ∪ e

is called the pre-set of e. We define the set of minimal conditions of a heap E, minConds(E)to be the set

minConds(E) =def (f, -) | (f, -) ∈ CE , f /∈ EFigure 4.2 shows some example heaps (for Orc expressions). The events of the heap areshown in rectangles, labelled by their corresponding Orc actions. The conditions are thecircles. An event has input directed arcs from conditions consumed by it, and undirecteddashed arcs from those that are read. Outgoing arcs from an event point to conditions thatrefer to that event. Minimal conditions refer to the ⊥ event, which is not shown. A dashedtriangle on top of a minimal condition indicates the label of an external event that thecondition depends upon. Examples of external events, which are included in the supportof the heap, are e, f1, and f2.

The conflict and read conditions within the events of a heap define constraints betweenevents, in the style of an event structure. Given a heap E we define the following relationsbetween events in E (superscript ∗ denotes transitive closure):

¹E = ⊳ ∗ where ⊳ = (f, e) | f• ∩ •e 6= ∅ ∪ IE (4.9)

IE is the identity relation on E × E

ր′E = ≺E ∪

(f, e)

∣∣∣∣ ∃e′ ∈ E⊥, e1 :

[(e′, -) ∈ •f ∩ •e1

∧ e1 ¹E e

]

րE = ր′E ∪ (e, f) | e#a

Ef (4.10)

where event variables e, e1 and f range over E, and the symmetric conflict relation #aE

is deduced from ր′E via (4.4,4.5). The reason for the two-step definition of րE is that


ր′E satisfies conditions (4.1,4.2,4.4,4.5), but not necessarily (4.3). The latter is enforced

by second step in the definition, from ր′E to րE . Next, equip E with a labeling map

αE(e) =def a (4.11)

where event e = (•e, e, a). We shall denote by

min(E) = e ∈ E | ∀f ∈ E : f ¹E e ⇒ f = e (4.12)

the set of events e ∈ E that are minimal for the relation ¹E . For readability, we omit thesubscript E in the sequel. In the send heap in Figure 4.2, e ¹ f1 holds, where e is theevent labelled Mk1 or k1?v1. Also e ր f1 holds for all events e in the heap (except f1).

Definition 4.2 A configuration of a heap E is any finite subset κ of E with the followingproperties:

1. the restriction of ր to κ is well-founded;

2. e′ ∈ κ | e′ ր e is finite for every e ∈ κ;

3. κ is left-closed with respect to ¹, i.e., ∀e ∈ κ, e′ ∈ E, e′ ¹ e implies e′ ∈ κ;

4. for each event e belonging to κ, if f• ∩ •e 6= ∅ then f ∈ E⊥.

As for AES, heap configurations represent legal executions. By condition 3, condition 4is equivalent to requiring that f ∈ κ. Conditions 1–3 coincide with those involved in thedefinition of configurations for AES, see Section 4.2.1. Condition 4 is new. For e.g., , inthe send heap of Figure 4.2, any configuration having event f1 has to include its causalpredecessors, i.e., the events labelled Mk1 and k1?v1. Event f2 cannot appear in such aconfiguration since it is in mutual conflict with f1, thus Condition 1 would be violated.

Let Configs(E) be the set of all configurations of heap E.

4.2.3 From Heaps to LAES

One may expect (E,¹,ր, α) to be an LAES. This is not true in general, as certain axiomsmay be violated (e.g., , the causal relation ¹ may not be antisymmetric, or some eventsmay need external events for their enabling). In this section we show how to extract fromany heap E, an effective heap which has a direct correspondence with an LAES.

Definition 4.3 Given a heap E, its effective heap G [E] is defined as:

G [E] =def⋃

κ∈Configs(E) κ.

G [E] possesses a subset of E as its set of events. Generation of G [E] from a heap E isby pruning and by Definition 4.2. This generation is constructive. The introduction ofeffective heap G [E] is justified by the following result, where symbols ¹,ր, and α arethe restrictions, to G [E], of the relations and map defined in (4.9), (4.10), and (4.11),respectively.

Theorem 4.4 A [E] = (G [E] ,¹,ր, α) is an LAES. Furthermore, G [E] is the maximalsubset of events of E that induces an LAES.

Proof Outline. The complete proof is given in Appendix A.1. The first part is proved byusing (4.9), (4.10) and Definition 4.2 to show that relations ¹,ր on G [E] satisfy the con-ditions required for a LAES. The second part is proved by showing that any configurationof a maximal LAES induced by E is contained in Configs(E) and thus in G [E].

4.3 Orc Syntax and Semantics 89

Remark: The reader should not confuse between the notion of heaps given here andthose in [MG99, MB02], where the authors study heaps formed by blocks representingdurations of executions in transition systems. Since their heaps are downward causallyclosed conflict free partial orders, they correspond to configurations in our setting, ratherthan the heaps in the above sense.

4.2.4 Generic Operations on Heaps

We list here a few operations on heaps that are useful for wide area computing. From nowon, we specialize marks to being lists, with the usual operations.

• Marking: Marking creates distinct copies of a heap. For a heap E and m a mark, Em

is the heap where symbol m has been appended to the mark µ(c) of each conditionc ∈ minConds(E). The recursive definitions of events and conditions in E ensuresthat this operation creates a new instance of E.

• Disjoint Union: The disjoint union of heaps E and F where left and right are fixedmarks is:

E ⊎ F =def Eleft ∪ F right

• Preemption: For a heap E and F ⊆ E, the preemption of E by F terminates execu-tion of E when any event in F occurs. Formally, stopF (E) is the heap obtained byreplacing each event e = (•e, e, a) of E by the following event ϕ(e):

ϕ(e) =def

(•e ∪ (⊥, stop), e, a) if e ∈ F .

(•e, e ∪ (⊥, stop), a) if e /∈ F .(4.13)

• Copy: For two heaps E and F , we define copyl(E, F ) to be a copy of E with respectto context heap F . For a mark l, copyl(E, F ) is a fresh heap obtained by changingall minimal conditions (e, µ) ∈ minConds(E) as follows:

(e, µ) =

(e, (µ, l)) if (e, µ) /∈ CF

(e, µ) if (e, µ) ∈ CF

(4.14)

where CF is the set of associated conditions of the context heap F . Intuitively, eventsin E may share conditions (and thus are related) with events in the context heap F .The copy of E with respect to context F keeps these conditions intact in the copy topreserve the relations between the copied events and those in F .

4.3 Orc Syntax and Semantics

The reader is referred to [MC07] for an introduction to and motivation for Orc, as alanguage for wide area computing. The syntax and operational semantics of Orc in theform of SOS rules [KCM06], are given in Figure 4.1.

An Orc expression f can perform action a and transform itself into the expression f ′,which is denoted by the transition f

a→ f ′. The actions A and values V are described bythe following grammar:

a ∈ A ::= Mk(v) | k?v | !v | τ | τv

v ∈ V ::= x | vk | v


f, g, h ∈ Expression ::= M(p) | E(p) | f | g | f >x> g | f where x :∈ g | ?kp ∈ Actual ::= x | v

Definition ::= E(x) ∆ f

k fresh

M(v)Mk(v)−−−−→ ?k

(SiteCall)f

a−→ f ′ a 6= !v

f >x> ga−→ f ′ >x> g

(Seq1N)

?kk?v−−→ let(v) (SiteRet)

f!v−→ f ′

f >x> gτ−→ (f ′ >x> g) | [v/x].g

(Seq1V)

let(v)!v−→ 0 (Let)

fa−→ f ′

f where x :∈ ga−→ f ′ where x :∈ g

(Asym1N)

fa−→ f ′

f | ga−→ f ′ | g

(Sym1)g

!v−→ g′

f where x :∈ gτ−→ [v/x].f

(Asym1V)

ga−→ g′

f | ga−→ f | g′

(Sym2)g

a−→ g′ a 6= !v

f where x :∈ ga−→ f where x :∈ g′

(Asym2)

JE(x) ∆ f K ∈ D

E(p)τ−→ [p/x].f

(Def)

Figure 4.1 – The Syntax (top) Operational Semantics (bottom) of Orc

The actions A are the transition labels of the Orc operational semantics, except for theτv action which is an intermediary action needed for creating heaps. The x are variablenames. They are placeholders for the value which will eventually replace that variable inthe expression. The return values vk are indexed by call handles. They are placeholdersfor the values returned from site calls. The ground values v are the constant values whichare always available.

Observe the following. Due to rule (Def), recursive definitions are possible in Orc.Also, rule (Asym1V) exhibits termination of g upon its first publication.

To simplify the translation, we assume that the Orc programs we consider have distinctvariable names. This restriction does not reduce the program’s expressivity and can beenforced by a simple syntactic pre-processing step.

4.4 Denotations for Orc Expressions

In this section, we show how to construct the heap of an Orc program, and then its LAES.We begin with further useful operations on heaps that are specific to Orc. Then, we providethe heap semantics of Orc base expressions and operators.

• Free Variables: E(x) is the set of all events in heap E which depend on x.

E(x) = e ∈ E | ∃e′ ∈ E, e′ ¹E e, α(e′) ∈ Mk(x), !x, τx

4.4 Denotations for Orc Expressions 91

Call x a free variable of E if E(x) is nonempty. Let E(x) be the events in E that donot depend on x: E(x) = E − E(x).

• Publication events: !E is the set of publication events of heap E:

!E = e | α(e) = !v

• Preemption: Stopping E after the first value publication is defined as:

stop(E) =def stop!E(E)

• Send: For a publication event e = (•e, e, !v), define the τ(e) to be the event obtainedby changing the label of e as follows:

α(e) =

τx if α(e) = !x, for any variable xτ otherwise

(4.15)

The heap send(E) is the heap E where all the publication events e in E are replacedby τ(e). The publication events are still identifiable by their marks.

• Link: For a heap E, a context heap C, an event f not belonging to E, and a value v,

link(f, v, x, E, C)

is a heap in which variable x is bound to value v after external event f . Thecontext heap C identifies parts of E that are not affected by the variable binding.link(f, v, x, E, C) is the heap resulting from the following operations:

1. Create E′ = copyf (E, C) a new copy of E with respect to context heap Cand marked with label f . In making this copy, each event e ∈ E has a uniquecorresponding event e′ = ϕf (e) ∈ E′.

2. Change all e′ = (•e′, e′, a) ∈ E′ as below, where e = ϕ−1f (e′):

e′ =

(•e′ ∪ (f, e), e′, [v/x]a) if e′ ∈ min(E′)

(•e′, e′, [v/x]a) if e′ /∈ min(E′)(4.16)

The substitution [v/x]a replaces the variable x by v in the action a. If the variablex does not occur in a, the substitution leaves a unchanged. In the heap constructedhere, the event f referred by e′ ∈ min(E′) is not in the heap.

• Receive: We next construct a heap that can receive any values that is published byanother heap. If e is a publication event, τ(e) is the event e with its action changedaccording to (4.15). We define

recvx(E, F, C) =⋃

f∈ !E,α(f)= !v

link(τ(f), v, x, F, C)

Observe that, if !E is empty, this yields recvx(E, F, C) = ∅.

• Pipe: The pipe operator allows G to receive publications from F , subject to a contextC that identifies parts of G not affected by the communication.

pipex(F, G, C) = send(F ) ∪ recvx(F, G, C)


4.4.1 Heaps of Base Expressions

For an Orc expression f , [f ] is its heap denotation. In the following, nil is a distinguishedsymbol indicating the absence of mark.

[0] = ∅[ let(v)] = (c, ∅, !v)

where condition c = (⊥,nil)

[?k] = e = (c1, ∅, k?vk), (c2, ∅, !vk) where condition c1 = (⊥,nil), c2 = (e,nil)

[M(v)] = e = (c1, ∅, Mk(v)), f = (c2, ∅, k?vk), (c3, ∅, !vk) where condition c1 = (⊥,nil), c2 = (e,nil), c3 = (f,nil),k is fresh.

[E(v)] = [[v/x]f ]where E is an expression definition and E(x) ∆ f

4.4.2 Heaps for the Combinators

[f | g] = [f ] ⊎ [g] (4.17)

[f >x> g] = pipex([f ] , [g] , ∅) (4.18)

[g where x :∈ f ] = pipex(stop(F ), G(x), G(x)) ∪ G(x) (4.19)

where F = [f ]right and G = [g]left

Figure 4.2 gives the intermediary and the final heap for the Orc expression let(1) ≫ S(x) where x :∈ M | N. Note the two publications f1 and f2, by the paral-lel composition M | N . These are made conflicting by the extra (shaded) condition createdby the stop operator.

Following Section 4.2.3, we can now translate the heaps associated to Orc expressionsinto LAES. The LAES of an expression f is [[f ]] = A [ [f ] ] .

4.4.3 Recursive Definitions

The treatment of recursive definitions follows that given in [KCM06], except that thedenotation of an expression f is the heap [f ] instead of the set of traces 〈f〉. The heap fora recursive Orc definition f ∆ Exp(f) is the limit of a series of increasing approximations0 ⊑ Exp(0) ⊑ Exp(Exp(0)) ⊑ . . . . To ensure existence of the limit, the least fixpoint ofExp, we show that the Orc combinators are monotonic with respect to ⊑. For F and Gtwo heaps, define

F ≺ G if F ⊆ G and CF ∩ CG−F = ∅ (4.20)

Then for Orc expressions, f ⊑ g if [f ] ≺ [g]. The motivation for having the second conditionin (4.20) is that it is needed in the proof of Lemma 4.6 below.

Lemma 4.5 Relation ≺ is a partial order on heaps.

4.5 Correctness of Orc heap semantics 93

!v3 !v4

Mk1 Nk2

e f2e f1

f1 f2τ e

Sk3(v1)

k3?v3

!v3

k4?v4

!v4

Sk4(v2)

Sk3(v1) Sk4(v2)

k3?v3 k4?v4

k1?v1 k2?v2

Sk3(x)

k3?v3

!v3

[let(1) ≫ S(x)where x :∈ M | N]

Mk1 Nk2

k1?v1 k2?v2

τ τ τ

F = [M | N ]

send(stop(F))ττ

G

G(x)

G(x)

re vx(stop(F ), G(x), G(x))

G = [let(1) ≫ S(x)]

Figure 4.2 – Heap Construction Example: The shaded condition is the (⊥, stop) condition introduced by thestop operator. A dashed arrowhead to a minimal condition of the recv heap from an event name statesthat the condition depends on that external event. The external events here are e and f1, f2 in heaps G(x)and send(stop(F )) respectively. When these heaps are combined in the rightmost heap, these events becomeinternal events.

Proof. Assume F≺G≺H. So CF ∩ CG−F = ∅ and CG ∩ CH−G = ∅. Writing H − F =(H − G) ∪ (G − F ), we have CF ∩ CH−F = (CF ∩ CH−G) ∪ (CF ∩ CG−F ). The secondterm is an empty set. Since F ⊆ G, we have CF ⊂ CG. This gives CF ∩ CH−G = ∅ whichensures F≺H and proves the lemma.

Lemma 4.6 The Orc combinators are monotonic in both arguments. In particular, givenf ⊑ g, then

f | h ⊑ g | hf >x> h ⊑ g >x> hh >x> f ⊑ h >x> g

f where x :∈ h ⊑ g where x :∈ hh where x :∈ f ⊑ h where x :∈ g

Proof : The proof is given in Appendix A.3.

4.5 Correctness of Orc heap semantics

In this section we prove the correctness of the heap semantics for Orc. We show that theLAES [[f ]] of an Orc program f encodes all the possible actions that f may perform whileexecuting. We develop a one-one correspondence between the configurations of [[f ]] and


sequence of actions that f performs. However, instead of directly considering the LAES[[f ]], we establish the correspondence between the heap [f ] and f . We can do this becauseof the following theorem:

Theorem 4.7 For any Orc expression f , the set of events in [[f ]] is equal to the set ofevents in [f ].

Proof: The theorem can be shown by recursing over the structure of the expression f .See Appendix A.4 for the complete proof.

We will need a notion of the future of a heap, i.e. the residual heap after one of itsminimal event has occurred. For a heap H and a minimal event e of H, we denote thefuture of H after the occurence of e by the heap H \ e, where

H \ e = H − e′ ∈ H | ∃e∗, e∗ ¹ e′, e∗ ր e.

Intuitively, the future of H after the occurence of a minimal event e contains only thoseevents of H that can occur after e has occurred. This is got by removing from H all theevents that are pre-empted by e, and the causal successors of such events.

In the correspondence theorem below, fa−→ f ′ denotes a transition of the expression

f to f ′ according to the SOS rules of Figure 4.1, where a ∈ Mk(v), k?v, !v, τ and v isa constant value. For e.g, if f = let(x), no such transition f

a−→ f ′ can occur, since noSOS-rule of Figure 4.1 exists for let(x) when x is a variable. However since the heap ofexpressions like let(x) can be defined, for the correspondence theorem we will need todistinguish between the events of the heap whose actions represent transitions that canoccur according to the SOS rules, and those that do not. This is done by identifyingevents of the heap that have free variables in their labels. We call an event e whose labelα(e) does not have any free variables in it a non-free event. For a non-free event e,α(e) ∈ Mk(v), k?v, !v, τ, where v is a constant. The correspondence theorem is nowstated as follows:

Theorem 4.8 (Correspondence Theorem) Let f be any Orc expression.

1. If fa−→ f ′ then there exists a minimal non-free event e of [f ] such that α(e) = a and

[f ] \ e = [f ′].

2. For every minimal non-free event e of [f ], there is an expression f ′ such that fα(e)−−→ f ′

and [f ] \ e = [f ′].

Proof. The proof is done by a structural induction over the Orc expression f . The com-plete proof is given in Appendix A.5. It is easy to verify that for any Orc expression f anda sequence [f ] \ e1 \ e2 . . ., the set of events e1, e2, . . . is a configuration of [f ] (andso of [[f ]] by Theorem 4.7). Theorem 4.8 establishes a correspondence between one step ofsuch a sequence: i.e. between [f ] \ e and a transition f

a−→ f ′. By recursively applyingthe theorem we get a direct correspondence between the configurations of [f ] (or [[f ]]) andthe sequence of actions, or traces of f .

4.6 Related Work

Closest to our present study is the work and [RBHJ06a], where Orc expressions are trans-lated to colored Petri net systems [DK04]. Another closely related work is reported in Bruni

4.7 Conclusion 95

et al. [BMT06], where the authors link the Orc language to Petri nets and the Join Calculus;it is advocated that Join Calculus, by offering means to support dynamic creation of namesand activities as well as pruning associated with asymmetric conflict, is an adequate formal-ism for orchestrations. For an approach that focuses on temporal properties without partialorders nor performance evaluation, see [DLZ06], where a Timed Automaton semantics ofOrc is given and used for verification purposes using the Uppaal tool. On the other hand, anumber of papers have been devoted to the Petri net semantics of the most widely used lan-guage for orchestration, namely BPEL, see [HSS05, LMSW06, OVvdA+05, vBK05, KvB04]and the tutorial [vBK06].

Our work is unique in that it provides a direct coding of a wide area computing languageinto asymmetric event structures. This is of immediate use in QoS studies, as the latterbuilds on timed and/or probabilistic enhancements of partial order models [RBHJ06a].

4.7 Conclusion

We have presented a partial order semantics for Orc, a structured orchestration languagewith support for termination and recursive process instantiation. The semantics uses heapsto encode sets of interrelated events because they simplify manipulation of the fragmentsof program behavior that arise when analyzing the sub-expressions of a program. Thesefragments are composed to create effective heaps, from which more traditional asymmetricevent structures are derived. We show that the event structure semantics is equivalent toa previous denotational trace semantics.

The heap semantics provides a model of true concurrency and also directly supportanalysis of non-functional properties of Orc programs, including critical path and depen-dency analysis that can affect Quality of Service. A verbatim coding of the Orc heapsemantics has been written in Prolog—it takes only two pages of Prolog code.


Chapter 5

Branching Cells for Asymmetric

Event Structures

Sidney RosarioIRISA/INRIA Rennes,Campus de Beaulieu, Rennes. France.

AbstractIn this chapter, we extend the notion of branching cells introduced for prime-event struc-tures in [AB06, AB08] to asymmetric event structures (AES). This involves extending thenotion of minimal conflict for AES. These notions are then used to compute the probabil-ity of the occurrence of any given event of a stochastic AES under a race policy, when theevents are assumed to occur with exponential delays.

98 Branching Cells for Asymmetric Event Structures

5.1 Event Structures

Event structures [NPW81] are mathematical models for concurrent systems, consisting ofa set of events and a description of the relationships between them. The events of theevent structure are partially ordered, and the causal, conflict and concurrency relationshipbetween events are made explicit. Event structures have been shown to be related to othermodels of concurrency: for e.g., in [NPW81], the unfoldings of 1-safe Petri nets are shownto have a one-one correspondence with a particular class of event structures called primeevent structures.

Various other classes of event structures can be defined, by placing different kinds ofrestrictions on the relationship between the events. In this chapter we study asymmetricevent structures, which were used in Chapter 4 to give a denotational semantics to Orcexpressions. Asymmetric event structures were defined in [BCM01], where they were shownto have a one-one correspondence with the unfolding of semi-weighted contextual nets.The derivation of the labelled asymmetric event structures of Orc expressions from theirheaps was in fact due to this correspondence. This relationship between asymmetric eventstructures and contextual nets justifies our study of asymmetric event structures. In thefirst two chapters of this thesis, we saw that read arcs, (and so contextual Petri nets) arespecially useful when modelling concurrent reads, pre-emption and termination. In thischapter, we consider asymmetric event structures, and extend the notion of branching cellsdefined for prime event structures in [AB06, AB08], to asymmetric event structures.

Branching cells have many interesting properties [AB06, AB08]. They capture thechoices made in course of an execution in a dynamic way: every maximal configuration canbe seen as a stack of branching cells, the choices in the execution being localised withinthese cells. In section 5.3 of this chapter we use the notion of stopping prefix defined inthe context of branching cells to compute the probability of the occurrence of an eventin a timed asymmetric event structure. In our timed asymmetric event structures, anenabled (minimal) event is assumed to occur with an exponential delay. This computationof the occurrence probability was done as part of the paper “Critical paths in the PartialOrder Unfolding of a Stochastic Petri Net” [BHR09], where we studied the probability ofan event’s delay to be critical for the delay of the overall execution.

5.1.1 Pre, Asymmetric and Prime Event Structures

Definition 5.1 (Pre-AES) A Pre-Asymmetric Event structure is a tuple (E,≤,ր), whereE is a set of events, ≤ and ր are binary relations on events in E such that:

1. ≤ is a partial order and ⌊e⌋ = e′ ∈ E | e′ ≤ e is finite for all e ∈ E.

2. For all e, e′ ∈ E:

(a) e < e′ ⇒ e ր e′

(b) ր⌊e⌋ is acyclic

where < (strict causality) is the irreflexive restriction of ≤. Condition (2b) implies inparticular that ր is irreflexive.

Figure 5.1 shows a Pre-AES with five events. The solid arrows represent causal depen-dency (e1 < e4 and e2 < e5) and the dashed arrows asymmetric conflict (e1 ր e2, e2 ր e1

and e2 ր e3). According to the definition, e1 ր e4 and e2 ր e5 also hold.The induced conflict relation #E between events of the Pre-AES is defined as:

5.1 Event Structures 99

.. .

..

e2

e5

e1

e4

e3

Figure 5.1 – A Pre-Asymmetric Event Structure.

1. e0 ր e1 ր . . . ր en ր e0 ⇒ #E(e0, e1, . . . en)

2. If #E(A ∪ e) and e < e′, then #E(A ∪ e′)

In Figure 5.1, since e1 ր e2 ր e1, we have #Ee1, e2. We write e#Ee′ when the conflictis binary, i.e., if #Ee, e′.

Definition 5.2 (AES) An Asymmetric Event structure (AES) is a Pre-AES (E,≤,ր)such that for all e, e′ ∈ E, if e#Ee′ ⇒ e ր e′

The ր relation of any Pre-AES can be saturated to make it an AES. For example, tothe Pre-AES in Figure 5.1, by adding

(e1, e5), (e5, e1), (e2, e4), (e4, e2), (e4, e5), (e5, e4)

to the asymmetric conflict relation, we get an AES.A prefix of E is any subset P ⊆ E such that P is downward-closed i.e., ⌊e⌋ ⊆ P for

all e ∈ P .

Definition 5.3 (AES Configuration) A configuration of AES (E,≤,ր) is a set C ⊆ Eof events such that:

1. րC is well-founded (C has no ր-cycles)

2. C is a prefix

3. e′ ∈ C | e′ ր e is finite for all e ∈ C

The third conditions ensures that for any event in a configuration, there are only afinite number of events that need to occur before it.

Order on Configurations, Maximal Configurations: We define an order ≺E between con-figurations of an AES (E,≤,ր). For two configurations v, v′ of E, v ≺E v′ if:

1. v ⊂ v′

2. ∄ e ∈ v′ \ v, e′ ∈ v such that e ր e′

v′ is said to extend v. The notion of order here is more strict than the set inclusionorder defined for prime event structures. The configuration v′ that extends a configurationv can not have any events that are preempted by events in v. Intuitively, a configurationv′ extends a configuration v if v′ can be reached after v has been reached.

Definition 5.4 (Compatible Configurations) Two configurations v and v′ are said tobe compatible if


1. v ∪ v′ is a configuration.

2. v ¹ v ∪ v′ and v′ ¹ v ∪ v′.

Two events e and e′ are said to be compatible if ⌊e⌋ and ⌊e′⌋ are compatible. Event e issaid to be compatible with a configuration v if ⌊e⌋ and v are compatible.

Maximal Configurations: We denote by VE the set of configurations of E and by VE theset of finite configurations of E. Let ¹E denote the reflexive closure of ≺E . (VE ,¹E) is apartial order of configurations. Any chain of these configurations has an upper bound andso from Zorn’s Lemma, this partial order has maximal elements, which we call maximalconfigurations. The set of maximal configurations of E is denoted by ΘE . ∀ω ∈ ΘE , v ∈VE , ¬(ω ≺E v).

In future, we will omit the use of subscripts for the relations when the associated eventstructure is obvious. We will also use only the set of events E to refer to the event structure(E,≤,ր).

Future of a configuration:: For a configuration v of AES E, the future of v in E, denotedby Ev is a subset of E with the nodes:

Ev = e ∈ E \ v | e is compatible with v

Examples: In the event structure in Figure 5.2, the future of the event b are the eventsc, e since they are the only events compatible with b. In Figure 5.3, the future of theinfinite configuration e1, e2, . . . is empty. The event e does not appear in this future sincee1, e2, . . . ∪ e is not a configuration.

...

..

.. ec

a

c d e

b

v = b

Ev

Figure 5.2 – Future of a configuration: The right AES is the future of the configuration v = b.

...

.

e1 e

e2

e3

v = e1, e2, . . .

Ev

∅

Figure 5.3 – Future of configuration: The future of the configuration e1, e2, . . . is empty.

5.1 Event Structures 101

Let configuration u ∈ VE and v ∈ VEu . We define the concatenation of any such u andv as:

u ⊕ v =def u ∪ v

We define the inverse of concatenation:

u ⊖ v =def u \ v

for u ∈ VE , v ∈ VE , v ⊆ u.

5.1.2 Minimal Asymmetric Conflict, Stopping Prefix

For an AES (E,≤,ր), define the binary pure-asymmetric conflict relation = (ր \ <)that is, ee′ ⇒ (e ր e′) ∧ ¬(e < e′).

Definition 5.5 (Minimal Asymmetric Conflict) For an AES (E,≤,ր), two eventse, e′ ∈ E are in minimal asymmetric conflict, e րm e′ if:

1. e e′

2. (⌊e⌋ × ⌊e′⌋) ∩ # = (e, e′) or ∅

3. (e × ⌊e′⌋) ∩ ր= (e, e′)

րm is not symmetric in general. If e րm e′ and e′ րm e, then e, e′ are said to be inminimal symmetric conflict or simply in minimal conflict (#m).

.. .

.

e2

e5

e1

e4

.. .

.

e2

e5

e1

e4

.. .

.

e2

e4

e1

e3

Figure 5.4 – Minimal Conflicts in AES. The non-minimal conflicts are the shaded dashed arrows.

.. .

..

.

a

d

c

f

b

e

Figure 5.5 – Minimal Conflicts in AES.

Definition 5.6 (Stopping Prefix) A subset B ⊆ E is called a stopping prefix if

1. B is a prefix

2. B is closed under րm. That is, for all e ∈ B, the sete′ ∈ E | (e րm e′) or (e′ րm e) is contained in B.

For a stopping prefix B, a configuration v of E is said to be B-Stopped if v ∈ ΘB.Any configuration v of E is called a stopped configuration if there is a stopping prefixB such that v ∈ ΘB. We have the following important lemma:


Lemma 5.7 For every stopping prefix B of E and every ω ∈ ΘE

ω ∩ B ∈ ΘB

Hence every stopping prefix B induces a mapping πB : ΘE → ΘB

πB(ω) = ω ∩ B

Moreover,ω ∩ B ¹ ω

Proof: Call ωB = ω ∩B. Clearly ωB is a configuration (B is a prefix and the asymmetricconflict relation in ωB is the restriction of ր to B). We first show that ωB ¹ ω i.e.,

∄e′ ∈ ωB, e ∈ ω \ ωB, (e ր e′) (5.1)

holds. Suppose that (5.1) does not hold. Since ⌊e′⌋ ∈ ωB and e /∈ ωB, we have ee′. Alsosince e and e′ are in the same configuration ω, ¬(e#e′) and so condition 2 of Definition 5.5holds. But since e /∈ B, then there should exist an event e′′ ∈ ⌊e′⌋ \ e′ such that e ր e′′.Since e′′ ∈ ωB, we can recursively apply the above reasoning to get an infinite chains ofdistinct events e′ > e′′ > . . . in ωB which is not possible.

We now prove that ωB is a maximal configuration of B by contradiction. Suppose thatωB /∈ ΘB. Then there exists an event e∗ ∈ B \ ωB such that ωB ∪ e∗ is a configurationand ωB ≺ ωB ∪ e∗. Consider ω ∪ e∗. We show that it is a configuration:

1. ω ∪ e∗ is a prefix since ω and ωB ∪ e∗ are prefixes.

2. We need to show that e ∈ ω ∪ e∗ | e ր e′ is finite for all e′ ∈ ω ∪ e∗. But sinceω and ωB ∪ e∗ are configurations, we only need to show that e ∈ ω \ ωB| e ր e∗ isfinite. We show that this set is empty. Consider any e ր e∗ where e ∈ ω \ ωB . We have¬(e < e∗) (since ⌊e∗⌋ ∈ ωB) and so ee∗. Since e /∈ B, either condition 2 or condition 3(or both) of definition 5.5 is not satisfied. If condition 3 is not satisfied, this means that∃ e′ ∈ ⌊e∗⌋ \ e∗ such that e ր e′. But since e′ ∈ ωB this contradicts (5.1). If condition 2is not satisfied, then e#e∗ holds. Let e1 ≤ e, e2 ≤ e∗ be the minimal events of this conflictsuch that (⌊e1⌋×⌊e2⌋)∩# = (e1, e2). Then both e1 and e2 are in B (since e∗ ∈ B). Butthis would mean that e1 and e2 are also in ωB ∪ e∗ which contradicts our assumptionthat ωB ∪ e∗ is a configuration.

3. To show րω ∪ e∗ that is acyclic, we note that րω is acyclic. Any cycle of conflicts inω ∪ e∗ would thus have e∗ in it. But since ∄e ∈ ω \ ωB such that e ր e∗, the cycle willhave e ր e∗ with e ∈ ωB. Since րωB∪e∗ is acyclic, the cycle will have to have eventse′ ∈ ω \ ωB and e′′ ∈ ωB such that e′ ր e′′. This violates (5.1) and so cannot be true.

Thus ω ∪ e∗ is a configuration. We finally show that ω ≺ ω ∪ e∗. But sinceωB ≺ ωB ∪ e∗, we have that

∀e′ ∈ ωB, ¬(e∗ ր e′) (5.2)

holds. So we only need to show that ∄e ∈ ω \ ωB such that e∗ ր e. If such an eexists, we have e∗e and ¬(e∗#e). But since e /∈ B, there exists e′ ∈ ⌊e⌋ \ e such that(e∗ × ⌊e′⌋)∩ ր= (e∗, e′). This e′ would have to be in stopping prefix B ((e∗e′) and¬(e∗#e) holds). But this contradicts (5.2).

⋄

5.2 Recursive Stopping, Branching Cells 103

The set of stopping prefixes of E forms a (complete) lattice (B,¹). The union of twostopping prefixes B1, B2 ∈ B, B1 ∪B2 is a stopping prefix in B and so is their intersectionB1 ∩ B2.

The intersection of all stopping prefixes containing an event e ∈ E gives the uniqueminimal stopping prefix containing e, B(e). Call an event structure locally finite if forall events e ∈ E B(e) is finite. This also means that every finite subset A of E has a finitestopping prefix that contains A.

Remark: For a locally finite event structure, a finite stopped configuration is the sameas a finitely stopped configuration. In this case, for every finite stopped configuration v,there is a finite prefix B for which v ∈ ΘB.

Lemma 5.8 If B is a stopping prefix of E and v is a configuration of E, B ∩ Ev is astopping prefix of Ev.

Proof: The minimal asymmetric conflict relation րvm of Ev is the restriction to Ev of the

minimal asymmetric conflict relation րm of E i.e.,

րvm=րm ∩ (Ev × Ev) (5.3)

The lemma follows from the above equation. ⋄

Proposition 1 If E is locally finite, then for every configuration v of E, the future Ev islocally finite.

5.2 Recursive Stopping, Branching Cells

The class of stopped configurations is not closed under concatenation. In general, for astopped configuration v of E and a stopped configuration u of Ev, v ⊕ u is not necessarilya stopped configuration of E.

As an example consider the event structure E to the left in Figure 5.6. a, b is astopping prefix and so configuration b is a stopped configuration of E. The events in thefuture Eb are shown in black in the figure on the right. c and e are both stopping prefixesso e is a stopped configuration of Eb. b⊕ e is however, not a stopped configuration ofE.

.

..

.. . .

.

.$d$

.

$b$$a$a

c d e

b

c e

Figure 5.6 – Stopped configuration are not closed under concatenation.

Since we will be interested in recursively concatenated stopped configurations of E, wedefine the set of recursively stopped configurations of E, WE .

Definition 5.9 (R-Stopped Configurations) The set of recursively stopped (R-stopped)configurations of E, WE is defined as follows:

1. ∅ ∈ WE

2. If u ∈ WE and v is any finite stopping configuration of Eu, then u ⊕ v is in WE.


3. WE is closed under supremum of non-decreasing sequences.

We denote by WE the set of finite R-stopped configurations of E. A R-stopped configu-ration thus has a decomposition of finite R-stopped configurations. Such a decompositionhowever, may not be unique.

Lemma 5.10 Let B be a stopping prefix of E and v be a configuration of B. Then,

1. D is a stopping prefix of Bv ⇒ D is a stopping prefix of Ev.

2. D is a stopping prefix of Ev ⇒ D ∩ B is a stopping prefix of Bv.

Proposition 2

1. If B is a stopping prefix of E, R-stopped configurations of B are those R-stoppedconfigurations of E contained in B. Moreover,

v ∈ WE ⇒ v ∩ B ∈ WB

2. For every pair u, v of configurations, we have:

u ∈ WE , v ∈ Wu ⇒ u ⊕ v ∈ WE

Definition 5.11 (Initial Stopping Prefix) A non-empty stopping prefix B is an initialstopping prefix of E if the only stopping prefix strictly contained in B is ∅.

Theorem 5.12 If they exist, initial stopping prefixes of E are disjoint. If E is locally finiteor pre-regular, every non-empty stopping prefix of E contains an initial stopping prefix.

Proof: The first statement follows from the observation that intersection of stoppingprefixes are stopping prefixes themselves.

If E is locally finite, any stopping prefix B contains a finite (nonempty) stopping prefix.(for e.g., take B(e) where e is a minimal event in B). A finite stopping prefix contains aninitial stopping prefix and so does B.

If E is pre-regular, consider a stopping prefix B of E. Let B∗ be the family of stoppingprefixes of E strictly contained in B. Using Zorn’s lemma, we can show that B∗ has a(nonempty) minimal element, which would be an initial stopping prefix contained in B.This is done by showing that every chain of decreasing prefixes of B has a lower bound inB∗.

Let I be a totally ordered sequence of indices and (Bi)i∈I be a sequence of prefixescontained in B such that ∀i, j ∈ I such that i > j, then Bi ⊂ Bj . We claim thatC =

⋂i∈I Bi is a lower bound of this chain. Since C is a stopping prefix, it is contained in

B∗. To show that C is non-empty, let us assume the contrary. Let e1 be a minimal elementof B1. Since C = ∅, we can construct a sequence of events (en)n≥N where N ⊆ I such thaten is a minimal event in Bn and e1, . . . en−1 /∈ Bn. (en)n≥N are all pairwise distinct andminimal and thus E is not pre-regular. The contradiction implies that C 6= ∅.

⋄

Proposition 3 If E is locally finite, then every initial stopping prefix of E is finite. If Eis pre-regular, initial stopping prefixes of E are finitely many.

5.2 Recursive Stopping, Branching Cells 105

Proof: If E is locally finite, every stopping prefix contains a finite stopping prefix and sothe first statement follows.

Every initial prefix of E contains a minimal event of E. Since the initial stoppingprefixes are disjoint, any event structure having infinitely many initial stopping prefixeswill have infinitely many minimal places and so will not be pre-regular.

⋄

5.2.1 Branching Cells

The sequel assumes that the event structure E is locally finite.

Assumption 1 Event structure E is locally finite.

Under this assumption, according to Proposition 1 all the futures Ev are also locallyfinite.

Definition 5.13 (Branching Cell) A branching cell of E is any initial stopping prefixof Ev, where v is a finite R-stopped configuration of E (v ∈ WE).

The branching cells that are initial stopping prefixes of Ev where v ∈ WE are denotedby δE(v) (or δ(v) for short) and are called the branching cells enabled by v.

Proposition 1 and Proposition 3 together with Assumption 1 imply the following:

Proposition 4 Every branching cell of E is finite.

Covering through branching cells: Every R-stopped configuration can be uniquelycharacterised by a set of branching cells in the following way:

Lemma 5.14 Let v be a R-stopped configuration of E. Then there exists a decompositionof v, (vn)0≤n≤N , where N ≤ ∞, and branching cells (cn)0<n≤N such that:

1. cn+1 is enabled at vn.

2. vn+1 ⊖ vn is a maximal configuration of cn+1.

The branching cells cn are pairwise disjoint. Moreover, if there exists another such pair ofsequences (v′n)0≤n≤N ′ and (c′n)0<n≤N ′ , then

cn, 0 < n < N = c′n, 0 < n < N ′

See appendix B.1 for the proof.

Definition 5.15 The covering ∆E(v) (or ∆(v) for short) of a R-stopped configuration vis defined as the set of branching cells:

∆E(v) = cn, 0 < n < N

where (vn)0≤n≤N , and (cn)0<n≤N are the two sequences associated to v in Lemma 5.14.


5.2.2 Max-initial decomposition

Definition 5.16 Let E be a pre-regular event structure. The max-initial stopping pre-fix of E, B0(E) is the union of all the initial stopping prefixes of E. We also take B0(∅) = ∅.

Since E is both pre-regular and locally finite, its initial stopping prefixes are finite andfinitely many. Therefore, B0(E) is finite. Moreover, B0(E

v) is finite for any configurationv of E.

Theorem 5.17 Let E be a pre-regular event structure. Then every maximal configurationω is R-stopped. A valid decomposition of ω is given by the sequence (vn)n≥0 where:

v0 = ∅, ∀n ≥ 0, vn+1 = vn ⊕ zn+1, zn+1 = ω ∩ B0(Evn)

(vn)n≥0 is called the max-initial decomposition of ω.

Proof: Clearly vn ⊆ ω. Also zn+1 = ω \ vn ∩ B0(Evn). Since ω \ vn is a maximal

configuration of Evn and since B0(Evn) is a stopping prefix of Evn , by Lemma 5.7, zn+1

is maximal in B0(Evn). Thus to show that (vn)n≥0 is a valid decomposition of ω, we only

have to show that:

ω ⊆⋃

n≥0

vn (5.4)

Call v =⋃

n≥0 vn.

Step 1: We claim that (5.4) holds if E is finite. In this case, there is an integer N suchthat v = vN = vN+1. Then zN+1 = vN+1 ⊖ vN = ∅. This implies that B0(E

vN ) = ∅, and

so Evn = ∅. This means that vn is maximal in E. Since ω ⊇ vN , we get v = vN = ω.

Step 2: Let B be a finite stopping prefix of E and let ωB = ω∩B. Then, by Proposition2, (v′n)n≥0 is a max-initial decomposition of ωB, where v′n = vn ∩ B.

Step 3: Let B be a finite stopping prefix of E and let (v′n)n≥0 be the max-initial decom-position of ωB = ω ∩ B. Then, we have

v =⋃

n≥0

vn ⊇⋃

n≥0

(vn ∩ B) =⋃

n≥0

v′n = ωB

Since this holds for any finite stopping prefix B of E, we have:

v ⊇⋃

B∈B

ω ∩ B (5.5)

where B is the lattice of finite stopping prefixes of E. Now consider any e ∈ ω. Since Eis locally finite, there exists a finite stopping prefix D such that e ∈ D. Thus, from (5.5),we have e ∈ v. This holds for any e ∈ ω and so we conclude that ω ⊆ v which was whatthat needed to be proved. ⋄

5.3 Stochastic AES and occurrence probabilites

In this section we equip the events of the AES with (probabilistic) timing properties andwe compute the probability for any particular event to occur. We will start by definingstochastic asymmetric event structures. We will then define how an event “occurs” in thesestructures, and finally we compute the probability for this occurrence.

5.3 Stochastic AES and occurrence probabilites 107

5.3.1 Stochastic AES

Our stochastic AES are obtained by simply associating stochastic delays to the events ofthe AES. We do this assigning an exponential delay distribution to every event of the AES.

Definition 5.18 (Stochastic AES) A stochastic AES is a pair (E,≤,ր, λ) where

1. (E,≤,ր) is an asymmetric event structure.

2. λ : E → R+ gives the rate parameter of the exponential delay distribution Pe forevery event e ∈ E.

Note that our stochastic AES is not to be confounded with the Timed Event Structuresin [KBL01], where delays merely indicate when an event may occur (but is not forced to).

The delay δ(e) of an event e follows the exponential distribution Pe with a rate param-eter λ(e). For convenience, we will write λe to mean λ(e). Observe that δ(e) ∈ [0,∞) andthe value ω = (δ(e))e∈E in the space Ω = [0,∞)E defines a delay for every event e ∈ E .We will see that every such ω defines a unique maximal configuration θ of E. We makethe following assumption:

Assumption 2

1. The measures (Pe)e∈E are pairwise independent.

2. No Pe has atoms: ∀ e ∈ E : ∀x ∈ [0,∞) : Pe(x) = 0.

Heights. Let us denote the immediate predecessors of an event e by the set

e = e′ ∈ E | e′ < e, e′ ≤ e′′ < e ⇒ e′ = e′′.

We use ⊥ to denote a dummy initial event and set e = ⊥ if e is a minimal event of E .The height of an event e for a given ω is then defined (see, e.g., [MG99]) recursively by

H(e, ω) , maxe′∈e

H(e′, ω)

+ δ(e) and H(⊥, ω) = 0; (5.6)

A configuration κ of E has height

H(κ, ω) , maxe∈κ

H(e, ω) . (5.7)

Note that only the causality relation < and the delays δ are used in the computation ofH(e, ω), the conflicting events have no influence. So H(e, ω) is defined statically, withoutrefering to a dynamic execution of E .

For τ ∈ [0,∞), denote by Eτ (ω) , e | H(e, ω) 6 τ the random set of those eventswhose height is bounded by τ .

Theorem 5.19 Under Assumption 2.1, the following properties hold.

1. H(e, ω) < ∞ for all e ∈ E and almost all ω ∈ Ω.

2. H(e, ω) 6= H(e ′, ω) almost surely for any e, e ′ ∈ E such that e 6= e ′.

3. For all τ ∈ [0,∞), the set Eτ (ω) is finite for almost all ω.

Proof: See appendix B.1.2.


5.3.2 Occurrence of an event

The occurrence of event e is defined by an execution policy. We adopt a race policy forthe execution of events: the first minimal event whose delay expires occurs. The occurringevent will preempt any of its competitors.

To formalise the notion of occurrence in our race policy we define the occurrence pred-icate occ(e, ω). This predicate is true if and only if e occurs under ω; i.e., all of e’spredecessors e occur under ω, and none of the events that preempt e occur. Formally wehave the following recursive definition for occ(e, ω):

Definition 5.20 (Occurrence predicate) For every ω ∈ Ω, e ∈ E ,

1. occ(⊥, ω) = true.

2. occ(e, ω) = true iff

(∀ e ′ ∈ e : occ(e ′, ω)) ∧ (∀ e ′ ∈ check(e, ω) : ¬occ(e ′, ω)) (5.8)

hold, where check(e, ω) , e ′ | e ր e ′ ∧ H(e ′, ω) 6 H(e, ω).

The set check(e, ω) is the set of events that preempt e, and which have a height lesserthan that of e in ω. If any of the events in check(e, ω) occur under ω, then e can not occurunder ω. For all e ∈ E , define

Occ(e) = ω | occ(e, ω),

the set of ω under which e occurs. The set R(ω) = e ∈ E | occ(e, ω) gives the set ofevents that occur under ω. We then have the following lemma:

Lemma 5.21 For almost all ω ∈ Ω, R(ω) is a maximal configuration of E, i.e., R(ω) ∈ ΘE.

Proof: See appendix B.1.3.

5.3.3 Probability of occurrence

From equation (5.8), we see that the occurrence of an event e under any ω is determinedby ⌊e⌋ and the set of events e′ | e ր e′. In fact, the latter set can be further restrictedto events e′ that are in minimal conflict րm with e. It is easy to see in equation (5.8),that if e ր e′, e ր e′′ and e′ < e′′, then H(e′′, ω) — and so e′′ — has no influence on thevalue of occ(e, ω). Thus the set of events which completely determine the occurrence of anevent e is a prefix of E containing e, and which is closed under minimal conflict րm. Thisis infact the minimal stopping prefix B(e) that contains e.

We are interested in calculating the probability that event e occurs in an execution ofE, i.e., we want to compute P(Occ(e)). Now

Occ(e) = ω | occ(e, ω) = ω | e ∈ R(ω).

Occ(e) can be partitioned into equivalence classes of runs in the following way: in anyequivalence class C, any two runs ω1, ω2 are such that R(ω1) ∩ B(e) = R(ω2) ∩ B(e).For any run ω ∈ C, the set of events R(ω) ∩ B(e) is the same, denoted by κC . Notethat since R(ω) is a maximal configuration of E and since B(e) is a stopping prefix, fromLemma 5.7 we have that κC is a maximal configuration of B(e). In fact, for every maximal

5.4 Conclusion 109

configuration κe of B(e) that contains e, there is an equivalence class C such that κC = κe.Denote the set of equivalence classes of Occ(e) by Occ(e)/B(e). Thus

Occ(e) =⋃

C∈Occ(e)/B(e)

C

and soP(Occ(e)) =

∑

C∈Occ(e)/B(e)

P(C).

Let p(κC) denote each term of this summation. In fact, p(κC) gives the probability thatκC is the maximal configuration of B(e) in any run ω. Thus

P(Occ(e)) =∑

κe∈ΘB(e)

p(κe). (5.9)

where κe is a maximal configuration of B(e) that contains e. Note that equation (5.9) canonly be computed when B(e) is finite, i.e., E is a locally-finite event structure. We nowneed to compute all possible ways in which a maximal configuration can occur in B(e).This can be done by considering the graph of configurations of E. This graph is infact aMarkov chain, where the states are the configurations κ of E, and there is an arc from stateκ to state κ′ iff κ′ = κ ∪ e for some event e and κ ≺ κ′. Intuitively, the arc represents astep in the execution of E, going from configuration κ to κ′ due to the occurrence of evente.

The probability of taking the transition from κ to κ′ (i.e., to κ ∪ e) is equal to theprobability that e is the next event to occur after κ. Due to our race policy, this is equalto the probability that event e has the least delay amongst all such possible ’next’ events— the events enabled in configuration κ. This set of enabled events, enab(κ), is exactlythe set of minimal events in the future Eκ. Since the delays of events e are exponentiallydistributed with a rate parameter λe, the probability Pκ,κ′ to go from state κ to stateκ′ = κ ∪ e is thus given by

Pκ,κ′ =λe∑

e′∈enab(κ1) λe′.

The initial state of the Markov chain is the minimal configuration ⊥ and the maximalstates are the maximal configurations of E. Let prec(κ) denote the set of immediatepredecessor states of κ in the graph. We obtain p(κ) recursively as:

p(κ) =∑

κ∗∈prec(κ)

p(κ∗).Pκ∗,κ,

taking p(⊥) = 1.

5.4 Conclusion

Asymmetric event structures (AES) are models for concurrent computation, which areuseful to model pre-emption between events. We have extended the notion of minimalconflict and branching cells to the case of Asymmetric event structures. These notions wereused in computing the probability of the occurrence of a given event in a timed extensionof AES, under a race policy. In [BHR09] we have further investigated the question ofcomputing the probability of an event being critical in an execution.


Chapter 6

Probabilistic QoS and soft contracts

for transaction based Web services

orchestrations



AbstractService level agreements (SLAs), or contracts, have an important role in web services. Theydefine the obligations and rights between the provider of a web service and its client, withrespect to the function and the Quality of the service (QoS). For Web service orchestrations,contracts are deduced by a process called QoS contract composition, based on contractsestablished between the orchestration and the called web services. These contracts aretypically stated in the form of hard guarantees (e.g., response time always less than 5msec). Using hard bounds is not realistic, however, and more statistical approaches areneeded.

In this paper we propose using soft probabilistic contracts, which consist of a probabilitydistribution for the considered QoS parameter—in this paper, we focus on timing. We showhow to compose such contracts, to yield a global probabilistic contract for the orchestration.Our approach is implemented by the TOrQuE tool. Experiments on TOrQuE show thatoverly pessimistic contracts can be avoided and significant room for safe overbooking exists.

An essential component of SLA management is then the continuous monitoring of theperformance of called web services, to check for violations of the agreed SLA. We proposea statistical technique for run-time monitoring of soft contracts.

112 Probabilistic QoS and soft contracts for transaction based Web services orchestrations

6.1 Introduction

Web services and their orchestrations are now considered an infrastructure of choice formanaging business processes and workflow activities over the Web infrastructure [vdAvH02].BPEL [Bpe07] has become the industrial standard for specifying orchestrations. Numer-ous studies have been devoted to relating BPEL to mathematical formalisms for work-flows, such as WorkFlow nets (WFnets) [vdA97] a special subclass of Petri nets, or the pi-calculus [PW05]. This has allowed developing analysis techniques and tools for BPEL [OVvdA+07,AFFK05] including functional aspects of contracts,as well as techniques for workflow min-ing from logs [vdAvDH+03]. Besides BPEL, the Orc formalism has been proposed to specifyorchestrations, by W. Cook and J. Misra at Austin [MC07, KCM06]. Orc is a simple andclean academic language for orchestrations with a rigorous mathematical semantics. Forthis reason, our study in this paper relies on Orc. Its conclusions and approaches, however,are also applicable to BPEL.

Contract based QoS management When dealing with the management of QoS, con-tracts—in the form of Service Level Agreements, SLA [BSC01]—specify the commitmentsof each subcontractor with regard to the orchestration. Standards like web service LevelAgreement (WSLA) [KL03] proposed by IBM allow for specifying (and monitoring) QoSparameters of web services through contracts. Though there is no such standardizationfor QoS parameters of web services, most SLAs commonly tend to have QoS parameterswhich are mild variations of the following: response time (latency); availability; maximumallowed query rate (throughput); and security. In this paper, we focus on response time.

From QoS contracts with sub-contractors, the overall QoS contract between orchestra-tion and its customers can be established. This process is called contract composition; itwill be our first topic in this paper. Then, since contracts cannot only rely on trustingthe sub-contractors, monitoring techniques must be developed for the orchestrator to beable to detect possible violation of a contract, by a sub-contractor. This will be our secondtopic.

Hard versus Soft Contracts To the best of our knowledge, with the noticeable ex-ception of [LSW01, HWTS07], all composition studies consider performance related QoSparameters of contracts in the form of hard bounds. For instance, response times andquery throughput are required to be less than a certain fixed value and validity of answersto queries must be guaranteed at all times. When composing contracts, hard composi-tion rules are used such as addition or maximum (for response times), or conjunction (forvalidity of answers to queries).

Whereas this results in elegant and simple composition rules, we argue that this generalapproach by using hard bounds does not fit the reality well. Figure 6.1 displays a histogramof measured response times for a “StockQuote” web service which returns stock prices ofa queried entity [XMe]. These measurements show evidence that the tail of the abovedistribution cannot be neglected. For example, in this histogram, percentiles of 90%, 95%,and 98%, correspond to response times of 6,494 ms, 13,794 ms, and 23,506 ms respectively.Setting hard bounds in terms of response time would amount to selecting, e.g., the 98%percentile of 23,506 ms, leading to an over pessimistic promise, for this service.

In fact, users would find it very natural to “soften” contracts: a contract should promise,e.g., a response time in less than T milliseconds for 95% of the cases, validity in 99%of the cases, accept a throughput not larger than N queries per second for 98% of atime period of M hours, etc. This sounds reasonable but is not used in practice, partly

6.1 Introduction 113

0 0.5 1 1.5 2 2.5 3

x 104

0

50

100

150

200

250

300

350

400

Delays

No.

of o

ccur

ence

s

Figure 6.1 – Measurement records for response times, for Web service StockQuote.

because soft contracts based on a single percentile (e.g., 95% or 99% of the cases) as abovelack composition rules. To cope with this difficulty, we propose soft contracts based onprobability distributions. As we shall see, such contracts compose well.

Soft Probabilistic Contract Composition Having agreed on SLA or contracts withthe different sub-contractors, the orchestrator can then attach a probability distributionto the considered QoS parameters. If a combined executable functional-and-QoS model ofthe orchestration is available, it is then possible to compute the probability distribution ofthe same QoS parameter, for the orchestration.

Such a combined functional-and-QoS model of the orchestration requires enhancing or-chestration specifications with QoS attributes seen as random variables. This, however, isby itself not enough in general. More precise information regarding causal links relatingevents is needed. For example, latencies are added among events that are causally re-lated, not among concurrent events. Thus, we need to explicit causality, concurrency, andsequencing in the orchestration in a precise way, which amounts to representing orches-trations as partial orders of events. Some mathematical models of orchestrations providethis, e.g., the partial order semantics of WorkFlow nets [vdA97]. Our group has developeda tool TOrQuE (Tool for Orchestration Quality of Service evaluation) that directly pro-duces executions as partial orders, from an Orc program. The results reported here wereobtained by this tool.

Soft Probabilistic Contract Monitoring An essential component of SLA manage-ment is the run-time monitoring of contracts. SLA monitoring must be continuous totimely detect possible SLA violations. In case of a violation, the called service may haveto incur some agreed penalty. Alternatively, if the service is called by an orchestrator,the orchestrator might consider reconfiguring the orchestration to call an alternative ser-vice. The monitoring of probabilistic contracts requires using methods from statistics. Wepropose using statistical testing to check if the observed performance deviates from theperformance promised in the contract.

Organization of the paper In section 6.2.1 we present an example of an orchestration,which is then used to illustrate the primary challenges involved in QoS studies of webservices and their compositions. The example is also used in our experiments. In section


6.3, we present our general approach for contract composition and describe the TOrQuEtool supporting it. The simulations on contract composition, which show a potential foroverbooking are given in section 6.4. In section 6.5 we introduce our technique for moni-toring soft contracts. The experiments done on monitoring are reported in 6.6. Section 6.7gives a survey of the existing literature on QoS-enabled WS composition. Finally, section6.8 presents conclusions and outlooks.

6.2 QoS issues in web services and their compositions

In this section we will explain the main challenges faced in QoS studies of web servicesand their compositions. From this we will draw conclusions regarding how QoS studiesshould be performed, for web services orchestrations. This is done with the help of asample orchestration CarOnLine which we will present first. The CarOnLine example,which was developed in the SWAN project [SWA], is also used in our experimentationswith the TOrQuE tool.

6.2.1 Example of an orchestration

CarOnLine is a composite service for buying cars online, together with credit and insurance.A simplified graphical view of it is shown in Figure 6.2.

CarOnLine request (car)

GarageA GarageB

mux

AllCredit AllCreditPlus GoldInsure InsureAll InsurePlus

min min

merge

car = deluxe

CarOnLine

response

yes no

timer timer

p

pc i

Figure 6.2 – A simplified view of the CarOnLine orchestration. The calls to GarageA and GarageB are guardedby a timer that returns a “Fault” message whenever the timeout occurs—this is not shown on the figure. In thediscussion in section 6.2.2 regarding “monotonicity”, the test car = deluxe is changed to p ≥ limit.

On receiving a car model as an input query, the CarOnLine service first sends parallelrequests to two car dealers (GarageA, GarageB), getting quotations for the car. The calls toeach garage are guarded by a timer, which stops waiting for a response once the timeoutoccurs. If a timeout occurs, the response of the call is a Fault value. The best offer is chosenby the (local) function Mux which returns the minimum non-faulty value. If both timeoutsoccurs, Mux returns a Fault. Credit and insurances are found in parallel for the best offer.Two banks (AllCredit, AllCreditPlus) are queried for credit rates and the one offering a lower

6.2 QoS issues in web services and their compositions 115

rate is chosen. For insurance, if the car belongs to the deluxe category, any insuranceoffer by service GoldInsure is accepted. If not, two services (InsurePlus, InsureAll) are called inparallel and the one offering the lower insurance rate is chosen. In the end, the (car-price(p), credit-rate (c), insurance-rate (i)) tuple is returned to the customer.

CarOnLine(car) ∆ CarPrice(car) >p> let(p, c, i)where c :∈ GetCredit(p)

i :∈ GetInsur(p, car)

CarPrice(car) ∆ Mux(p1, p2)where

p1 :∈ (NetGA ≫ GarageA(car)) | Timer(T )p2 :∈ (NetGB ≫ GarageB(car)) | Timer(T )

>p> if(p 6= Fault)) ≫ let(p)

GetCredit(p) ∆ Min(c1, c2)where

c1 :∈ NetC ≫ AllCredit(p)c2 :∈ NetCP ≫ AllCreditP lus(p)

GetInsur(p, car) ∆ if(car = deluxe) ≫ GoldInsure(p) | ifnot(car = deluxe) ≫ min(ip, ia)

where ip :∈ InsureP lus(p)ia :∈ InsureAll(p)

Table 6.1 – CarOnLine in Orc.

The Orc program for CarOnLine is given in Table 6.1. We chose to use Orc because itis an elegant language equipped with formal semantics [KCM06, RKB+07b]. Orc definesthree basic operators.

For Orc expressions f, g, “f | g” executes f and g in parallel. “f >x> g” evaluates ffirst and for every value returned by f , a new instance of g is launched with variable xassigned to this return value; in particular, “f ≫ g” (which is a special case of the formerwhere returned values are not assigned to any variable) causes every value returned by f tocreate a new instance of g. “f where x :∈ g” executes f and g in parallel. When g returnsits first value, x is assigned to this value and the computation of g is terminated. All sitecalls in f having x as a parameter are blocked until x is defined (i.e., until g returns itsfirst value).

CarPrice calls GarageA and GarageB in parallel for quotations. Calls to these garages areguarded by a timer site Timer which returns a fault value T time units after the calls aremade. The let site simply returns the values of its arguments—sites can only executewhen all their parameters are defined and thus can be used to synchronize parallel threads.The value returned by CarPrice (here the variable p) is passed as argument to GetCredit andGetInsur which parallelly find credit and insurance rates for the price. The service NetGA inNetGA ≫ GarageA(car) is a dummy service that captures the contribution of the network to theresponse time of GarageA as perceived by the orchestration. No such call occurs in GetInsur.This is because the orchestration does not enter into contracts with the insurance sites,which are assumed to be freely available. The absence of a contract requires estimating theinsurance sites’ and the associated network’s performance. This is discussed in the nextsection.


6.2.2 QoS Issues for web service Orchestrations

With the help of CarOnLine, we now discuss how the QoS issues for service orchestrationsdiffer from traditional QoS studies.

6.2.2.1 Flow may be data dependent

In the GetInsure component of CarOnLine, there are two exclusive ways for getting insurancequotes for a car: either by calling GoldInsure or by calling InsureAll and InsurePlus in parallel.The choice of which branch is taken depends on the value of the parameter “car”. Inmost orchestrations, the execution flow usually depends on the values of its different dataparameters, which are unknown a priori. Thus by changing its execution flow, data valuesin an orchestration can directly affect its QoS.

6.2.2.2 Flow may be time dependent

In CarPrice component of CarOnLine, the calls to GarageA and GarageB are guarded by a timer.Depending on whether or not the garages respond before the timeout occurs, the orchestra-tion may decide to take different execution paths, directly affecting its performance. Thusthe presence of timers in orchestrations can also alter its control flow.

6.2.2.3 Orchestrations may not be “monotonic”

An implicit assumption in contract based QoS management is: “the better the componentservices perform, the better the orchestration’s performance will be.” Surprisingly, thisproperty that we called “monotonicity” [BRBH09] can easily be violated, meaning thatthe performance of the orchestration may improve when the performance of a componentservice degrades. This is highly undesirable since it can make the process of contractcomposition inconsistent. A contract based approach needs monotonicity.

Consider the CarOnLine orchestration of Figure 6.2, but slightly modified. The condition“car = deluxe” for deciding calls to insurance services is changed as follows: if the best pricereturned by the garages is p, then GoldInsure is called if p ≥ limit where limit is a certainconstant value. If p < limit, InsurePlus and InsureAll are called in parallel. Assume thatthe credit services AllCredit and AllCreditPlus respond extremely fast (almost 0 time units)and so the response time of the orchestration only depends on the response time of thegarage and insurance services. Let response times of the garage and insurance servicesGarageA, GarageB, GoldInsure, InsureAll and InsurePlus be δA, δB, δG, δI1 and δI2 respectively. Alsoassume that the price quotes p of GarageA are always greater than limit and that the pricequote of GarageB is always less than limit. Now, the overall orchestration response time isδO = max(δA, δB)+max(δI1 , δI2), assuming that both δA and δB are less than the timeoutvalue T.

Suppose that the performance of GarageB now deteriorates, and it does not respondbefore timeout time T . GarageA’s price quote is now the best quote. Since we assumed thatthe quotes of GarageA are always greater than limit, GoldInsure is called and the orchestration’slatency is δO′ = T+δG. In the case when δG ≪ max(δI1 , δI2), it is possible that δO′ < δO. Inother words, the deterioration of the performance of GarageB, could lead to an improvementin the performance of the orchestration.

Such a pathological situation does not occur in our original example since the responsetime of GetInsur depends only on the external parameter car. Once car is fixed, responsetimes behave in a monotonic way. Thus, our example is monotonic.

6.2 QoS issues in web services and their compositions 117

Of course, it may not be considered fair to compare the different situations on theonly basis of time performance, since they do not return the same data. A call alwaysimmediately returning “nothing found ” will have best timing performance, but is clearlynot satisfactory from the user’s viewpoint.

Further results regarding monotonicity can be found in [BRBH09]. To conclude onthis aspect, we believe that monotonicity should be considered from a broader perspective,taking into account both timing and other QoS parameters, as well as data.

6.2.2.4 Orchestrations face the Open World paradigm

The actors affecting the QoS of a web service orchestration are:

• the orchestration server;

• the web services called by the orchestration;

• the transport network infrastructure.

All these actors contribute to the overall QoS characteristics of the orchestration. There-fore, to be able to offer QoS guarantees, the orchestration needs QoS data from the othertwo types of actors.

In the context of networks, QoS studies assume knowledge of end-to-end resourcesand traffic, and use these to predict or estimate end-to-end QoS [FLBT+02]. This can,for example, be used for evaluating the end-to-end performance of streaming services,supported by a dedicated cross-domain VPN. The reason for being able to do this is that,once defined and deployed, the considered VPN has knowledge of its own resources andtraffic, which is enough to evaluate the QoS offered to the considered streaming service.

For our case of web services orchestrations, however, the situation is different:

• The orchestration has knowledge about the resources of its own server architecture.It knows the traffic it can support, and it can monitor and measure its own ongoingtraffic at a given time.

• The resources and extra traffic for each called web service are not known to theorchestration—other users of these sites belong to the “open world” and the orches-tration just ignores their existence.

• The resources and extra traffic for the transport network infrastructure are not knownto the orchestration—other traffic belongs to the “open world” and the orchestrationjust ignores it.

Due to the issues discussed above, traditional QoS techniques are not very appropriatewhen applied to the study of QoS in web services orchestrations. Contracts have emergedas the adequate paradigm for QoS of orchestrations and, more generally, of composite webservices in open world contexts.

6.2.3 Conclusions drawn from this discussion

From the above analysis, the following conclusions emerge regarding how QoS studiesshould be performed for web services orchestrations:

• To ensure consistency of QoS studies, we must only consider monotonic orches-trations, that is, orchestrations such that, if QoS of some called service improves,then so does the orchestration itself. Conditions ensuring monotonicity are foundin [BRBH09]. Our CarOnLine example is monotonic.


• Since, for general orchestrations, control flow may be data- and time-dependent, ana-lytical techniques for performance studies—such as typically used for networks [FLBT+02]—do not apply. One may consider restricting ourselves to finite data types and discretedomains for real-time, but then the computational cost of evaluating the QoS of theorchestration in all configurations may become prohibitive. This is why we chose torely on simulation techniques. Of course, such simulations must take into accountboth data and QoS aspects.

• Because of the “open world” paradigm, QoS evaluation cannot rely on a joint model ofresources and traffic for the web services called by the orchestrator. The contributionof each of the web service called, to the QoS of the orchestration must then beabstracted in some way. In our open world, this relies on a notion of trust betweenthe partners (the orchestration on one hand, and the called services on the other),formalized as an SLA. An SLA here is a contract about QoS, relating the orchestrationto the services it calls. In this approach, the orchestration has no means to be surethat such an SLA is faithful. Therefore, run-time monitoring of such contracts forpossible violation is needed.

As advocated in the introduction, we decided to work with soft probabilistic contracts.Then, for the above mentioned reasons, we chose to resort to Monte-Carlo simulationsto compose contracts and tune our monitoring algorithms. As this is a first study ofthis subject, we left aside the issue of implementing efficient Monte-Carlo simulations,e.g., by using importance sampling [SSG97].

In the following sections, we shall study contract composition, i.e., how the orchestration’scontract relates to the contracts established with the different called services, seen assub-contractors. Then, we shall study contract monitoring, i.e., the monitoring of sub-contractors for possible QoS contract violation.

6.3 Contract Composition and the TOrQuE tool

6.3.1 How to establish Probabilistic Contracts and how to composethem

In general, the orchestration will establish contracts or SLAs with the web services it iscalling. For S a called web service, we call S a sub-contractor in the sequel, the contractfor the considered QoS parameter has the form of a cumulative distribution function

FS(x) = P(δS ≤ x), (6.1)

where δS is the random QoS parameter (here the response time), and x ranges over thedomain of this QoS parameter (here R+).1

Regarding transport, different approaches might be considered. In a first “agnostic”approach, the orchestration will not contract regarding transport. The reason is thatthe orchestration does not want to know the network domains it may traverse. If QoSinformation regarding the transport layer is still wanted, this can be coarsely estimatedby sending “pings” to the considered site. In another approach, the orchestrator may want

1In practice, FS will be abstracted by either a finite set of quantiles (FS(x1), . . . , FS(xK), for a fixedfamily x1, . . . , xK of values for the QoS parameters) or a finite set of percentiles (e.g., the set of valuesy1, . . . , y9 such that FS(y1) = 10%, . . . , FS(y9) = 90%). Such contracts are easily expressible in terms ofthe WSLA standard [KL03].

6.3 Contract Composition and the TOrQuE tool 119

to contract with the network service provider (e.g., as part of Virtual Private Networkguarantees of service), very much in the way contracts are established with called webservices.

Finally, some web services, such as e.g., Google, may address huge sets of users andwould therefore not enter in a negotiation process with any orchestration. The distributionof such sites can be estimated on the basis of measurements.

To summarize, in designing contracts with its own customers, the orchestration: 1)uses the contracts it has agreed upon with its subcontracting web services, 2) may estimateQoS parameters for other web services it is using, and, 3) may estimate QoS parametersfor transport.

Based on this approach, we have developed the following Monte-Carlo procedure forQoS contract composition. This procedure is applied at design time:

• Contracts with the called sites have the form of probability distributions for theconsidered QoS parameters. From these, we draw successive outcomes for the tuples:

response to queries, associated QoS parameters

If no contract is available for a given site, we replace the missing probability distri-bution by empirical estimates of it, based on QoS measurements.

• Using a partial order execution model for the orchestration, we run Monte-Carlosimulations of the orchestration involving independent successive trials for the ran-dom latencies, thus deriving empirical estimates for the global QoS parameters of theorchestration.

• Having these empirical estimates, we can properly select quantiles defining soft con-tracts for the end user.

6.3.2 The TOrQuE tool

The TOrQuE (Tool for Orchestration simulation and Quality of service Evaluation) toolimplements the above methodology. Its overall architecture is shown in Figure 6.3. The

StamperTime

measure−ments generator

randombatch−wise

offlineprocessing

SLA Design

TraceReconstructor

Figure 6.3 – Overall architecture of the TOrQuE tool.

steps involved in the QoS evaluation and the TOrQuE modules that perform them arecommented next.

The orchestration model To ease the development of this tool, we decided to replacethe (complex) BPEL standard for specifying web services orchestrations by a light weightformalism called Orc [MC07]. The authors of this formalism have developed a tool [CM]which can animate orchestrations specified in Orc.


Getting QoS enhanced partial order models of executions This is performed bythe “Trace Reconstructor” module. Jointly with the authors of Orc, we have developed analternative mathematical semantics for Orc in terms of event structures [RKB+07b]. Eventstructures [BCM01] provide the adequate paradigm for deriving partial order models of Orcexecutions, in which causality and concurrency relationships between the different eventsof the orchestration is made explicit. Partially ordered executions can be tagged with QoSparameters which can then be composed. For example, Figure 6.4 shows how the responsetime of a fork-join pattern is computed from that of its individual events. These max-plus rules are used to combine delays in the partial order. The QoS parameter tagging of

Fork

Join

Call S1 Call S2

t1 = δfork

t6 = max(t2, t3) + δjoin

t3 = t1 + δS2t2 = t1 + δS1

Figure 6.4 – Deriving response time for a fork-join pattern. The “Fork” and “Join” are the branching andsynchronization events, S1 and S2 are two web services called in parallel. δa denotes the time taken for eventa to execute.

the partial ordered executions and their composition is implemented in TOrQuE ’s tracereconstructor module (see Figure 6.3). Arbitrary patterns encountered in Orc specificationscan be handled by this module.

6.3 Contract Composition and the TOrQuE tool 121

GarageB

?GarageB

MyTimer

?MyTimer

Mux

MuxMux

GarageA

?GarageA

MyTimer

?MyTimer

?Mux

ifnotfault

?ifnotfault

Mux

?Mux

ifnotfault

?ifnotfault

?Mux

ifnotfault

?ifnotfault

?Mux

ifnotfault

?ifnotfault

ifgt

?ifgt

GoldInsur

?GoldInsur

ifle

?ifle

InsurPlus

?InsurPlus

Min

?Min

InsurAll

?InsurAll

!

AllCreditPlus

?AllCreditPlus

Min

?Min

AllCredit

?AllCredit

!

Figure 6.5 – A labelled event structure collecting all possible executions of CarOnLine, as generated byour tool. The three dangling arcs from the shaded places are followed by copies of the boxed net. Theaim of the figure is to show the partial order structure. Zooming-in the electronic version reveals thedetailed labels of the transitions, as generated from the detailed Orc specification.


Figure 6.5 shows a diagram of the event structure corresponding to the CarOnLine pro-gram written in Orc. The event structure is generated by our tool and it collects all thepossible executions of CarOnLine, taking into account timers and other interactions betweendata and control. Each execution has the form of a partial order and can be analyzedto derive appropriate QoS parameter composition, for each occurring pattern. Each sitecall to a service M is translated into three events, the call (M), the call return (?M) andthe publish action (!), which adds to the length of the structure. For more informationregarding these event structures, the reader is referred to [BRBH09].

Drawing at random, samples of QoS parameters for the called sites This isperformed by the “Time Stamper” module. To perform Monte-Carlo simulations using theTrace Reconstructor, we need to feed it with actual values for the QoS parameters. Forthe called sites, these values should be representative of the contracts established betweenthem and the orchestration. This is achieved by drawing such parameters at random fromthe probability distribution specified in each contract.

If no contract is available with a given site, the needed probability distribution mayalternatively be estimated from measurements. For example, calling the considered site acertain number of times and recording the response times provides an empirical distributionthat can be re-sampled by simple bootstrapping techniques [DH97]. The Time Stampermodule supports both techniques: sampling from contract’s probability distribution orbootstrapping measured values.

Exploiting results from Monte-Carlo simulations to set contracts for the orches-tration This is performed by the “SLA Design Unit”, which is mainly a GUI module thatdisplays simulation logs and histograms or empirical distributions of the QoS parametersand allows selecting appropriate quantiles.

6.3.3 Discussion on criticality

At a first sight, not all sites in an orchestration have an equal impact on the QoS of theorchestration. Some sites may be critical, in that a slight degradation/improvement in theirperformance will directly result in a degradation/improvement in the performance of theoverall orchestration. Other sites may not be critical, a degradation in their performancewould not affect the performance of the orchestration very much.

To address this in the context of classical timing performance studies, e.g., for schedul-ing purposes, the notion of critical path was proposed. However, this notion must berevisited under our probabilistic approach.

For instance, consider the example of Figure 6.4, we have t6 = t1 +max(δS1 , δS2)+δjoin,so it seems that only the “slowest” among the two sites S1 and S2 matters. This is awrong intuition, however. Assume that the two sites S1 and S2 behave independentlyfrom the probabilistic point of view. Setting δ = max(δS1 , δS2), Fi(x) = P(δSi ≤ x), andF (x) = P(δ ≤ x), we have F (x) = F1(x) × F2(x). Next, suppose that the two sites S1

and S2 possess unbounded response times. Thus, for any x > 0 we have 0 < Fi(x) < 1 fori = 1, 2. In this case, since F (x) = F1(x) × F2(x), any change in F1 or F2 will result in achange in F . Thus, both sites S1 and S2 are equally critical, even if, say, F1(x) > F2(x) forevery x, meaning that there are good chances that S1 will respond faster. Of course, if F1

and F2 possess disjoint supports, meaning that there exists some separating value xo suchthat F2(xo) = 0 but F1(xo) = 1, then we know that δS1 < δS2 will hold with probability 1,so that S1 is never on the critical path.

6.4 Experimental Results for Contract Composition: opportunities for overbooking 123

This discussion justifies that all sub-contractors are individually monitored for possiblecontract violation, as they all have impact on the overall orchestration QoS in general—seesection 6.5 regarding monitoring.

6.4 Experimental Results for Contract Composition: oppor-tunities for overbooking

In this section we report the results obtained on the composition of contracts, from thesimulations of the TOrQuE tool. The results show possibilities for overbooking and validateour approach of using probabilistic contracts.

In orchestrations, exceptions and their handling are frequently part of the orchestrationspecification itself. In addition, collecting measurement data from existing web servicesregarding this type of parameter is difficult (actually, in our experiments, no exceptionswere observed). For these two reasons, we did not include exceptions in our simulationstudy.

6.4.1 Approach

Probabilistic contracts for the sites

The sites in the CarOnLine example were not implemented as real services over the In-ternet. In order to assign realistic delay behavior to these sites during the simulations,we associated their behavior to that of actual web services over the Internet. For this, wemeasured response times of calls to these actual web services. The response time recordedwere used in a bootstrap mode and also to fit distributions which would be sampled duringsimulations.

We considered six different web services for this purpose [XMe]: StockQuote whichreturns stock prices for a queried enterprise, USWeather which gives the weather forecastof a queried city for a week from the day of the call, CongressMember which returns thelist of the members of the US Congress, Bushism which returns a random quote of GeorgeW. Bush, Caribbean which returns information related to tourism in the Caribbean, andXMethods which queries a database of existing web services over the web. We made 20,000calls to each of these six web services and measured their response times. The calls weremade in sequence, a new call being made as soon as the previous call responded. We couldroughly categorize these services into three categories based on their response times:

• Fast: The service Caribbean with response times in the range 60-100 ms or theCongressMember service with response times between 300-500 ms.

• Slow : Service StockQuote which responded typically between 2 and 8 seconds.

• Moderate: The services like USWeather, XMethods and Bushism, with response timesin the 800-2000 ms range.

Fitting distributions on measured data

To validate the use of certain families of distributions, we performed their best fit on themeasured data. When applied to the measured response times of the six different webservices, we observed that T location-scale distributions served as good approximationsin most cases. Moreover, Gamma and the Log-Logistic distributions [LR05] were also


Figure 6.6 – Fitting of a T Location-scale distribution on the plot of 20,000 measured delays of theservice USWeather.

reasonably good fits for the response times. Figure 6.6 shows the results of the fit of a TLocation-Scale distribution on the response times of the service USWeather.

While the quality of fit is reasonably good, this point is anyway not central in our study.We only see the use of certain families of distributions as an alternative to bootstraptechniques, when measurements are not available. In general, however, we prefer usingbootstrapping techniques.

Orchestration Engine Overhead

The events of an orchestration could be seen as one of these two types : 1) the servicecall events which are calls to a external sites. 2) the events internal to the orchestration,implementing the processing and coordination actions of the orchestration. Depending onthe relative cost (in terms of execution time) of these events the following scenarios can beconsidered:

• Zero delay: The delay due to the internal events is zero (or negligible) when comparedto that of the site calls. The overall delay of the orchestration would depend solelyon the response times of the services it calls in this case.

• Non-zero delay: The delays of the internal events in this case are non zero, comparableto the delays of site calls.

Since the performance of our prototype can not be regarded as representative of that of areal orchestration engine, we considered only the first scenario.

6.4.2 Simulation results

All the measurements and simulations were performed on a 2 GHz Pentium dual coreprocessor with 2 Gb RAM. We consider two cases of simulations, depending on the timeoutvalue T for the calls to the garages (see site Timer(T ) in Table 6.1 ) : 1) No timeout(equally, T is infinite) 2) T is a finite value, which is lesser than the maximum responsetime of a garage.


Case 1: No timeouts

Based on the way delays of site calls are generated, we performed two types of simulations:those in which delays generation is done by 1) bootstrapping measured values, 2) samplinga T location-scale distribution, previously fit to measured data.

Bootstrap based Simulations In these simulations, we associated each service in theCarOnLine example with delay behaviors of one of the six web services mentioned previ-ously. The associations are shown in Table 6.2 and the cumulative distribution functionsof the observed response times for each of the called services are shown in Figure 6.7.During any run of CarOnLine, the response time of a call is picked uniformly from theset of 20,000 delay values of its associated site. Since the response times of these serviceswere measured from the client’s side, they include the network’s delay too. So we do notconsider the explicit delays modeled by the sites NetGA, NetGB, NetC and NetCP , andgive them zero delay each (if the contracts modeled only the performance from the server’sperspective, without accounting for the network, we could give delays to each of these sitesaccording pings done to the web services).

Site Service

GarageA USWeatherGarageB BushismAllCredit XMethods

AllCreditPlus StockQuoteGoldInsure CaribbeanInsureAll CongressMembersInsurePlus CongressMembers

Table 6.2 – Response time associations for sites in CarOnLine

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 2000 4000 6000 8000 10000 12000 14000

CD

F

delay (msec)

GarageAGarageBAllCredit

AllCreditPlusGoldInsure

InsureAllInsurePlus

Figure 6.7 – Cumulative distribution function for the measured delays of the six web services.


Results using hard contracts

Consider the following “hard contract” policy—which is close to current state of practice.Contracts have the form of a certain quantile, e.g.: “the response time shall not exceed xms in y% of the cases.”

More precisely, let contracts of the orchestration with a site be of the form

P(δi ≤ Ki) ≥ pi (6.2)

where i = 1, ..., m ranges over the sites involved in the orchestration, δi is the responsetime of site i, Ki is the promised bound of site i, and pi is the corresponding probability(so that δi ≤ Ki holds in y% of the cases, where y = 100 × pi). Assuming the called sitesto be probabilistically independent, what the orchestration can guarantee to its clients is

P(δ ≤ K) ≥m∏

i=i

pi (6.3)

where δ is the response time of the orchestration and K is the max-plus combination theKi’s, according to the orchestration’s partial ordering of call events.

By setting the delay contracts (maximum delay values) of each of the sites involvedin CarOnLine to their 99.2% quantile values, we get the end-to-end orchestration delaybound to be K = 44, 243 ms, which can be guaranteed for 94.53% of the cases.

Results using probabilistic soft contracts

We now compare the above results with our approach using probabilistic contracts. To thisend, we performed 100,000 runs of the orchestration in the bootstrap mode. The empiricaldistribution of end-to-end delays of the orchestration is shown in Figure 6.8. The minimumdelay observed in this case is 1,511 ms and the maximum is 369,559 ms. The 94.53% delayquantile of this distribution is 23,189 ms, to be compared with the more pessimistic value44,243 of ms that we obtained using the usual approach.

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 5000 10000 15000 20000 25000

CD

F

delay (msec)



InsureAllInsurePlusCarOnLine

Figure 6.8 – Empirical distribution of end-to-end orchestration delays for 100000 simulations in the bootstrappingcase.


T Location-Scale Sampling based Simulations In this mode of simulation, T location-scale distributions are sampled to generate delay values for site calls. The delay values ofthe six web services were fitted with a T Location-scale distribution, giving the estimatedµ, σ and ν parameters of the distribution. The pdf for this distribution is:

p(x) =Γ(ν+1

2 )

σ√

νπΓ(ν2 )

[ν +

(x−µ2

)2

ν

]−( ν+12 )

The association of sites of CarOnLine and the Web services remains unchanged, as givenin Table 6.2. The parameter ν for the fitted T Location-Scale distribution for each of thesites is given in Table 6.3.

Site Param ν Site Param ν

GarageA 2.678 GoldInsure 1.338GarageB 0.835 InsureAll 0.835AllCredit 0.297 InsurePlus 0.835

AllCreditPlus 2.258

Table 6.3 – Parameter ν of the fitted T location distributions

Results using hard contracts

On setting the delay contracts of each of the sites to their 99.2% quantile values, we get theend-to-end orchestration delay bound to be K = 1469, 539 ms, which can be guaranteedfor 94.53% of the cases.

Results using probabilistic soft contracts

As before, we assume zero delay for all the internal orchestration actions and perform100,000 runs of the orchestration in this configuration. End to end orchestration delaysfrom the simulations were recorded. In this case, the 94.53% quantile is found to be 14,658ms.

The results are summarized in Table 6.4.

ModeSoft contract

94.53% quantileHard contract

94.53% quantileBootStrap 23,189 44,243

T Location Dist 14,658 1,469,539

Table 6.4 – No Timeout case: Comparison of Delay Quantiles

The time taken for the 100,000 simulations in the bootstrap mode was 37.74 sec andin the T Location-sampling mode was 42.13 sec.

Case 2: Finite Timeouts

Using hard contracts in orchestrations having timeouts raises difficulties. As an illustration,consider again Figure 6.4. Let K1 and K2 be the two hard bounds (in ms) for responsetimes in the contracts of sites S1 and S2, respectively. Assume that timers are used to


guard the two site calls, with timeout occurring at λ ms. Then, clearly, the contract thatresults for this orchestration entirely depends on the relative position of λ, K1, and K2.If λ > Ki for i = 1, 2, then a timeout is supposed to never occur (unless one of the sitecontracts is violated). On the other hand, if λ < Ki for i = 1, 2 then, even if the sitesrespect their contracts, this may at times be seen by the orchestration as a timeout. Clearly,using timers in combination with hard contracts makes little sense.

In contrast, probabilistic soft contracts allow using timers with no contradiction. Thereason is that Monte-Carlo simulations have no problem simulating timers and their ef-fect on the distribution of the orchestration response time. As a consequence, we onlypresent the results from our simulations without a comparison to the hard contract basedcomposition.

We again perform simulations in two modes: Bootstrap and T Location-scale basedsimulations.

Bootstrap based Simulations As before, we associated each service in the CarOnLineexample with delay behaviors of one of the six web services measured. The associations arethe same as before, given in Table 6.2. We now have timeouts for the calls to sites GarageAand GarageB. The 99.2% delay quantiles for these two sites are 3,304 msec and 4,183 msecrespectively. We perform simulations with different timeout values: 3,000, 4,000 and 5,000msec. The results are given in Table 6.5.

T-Location Scale Sampling based Simulations We maintain the associations ofTable 6.2 and perform simulations by sampling the fitted T Location-scale distributions.The results of these simulations summarized in Table 6.5. The average time for 100,000

ModeSoft contract

94.53% quantileTimeout Value T

BootStrap 23,040 3,000BootStrap 22,681 4,000BootStrap 22,834 5,000

T Location Dist 13,258 3,000T Location Dist 13,364 4,000T Location Dist 13,582 5,000

Table 6.5 – Finite Timeout case: Delay Quantiles

simulations in the bootstrap mode was 34.29 sec and in the T Location-sampling modewas 43.75 sec.

6.5 Monitoring

In this section we describe our technique for monitoring soft contracts. We show howmonitoring is done for any contracted service when it is called by the orchestration.

We want to compare the observed performance of a service S to that promised in itssoft contract FS . Recall that the soft contract FS is a distribution on the response timesof S: FS(x) = P(δS ≤ x). We denote by GS the actual distribution function of S. We saythat contract FS is met if

∀x, GS(x) ≥ FS(x) (6.4)

6.5 Monitoring 129

holds. Condition (6.4) expresses that the response time of S is stochastically smaller thanthe promise FS [KKO77]. Now, we want to perform on-line monitoring, meaning that wewant to detect as soon as possible if S starts breaching its contract. To this end, denoteby GS,t the actual distribution function of site S at time t. We want to detect as quicklyas possible when condition (6.4) gets violated by S, that is, to set a red light at the firsttime t when the following condition occurs:

supx

(FS(x) − GS,t(x)) > 0, (6.5)

which is the negation of condition (6.4).Unfortunately, the orchestrator does not know GS,t; it only can estimate it by observing

S. To this end, let ∆t be a finite set of sample response times of S, collected up to time t,we call it a population. For a while, we remove subscript t for notational convenience. ForX a set, let |X| denote its cardinality. Then

GS,∆(x) =def| δ | δ ∈ ∆ and δ ≤ x|

|∆| (6.6)

is the empirical distribution function, defined as the proportion of sample response timesless than x among population ∆. Then, as a first sight, the contract is violated when

supx∈R+

(FS(x) − GS,∆(x)) (6.7)

occurs. The problem with equation (6.7) is that GS,∆(x) can randomly fluctuate aroundFS(x), especially when |∆| is small. A solution to this problem is to have a tolerance zonefor such deviations.

Our on-line monitoring procedure is then as follows. Decide that site S violated itscontract at the first time t (if any) when

supx∈R+

(FS(x) − GS,∆t(x)) ≥ λ (6.8)

occurs, where λ is a small positive parameter which defines the tolerance zone. Reducing λimproves the chances of detecting contract violation earlier (it reduces the detection delay),but it also increases the risk of a false alarm (it increases the false alarm rate), see [BN93].Thus, tolerance parameter λ has to be tuned in a meaningful way. This is done in anoff-line “calibration phase”, performed prior to the monitoring.

Calibration Phase

As explained in Section 6.3, during contract composition, sample response times are drawnfrom the contract distribution FS(x) for each service S involved in the orchestration. Sup-pose the total number of samples drawn for a given service S is M , i.e. the set of sampleddelay values for S during the simulation is ∆ = δ1, . . . δM. In the calibration phase, weapply the following bootstrapping method [DH97]:

1. Generate ∆∗ by re-sampling ∆ at random. This means that ∆∗ is a randomly selectedsubset of ∆, of fixed size |∆∗| = N . According to bootstrapping discipline, N shouldbe smaller than log(M). Using ∆∗, we can produce a bootstrap estimate GS,∆∗(x)of FS(x) using equation (6.6). Denote by Ω be the set of such randomly generated∆∗ ⊆ ∆. In our experiments, we have chosen its cardinality |Ω| to be about 10, 000.


2. A false alarm level L (e.g., 5%) during monitoring is agreed between the orchestratorand the service S. Taking GS,∆∗(x) as a population, where ∆∗ ranges over Ω, thetolerance parameter λ is tuned to the smallest value such that

supx∈X

(FS(x) − GS,∆∗(x)) ≤ λ

holds for 100 − L percent (e.g., 95%) of the ∆∗ ∈ Ω.

In fact, it is a result due to Kolmogorov [LR05], sect. 14.2, that, for N large enough, theso obtained value for the tolerance zone λ does not depend on the distribution FS . Yet,to avoid dealing with size issues of N , we prefer calibrating tolerance parameters for eachsite individually. But, clearly, there is room for saving computations at this step.

Monitoring Phase

Once the tolerance parameter λ is set, monitoring can be done in the following way: supposethe first N responses of service S have latencies δ1, . . . δN. Taking ∆ = δ1 . . . δN, wecompute GS,∆(x) and then check if condition (6.8) is violated. When the (N + 1)st delay,δN+1 is recorded, we shift ∆ by one observation, making it δ2, . . . δN+1. We computeGS,∆(x) for this new ∆ and check violation of (6.8) again. This process is repeated forfurther observed response times, each time shifting ∆ by one observation.2 So ∆ is a slidingwindow of fixed size N . The window size N is the same as the size |∆∗| in the calibrationphase.

Window length N appears as an additional design parameter for the monitoring pro-cedure. N can be entirely decided by the orchestrator and need not be a part of thecontract. The rationale for tuning N is as follows: Observe that N is strongly correlatedwith the detection delay in case of a contract violation. On the one hand, the proportionof breaching data must be large enough in the window ∆ in order for condition (6.8) toget violated. Thus, reducing N contributes to the reduction of detection delay. On theother hand, reducing N increases random fluctuations of GS,∆∗(x) when ∆∗ ranges overΩ, thus resulting in the need for increasing tolerance parameter λ to maintain the agreedfalse alarm rate, which in turn increases the delay for detecting violation. This results in atradeoff leading to an optimal choice for N . Anyway, this need not be part of the agreedcontract.

6.6 Experimental Results: Monitoring

We now describe the implementation of our monitoring technique and the results obtained.We first discuss the kind of soft contracts we use in the simulations. After this, we presentresults on the monitoring on contracts, as explained in section 6.5.

6.6.1 Contract of the orchestration

We take the contract of a service S, FS to be a probability distribution of the response time.Expecting a service provider to able to give a precise probability for every possible valueof latency is however impractical. So, we take the contract with provider S to be a set ofquantiles of latencies x1 . . . xk with the corresponding probabilities FS(x1) . . . FS(xk).Hard contracts are just a special case of our soft contracts, in which only one such quantile

2Actually, we do not need to shift the window by 1; any fixed amount can be used instead providedthat successive windows overlap.

6.6 Experimental Results: Monitoring 131

exists. We thus requires the provider to pass from promising a performance probability ofone quantile to multiple quantiles.

During simulation, two possibilities may be considered when using FS = FS(x1) . . . FS(xk)for sampling response times:

• Use FS as it is, by sampling each time one of the quantiles x1 . . . xk, in proportionwith FS . This would lead to over-pessimistic distributions, however.

• Hypothesize a constant probability density within each quantile, except for the lastone where exponential distribution is hypothesized. From our experiments regardingweb services response times, we preferred this second approach.

While monitoring, we check for violation of condition (6.8) only for the set of quantilesthat have been promised by the service S in its contract FS . The set of positive reals R+

in equation (6.8) is thus replaced by the set X = x1 . . . xk of latency quantiles promisedin the contract.

6.6.2 Results

We ran CarOnline orchestration and monitored the single service GarageA in isolation,according to section 6.5. We only show the monitoring of one service, since the process ofmonitoring is identical for any other service of the orchestration. There was no particularreason for choosing to monitor GarageA, we could have done the same with any otherservice of CarOnline. The delay behaviors associated with each of CarOnLine remains thesame as in section 6.4.2, given by Table 6.2. The contract of GarageA, the finite set ofquantiles and their corresponding probabilities, is given in the first and second column ofTable 6.6, respectively. These values were derived from the measured response times of theUSWeather service. The false alarm rate agreed with the orchestrator is 95%.

Contract Delay CDF Experimental DelayQuantile (msec) Quantile (msec)

1149 0.1 11991229 0.2 12791310 0.3 13601462 0.5 15201645 0.7 17451905 0.85 20052312 0.95 2412

Table 6.6 – Contract and experimental distributions of GarageA.

As mentioned in the end of section 6.5, we need to find a good value for the windowlength N for the calibration and the monitoring phase (it directly affects the detectiondelay). For this, we ran the calibration and monitoring on GarageA for three differentwindow lengths: 10, 30 and 50. The violations were detected after 10 to 25 calls (with lotsof variations) when N = 10, 20 to 30 calls when N = 30 and between 40 to 80 calls whenN = 50. N = 30 was preferred to N = 10 because less variations were observed in thedetection delay, and is clearly preferred over N = 50 where the detection delay was toolarge.

With N = 30, the calibration phase (6.5) on this distribution of GarageA gave thetolerance parameter λ equal to 0.167. After the calibration phase, the CarOnline orches-tration was run 1000 times as follows: From run 1 to 700, GarageA’s actual performance


was exactly that as the promised distribution. From run 700 to 1000, we slightly deteri-orated GarageA’s performance to follow a “slower” distribution. The delay quantiles andtheir corresponding probabilities of this slower distribution is given in the third and secondcolumn of Table 6.6, respectively.

The result of the monitoring is shown in Figure 6.9. The value of supx∈X(FS(x) −GS,∆∗(x)) is plotted for each call made to GarageA. The horizontal line shows the value ofλ, 0.167. The detection occurs around the 747th run, i.e. around 47 calls later.

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 100 200 300 400 500 600 700 800 900 1000

Dev

iatio

n fr

om C

ontr

act

Simulation Number

Figure 6.9 – Monitoring of GarageA. The plot shows the deviation from its contract for each run of the

simulation. This deviation is supx∈X(FS(x) − bGS,∆∗(x)).

The test statistics used in Figure 6.9 for behaves in a quite noisy way. This suggests thatthe ratio false alarm rate versus detection delay may not be optimal. Monitoring procedure(6.8) could be improved in many respects, however, using the huge background of sequentialand non-parametric statistics [BN93]. First, empirical estimate (6.6) for the distributionfunction GS of S could be improved by using (possibly adaptive) kernel estimators. Second,instead of relying on an estimate based on a sliding window, truly sequential estimates couldbe used. We have, however, decided to keep basic in the techniques we used from statistics,for two reasons: they are easily understandable by non specialists, and they are robust andeasy to tune.

6.7 Related Work

Proposals for such QoS-based compositions are few and no well-accepted standard existsto date. Menascé [Men02] discusses QoS issues in Web services, introducing the responsetimes, availability, security and throughput as QoS parameters. He also talks about theneed of having SLAs and monitoring them for violations. He does not however, advocatea specific model to capture the QoS behaviour of a service, or a composition approach tocompose SLAs.

Agarwal et. al [AVMM04] view QoS based composition as an optimization problem inthe METEOR-S project. Services have selection criteria which are constraints, for whichan optimal solution is found using integer linear programming. Cardoso et al. in [CSM+04]aim to derive QoS parameters for a workflow, given the QoS parameters of its componenttasks. Using a graph reduction technique, they repeatedly re-write the workflow, mergingdifferent component tasks and also their QoS attributes according to different rules.

6.7 Related Work 133

Zeng et al. [ZBN+04] use Statecharts to model composite services. An orchestrationis taken to be a finite execution path. For each task of the orchestration, a service isselected from a pool of candidate services, using linear programming techniques such thatit optimizes a specific global QoS criteria. In [NKP06], the authors propose using fuzzydistributed constraint satisfaction programming (CSP) techniques for finding the optimalcomposite service.

Canfora et. al [CPEV05] use Genetic Algorithms for deriving optimal QoS composi-tions. They use techniques similar to [CSM+04] for modeling QoS of services. Comparedto the linear programming method of Cardoso et. al [CSM+04], the genetic algorithm istypically slower on small to moderate size applications, but is more scalable, outperforminglinear programming techniques when the number of candidate services increase.

A distinguishing feature of our proposal from the above composition techniques isthat we do not consider the QoS parameters of a service to be fixed, hard bound values.We believe that in reality, these parameters exhibit significant variations in their valuesand are better modeled by a probability distribution. This alternative approach has twoadvantages. First, it reduces pessimism in contract composition, as we shall see. And,second, it allows for “soft” monitoring of contract breaching (have a delayed response onceupon a time should not be seen as a breaching).

In [CMS+03] the authors use WSFL (Web Service Flow Language) - a language pro-posed by IBM to model web service compositions - and enhance it with the capability tospecify QoS attributes. These are then translated into a simulation model in Java (JSim)which can then be simulated for performance analysis. The fundamental difference fromour approach is that the approach assumes a "non-open world" scenario, assuming thatthe services of the orchestration can be instrumented with measurement code to get infor-mation about its performance. This information then seems to be used in queuing basedmodels, to generate queuing and service execution time during simulations. The authorshowever, do not give any detailed information about the models and the associated param-eters used in the simulations. This approach also requires the orchestrator to be able tocontrol all the load on the external service too, which is often an unrealistic approach.

Web service Performance Analysis Center (sPAC) [SL05], is another similar approachfor performance evaluation of services and their compositions. The authors use UMLdiagrams to model a service composition which is then translated into a simulation modelin Java (SimJava). sPAC also generates code to call the services in the composition undera light load, to record the performance of the services. The performance statistics collectedare then used in the simulation mode to model the performance of the services. However,as in [CMS+03], sPAC also assumes that the services whose performance it evaluates canbe instrumented to collect the performance statistics for use in simulations.

The notion of probabilistic QoS has been introduced and developed in [HWTS07] withthe ambition to compute an exact formula for the composed QoS, which is only possiblefor restricted forms of orchestrations without any data dependency. We propose usingsimulation techniques to analyze the QoS of a composite service, this allows us to use non-trivial distributions as models for performance and also permits analysis of orchestrationswhose control flow have data and time related dependencies.

Most of the work in QoS monitoring is dedicated to the design of service monitoringarchitectures [ZLC07]. Service monitoring needs to be integrated in the infrastructure atlarge in order to enable detection and routing of the service operational events. We haveproposed a framework for probabilistic contracts and shown how they can be composed.For run-time monitoring, this leads directly to the use of statistical testing techniques todetect violation of QoS contracts. Such techniques have already been used in [BASE07] to


adapt SLA checkers to the variation of the environment, but in a context of deterministiccontracts.

6.8 Conclusion

We have studied soft probabilistic contracts, their composition, and their monitoring, forweb services orchestrations. Probabilistic soft contracts have a number of advantages: theycompose easily, as shown by our Monte-Carlo based dimensioning tool TOrQuE ; they pro-vide opportunity for well sound overbooking, thus avoiding pessimistic contracts; they allowhandling timers as part of the orchestration, a frequent and desirable practice. We stressthat our TOrQuE tool can indeed be used for the dimensioning of realistic orchestrations,as the cost of running Monte-Carlo simulation for design space exploration is acceptable.We have also proposed a statistical approach to design monitors for services promisingsoft contracts for monotonic orchestrations. Our method requires prior calibration of thedetection threshold, in order to achieve an agreed false alarm rate.

We plan to extend our work in two directions. The first direction is the real deploymentof the method on the Web, based on the Orc run-time environment. More precisely, weare currently working on QoS-enabled extensions of this language.

The second direction is to generalize what we have done on response times to otherQoS parameters, addressing the fact that different QoS parameters are often correlated.Indeed, we believe that a large part of the techniques we have developed generalize toother QoS parameters (e.g., availability, reliability, security, and possibly quality of data).In particular, our abstract representation of runs of orchestrations as partial orders ofevents allows us to combine performance quanta in a flexible way.

Chapter 7

Monotonicity in Service

Orchestrations

Anne BouillardEcole Normale Supérieure de Cachan,Campus de KerLann, Bruz. France.


Abstract

Web Service orchestrations are compositions of different Web Services to form a new ser-vice. The services called during the orchestration guarantee a given Quality of Service(QoS) to the orchestrator, usually in the form of contracts. These contracts can then beused by the orchestrator to deduce the contract it can offer to its own clients, by perform-ing contract composition. An implicit monotonicity assumption in contract based QoSmanagement is: “the better the component services perform, the better the orchestration’sperformance will be”.

In some orchestrations, however, monotonicity can be violated, i.e., the performance ofthe orchestration improves when the performance of a component service degrades. Thisis highly undesirable since it can render the process of contract composition inconsistent.

In this paper we formally define monotonicity for orchestrations modelled by ColoredOccurrence Nets (CO-nets) and we characterize the classes of monotonic orchestrations.Contracts can be formulated as hard, possibly nondeterministic, guarantees, or alterna-tively as probabilistic guarantees. Our work covers both cases. We show that few orches-trations are indeed monotonic, mostly because of complex interactions between control,data, and timing. We also provide user guidelines to get rid of non-monotonicity whendesigning orchestrations.

136 Monotonicity in Service Orchestrations

7.1 Introduction

Web Services and their compositions are being widely used to build distributed applicationsover the web. Web Service orchestrations are compositions of Web Services to form an ag-gregate, and usually more complex, Web Service. Different formalisms have been proposedfor orchestrating Web Services, the most popular amongst these is the Business ProcessExecution Language (BPEL) [Bpe07]. Another such formalism is Orc [MC07], a small andelegant language equipped with extensive semantics work [KCM06, RKB+07b, WKCM08].Various other models have been used either to directly model orchestrations, or as asemantic domain for some formalisms; see for example the Petri Nets based WorkFlowNets [vdA97].

Though the main focus of the existing models is to capture the functional aspects ofservice and their compositions, the non-functional - also called Quality of Service (QoS)- aspects also need to be considered. The QoS of a service is characterised by differentmetrics - called QoS parameters - , e.g., latency, availability, throughput, security, etc.QoS management is usually based on the notion of a Service Level Agreement (SLA) orcontract, which specifies constraints on the QoS parameters of the service. A typical servicecontract could be : for 95% of the requests, the response time will be less than 5ms. TheWSLA Standard [KL03] is one such proposition for specifying QoS through SLAs.

In service orchestrations, contracts are agreements made between the orchestrator andthe different services called by the orchestrator (also called sub-contractors) which formalisethe duties and responsibilities for each of them. The orchestrator can then compose all thecontracts with its sub-contractors, to help it propose a contract to its own clients. Thisprocess is called contract composition. In [RBHJ07] we introduced the notion of probabilisticcontracts to formalise the QoS behaviour of services — the work of [RBHJ07] focusedon latency. We showed how these contracts can be composed to get the orchestration’scontract. We also showed that there is room for overbooking the orchestrator’s resources.

Contract based QoS management in orchestrations relies on the implicit assumptionthat if each of the sub-contractor meets its contract’s objectives, then so does the orches-trator. Vice-versa, a sub-contractor breaching its contract can cause the orchestrator tobreach the contract with its clients. Thus the whole philosophy behind contracts is thatthe better the sub-contractors behave, the better the overall orchestration will meet itscontract. In fact, the authors themselves have developed their past work [RBHJ07] basedon this credo . . . until they discovered that this implicit assumption could easily be falsified.Why so?

S

N T

M

Figure 7.1 – A non-monotonic orchestration

As an example, consider the orchestration modeled by the Petri net in Figure 7.1.Services M and N are first called in parallel. If M responds first, service S is next calledand the response of N is ignored. If N responds first, T is called and not S. Let δi denotethe response time of site i. Assume the following delay behaviour: δM < δN and δS ≫ δT .Since M responds faster, the end-to-end orchestration delay is d0 = δM + δS . Now letservice M behaves slightly ’badly’, i.e., delay δM increases and becomes slightly greater

7.2 Examples for Non-monotonic Orchestrations 137

than δN . Now service T is called and the new orchestration delay is d1 = δN + δT . Butsince δS ≫ δT , d1 is in fact lower than d0. This orchestration is non-monotonic sinceincreasing the latency of one of its components can decrease the end-to-end latency of theorchestration. So, what is the nature of the difficulty?

“Simple” composed Web services are such that QoS aspects do not interfere with func-tional aspects and do not interfere with each other. Their flow of control is typically rigidand does not involve if-then-else branches. For such cases, latencies will compose gentlyand will not cause pathologies as shown above. However, as evidenced by the rich con-structions offered by BPEL, orchestrations and choreographies can have branching basedon data and QoS values, various kinds of exceptions, and timers. With such flexibility,non-monotonicity such as that exhibited by the example of Figure 7.1 can very easilyoccur.

Lack of monotonicity impairs using contracts for the compositional management of QoS.Surprisingly enough, this fact does not seem to have been noticed in the literature.

In this paper we classify orchestrations based on their monotonic characteristics. Wefocus on latency, although other aspects of QoS are discussed as well. Section 7.2 informallyintroduces the notion of monotonicity with examples. In Section 7.3 we recall the definitionof Petri nets and introduce our model, OrchNet. A formal definition of monotonicityand a characterisation of monotonic orchestrations is then given in Section 7.4. Section7.5 extends the notion of monotonicity to nets whose transitions’ delays are probabilitydistributions. Section 7.6 gives a few ideas to avoid the problem of non-monotonicity andSection 7.7 concludes. Proofs of non-trivial results are deferred to the appendix.

7.2 Examples for Non-monotonic Orchestrations

In this section we look at sample orchestrations and illustrate the concept of (non) mono-tonicity using them.

The Travel Planner orchestration: The orchestration to the left in Figure 7.2 isinspired by [ZBN+04]. A client calls the Travel Planner orchestration with a city he plansattra tions

HotelAsear h CarRentBikeRentd > ℓ

d ≤ ℓ

attra tionsHotelAsear h

BikeRentFigure 7.2 – The Travel Planner orchestration (left); a simplified version (right)

to visit along with the dates of his visit. The orchestration looks for a hotel in that city(service HotelA) for those dates and parallelly looks for sites of attractions (service SearchAttractions) in the city. Once both these tasks are completed, it calculates the maximaldistance ’d’ between the hotel found and the attraction sites. If this distance is less thana certain threshold ℓ, a bike rental service is called to get quotes for a rental bike. Ifdistance d exceeds ℓ, then Car Rent is called to get quotes for a rental car instead. Theorchestration to the right in Figure 7.2 is a simplified version of travel planner, in which itis assumed that all returns from HotelA are closer than ℓ to the attraction site.

This Travel Planner orchestration is monotonic: Increasing (or decreasing) the responsetime of any of its component services does result in a corresponding increase (or decrease) in


the end to end latency. Monotonicity holds in this case because increasing (or decreasing)the response time of the services called first does not affect the value returned by theseservices.

The Travel Planner orchestration – A Modified Version The presence of timeoutsand data dependant choices in orchestrations can however complicate things. Figure 7.3(left) is a modified version of the Travel planner example where quotes for hotels areobtained from two services, HotelA and HotelB. Such an extension is quite natural inorchestrations, where a pool of services with similar functionality are queried with thesame request. The orchestration selects the best response obtained from the pool, orcombines their responses. In this modified Travel Planner example, of the two hotel offersreceived, the cheaper one is taken. Calls to the hotels are guarded by timers: if only onehotel has replied before a timeout, the response of the other is ignored. The rest of theexample is unchanged.

HotelATimerattra tionssear h CarRentBikeRentd > ℓ

d ≤ ℓ

TimerHotelBattra tionssear h

TimerHotelBHotelATimerBikeRentCarRent

Figure 7.3 – The Modified Travel Planner orchestration. By convention, each Timer has priority over theHotelX service it is in conflict with. Left (a), right (b)

Now look at the following scenario: HotelA returns propositions that are usually cheaperthan those of HotelB and so HotelA’s propositions are chosen. Let the distance d in thiscase be greater than ℓ and so service Car Rent is called. If the performance of HotelA nowdegrades such that it doesn’t reply before a timeout, only HotelB ’s response is taken. Saythat the maximum distance d in this case is less than ℓ and so service Bike Rent is called.Now if Car Rent takes a significantly greater time to respond compared to Bike Rent, it ispossible that the overall latency is shorter in the second case. That is, a degradation in theperformance of a service (HotelA here) leads to an improvement in the overall performanceof the orchestration.

A solution to this is to make the choice in the Travel Planner orchestration dependent onthe orchestration’s client. For e.g, if we alter this orchestration such that the client specifiesin the start of the orchestration whether he wants to rent a car or a bike, the choice isresolved by the client. The exact execution path of the orchestration is known at the start,on receiving the client’s request. This execution path is a partial order, which is monotonic.We could then have input-dependent contracts, e.g., promising a certain response time fora given set of input parameters and promising another response behaviour for a differentset of inputs.

The orchestration to the right in Figure 7.3 assumes that HotelA’s propositions are allclose to the attraction sites, whereas those of HotelB are all far from them. The net on

7.3 The Orchestration Model: OrchNets 139

the left can thus be simplified to the guard-free net of the right.The examples in figure 7.3 are non-monotonic due to the presence of choice followed

by paths with different performances. In the sequel, we formally characterize the classesof orchestrations that are monotonic, giving both necessary and sufficient conditions for it.The formal material for this is introduced next.

7.3 The Orchestration Model: OrchNets

In this section we present the high level Petri Nets model for orchestrations that we usefor our studies, which we call OrchNets. OrchNets are a special form of colored occurrencenets (CO-nets).

We have chosen this mathematical model for the following reasons. From the semanticstudies performed for BPEL [OVvdA+07, AFFK05] and Orc [KCM06, RKB+07b], we knowthat we need to support in an elegant and succinct way the following features: concurrency,rich control patterns including preemption, representing data values, and for some caseseven recursion. The first three requirements suggest using colored Petri nets. The lastrequirement suggests considering extensions of Petri nets with dynamicity. However, inour study we will not be interested in the specification of orchestrations, but rather in theirexecutions. Occurrence nets are concurrent models of executions of Petri nets. As such,they encompass orchestrations involving recursion at no additional cost. The executionsof Workflow Nets [vdA97] are also CO-nets.

7.3.1 Background on Petri nets and Occurrence nets

A Petri net is a tuple N = (P, T ,F , M0), where: P is a set of places, T is a set oftransitions such that P ∩ T = ∅, F ⊆ (P ×T )∪ (T ×P) is the flow relation, M0 : P → N

is the initial marking.The elements in P ∪ T are called the nodes of N and will be denoted by variables

for e.g., , x. For a node x ∈ P ∪ T , we call •x = y | (y, x) ∈ F the preset of x, andx• = y | (x, y) ∈ F the postset of x. A marking of the net is a multiset M of places, i.e.,a map from P to N. A transition t is enabled in marking M if ∀p ∈ •t, M(p) > 0. Thisenabled transition can fire resulting in a new marking M − •t + t• denoted by M [t〉M ′.A marking M is reachable if there exists a sequence of transitions t0, t1 . . . tn such thatM0[t0〉M1[t1〉 . . . [tn〉M . A net is safe if for all reachable markings M , M(P) ⊆ 0, 1.

For a net N = (P, T ,F , M0) the causality relation < is the transitive closure of theflow relation F . The reflexive closure of < is denoted by ≤. For a node x ∈ P ∪T , the setof causes of x is ⌊x⌋ = y ∈ P ∪ T | y ≤ x. Two nodes x and y are in conflict - denotedby x#y - if there exist distinct transitions t, t′ ∈ T , such that t ≤ x, t′ ≤ y and •t∩ •t′ 6= ∅.Nodes x and y are said to be concurrent - written as x‖y - if neither (x ≤ y) nor (y ≤ x)nor (x#y). A set of concurrent places P ⊆ P is called a co-set. A cut is a maximal (forset inclusion) co-set.

A configuration of N is a subnet κ of nodes of N such that:

1. κ is causally closed, i.e., , if x < x′ and x′ ∈ κ then x ∈ κ

2. κ is conflict-free, i.e., , for all nodes x, x′ ∈ κ,¬(x#x′)

For convenience, we will assume that the maximal nodes (w.r.t the < relation) in a con-figuration are places.

A safe net N = (P, T ,F , M0) is called an occurrence net (O-net) iff


1. ¬(x#x) for every x ∈ P ∪ T .

2. ≤ is a partial order and ⌊t⌋ is finite for any t ∈ T .

3. For each place p ∈ P, |•p| ≤ 1.

4. M0 = p ∈ P|•p = ∅, i.e., the initial marking is the set of minimal places withrespect to ≤.

Occurrence nets are a good model for representing the possible executions of a concurrentsystem. Unfoldings of a safe Petri net, which collect all the possible executions of the net,are occurrence nets. Unfoldings are defined as follows. For N and N ′ two safe nets, a mapϕ : P ∪ T 7→ P ′ ∪ T ′ is called a morphism of N to N ′ if: 1/ ϕ(P) ⊆ P ′ and ϕ(T ) ⊆ T ′,and 2/ for every t ∈ T and t′ = ϕ(t) ∈ T ′, •t ∪ t ∪ t• is in bijection with •t′ ∪ t′ ∪ t′•

through ϕ. A branching process of a safe net N is a pair (U, ϕ) where U is an occurrencenet and ϕ : U 7→ N is a morphism such that 1/ ϕ establishes a bijection between M0 andthe minimal places of U , and 2/ •t = •t′ and ϕ(t) = ϕ(t′) together imply t = t′. Branchingprocesses are partially ordered (up to isomorphism) by the prefix order and there existsa unique maximal branching process called the unfolding of N and denoted by UN . Theconfigurations of UN capture the executions of N , seen as partial orders of events. For aconfiguration κ of an occurrence net N , the future of κ in N , denoted by Nκ is a sub-netof N with the nodes:

Nκ = x ∈ N \ κ | ∀x′ ∈ κ,¬(x#x′) ∪ max(κ)

where max(κ) is the set of maximal nodes of κ (which are all places by our restriction onconfigurations).

7.3.2 Orchestration model: OrchNets

We now present the orchestration model that we use for our studies, which we call OrchNets.OrchNets are occurrence nets in which

tokens are equipped with a special attribute, referredto as a color, and consisting of a pair (value, date).

(7.1)

Figure 7.4 shows an OrchNet with its dates. Each place is labeled with a date which is

m n

t

E =

8<:

ase d < d′ then d + τs ase d > d′ then d′ + τtotherwise nondeterministi .d1

d2

τn

τ = 0

τtτs

τ = 0

τm

d0 + τm d1 + τn

d′ = maxd2, d1 + τn

s

d0

d = maxd2, d0 + τm

Figure 7.4 – An OrchNet showing the dates of its tokens. The delay of a transition is shown next to it.

the date of the token on reaching that place. Transitions are labeled with latencies. The


tokens in the three minimal places are given initial dates (here, d0, d1, d2). The four namedtransitions m, n, s and t are labeled with latencies τm, τn, τs and τt respectively, and thetwo shaded transitions have zero latency.

The presence of dates in tokens alters the firing semantics. A transition t is enabledat a date when all places in its preset have tokens. and if its guard evaluates to true(absence of a guard is interpreted as the guard true). Once enabled, transition t takesτt additional time to fire. For example, the shaded transition in the left has all its inputtokens at maxd2, d0 + τm and so it fires at maxd2, d0 + τm+0 since it has zero latency.If a transition fires at date d, then the tokens in its postset have the date d. This isshown in the figure, e.g., on the place following the left shaded transition, which has datemaxd2, d0 + τm.

When transitions are in conflict, (e.g., the two shaded transitions in Figure 7.4), thetransition that actually occurs is governed by a race policy [MBC+98, MBB+89]. If aset of enabled transitions are in conflict, the one with smallest date of occurrence willfire, preempting the other transitions in conflict with it. In Figure 7.4, the left or theright shaded transition will fire depending on whether d < d′ or d > d′ respectively, witha nondeterministic choice if d = d′. This results in selecting the left most or right mostcontinuation (firing s or t) accordingly. The resulting overall latency E of the orchestrationis shown at the bottom of the figure.

In addition to dates, tokens in OrchNets can have data attributes, which we call values.We have not shown this in Figure 7.4, in order to keep it simple. Values of tokens in thepreset of a transition t can be combined by a value function φt attached to t. The resultingvalue is taken by the token in the postset of t. At this point we are ready to provide theformal definition of OrchNets :

Definition 7.1 (OrchNet) An OrchNet is a tuple N = (N, Φ, T, Tinit) consisting of

• An occurrence net N with token attributes c = (value, date).

• A family Φ = (φt)t∈T of value functions, whose inputs are the values of the transi-tion’s input tokens.

• A family T = (τt)t∈T of latency functions, whose inputs are the values of the transi-tion’s input tokens.

• A family Tinit = (τp)p∈min(P) of initial date functions for the minimal places of N .

In general, value, latency, and initial date functions can be nondeterministic. We introducea global, invisible, daemon variable ω that resolves this nondeterminism and we denote byΩ its domain. That is, for a given value ω of this daemon, φt(ω), τt(ω), and τp(ω) are alldeterministic functions of their respective inputs.

7.3.3 The semantics of OrchNets

We now explain how the presence of dates attached to tokens affects the semantics ofOrchNets by adopting the so-called race policy. We first describe how a transition t modifiesthe attributes of tokens. Let the preset of t have n places whose tokens have (value, date)attributes (v1, d1) . . . (vn, dn). Then all the tokens in the postset of t have the pair (vt, dt)of value and date, where:

vt = φt(v1 . . . vn)

dt = maxd1 . . . dn + τt(v1 . . . vn) (7.2)


The race policy for firing transitions is as follows. In any given marking M , let T be theset of transitions that are possibly enabled, i.e. ∀t ∈ T , •t is marked in M and the guardof t (if any) is true. Then the transition t that is actually enabled, (which really fires) isgiven by:

t = arg mint∈T

dt,

where: arg minx∈X

f(x) = x∗ ∈ X s.t. ∀x′ ∈ X, f(x∗) ≤ f(x′).

If two possibly enabled transitions have the same dt, then the choice of the transitionthat actually fires is non-deterministic. The race policy has the effect of filtering outconfigurations of OrchNets as explained now. Let N = (N, Φ, T, Tinit) be a finite OrchNet.For a value ω ∈ Ω for the daemon we can calculate the following dates for every transitiont and place p of N :

dp(ω) = τp(ω) if p is minimal, ds(ω) where s = •p otherwisedt(ω) = maxdx(ω) | x ∈ •t + τt(ω)(v1, . . . vn)

(7.3)

where v1, . . . vn are the value components of the tokens in •t as in equation (7.2). If κis a configuration of N , the future N κ is the OrchNet (Nκ, ΦNκ , TNκ , T ′

init) where ΦNκ

and TNκ are the restrictions of Φ and T respectively, to the transitions of Nκ. T ′init is the

family derived from N according to (7.3): for any minimal place p of Nκ, the initialisationfunction is given by τ ′

p(ω) = dp(ω). For a net N with the set of transitions TN , setTmin(N) = t ∈ TN | ••t ∩ TN = ∅. Let min(PN ) denote the minimal places of N . Nowdefine κ0(ω) = min(PN ) and inductively,

for m > 0 : κm(ω) = κm−1(ω) ∪ tm ∪ •tm ∪ tm• (7.4)

where tm = arg mint∈Tmin(Nκm−1(ω))

dt(ω)

Since net N is finite, the above inductive definition terminates in finitely many steps whenNκm(ω) = ∅. Let M(ω) be this number of steps. We thus have

∅ = κ0 ⊂ κ1(ω) · · · ⊂ κM(ω)(ω)

κM(ω)(ω) is a maximal configuration of N that can actually occur according to the racepolicy, for a given ω ∈ Ω; such actually occurring configurations are generically denoted by

κ(N , ω)

For B, a prefix-closed subset of the nodes of N define

Eω(B,N ) = maxdx(ω) | x ∈ B (7.5)

If B is a configuration, then Eω(B,N ) is the time taken for B to execute (latency of B).The latency of the OrchNet N = (N, Φ, T, Tinit) for a given ω is

Eω(N ) = Eω(κ(N , ω),N ) (7.6)

Our design choices for the semantics of OrchNets were inspired by the application domain,i.e. compositions of web services. They reflect the following facts:

• Since we focus on latency, value, date is the only color needed.

7.4 Characterizing monotonicity 143

• Orchestrations rarely involve decisions on actions based on absolute dates. Timeoutsare an exception, but these can be modelled explicitly, without using dates in guardsof transitions. This justifies the fact that guards only have token values as inputs,and not their dates.

• The time needed to perform transitions does not depend on the tuple of dates(d1 . . . dn) when input tokens were created, but it can depend on the data (v1 . . . vn)and computation φ performed on these. This justifies our restriction for output arcexpressions.

If it is still wished that control explicitly depends on dates, then dates must be measuredand can then be stored as part of the value v.

7.4 Characterizing monotonicity

In this article, we are interested in the total time taken to execute a web-service orches-tration. As a consequence, we will consider only orchestrations that terminate in a finitetime, i.e., only a finite number of values can be returned.

7.4.1 Defining and characterizing monotonicity

To formalize monotonicity we must specify how latencies and initial dates can vary. Asan example, we may want to constrain some pair of transitions to have identical latencies.This can be stated by specifying a legal set of families of latency functions. For example,this legal set may accept any family T = (τt)t∈T such that two given transitions t and t′

possess equal latencies: ∀ω ⇒ τt(ω) = τt′(ω). The same technique can be used for initialdates. Thus, the flexibility in setting latencies or initial dates can be formalized under thenotion of pre-OrchNet we introduce next.

Definition 7.2 (pre-OrchNet) Call pre-OrchNet a tuple N = (N, Φ, T, Tinit), where Nand Φ are as before, and T and Tinit are sets of families T of latency functions and offamilies Tinit of initial date functions. Write N ∈ N if N = (N, Φ, T, Tinit) for some T ∈ Tand Tinit ∈ Tinit.

For two families T and T ′ of latency functions, write

T ≥ T ′

to mean that ∀ω ∈ Ω,∀t ∈ T =⇒ τt(ω) ≥ τ ′t(ω), and similarly for Tinit ≥ T ′

init. ForN ,N ′ ∈ N, write

N ≥ N ′ and E(N ) ≥ E(N ′)

to mean that T ≥ T ′ and Tinit ≥ T ′init both hold, and Eω(N ) ≥ Eω(N ′) holds for every ω,

respectively.

Definition 7.3 (monotonicity) pre-OrchNet N = (N, Φ, T, Tinit) is called monotonic if,for any two N ,N ′ ∈ N, such that N ≥ N ′, we have E(N ) ≥ E(N ′).

Theorem 7.4 (a global necessary and sufficient condition)


1. The following implies the monotonicity of pre-OrchNet N = (N, Φ, T, Tinit):

∀N ∈ N,∀ω ∈ Ω,∀κ ∈ V (N) =⇒ Eω(κ,N ) ≥ Eω(κ(N , ω),N ) (7.7)

where V (N) denotes the set of all maximal configurations of net N and κ(N , ω) isthe maximal configuration of N that actually occurs under the daemon value ω.

2. Conversely, assume that:

(a) Condition (7.7) is violated, and

(b) for any two OrchNets N and N ′ s.t. N ∈ N, then N ′ ≥ N ⇒ N ′ ∈ N.

Then N = (N, Φ, T, Tinit) is not monotonic.

Statement 2 expresses that Condition (7.7) is also necessary provided that it is legal toincrease at will latencies or initial dates. Observe that violating Condition (7.7) does not byitself cause non-monotonicity; as a counterexample, consider a case where T is a singletonfor which (7.7) is violated—it is nevertheless monotonic.

The orchestration in the left of Figure 7.2 satisfies Theorem 7.4 trivially, since for anygiven ω, there is only one possible maximal configuration. This is because the value of d isfixed for a ω and only one branch of the two rental services is enabled. The orchestrationin the left of Figure 7.3 does not fulfill Theorem 7.4. Consider an ω for which the actuallyoccurring configuration κ has both the responses of HotelA and HotelB. Say that d > ℓ forκ and Car Rent is called. Now consider another configuration κ′ (under the same ω), gotby replacing HotelA by Timer. In this case, the response of Hotel B is used to calculate d,which may be different from that in configuration κ. This d could be less than ℓ causingBike Rent to be called. In this case, the latencies of Car Rent and Bike Rent can be setsuch that Eω(κ,N ) > Eω(κ′,N ), violating Theorem 7.4.

7.4.2 A structural condition for the monotonicity of workflow nets

Workflow nets [vdA97] were proposed as a simple model for workflows. These are Petrinets, with a special minimal place i and a special maximal place o. We consider the classof workflow nets that are 1-safe and which have no loops. Further, we require them to besound [vdA97]. A Workflow net W is sound iff:

1. For every marking M reachable from the initial place i, there is a firing sequenceleading to the final place o.

2. If a marking M marks the final place o, then no other place can in W can be markedin M

3. There are no dead transitions in W . Starting from the initial place, it is alwayspossible to fire any transition of W .

Workflow nets will be generically denoted by W . We can equip workflow nets with thesame attributes as occurrence nets, this yields pre-WFnets W = (W, Φ, T, Tinit). Referringto the end of Section 7.3.1, unfolding W yields an occurrence net that we denote by NW

with associated morphism ϕW : NW 7→ W . Here the morphism ϕW maps the two ctransitions (and the place in its preset and postset) in the net on the right to the singlec transition (and its preset and postset) in the net on the left. Observe that W and NW

possess identical sets of minimal places. Morphism ϕW induces a pre-OrchNet

NW = (NW , ΦW , TW , Tinit)

7.5 Probabilistic monotonicity 145

by attaching to each transition t of NW the value and latency functions attached to ϕW (t)in W.

We shall use the results of the previous section in order to characterize those pre-WFnetswhose unfoldings give monotonic pre-OrchNets. Our characterization will be essentiallystructural in that it does not involve any constraint on latency functions. Under thisrestricted discipline, the simple structural conditions we shall formulate will also be almostnecessary. For this, we recall a notion of cluster [Mur89] on nets. For a net N , a cluster isa (non-empty) minimal set c of places and transitions of N such that ∀t ∈ c, •t ⊆ c and∀p ∈ c, p• ⊆ c.

Theorem 7.5 (Sufficient Condition) Let W be a WFnet and NW be its unfolding. Asufficient condition for the pre-OrchNet NW = (NW , ΦW , TW , Tinit) to be monotonic is thatevery cluster c satisfies the following condition:

∀t1, t2 ∈ c, t1 6= t2 =⇒ t1• = t2

• (7.8)

Recall that the sufficient condition for monotonicity stated in Theorem 7.4 is “almostnecessary” in that, if enough flexibility exist in setting latencies and initial dates, then itis actually necessary. The same holds for the sufficient condition stated in Theorem 7.5 ifthe workflow net is assumed to be live.

Theorem 7.6 (Necessary Condition) Suppose that the workflow net W is sound. As-sume that W ∈ W and W ′ ≥ W implies W ′ ∈ W, meaning that there is enough flexibility insetting latencies and initial dates. In addition, assume that there is at least one W∗ ∈ Wsuch that there is an daemon value ω∗ for which the latencies of all the transitions arefinite. Then the sufficient condition of Theorem 7.5 is also necessary for monotonicity.

Observe that the orchestration in the right of figure 7.2 satisfies Theorem 7.5, whereasthe orchestration in the right of figure 7.3 does not.

7.5 Probabilistic monotonicity

So far we have considered the case where latencies of transitions are nondeterministic. Ina previous work [RBHJ07, RBHJ08], on the basis of experiments performed on real Webservices, we have advocated the use of probability distributions when modeling the responsetime of a Web service. Can we adapt our theory to encompass probabilistic latencies?

7.5.1 Probabilistic setting, first attempt

In Definitions 7.1 and 7.2, latency and initial date functions were considered nondetermin-istic. The first idea is to let them become random instead. This leads to the followingstraightforward modification of definitions 7.1 and 7.2:

Definition 7.7 (probabilistic OrchNet and pre-OrchNet, 1) Call probabilistic Or-chNet a tuple N = (N, Φ, T, Tinit) where Φ = (φt)t∈T , T = (τt)t∈T , and Tinit = (τp)p∈min(P),are independent families of random value functions, latency functions, and initial datefunctions, respectively.

Call probabilistic pre-OrchNet a tuple N = (N, Φ, T, Tinit), where N and Φ are asbefore, and T and Tinit are sets of families T of random latency functions and of familiesTinit of random initial date functions. Write N ∈ N if N = (N, Φ, T, Tinit) for some T ∈ Tand Tinit ∈ Tinit.


We now equip random latencies and initial dates with a probabilistic ordering. If τ is arandom latency function, its distribution function is defined by

F (x) = P(τ ≤ x)

where x ∈ R+. Consider the following ordering: random latencies τ and τ ′ satisfy

τ ≥s τ ′ if F (x) ≤ F ′(x) holds ∀x ∈ R+, (7.9)

where F and F ′ are the distribution functions of τ and τ ′, respectively—with correspondingdefinition for the probabilistic ordering on initial date functions. Order (7.9) is classicalin probability theory, where it is referred to as stochastic dominance or stochastic orderingamong random variables [KKO77].

Using order (7.9), for two families T and T ′ of random latency functions, write

T ≥s T ′

to mean that ∀t ∈ T =⇒ τt ≥s τ ′t , and similarly for Tinit ≥s T ′

init. For N ,N ′ ∈ N, write

N ≥s N ′

if T ≥s T ′ and Tinit ≥s T ′init both hold. Finally, the latency Eω(N ) of OrchNet N is itself

seen as a random variable that we denote by E(N ), by removing symbol ω. This allowsus to define, for any two N ,N ′ ∈ N,

E(N ) ≥s E(N ′)

by requiring that random variables E(N ) and E(N ′) are stochastically ordered.

Definition 7.8 (probabilistic monotonicity, 1) Probabilistic pre-OrchNet N is calledprobabilistically monotonic if, for any two N ,N ′ ∈ N, such that N ≥s N ′, we haveE(N ) ≥s E(N ′).

It is a classical result on stochastic ordering that, if (X1, . . . , Xn) and (Y1, . . . , Yn) are inde-pendent families of real-valued random variables such that Xi ≥s Yi for every 1 ≤ i ≤ n,then, for any increasing function f : Rn → R, then f(X1, . . . , Xn) ≥s f(Y1, . . . , Yn). Ap-plying this yields that nondeterministic monotonicity in the sense of definition 7.3 impliesprobabilistic monotonicity in the sense of to definition 7.8. Nothing can be said, however,regarding the converse.

In order to derive results in the opposite direction, we shall establish a tighter linkbetween this probabilistic framework and the nondeterministic framework of sections 7.3and 7.4.

7.5.2 Probabilistic setting: second attempt

Let us restart from the nondeterministic setting of sections 7.3 and 7.4. Focus on defini-tion 7.1 of OrchNets. Equipping the set Ω of all possible values for the daemon with aprobability P yields an alternative way to make the latencies and initial dates random.This suggests the following alternative setting for probabilistic monotonicity.

Definition 7.9 (probabilistic OrchNet and pre-OrchNet, 2) Call probabilistic Or-chNet a pair (N ,P), where N is an OrchNet according to definition 7.1 and P is a prob-ability over the domain Ω of all values for the daemon.

Call probabilistic pre-OrchNet a pair (N,P), where N is a pre-OrchNet according todefinition 7.2 and P is a probability over the domain Ω of all values for the daemon.

7.5 Probabilistic monotonicity 147

How can we relate the two definitions 7.7 and 7.9? Consider the following assumption,which will be in force in the sequel:

Assumption 3 For any N ∈ N, τt and τp form an independent family of random variables,for t ranging over the set of all transitions and p ranging over the set of all minimal placesof the underlying net.

Let us now start from definition 7.7. For t a generic transition, let (Ωt,Pt) be the setof possible experiments together with associated probability, for random latency τt; andsimilarly for (Ωp,Pp) and τp. Thanks to assumption 3, setting

Ω =(∏

t

Ωt

)×

(∏

p

Ωp

)and P =

(∏

t

Pt

)×

(∏

p

Pp

), (7.10)

yields the entities of definition 7.9. Can we use this correspondence to further relateprobabilistic monotonicity to the notion of monotonicity of sections 7.3 and 7.4? In thenondeterministic framework of section 7.4, definition 7.2, we said that

τ ≥ τ ′ if τ(ω) ≥ τ ′(ω) holds ∀ω ∈ Ω, (7.11)

Clearly, if two random latencies τ and τ ′ satisfy condition (7.11), then they also satisfycondition (7.9). That is, ordering (7.11) is stronger than stochastic ordering (7.9). Unfor-tunately, the converse is not true in general. For example, condition (7.9) may hold whileτ and τ ′ are two independent random variables, which prevents (7.11) from being satisfied.Nevertheless, the following routine result holds:

Theorem 7.10 If condition (7.9) holds for the two distribution functions F and F ′, thenthere exists a probability space Ω, a probability P over Ω, and two real valued randomvariables τ and τ ′ over Ω, such that:

1. τ and τ ′ possess F and F ′ as respective distribution functions, and

2. condition (7.11) is satisfied by the pair (τ , τ ′) with probability 1.

Proof: Take Ω = [0, 1] and let P the Lebesgue measure. Then by setting the randomvariables τ(ω) = infx ∈ R+|F (x) ≥ ω and τ ′(ω) = infx ∈ R+|F ′(x) ≥ ω yields theclaim.

Theorem 7.10 allows reducing the stochastic comparison of real valued random variables totheir ordinary comparison as functions defined over the same set of experiments endowedwith a same probability. This applies in particular to each random latency function andeach random initial date function, when considered in isolation. Thus, when performingconstruction (7.10) for two OrchNets N and N ′, we can take the same pair (Ωt,Pt) torepresent both τt and τ ′

t , and similarly for τp and τ ′p. Applying (7.10) implies that both N

and N ′ are represented using the same pair (Ω,P). This leads naturally to definition 7.9.In addition, applying theorem 7.10 to each transition t and each minimal place p yields

that stochastic ordering N ≥s N ′ reduces to ordinary ordering N ≥ N ′. Observe thatthis trick does not apply to the overall latencies E(N ) and E(N ′) of the two OrchNets; thereason for this is that the space of experiments for these two random variables is alreadyfixed (it is Ω) and cannot further be played with as theorem 7.10 requires. Thus we canreformulate probabilistic monotonicity as follows—compare with definition 7.8:


Definition 7.11 (probabilistic monotonicity, 2) Probabilistic pre-OrchNet (N,P) iscalled probabilistically monotonic if, for any two N ,N ′ ∈ N, such that N ≥ N ′, wehave E(N ) ≥s E(N ′).

Note the careful use of ≥ and ≥s . The following two results establish a relation betweenprobabilistic monotonicity and monotonicity:

Theorem 7.12 If pre-OrchNet N is monotonic, then, probabilistic pre-OrchNet (N,P) isprobabilistically monotonic for any probability P over the set Ω.

This result was already obtained in the first probabilistic setting; it is here a direct conse-quence of the fact that τ ≥ τ ′ implies τ ≥s τ ′ if τ and τ ′ are two random variables definedover the same probability space. The following converse result completes the landscapeand is much less straightforward. It assumes that it is legal to increase at will latencies orinitial dates, see theorem 7.4:

Theorem 7.13 Assume condition 2b of theorem 7.4 is satisfied. Then, if probabilistic pre-OrchNet (N,P) is probabilistically monotonic, then it is also monotonic with P-probability 1.

7.6 Getting Rid of Non-Monotonicity

Avoiding Non-Monotonicity. We suggest a few ways in which non-monotonic orches-trations can be made monotonic. These might serve as guidelines to the designer of anorchestration, to avoid building non-monotonic orchestrations.

1. Eliminate Choices. We saw that choices in the execution flow can create non-monotonicity. So if possible, choices in the execution flow should be avoided whiledesigning orchestrations. This seems very restrictive but is not totally unrealistic.For example, in the Travel Planner orchestration of figure 7.3, if the designer can finda rental service for both, cars and bikes, then the two mutually exclusive rental callscan be replaced by a call to that single rental service. This makes the execution flowan event graph and the Travel Planner orchestration monotonic.

2. Balancing out performance of mutually exclusive branches. One way to make anorchestration “more monotonic” is to ensure that all its mutually exclusive brancheshave similar response times. For e.g., in the Travel Planner example of figure 7.3, ifthe two exclusive services Bike Rent and Car Rent have similar response times, theorchestration is nearly monotonic.

3. Externalising Choices. Choices are of course integral to many execution flows andsometimes simply cannot be removed. A possible way out in this case is to externalisethe choice and make them client dependent. This solution has already been discussedin the modified Travel Planner example of Section 7.2.

4. If none of the above works, then a brute force alternative consists in performing thefollowing. Replace the orchestration latency Eω(N ) defined in (7.6) by the followingpessimistic bound for it (see Theorem 7.4 for the notations):

Fω(N ) = max Eω(κ,N ) | κ ∈ V (N) (7.12)

Then for any net N , and any two OrchNets N and N ′ over N , ∀ω

Fω(N ) ≥ Eω(N ) (7.13)

N ≥ N ′ ⇒ Fω(N ) ≥ Fω(N ′) (7.14)

7.7 Conclusion 149

holds. Therefore, using the pessimistic bound Fω(N ) instead of tight estimate Eω(N )when building the orchestration’s contract with its customer, is safe in that: 1) by(7.14), monotonicity of Fω(N ) with respect to the underlying OrchNet is guaranteed,and 2) by (7.13), the orchestration will meet its contract if its sub-contractors do so.In turn, this way of composing contracts is pessimistic and should therefore be avoidedwhenever possible.

Where does monotonicity play a role in the orchestration’s life cycle? We use contractsto abstract the behaviour of the services involved in an orchestration. The orchestration,trusting these contracts, composes them to derive an estimate of its own performance,from which a contract between the orchestration and its customers can be established.Since this relies on trust between the orchestration and its sub-contractors, these contractswill have to be monitored at run-time to make sure that the sub-contractors deliver thepromised performance. In case of violation, counter-measures like reconfiguring the or-chestration might be taken. The orchestration’s life cycle thus consists of the followingphases [RBHJ08]:

1. At design time, establish QoS contracts with the customer by composing QoS con-tracts from the called services; tune the monitoring algorithms accordingly; designreconfiguration strategy.

2. At run time, run the orchestration; in parallel, monitor the called services for possibleQoS contract violation; whenever needed, perform reconfiguration.

Monotonicity plays a critical role at design time. The above pessimistic approach can beused as a backup solution if monotonicity is not satisfied. Monotonicity is however, notan issue at run time and the orchestration can be taken as such, with no modification.Monitoring of the called services remains unchanged too.

7.7 Conclusion

This paper is a contribution to the fundamentals of contract based QoS management ofWeb services orchestrations. QoS contracts implicitly assume monotonicity w.r.t. QoSparameters. We focus on one representative QoS parameter, namely response time. Wehave shown that monotonicity is easily violated in realistic cases. We have formalizedmonotonicity and have provided necessary and sufficient conditions for it. As we haveseen, QoS can be very often traded for Quality of Data: poor quality responses to queries(including exceptions or invalid responses) can often be got much faster. This revealsthat QoS parameters should not be considered separately, in isolation. We have providedguidelines for getting rid of non-monotonicity.

We see one relevant extension of this work: Advanced orchestration languages likeOrc [MC07] offer a sophisticated form of preemption that are modelled by contextual nets(with read arcs). Our mathematical results do not consider nets with read arcs. Extendingour results to this case would be interesting and useful.


Chapter 8

A Theory of QoS for Web Service

Orchestrations

Sidney Rosario, Albert BenvenisteIRISA/INRIA Rennes,Campus de Beaulieu, Rennes. France.


AbstractIn this paper we develop a comprehensive framework for QoS management based on soft

probabilistic contracts. Our approach encompasses general QoS parameters, with “responsetime” as a particular case. We support composite QoS parameters, e.g., combining timingaspects with “quality of data” or security level. We also study contract composition (how toderive QoS contracts for an orchestration from the QoS contracts with its called services),and contract monitoring.

152 A Theory of QoS for Web Service Orchestrations

8.1 Introduction

Web services and their orchestrations are now considered an infrastructure of choice formanaging business processes and workflow activities over the Web [vdAvH02]. BPEL [Bpe07]has become the industrial standard for specifying orchestrations. Besides BPEL, the Orcformalism [KQM09, KCM06] has been proposed to specify orchestrations, by W. Cook andJ. Misra from the University of Texas at Austin. Orc is a simple and clean academic lan-guage for orchestrations with a rigorous mathematical semantics. For this reason, our studyin this paper relies on Orc. Its conclusions and approaches, however, are also applicable toBPEL.

When dealing with the management of QoS, the commitments of each subcontractorwith regard to the orchestration are specified via contracts in the form of Service LevelAgreements, SLA [BSC01]. Most SLAs commonly tend to have QoS parameters which aremild variations of the following: response time (latency); availability; maximum allowedquery rate (throughput); and security [KL03]. From QoS contracts with sub-contractors,the overall QoS contract between orchestration and its clients can be established. Thisprocess is called contract composition. Then, since contracts cannot only rely on trustingthe sub-contractors, monitoring techniques must be developed for the orchestrator to beable to detect possible violations of a contract by a sub-contractor. Finally, upon contractviolation, the orchestrator may consider reconfiguration itself, i.e., replacing some calledservices by alternative, “equivalent” ones — we do not address this last task here.

To the best of our knowledge, with the noticeable exception of [LSW01] and [HWTS07],all composition studies consider performance related QoS parameters of contracts in theform of hard bounds. For instance, response times and query throughput are required tobe less than a certain fixed value and validity of answers to queries must be guaranteed atall times. When composing contracts, hard composition rules are used. Typical examplesare addition or maximum (for response times), or conjunction (for validity of answersto queries). Whereas this results in elegant and simple composition rules, this generalapproach by using hard bounds does not fit the reality well and may lead to over pessimisticpromises. Indeed, real measurements of response times for existing Web services reveal thatthey vary a lot and are better represented through their histogram. Thus we have proposedin [RBHJ08] using soft probabilistic contracts instead. In such contracts, hard bounds arereplaced by probabilistic obligations, i.e., a QoS parameter is considered probabilistic anda distribution probability is agreed for it. The obligation is that the called service shouldbehave “no worse” than this agreed distribution regarding this QoS parameter, in a sensethat we formalize in this paper.

Adopting a probabilistic approach for QoS has many advantages, but also raises someissues when performing contract composition and contract monitoring. Analytical solutionsfor deriving the distribution of the composition from the distribution of its componentsexist for simple cases where the control flow of the composition is not affected by thedata values of the queries and their responses, and other timing issues. Queuing networktechniques can be used in simple cases like this. More sophisticated stochastic Petri netscan also be used, but they require restricting to exponential distributions. These elegantanalytical approaches, however, are not applicable in general to services orchestrationswhere responses to queries and timing (via timeouts) interfere with the control flow —The CarOnLine example in the next section is an instance of this. We thus need todevelop new techniques to perform contract composition and contract monitoring, adaptedto probabilistic contracts.

Contributions: In this paper we extend and systematize the approach of [RBHJ08]

8.2 Our Approach 153

and [BRBH09] by extending it beyond the case of soft probabilistic contracts for ResponseTime.

Our first contribution consists in proposing a comprehensive approach for Soft Prob-abilistic QoS Contracts encompassing a large class of QoS parameters taking values inpartially ordered domains, together with means to build composite QoS parameters andcontracts and reason about them.

A second contribution consists in a procedure to perform flexible contract composition,which consists in relating the obligations binding the pair client, orchestration, to theobligations binding the different pairs orchestration, called service.

A third (minor) contribution consists in the extension of the contract monitoring tech-nique proposed in [RBHJ08] to our generalized case. This extension turns out to bestraightforward, as we shall see.

Last but not least, we discuss languages features that are useful in making our approacheffective. Not surprisingly, QoS domains must be declared along with their characteristicsallowing to perform contract composition. We also found it useful to introduce a languagefeature that is generic with respect to the various QoS domains and performs a filtering ofresponses from called services or from pools thereof, according to best QoS performance.We illustrate this with the Orc language.

Our whole approach is supported by the TOrQuE tool (Tool for Orchestration Qual-ity of Service Evaluation), from which experimental results for contract composition arederived. The organization of the paper is as follows: Our study is illustrated by the“CarOnLine” example that we present in the next section. Based on this example, wediscuss in particular why QoS domains should be partially, not totally, ordered. We thenformalise the notion of QoS parameters and their domains, and introduce our orchestrationmodel. We study the monotonicity of orchestrations for generic parameters, for both, thenon-deterministic and the probabilistic case. We then develop our general framework forflexible QoS management, including the procedure for contract composition. Experimentsare reported in the end.

8.2 Our Approach

In this section we outline our approach to QoS management. Corresponding key elementsare detailed in subsequent sections. To motivate our approach we first discuss a represen-tative case of an orchestration, the CarOnLine example.

8.2.1 The CarOnLine motivating example

The CarOnLine orchestration is shown in Figure 8.1. In search for a second-hand car, a clientcalls the orchestration with an input car type — small car, family car, SUV, etc — as theinput. The orchestration calls two garages, GarageA and GarageB, in parallel, with the client’scar type as an input parameter. The garages respond with their price quote for that carand best offer is selected. The calls to the garages are guarded by a Timeout. If only onegarage has responded when a timeout occurs, its response is taken as the best offer andany eventual response of the other garage is simply ignored. If no garage responds beforetimeout, then a “Fault” message is returned to the client, indicating an exception. Afterselecting the best offer for the car, CarOnLine finds insurance and credit offers for this car.For credit offers, two services AllCredit and AllCreditPlus are called in parallel and the offerhaving the best (lowest) interest rate is chosen. The insurance services called depend onthe type of car which needs to be insured. If the car requested by the client is of some


Timer GarageA TimerAllCredit AllCreditPlus InsurePlusInsureAllGoldInsuremergesyn

ar=deluxeCarOnLine Response

GarageBCarOnLine Requestyes no

bestbestbest

Figure 8.1 – Schematic representation of the CarOnLine example. Calls to services are shown by roundedrectangles and processing actions internal to the orchestration are shown in bold boxes.

“deluxe” category, then only one service — GoldInsure — can offer insurance for such cars,and any offer made by it is taken. If the car is not a “deluxe” car, then two services,InsureAll and InsurePlus are called in parallel and the best insurance, i.e., the one that coststhe least amongst the two offers, is selected. In the end, the tuple (price, credit, insurance)is returned to the client.

In this example, we regard the tuple (time, price, credit, insurance) as a composite QoSparameter for optimization by the orchestration. The usual practice in dealing with com-posite parameters consists in synthesizing a single totally ordered parameter by combiningthe QoS parameters using for e.g., a linear combination with user selected weights (see fore.g., [ZBN+04]). We think that such combinations are arbitrary and make little sense inmost cases including the present one. Thus we prefer keeping the different QoS parametersas such and order them individually. As a consequence, the composite parameter can beonly partially, not totally, ordered.

A formal description of CarOnLine using the Orc language is given in Table 8.1. Orc offersthree primitive operators, see [KQM09] and [KCM06] for details. For Orc expressions f, g,“f | g” executes f and g in parallel. “f >x> g” evaluates f first and for every valuereturned by f , a new instance of g is launched with variable x assigned to this returnvalue; in particular, “f ≫ g” (which is a special case of the former where the returnedvalues are not assigned to any variable) causes every value returned by f to create a newinstance of g. “f where x :∈ g” executes f and g in parallel. When g returns its firstvalue, x is assigned to this value and the computation of g is terminated. All site calls in fhaving x as a parameter are blocked until x is defined (i.e., until g returns its first value).

The operator :∈Q is a new operator, where Q is a static parameter of this operator. Qis a QoS parameter whose domain is a partially ordered set (DQ,≤); by convention, “best”will refer to a minimal element among a set. The expression “f where x :∈Q g” does nottake the first value returned by g as x. Instead it waits for a “best quality” response amongall responses from g to that call, irrespective of the time taken to generate them. Since thedomain of Q is only partially ordered in general, a best response may not be unique, so anondeterministic choice is performed in this case. Observe that :∈ is a particular case of

8.2 Our Approach 155

Assumptions QoS parameters :δ : inter-query time, Dδ = R+

Guarantees QoS parameters :d : latency, Dd = R+

p : car price, Dp = R+

i : insurance costs, Di = R+

c : credit rate, Dc = R+

CarOnLine(car) ∆ CarPrice(car) >p> let(p, c, i)where c :∈d GetCredit(p)

i :∈d GetInsur(p, car)

BestQ(E1, . . . , En) ∆ let(a) where a :∈Q E1 | E2 . . . | En

CarPrice(car) ∆ Bestp(Bestd(GarageA[d, p](car), RT imer[d](T )),Bestd(GarageB[d, p](car), RT imer[d](T )))

>p> if(p 6= Fault)) ≫ let(p)

GetCredit(p) ∆ Bestc(AllCredit[d, c](p), AllCreditP lus[d, c](p))

GetInsur(p, car) ∆ if(car = deluxe) ≫ GoldInsure[d, i](p) | if(car 6= deluxe) ≫

Besti(InsureP lus[d, i](p), InsureAll[d, i](p))

Table 8.1 – CarOnLine in Orc, enhanced with QoS specification.

:∈Q by taking Q as the latency or response time of the call — in this case it is not neededto wait for all the responses from g to get the best one, since the first one received will, bydefinition, be the best.

8.2.2 Summary of our approach for QoS management

Our objective is to develop the needed framework and tools to support the following tasksin QoS management:

1. Task 1. To give a QoS-enhanced description of the orchestration: this is best illus-trated by the Orc specification of the CarOnLine example in Table 8.1. QoS parametersare declared together with their type. The Orc expressions describe how the servicesare orchestrated. Special operators related to QoS are provided. Such a specificationshould allow for complex interactions between control, response values of the servicescalled, and values of the QoS parameters. For example, in CarOnLine the QoS param-eter tuple latency, price, credit, insurance interacts with the value of parameter“car”.

2. Task 2. To specify probabilistic contracts with explicit obligations of the differentactors: A contract usually involves two parties and consists of assumptions andguarantees: provided that one party (the client) respects certain assumptions, it isassured certain guarantees from the other party (the provider). For example, theorchestration is a client of the services it calls, and a provider of a service to its ownclients. We thus have a contract binding each called service to the orchestration, anda contract binding the orchestration to its client. For the first type of contract, theguarantee involves the QoS of the called services, and for the second type of contract,


the guarantee involves the overall QoS of the orchestration. Now, to account for thehigh variability in performance of Web services, we consider the QoS parameters tobe random. So the guarantee part of a contract specifies a worst case probabilitydistribution for the different QoS parameters affected by the service1.

3. Task 3. To model how QoS parameters evolve while the query is processed : As anexample consider the QoS parameter latency for CarOnLine. By observing how thequery travels through the orchestration and knowing the latency of the service calls,we can use max-plus algebraic rules to derive the orchestration’s end-to-end latencyfor that query. We need to develop a similar algebra and the associated operationsfor generic QoS parameters, which model how the QoS parameters of a query evolvewhile being processed by the orchestration. Moreover, the treatment of the QoSparameters for assumptions and guarantees differ. Guarantee QoS parameters (likelatency) are associated with individual queries but assumptions (like inter-query time)are derived from a flow of queries). This task is addressed in detail in the section on“QoS Computing”.

4. Task 4. To perform Monte-Carlo simulations for contract composition: We need aprocedure to derive the probabilistic contract between the orchestration and its clientsfrom the probabilistic contracts agreed between the orchestration and the services itcalls. As mentioned in the introduction, the analytical techniques for composingprobability distributions are not applicable to generic orchestrations where control,data and time may interfere in possibly complex ways. In this case, Monte-Carlosimulation techniques are a powerful alternative. They rely on the mathematicallysound basis of statistical inference and law of large numbers. In such simulations, thecomputations mentioned in Task 3 are repeated sufficiently, to derive an empiricalestimate of the probability distribution for the overall QoS of the orchestration. Thistechnique is developed in detail in the section on “Probabilistic Contract Composi-tion”.

5. Task 5. To monitor probabilistic contracts : We must monitor on-line, whether theservices called by the orchestration actually meet the agreed probabilistic contract.The techniques for monitoring probabilistic contracts must be based on statisticaltesting procedures. We treat this task in the section on “Probabilistic ContractMonitoring”.

The next two sections are devoted to the study of the Tasks 3 and 4 mentioned above.

8.3 QoS Computing

In this section we review the algebra needed for reasoning about the evolution of QoSparameters. Since the treatment of the guarantee QoS parameters differ from that of theassumption QoS parameters, we consider each case separately.

8.3.1 QoS domains and the algebra of QoS computing for guaranteeparameters

Our discussion on the evolution of the guarantee QoS parameters involves an abstract toyorchestration example, with no concrete meaning attached to it. The example uses Petri

1Alternatively, when no such contract can be established (for e.g. if the called service is offered by apopular provider such as Google), a distribution can be estimated from measurement records

8.3 QoS Computing 157

net unfoldings or occurrence nets to model the executions of the orchestration seen as aconcurrent system. Corresponding formal material will be developed in the next section.

An input query to the orchestration is represented by a set of (initial) input tokens,and the processing of the query is modeled by the flow of the tokens in the net. We attachthe guarantee QoS parameters to the tokens to model how these parameters evolve whenthe query is processed. The tokens are thus equipped with a color consisting of a pair

(v, q) = (data, QoS value) (8.1)

t2 t′2

t1 t′1 δq′1 t′′1 δq′′1

q′′1

q0

q′′0

δq1

q′0

δq′2δq2

q2 q′2

q′1q1q′′0 ∨ q1

Figure 8.2 – A simple example. Only QoS values are mentioned — with no data. Each place comes labeledwith a QoS value q which is the q-color of the token if it reaches that place.

Figure 8.2 depicts our toy example. In this figure, pre- and post-sets of a transitiont are denoted by •t and t• respectively. The following rules are sufficient to describe theevolution of the QoS parameters of a query while being processed by the orchestration:

• QoS increments are captured by ⊕: When traversing a transition, each token has itsQoS value incremented. For example, the left most token has initial QoS value q0,which gets incremented as q1 = q0 ⊕ δq1 when traversing transition t1.

• Synchronizing tokens: A transition t is enabled when all places in its preset havetokens. For the transition to fire, these tokens must synchronize, which results inthe “worst” QoS value, denoted by supremum ∨ associated to a given order ≤, wheresmaller means better. For example, when the two input tokens of t2 get synchronized,the resulting synchronized pair has QoS value q′′0 ∨ q1. This is depicted on figure 8.2with this QoS value attached to the shaded area.

• Dealing with conflicts, competition policy: Let us first focus on the conflict followingplace q′0. The QoS alters the usual semantics of the conflict by using a competitionpolicy that is reminiscent of the classical race policy [MBB+89]. The competitionbetween the two conflicting transitions in the post-set is solved by using order ≤ usedfor token synchronization. Thus we test whether q′o ⊕ δq′1 ≤ q′o ⊕ δq′′1 holds, or theconverse. The smallest with respect to ≤ wins the competition — if equality holds,then nondeterministic choice occurs.

Comparing q′o ⊕ δq′1 and q′o ⊕ δq′′1 however generally requires knowing the two alter-natives, which in turn can affect the QoS of the winner, as we shall see for specificQoS domains. This is taken into account by introducing a special operator ¢. If twotransitions t and t′ are in competition and would yield tokens with respective QoSvalues q and q′ in their post-sets, the cost of comparing them to set the competitionalters the QoS value of the winner in that — assuming the first wins — q is modifiedand becomes q ¢ q′. For the case of the figure, we get


if (q′o ⊕ δq′1) ≤ (q′o ⊕ δq′′1) then t′1 fires and q′1 = (q′o ⊕ δq′1) ¢ (q′o ⊕ δq′′1)if (q′o ⊕ δq′1) ≥ (q′o ⊕ δq′′1) then t′′1 fires and q′′1 = (q′o ⊕ δq′′1) ¢ (q′o ⊕ δq′1)

(8.2)

Now, another next conflict may occur between t2 and t′2. Now, if t′′1 actually winsthe first competition, then t′2 will never be enabled and this second potential conflictdoes not occur. In this case, t2 fires and we get

q2 = (q′′o ∨ q1) ⊕ δq2

where the first parenthesis involves the QoS after synchronizing the two input tokensof t2 as shown by the shaded area. If, instead, t′1 wins the first competition, then t′2will get enabled and this second potential conflict will occur. This conflict is thenhandled in a way similar to (8.2).

Some examples of QoS domains

We now instantiate our generic famework by reviewing some examples of QoS domains,with their associated relations and operators ⊕, ≤, and ¢. It is kindly suggested that thereader rescans the above items, for each instance:

1. Latency or Response Time: QoS value of a token gives the accumulated latency, or“age” of the token since it was created when querying the orchestration. The QoSdomain here is R+, equipped with ⊕d = +, and ≤d = the usual order on R+ (∨ isthe usual max operator). Regarding operator ¢d, for the case of latency with racepolicy, comparing two dates via d1 ≤d d2 does not impact the QoS of the winner:answer to this predicate is known as soon as one of the two events is seen, i.e., attime min(d, d′).2 Hence, for this case, we take d1 ¢d d2 = d1, i.e., d2 does not affectd1. This is the basic example, which was studied in [BRBH09]. Since no generic QoSwas considered in this reference, there was no need for considering ¢.

2. Security level : QoS value s of a token belongs to (high, low,≤s), with high ≤s low.Each transition has a security level encoded in the same way, and we take ⊕s = ∨s,reflecting that a low security service processing a high security data yields a lowsecurity response. We now focus on operator ¢s. For this case also, comparing twovalues via s1 ≤s s2 does not impact the QoS of the winner: QoS values are strictly“owned” by the tokens, and therefore do not interfere when comparing them. Hence,we take again s1 ¢s s2 = s1, i.e., s2 does not affect s1. More generally, Quality ofResponse, which has several instances in the CarOnLine example, is handled in thesame way as security level, by using various domains.

3. Composite QoS, first example: we may also consider a composite QoS parameterconsisting of the pair (s, r), where s is a security parameter and r is some quality ofresponse with domain Dr, equipped with ≤r and ¢r. When synchronizing tokens,the product order ≤=≤s × ≤r is used, reflecting the fact that a low security levelresults from synchronizing a low security response with any other response, and a thesame for quality of response. Alternatively, we may want to prioritize security; thisis achieved by taking for ≤ the lexicographic order obtained from the pair (≤s,≤r)by giving priority to s. For both cases, we take ¢ = (¢s , ¢r ).

2This reflects the fact that the evaluation of min(d, d′) can be done in a non-strict way.

8.3 QoS Computing 159

4. Composite QoS, second example: So far the special operator ¢ did not play any role.We will however need it here, when we consider a composite QoS parameter (s, d),where s and d are as above. We want to give priority to security s, and thus wenow take ≤ to be the lexicographic order obtained from the pair (≤s,≤d) by givingpriority to s.

Focus on operator ¢. Consider the marking resulting after firing t1 and t′1 in fig-ure 8.2, enabling t2 and t′2, which are in conflict. Now suppose the QoS valueq2 = (q′′o ∨ q1)⊕ δq2 of the token in postset of t2 is equal to q2 = (low , d2). Similarly,suppose that the QoS value q′2 = (q′′o ∨ q1) ⊕ δq′2 of the token in the postset of t′2 isequal to q′2 = (low , d′2) where d′2 >d d2. From the competition rule, transition t2 winsthe conflict and the outgoing token has QoS value q2 = (low , d2).

However, the decision to select t2 can only be made when q′2 is known, that is, attime d′2. The reason for this is that, since at time d2 a token with security levellow is seen at place following t2, it might be that a token with security level highlater enters place following t′2. The latter would win the conflict according to ourcompetition policy — security level prevails. Observing that the right most tokenindeed has priority level low can only be seen at time d′2. Thus it makes little senseassigning q2 = (low , d2) to the outgoing token; it should rather be q2 = (low , d′2).This is why a non-trivial operator ¢ is needed, namely,

(s, d) ¢ (s′, d′) = (s, max(d, d′)) (8.3)

Evaluation policy

It is worth summarizing, on the example of figure 8.2, the resulting semantics of the net,as enhanced with its QoS attributes. This is shown on figure 8.3. The shaded areas ofthe figure indicate the sub-nets which will never be activated. The formulas for q1, q2, etc,were given before.

t′2

t′1

t2 t′2

t′1

t′2

t2 t′2

t1 t′1 δq′1q′′0

δq1

q′0

δq′2δq2

t′′1

q0

t2 t′2

t1 t′1q′′0

δq1

δq′2δq2

t′′1δq′1

q′1q1

t2

t1δq1

δq2

q′′0

δq′2

t′′1 δq′′1

q1 q′′1

δq′1δq′′1

t′2

t1 t′1δq1

δq′2δq2

δq′1

q1

q′2

t2

t1

δq′2

t′′1 δq′′1

q′′1

δq′1

q2

δq1

δq2t2

t1 t′1δq1

q′1

q2

δq2 δq′2

δq′1

Figure 8.3 – Evaluation steps of the example of figure 8.2.


Contracts: Assumptions versus Guarantees

So far we have only considered QoS offered by the called services and we have shown howto derive from them the QoS offered by the orchestration. In other words, we have onlyconsidered the obligations of the called services. But we have ignored the obligations of theagent calling the services — it can be the orchestration querying a service, or the customerquerying the orchestration. For example, there is a limit in how many queries a sameIP-address can call some free Web services. Another obligation is that the query must bewell formed — although this is hardly considered as an issue of QoS. One can also imaginesetting restrictions on the size of the parameters of a call to a service.3

Query throughput is directly measured by the called services, provided proper identi-fiers (for the calling client and considered query) are given. For our convenience, we willrepresent query throughput via the delay τ between two successive queries. Other QoS re-lated characteristics of the queries are carried by the tokens entering the orchestration. Toaccount for this, we augment the color (v, q) of (8.1) with a new attribute β characterizingthe flow of queries. Thus we collect the pair α = (τ, β) as an assumption to which eachquery is subject when submitted to a called service, and we denote by A the domain ofassumptions.

Since we view QoS as benefiting from guarantees by the orchestration to its clients, andby the called services to the orchestration, characteristics α of the flow of queries will bein turn subject to assumptions. A contract will then be an implication

assumption ⇒ guarantee

expressing that some level of QoS is guaranteed by the orchestration to its client (or by acalled service to the orchestration), provided that assumptions are met.

In contrast to QoS values, assumptions are modified by the orchestration, but indirectlyas a consequence of control. For example, the time-and-data dependent routing of tokens inthe orchestration modifies the time elapsed since the previous query. However, in contrastto guarantees, no control flow decision is made while executing the orchestration based onassumptions.

8.4 The Orchestration Model: OrchNets

In this section we present OrchNets as our model for QoS enhanced orchestrations. Or-chNets are a special form of colored occurrence nets (CO-nets) in which explicit provisionis offered for QoS management. Occurrence nets are concurrent models of executions ofPetri nets. They can support data values, QoS parameters, preemption, and recursion atno additional cost. Note that the executions of Workflow Nets [vdA97] are also CO-nets.We begin with the formal definition of QoS domains, for guarantees and assumptions.

8.4.1 QoS domains

Definition 8.1 (QoS domains for guarantees) A QoS domain for guarantees is a tu-ple Q = (D,≤,⊕, ¢) where:

• (D,≤) is a partial order that is a complete upper lattice, meaning that every subsetS ⊆ D has a unique least upper bound denoted by

∨S. By convention, we interpret

3Observe that the same arise in QoS in the context of networks and IP. In this context, QoS involvesparameters that are obligations for the network, e.g., jitter and latency, whereas not exceeding maximalthroughput is an obligation for the user of the network.


synchronization order ≤ as “better”. Hence operator ∨ amounts to taking the “worst”QoS and is used while synchronizing tokens.

• Operator ⊕ : D × D → D captures how a transition increments the QoS value; itsatisfies the following conditions:

1. there exists some neutral element 0 with ∀q ∈ D ⇒ q ⊕ 0 = 0 ⊕ q = q;

2. ⊕ is monotonic:

q1 ≤ q′1 and q2 ≤ q′2 =⇒ (q1 ⊕ q2) ≤ (q′1 ⊕ q′2)

3. ∀q, q′ ∈ D, ∃δq ∈ D such that q ≤ q′ ⊕ δq.

• The competition function ¢ : D×D∗ → D, where D∗ =⊎∞

k=0 Dk and D0 = ∅, maps apair consisting of 1/ the QoS resulting from the synchronization of the input tokens,and 2/ the tuple of the QoS of other tokens that must be considered when applyingcompetition. We require the following regarding ¢:

1. q ¢ ǫ = q where ǫ denotes the empty tuple, that is, if no competition occurs,then q is not altered;

2. ¢ is monotonic:

q ≤ q′ and q1 ≤ q′1, . . . , qn ≤ q′n⇓

(q ¢ (q1, . . . , qn)) ≤ (q′ ¢ (q′1, . . . , q′n))

Examples were given in section 8.3.1. The actual size of the second component of competi-tion function ¢ is dynamically determined while executing the net, this is why the domainof ¢ is D × D∗.

Definition 8.2 (QoS domains for Assumptions) A QoS domain for assumption is apair (A,≤A), where: A = R+×B, R+ is equipped with its usual order ≤, B is equipped withsome partial order ≤B, ≤A is the product order, and (A,≤A) is a complete lower lattice,meaning that every subset S ⊆ D has a unique least lower bound denoted by

∧S.

For t a service, an assumption is a pair α = (τ, β) where τ is the elapsed time since the lasttoken was received, and β collects the other assumptions attached to the tokens. Observethat τ ≤ τ ′ must be interpreted as “τ is worse than τ ′” from the point of view of t, becausethis amounts to increasing the load on t. We take the same interpretation for ≤B, and alsofor partial order ≤A on A; thus “better” translates as “≥A” for assumptions, which is theconverse of guarantees.

If some QoS parameter q of the orchestration is irrelevant to a service it involves, wetake the convention that this service acts on tokens with a 0 increment on the value of q.With this convention we can safely assume that the orchestration, all its called services,and all its tokens use the same QoS domain. This assumption will be in force in the sequel.

Before providing the formal definition of OrchNets, we need some background on oc-currence nets.


8.4.2 Background on Petri nets and Occurrence nets

We assume that the reader is familiar with the basics of Petri nets [Mur89]. A Petrinet is a tuple N = (P, T ,F , M0), where: P is a set of places, T is a set of transitionssuch that P ∩ T = ∅, F ⊆ (P × T ) ∪ (T × P) is the flow relation, M0 : P → N isthe initial marking. For x ∈ P ∪ T , we call •x = y | (y, x) ∈ F the preset of x, andx• = y | (x, y) ∈ F the postset of x. For a net N = (P, T ,F , M0) the causality relation≤ is the transitive and reflexive closure of F . For a node x ∈ P ∪ T , the set of causes ofx is ⌊x⌋ = y ∈ P ∪ T | y ≤ x. Two nodes x and y are in conflict, denoted by x#y, ifthere exist distinct transitions t, t′ ∈ T , such that t ≤ x, t′ ≤ y and •t ∩ •t′ 6= ∅. Nodes xand y are said to be concurrent if neither (x ≤ y) nor (y ≤ x) nor (x#y). A configurationof N is a subnet κ of nodes of N such that: 1/ if x < x′ and x′ ∈ κ then x ∈ κ; and, 2/for all nodes x, x′ ∈ κ,¬(x#x′). For convenience, we require that the maximal nodes in aconfiguration are places.

Occurrence nets: A Petri net is safe if all its reachable markings M satisfy M(P) ⊆ 0, 1.A safe net N = (P, T ,F , M0) is an occurrence net (O-net) iff

1. ¬(x#x) for every x ∈ P ∪ T ;

2. ≤ is a partial order and ⌊t⌋ is finite for any t ∈ T ;

3. for each place p ∈ P, |•p| ≤ 1;

4. M0 = p ∈ P|•p = ∅ holds.

Occurrence nets are a good model for representing the possible executions of a concurrentsystem.

Branching cells: The discussion of the example in section 8.3.1 revealed the need toconsider, dynamically while execution progresses, the set of transitions that are both en-abled and in conflict with a considered transition. This was studied by S. Abbes and A.Benveniste with the notion of branching cell [AB06], which we recall now. Let N be anoccurrence net. Two transitions t, t′ ∈ T are in minimal conflict, written t#mt′, if and onlyif (⌈t⌉ × ⌈t′⌉) ∩ # = (t, t′), where ⌈t⌉ = s ∈ T | s ≤ t is the set of transitions causingt. A prefix M of N is a causally closed subnet of N whose maximal nodes are places;formally, M is closed under operations t → ⌈t⌉ and t → t•. Prefix M is called a stoppingprefix if it is closed under minimal conflict: t ∈ M and t′#mt imply t′ ∈ M . Branchingcells of occurrence net N are inductively defined as follows: 1/ every minimal (for prefixrelation) stopping prefix of N is a branching cell, and, 2/ let B be any such branching celland κ any maximal configuration of it, then any branching cell of Nκ is a branching cellof N , where Nκ, the future of κ, is defined by

Nκ = x ∈ N \ κ | ∀x′ ∈ κ,¬(x#x′) ∪ maxPlaces(κ) (8.4)

where maxPlaces(κ) is the set of maximal nodes of κ (which are all places).4 A result re-garding branching cells that we will need is that the minimal (for causality order) branchingcells of an occurrence net are pairwise concurrent.

4In the example of figure 8.2, transitions t′1 and t′′1 , along with their pre and post sets form one of thebranching cells of the net.


8.4.3 OrchNets: formal definition and semantics

In this section we assume QoS domains (D,≤,⊕, ¢), for guarantees, and (A,≤A), forassumptions.

Definition 8.3 (OrchNet) An OrchNet is a tuple N = (N, V, A, Q, Qinit) consisting of

• A finite occurrence net N with token attributes

c = (v, β, q) = (data, assumption, QoS value)

• A family V = (νt)t∈T of value functions, mapping the data values of the transition’sinput tokens to the data value of the transition’s output token.

• A family A = (αt)t∈T of assumptions, where αt = (τt, βt) for each t ∈ T , τt is thetime elapsed since the previous token traversed transition t, and βt is the value setfor the assumptions by transition t when the considered token traverses it.

• A family Q = (ξt)t∈T of QoS functions, mapping the data values of the transition’sinput tokens to a QoS increment.

• A family Qinit = (ξp)p∈min(P) of initial QoS functions for the minimal places of N .

Values, assumptions, and QoS functions can be nondeterministic. We introduce a global,invisible, daemon variable ω that resolves this nondeterminism and we denote by Ω itsdomain. That is, for ω ∈ Ω, νt(ω), αt(ω), ξt(ω), and ξp(ω) are all deterministic functionsof their respective inputs.

Competition Policy

We will now explain how the presence of QoS values attached to tokens affects the semanticsof OrchNets. Assumptions play no role in the competition policy, see our discussion ofsection 8.3.1. Thus, in the following analysis, we can safely ignore α. So, when talkingabout a “QoS value” in this subsection, we mean a QoS value for guarantees. Accordingly,we will consider that any place p of occurrence net N has a pair (vp, qp) = (data, QoSvalue) assigned to it, which is the color held by a token reaching that place.

Procedure 1 (competition policy) Let ω ∈ Ω be any value for the daemon. The con-tinuation of any finite configuration κ(ω) is constructed by performing the following steps,where we omit the explicit dependency of κ(ω), νt(ω), and ξt(ω), with respect to ω, for thesake of clarity:

1. Choose nondeterministically a minimal branching cell B in the future of κ.

2. For t any minimal transition of B, compute:

qt =(∨

p′∈•t qp′

)⊕ ξt(vp′ | p′ ∈ •t) (8.5)

3. Competition step: select nondeterministically a minimal transition t∗ of B such thatno other minimal transition t of B exists such that qt < qt∗ .

4. Augment κ to κ′ = κ ∪ t∗ ∪ t∗•, and assign, to every p ∈ t∗

•, the pair (v, q), where

v = νt(vp′ | p′ ∈ •t)q = qt∗ ¢ (qt | t ∈ B, t minimal, t 6= t∗)

(8.6)


Observe that the augmented configuration κ′ as well as the pair (v, q) are dependent on ω.

Step 4 of competition policy simplifies for the examples 1–3 of section 8.3.1, since q ¢

(q1, . . . , qn) = q in these cases. On the other hand, a non-trivial operator ¢ was needed toaddress example 4, see formula (8.3).

Since occurrence net N is finite, the competition policy terminates in finitely manysteps when Nκ(ω) = ∅. The total execution thus proceeds by a finite chain of nestedconfigurations: ∅ = κ0(ω) ⊂ κ1(ω) · · · ⊂ κn(ω). Hence, κn(ω) is a maximal configurationof N that can actually occur according to the competition policy, for a given ω ∈ Ω; wegenerically denote it by

κ(N , ω). (8.7)

For the example 1 of latency, our competition policy boils down to the classical race pol-icy [MBB+89]. Our competition policy bears some similarity with the “preselection poli-cies” introduced in [MBB+89], except that the continuation is selected based on QoS valuesin our case, not on random selection. We will also need to compute the QoS for any con-figuration of N , even if it is not a winner of the competition policy. We do this usingprocedure 1, but without the competition step:

Procedure 2 (QoS of an arbitrary configuration) Let κmax be any maximal config-uration of N and κ ⊆ κmax a prefix of it. With reference to procedure 1, perform thefollowing: step 1 with B any minimal branching cell in κmax \ κ, step 2 with no change,and then step 4 for any t as in step 2. Performing this repeatedly yields the pair (vp, qp)for each place p of κmax.

We are now ready to define what the QoS value of an OrchNet is, thus formalizing whatwe mean by “the QoS of an orchestration”.

Definition 8.4 (end-to-end QoS) For κ any configuration of occurrence net N , and ωany value for the daemon, the end-to-end QoS of κ is defined as

Eω(κ,N ) =∨

p∈maxPlaces(κ) qp(ω) (8.8)

The end-to-end QoS and loose end-to-end QoS of OrchNet N are given by

end-to-end QoS : Eω(N ) = Eω(κ(N , ω),N ) (8.9)

loose end-to-end QoS : Fω(N ) = maxEω(κ,N ) | κ ∈ V (N) (8.10)

where function max picks one of the maximal values in a partially ordered set, κ(N , ω) isdefined in (8.7), and V (N) is the set of all maximal configurations of net N .

Observe that Eω(N ) ≤ Fω(N ) holds and Eω(N ) is indeed observed when the orchestrationis executed. The reason for considering in addition Fω(N ) will be made clear in the nextsection on monotonicity.

8.5 Study of Monotonicity

In this section we focus on monotonicity of end-to-end QoS for an orchestration. We wish toformalize that a given orchestration is QoS-monotonic: “if any called service performs bet-ter, then so will the orchestration”. To simplify the presentation, the following assumptionis made throughout the paper :

8.5 Study of Monotonicity 165

Assumption 4 QoS functions ξt can be increased at will, independently for each transitiont of an OrchNet.

For two families Q and Q′ of QoS functions, write Q′ ≥ Q to mean that

∀ω ∈ Ω,∀t ∈ T ⇒ ξ′t(ω) ≥ ξt(ω) (8.11)

and similarly for Q′init ≥ Qinit. For OrchNet N ′ = (N, V, Q′, Q′

init), write

(i) N ′ ≥ N , (ii) E(N ′) ≥ E(N ) , and (iii) F (N ′) ≥ F (N )

to mean that (i) Q′ ≥ Q′ and Q′init ≥ Qinit both hold, (ii) ∀ω ∈ Ω, Eω(N ′) ≥ Eω(N ) holds

and (iii) ∀ω ∈ Ω, Fω(N ′) ≥ Fω(N ) holds.

Definition 8.5 (monotonicity) OrchNet N = (N, V, Q, Qinit) is called monotonic (resp.loosely monotonic) if, for any N ′ such that N ≥ N ′, E(N ) ≥ E(N ′) (resp. F (N ) ≥F (N ′)) holds.

The following immediate result justifies considering loose end-to-end QoS F (N ) in additionto end-to-end QoS E(N ):

Theorem 8.6 (loose monotonicity) Any OrchNet is loosely monotonic.

Consequently, it is always sound to base contract composition and contract monitoringon loose end-to-end QoS. This, however, has a price, since loose end-to-end QoS is pes-simistic compared to (actual) end-to-end QoS. The next theorem gives conditions ensuringmonotonicity, i.e., based on tight end-to-end QoS E(N ) :

Theorem 8.7 (monotonicity: global necessary and sufficient condition) OrchNetN = (N, V, Q, Qinit) is monotonic if and only if:

∀ω ∈ Ω,∀κ ∈ V (N) =⇒ Eω(κ,N ) ≥ Eω(κ(N , ω),N ) (8.12)

where V (N) is the set of all maximal configurations of net N and κ(N , ω) is the actuallyoccurring configuration, defined in (8.7).

Condition (8.12) is costly to verify. Thus, a simple structural condition for monotonicityis presented next for workflow nets, a simple model for workflows proposed by [vdA97].These are Petri nets, with a special initial place i and a special final place o. Workflownets will be generically denoted by W . We consider the class of workflow nets W that are1-safe and loop free, and are sound, that is:

1. Final place o is reachable from any reachable marking.

2. If a marking marks final place o, then it marks no other place.

3. Starting from i, it is always possible to fire any transition of W .

The executions of W can be represented by its unfolding NW [ERV02], which is a finiteoccurrence net derived from W . This induces an OrchNet NW = (NW , νW , QW , Qinit)by attaching to each transition t of NW the value and QoS inherited from W throughthe unfolding. For our structural characterization of monotonicity, we need the knownnotion of cluster [Mur89] on nets. For a net N , a cluster is a minimal set c of places andtransitions of N such that ∀t ∈ c, •t ⊆ c and ∀p ∈ c, p• ⊆ c.

Theorem 8.8 (Structural Condition) Let W be a WFnet and NW be its unfolding. Asufficient condition for the OrchNet NW = (NW , νW , QW , Qinit) to be monotonic is thatevery cluster c satisfies the following condition: ∀t1, t2 ∈ c, t1 6= t2 =⇒ t1

• = t2•. If, in

addition, the partial order of the QoS values (D,≤) is such that ∀q ∈ D, ∃q′ ∈ D =⇒ q′ >q, then, the above sufficient condition is also necessary.


8.5.1 Probabilistic monotonicity

So far we have considered the case where QoS increments of transitions are nondetermin-istic. In [RBHJ08] we have advocated the use of probability distributions when modelingthe response time of a Web service. Consequently, monotonicity for probabilistic QoS wasstudied in [BRBH09] for the restricted case of latency. Here we extend the range of thisstudy to the general framework for QoS studies developed in this paper.

As a prerequisite, we need some background on probabilistic ordering — more usuallycalled stochastic ordering [KKO77]. We first recall the classical case when (D,≤) is atotal order: If X is a random variable with values in D, its distribution function is definedby F (x) = P(X ≤ x), where x ∈ D. Consider the following ordering on probabilitydistributions: random variables X and X ′ satisfy X ≥s X ′ if F (x) ≤ F ′(x) holds ∀x ∈ D,where F and F ′ are the distribution functions of X and X ′, respectively. The intuition isthat “there are bigger chances that X ′ is smaller than x as compared to X”. This order isclassical in probability theory, where it is referred to as stochastic dominance or stochasticordering among random variables [KKO77].

To extend stochastic dominance to the case where ≤ is only a partial order, not a totalorder, we proceed as follows. Consider ideals of D, i.e., subsets I of D that are downwardclosed: x ∈ I and y ≤ x =⇒ y ∈ I. Examples of ideals are: for R+, the intervals, [0, x]for all x; for R+ × R+ equipped with the product order, arbitrary unions of rectangles[0, x] × [0, y]. Now, if X has values in D, we define its distribution function by

F (I) = P(X ∈ I), (8.13)

for I ranging over the set of all ideals of D. For X and X ′ two random variables withvalues in D, with respective distribution functions F and F ′, define

X ≥s X ′ iff for any ideal I of D ⇒ F (I) ≤ F ′(I) (8.14)

Probabilistic setting, first attempt

In Definition 8.3, QoS and initial QoS functions were considered nondeterministic. Thefirst idea is to let them become random instead. This leads to the following straightforwardmodification of definition 8.3:

Definition 8.9 (probabilistic OrchNet 1) A probabilistic OrchNet N is a tuple N =(N, Φ, Q, Qinit) where Φ = (φt)t∈T , Q = (ξt)t∈T , and Qinit = (ξp)p∈min(P), are independentfamilies of random value functions, QoS functions, and initial QoS functions, respectively.

We now equip random QoS increments and initial QoS values with a probabilistic orderingassociated to order ≤.

Using order (8.14), for two families Q = (ξt)t∈T and Q′ = (ξ′t)t∈T of QoS functions,write

Q ≥s Q′

to mean that ∀t ∈ T =⇒ ξt ≥s ξ′t, and similarly for Qinit ≥s Q′init. For N ,N ′ ∈ N, write

N ≥s N ′

if Q ≥s Q′ and Qinit ≥s Q′init both hold. Finally, the latency Eω(N ) of OrchNet N is

itself seen as a random variable that we denote by E(N ), by removing symbol ω. Thisallows us to define, for any two N ,N ′ ∈ N,

E(N ) ≥s E(N ′)

by requiring that random variables E(N ) and E(N ′) are stochastically ordered.

8.5 Study of Monotonicity 167

Definition 8.10 (probabilistic monotonicity, 1) Probabilistic OrchNet N is called prob-abilistically monotonic if, for any probabilistic OrchNet N ′ such that N ≥s N ′, we haveE(N ) ≥s E(N ′).

It is a classical result on stochastic ordering that, if (X1, . . . , Xn) and (Y1, . . . , Yn) are inde-pendent families of real-valued random variables such that Xi ≥s Yi for every 1 ≤ i ≤ n,then, for any increasing function f : Rn → R, then f(X1, . . . , Xn) ≥s f(Y1, . . . , Yn). Ap-plying this yields that nondeterministic monotonicity in the sense of definition 8.5 impliesprobabilistic monotonicity in the sense of to definition 8.10. Nothing can be said, however,regarding the converse.

In order to derive results in the opposite direction, we shall establish a tighter linkbetween this probabilistic framework and the nondeterministic framework of sections 8.4and 8.5.

Probabilistic setting: second attempt

Let us restart from the nondeterministic setting of sections 8.4 and 8.5. Focus on defini-tion 8.3 of OrchNets. Equipping the set Ω of all possible values for the daemon with aprobability P is a way to make the QoS functions and initial QoS random.

Definition 8.11 (probabilistic OrchNet 2) Call probabilistic OrchNet a pair (N ,P),where N is an OrchNet according to definition 8.3 and P is a probability over the domainΩ of all values for the daemon.

How can we relate the two definitions 8.9 and 8.11? Consider the following assumption,which will be in force in the sequel:

Assumption 5 For any OrchNet N , ξt and ξp form an independent family of randomvariables, for t ranging over the set of all transitions and p ranging over the set of allminimal places of the underlying net.

Let us now start from definition 8.9. For t a generic transition, let (Ωt,Pt) be the setof possible experiments together with associated probability, for random latency ξt; andsimilarly for (Ωp,Pp) and ξp. Thanks to assumption 5, setting

Ω =(∏

t

Ωt

)×

(∏

p

Ωp

)and P =

(∏

t

Pt

)×

(∏

p

Pp

), (8.15)

yields the entities of definition 8.11. Can we use this correspondence to further relateprobabilistic monotonicity to the notion of monotonicity of sections 8.4 and 8.5? In thenondeterministic framework of section 8.5, we said that

ξ ≥ ξ′ if ξ(ω) ≥ ξ′(ω) holds ∀ω ∈ Ω, (8.16)

Clearly, if two random latencies ξ and ξ′ satisfy condition (8.16), then they also satisfycondition (8.14). That is, ordering (8.16) is stronger than stochastic ordering (8.14). Un-fortunately, the converse is not true in general. For example, condition (8.14) may holdwhile ξ and ξ′ are two independent random variables, which prevents (8.16) from beingsatisfied. Nonetheless, the following result holds, which will allow to proceed:

Theorem 8.12 Assume condition (8.14) holds for the two distribution functions F andF ′. Then, there exists a probability space Ω, a probability P over Ω, and two real valuedrandom variables ξ and ξ′ over Ω, such that:


1. ξ and ξ′ possess F and F ′ as respective distribution functions, and

2. condition (8.16) is satisfied by the pair (ξ, ξ′) with probability 1.

The proof of this result is immediate if (D,≤) is a total order. It is, however, highlynontrivial if ≤ is only a partial order. This theorem is indeed part of theorem 1 of [KKO77].5

Theorem 8.12 allows to reduce the stochastic comparison of random variables to theirordinary comparison as functions defined over the same set of experiments endowed with asame probability. This applies in particular to each random QoS function and each randominitial QoS function, when considered in isolation. Thus, when performing construction(8.15) for two OrchNets N and N ′, we can take the same pair (Ωt,Pt) to represent bothξt and ξ′t, and similarly for ξp and ξ′p. Applying (8.15) implies that both N and N ′ arerepresented using the same pair (Ω,P). This leads naturally to definition 8.11.

In addition, applying theorem 8.12 to each transition t and each minimal place p yieldsthat stochastic ordering N ≥s N ′ reduces to ordinary ordering N ≥ N ′. Observe thatthis trick does not apply to the overall QoS E(N ) and E(N ′) of the two OrchNets; thereason for this is that the space of experiments for these two random variables is alreadyfixed (it is Ω) and cannot further be played with as theorem 8.12 requires. Thus we canreformulate probabilistic monotonicity as follows — compare with definition 8.10:

Definition 8.13 (probabilistic monotonicity, 2) Probabilistic OrchNet (N ,P) is calledprobabilistically monotonic if, for any probabilistic OrchNet N ′ such that N ≥ N ′, we haveE(N ) ≥s E(N ′).

Note the careful use of ≥ and ≥s . The following two results establish a relation betweenprobabilistic monotonicity and monotonicity:

Theorem 8.14 If OrchNet N is monotonic, then, probabilistic OrchNet (N ,P) is proba-bilistically monotonic for any probability P over the set Ω.

This result was already obtained in the first probabilistic setting; it is here a direct conse-quence of the fact that ξ ≥ ξ′ implies ξ ≥s ξ′ if ξ and ξ′ are two random variables definedover the same probability space. The following converse result completes the landscapeand is much less straightforward. It assumes that it is legal to increase the QoS values atwill, see assumption 4.

Theorem 8.15 If probabilistic OrchNet (N ,P) is probabilistically monotonic, it is alsomonotonic with P-probability 1.

8.6 Probabilistic contracts and their composition

In this section we first study QoS contracts in a probabilistic setting, that is, contractswhen QoS parameters are considered random. We then study contract composition, i.e.,the process by which the orchestration can derive the contract it can offer to its clientsfrom the contracts agreed with the services it calls.

5Thanks are due to Bernard Delyon who pointed this reference to us.

8.6 Probabilistic contracts and their composition 169

8.6.1 Probabilistic Contracts

Since we decided to consider QoS parameters as random, we can specify them via theircumulative distribution function (or distribution function for short), defined in section 8.5.1.

Definition 8.16 (probabilistic contracts) A contract is a pair (FA, FQ) of cumulativedistribution functions, representing the assumed distribution for the random assumption αand the guaranteed distribution for the random QoS function ξ.

What “better” means: When dealing with performance related contracts such as QoS con-tracts, it is expected that it is valid for an actor to perform better than its agreed contract.At this point it is worth formalizing what it means for an actor to perform better underour probabilistic setting, i.e., when QoS parameters are considered random. To this end,we shall use the notion of stochastic (partial) ordering ≥s , induced by the partial order≤ defined on a domain D. We say that a called service performs better regarding its QoSparameter Q if its actual distribution function F ′ is stochastically smaller than its nominaldistribution function F , i.e., F ′ ≤s F . We are now ready to formalize what it means fora called service to satisfy its contract in our probabilistic setting.

Definition 8.17 (satisfaction) Pair (A, Q), consisting of an actual random parameterfor the assumptions, and an actual random value for the QoS satisfies contract (FA, FQ)

if A ≥As FA and Q ≤Q

s FQ both hold.

where ≤A and ≤Q denote the partial orders defined on the domains of A and Q, and ≥As

and ≤Qs are corresponding stochastic orders. The reason for using different directions for

the orders in assumptions and guarantees is that “better” translates as ≥As for assumptions

and ≤Qs for guarantees.

To ensure soundness of any contract based approach for QoS, (probabilistic) mono-tonicity must be assumed, formalizing that, if any of the called services performs better,then as a result the orchestration should also perform better:

Assumption 6 (monotonicity) The considered probabilistic OrchNet is monotonic, mean-ing that increasing the QoS guarantees for each transition t results in an increase in theend-to-end QoS of the OrchNet, and decreasing the assumptions on the OrchNet results ina decreasing of the assumptions for each transition t.

Even if conditions for monotonicity are not satisfied, by using theorem 8.6, it is alwayssound to base SLA or contracts on the loose end-to-end QoS defined in (8.10). This, ofcourse, will result in being more pessimistic when composing contracts, i.e., when inferring,from the contracts the orchestration has with the services it calls, the QoS it can offer toits clients.

8.6.2 Contract composition

We are now ready to investigate contract composition. Roughly speaking, contract compo-sition consists in inferring, from the contracts the orchestration has with its called services,the overall contract it can offer to its clients. Contract composition is performed on aprobabilistic OrchNet N . For the rest of the section, each transition t ∈ T of N will rep-resent a sub-contracted service, called by the orchestration. Each service t will be assigneda probabilistic contract (FA,t, FQ,t) i.e., a pair consisting of a probabilistic QoS guarantee


FQ,t, and a probabilistic QoS assumption FA,t that any client of this service must complywith.

If assumptions FA,t are ignored, probabilistic contract composition is straightforwardand proceeds as follows:

• Each query to the orchestration generates calls to (a subset of) the services of theorchestration. For each such call to as service t, a value of the guaranteed QoSparameters is drawn from FQ,t. The QoS values of the different services called arethen composed to give the end-to-end guaranteed QoS value of the orchestration forthat particular query. Computing this end-to-end guaranteed QoS value is achievedby attaching QoS parameters to the tokens traversing the orchestration as discussedin the previous section.

• Following Monte-Carlo simulation principles, the above step is done repeatedly, ran-domly drawing values for the QoS parameters of the services called. This resultsin randomly distributed values for the end-to-end QoS of the orchestration. If theabove step is repeated sufficiently many times, then a good empirical estimate of theprobability distribution of the end-to-end QoS of the orchestration is obtained.

Such a Monte-Carlo procedure was proposed in [RBHJ08], for the restricted case oflatency or response time. Observe that, in this case, end-to-end QoS contracts for theorchestration are derived from the individual contracts between the orchestration and thecalled sites.

Unfortunately, this simple composition procedure fails to apply to the general casewhere both guarantees and assumptions are jointly considered. The reason is that whereasdependencies for guarantees are outward directed (from called sites to orchestration), theyare inward directed (from orchestration to called sites) for assumptions. For example, fromknowing the delay between two successive queries to the orchestration, the delay betweentwo successive queries to each service called by the orchestration is inferred. The factthat dependencies are directed in opposite ways for assumptions and guarantees causes thefailure of the above simple composition procedure. To overcome this we propose a moreelaborated two-phase procedure.

Contract composition procedure handling both assumptions and guarantees.The procedure is as follows:

1. Initial conditions: They consist of F 0A, the assumed distribution for the orchestra-

tion, and F 0Q,t, the guaranteed distribution offered by each t ∈ T .

2. Simulation Phase: it consists of the following steps:

a. Draw successive random calls to the orchestration according to distribution F 0A; each

call to the orchestration generates zero, one, or several calls to each transition t ofthe orchestration;

b. For every such call to a transition t:

– Record the value αt of the assumption of the call to t;

– Draw random QoS increment ξt from distribution F 0Q,t;

– Compute the QoS parameters of tokens entering t• using competition policy and(8.6).

8.7 Probabilistic Contract Monitoring 171

c. For each call to the orchestration, record the resulting end-to-end QoS E(N ).

Performing step b for the successive calls to t yields an empirical estimate F 1A,t for the

actually occurring assumptions for t. Performing step c for the successive calls to t yieldsan empirical estimate F 1

Q of the actual end-to-end QoS of the orchestration.

3. Negotiation Phase: At this point, two cases may occur:1) For the good case, pair (F 1

A,t, F0Q,t) is a contract considered acceptable by every

transition t. The orchestration can then propose the contract (F 0A, F 1

Q) to its client andthe procedure terminates at this step.

2) For the bad case, pair (F 1A,t, F

0Q,t) is a contract considered not acceptable by t ∈ T ′,

for some T ′ ⊆ T . The guarantees may be too demanding considering the assumptions.In this case we will re-run the above iterative process with new inputs. We have twoalternative approaches to do this, depending on which inputs we choose to update:

• In the first approach, we keep F 0Q,t unchanged for every transition t. Then, we

update the assumed distribution F 0A for the orchestration to a distribution F 1

A suchthat F 1

A >As F 0

A, i.e., F 1A is more favorable than F 0

A for the orchestration. Whenrunning the simulation phase using F 1

A instead of F 0A, a new assumed distribution

F 2A,t results for t that is more favorable than F 1

A,t for transition t. Having sufficientlyweakened F 1

A should then yield an assumed distribution F 2A,t such that (F 2

A,t, F0Q,t) is

now considered acceptable by every transition t.

• In the second approach, we do not change the assumed distributions F 0A and F 1

A,t, butwe relax the guaranteed distribution F 0

Q,t for every t ∈ T ′ to F 1Q,t. The guaranteed

distributions are relaxed till (F 1A,t, F

1Q,t) is an acceptable contract for every t ∈ T ′.

The convergence of this procedure is proved in appendix D.1.

8.7 Probabilistic Contract Monitoring

Once contracts have been agreed, they must be monitored by the orchestration for possibleviolation. Contract monitoring is studied in detail in [RBHJ08] for the case of a single QoSparameter, namely latency. The same technique however, extends without change to ourcase. We nevertheless reproduce it here because QoS domains can be partially, not totally,ordered in our case. Monitoring applies to each contracted distribution F individually,where F is the distribution associated to some QoS parameter Q having partially ordereddomain D. By monitoring the considered service, the orchestration can get an estimate ofthe actual distribution of Q, we call it G. The monitoring problem for the orchestrationis to decide whether or not G complies with F , where compliance is defined according toformula

supx∈D

(F (x) − G(x)) > 0, (8.17)

However G(x) in (8.17) is not given to the orchestration; it can only be estimated bycollecting actual values for QoS parameter Q. To this end, we consider the following basicempirical estimate for G(x), namely:

G∆(x) =| q ∈ ∆ | q ≤ x|

|∆| (8.18)


where ∆ is a sample of values q for Q collected at run time by the orchestration and |A|is the cardinality of set A. Estimate G∆ converges towards G when the size of ∆ grows toinfinity. In practice, successive values for G∆ are updated on-line at run time by collectingin ∆ buffered values for Q in a buffer of size N large enough. If ∆τ is the content of thebuffer at time τ , we thus get an estimate G∆τ , which we denote by Gτ for simplicity. Then,the indicator in (8.17) is replaced by:

χτ = supx∈D

F (x) − Gτ (x) (8.19)

At a first sight, a violation should be declared at the first instant τ when χτ > 0 occurs.The problem is that estimate Gτ (x) can randomly fluctuate around G(x), especially forN not large enough. Hence, applying the brute force stopping rule χτ > 0 will inevitablyresult in many false alarms. A counter-measure consists in having a tolerance zone abovethe critical value 0. This yields the following stopping rule for declaring violation: χτ > λwhere λ > 0 is a design parameter of the procedure, defining the tolerance zone. Wedo not provide here the details of how monitoring is implemented, the reader is referredto [RBHJ08], section V for this.

8.8 Experiments

In our experiments we implement the contract composition procedure described in sec-tion 8.6. We use Orc to model the orchestration and to specify the QoS behaviour (con-tracts) of the services involved in the orchestration. Our tool for performing contractcomposition is built upon an interpreter of Orc in Java, developed by members of the Orcteam at the University of Texas at Austin [QKCM]. We perform our experiments on theCarOnLine example of Table 8.1.

Initial Conditions: As described in section 8.6, the contract composition procedure iscarried out in iterative steps. Each iteration requires two inputs. The first input is the theinitial assumption distribution F 0

A agreed between the orchestration and its clients. Wetake F 0

A to be the inter-query time distribution of the calls to orchestration by its clients.We use an exponential distribution to model this inter-query time and the rate parameteris taken to be 5 (i.e., 5 requests/sec).

The second input is the set of guaranteed distributions F 0Q,t offered by each of the

contracted services t - here t ranges over GarageA, GarageB, AllCredit, AllCreditPlus,GoldInsure, InsurePlus and InsureAll. Observe in Table 8.1 that most of the services af-fect more than one QoS parameter. We thus have a probability distribution for each ofthe QoS parameters that t affects, and we then take t’s guaranteed distribution F 0

Q,t tobe the product of these distributions. Doing this, we are assuming independence of thedifferent QoS parameters that t affects. Also observe that all the services of CarOnLineaffect the latency parameter d. The guaranteed distribution for d, for each of the servicesof CarOnLine were derived from measurements made by calling six freely available servicesover the web. These services were published on the online repository XMethods [XMe].20,000 calls were made to each of these services and their response times were measured.The cumulative distribution function of the response times measured are shown in fig-ure 8.4. The guaranteed distribution for the other QoS parameters affected by the servicesof CarOnLine is given in Table 8.2.

8.9 Conclusion 173

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 2000 4000 6000 8000 10000 12000 14000

Cum

ulat

ive

Dis

trib

utio

n F

unct

ion

delay (msec)



InsureAllInsurePlusCarOnLine

GarageBNegoCarOnLineNego

Figure 8.4 – Cumulative distribution functions of latency, for the services of CarOnLine, and end-to-end latency.The result of the re-negotiation is shown in GarageBNego and CarOnLineNego.

Simulation Phase: With the above initial conditions, we ran the simulation step with100,000 calls to the orchestration. At each call, the inter-query time for each service wasobserved and the the end-to-end values of the QoS parameters d, l, c, p were observed. Theresulting distribution for the assumptions F 1

A,t - which is in fact the inter-query arrivaltime - for each of the services is given in Table 8.3. The orchestration’s end-to-end QoS forthe response time is shown by the CarOnLine curve in figure 8.4. The end-to-end valuesof the other QoS parameters are given in the ’O’ columns in Table 8.2.

Negotiation Phase: Now suppose that for GarageB, the inter-query distribution F 1A,t

is too strong for the guarantees F 0Q,t it offers. The contract with GarageB has to be re-

negotiated. The first approach to re-negotiation is to relax the assumptions F 0A which

will reduce the assumptions F 1A,t at GarageB. If we adopt the second approach to re-

negotiation, we keep the assumptions F 0A and F 1

A,t unchanged, but we sufficiently relaxthe guarantees of GarageB. The new guarantee for the response times of GarageB is givenby the GarageBNego curve in figure 8.4. The guarantees for the other QoS parameters arekept unchanged. The simulation step is now run again with this updated contract ofGarageB. The corresponding guaranteed distribution for the orchestration is given by theCarOnLineNego curve in figure 8.4.

8.9 Conclusion

We have presented in this paper the foundations of a theory of QoS for Web service or-chestrations. We adopt an approach based on contracts between the orchestrator and itssub-contracted services and its clients, having assumptions and guarantees on the QoSinvolved. For our approach to be sound, the orchestration needs to be monotonic with re-spect to the QoS parameters. We have proposed the use of probabilistic contracts, and haveshowed how they can be composed. We are currently improving our Java implementationto build a stable experimentation platform to support our studies.


l GA GB O0 0.25 0.2 0.211 0.25 0.25 0.252 0.25 0.25 0.253 0.15 0.25 0.234 0.1 0.05 0.06

p GA GB O10,000 0.15 0.18 0.1712,000 0.15 0.14 0.1415,000 0.2 0.23 0.2320,000 0.25 0.15 0.1625,000 0.25 0.3 0.3

i GI IP IA O100 0.2 0.22 0.2 0.21300 0.2 0.23 0.25 0.22350 0.3 0.2 0.15 0.23500 0.15 0.15 0.25 0.16800 0.15 0.2 0.15 0.18

c AC ACP O1 0.25 0.2 0.22 0.2 0.2 0.23 0.15 0.25 0.244 0.2 0.15 0.165 0.2 0.2 0.2

Table 8.2 – Probability distribution functions for the QoS parameters l, p, i and c of Table 8.1. The first columnof each table gives the QoS parameter and its values. The other columns gives a service of CarOnLine, andthe probability of getting a QoS value for that service. GA, GB, AC, ACP, GI, IP and IA are abbreviationsfor the services GarageA, GarageB, AllCredit, AllCreditPlus, GoldInsure, InsurePlus and InsureAll respectively.Columns labelled O represent the corresponding QoS value for the end-to-end orchestration.

Site Name GarageA GarageB AllCredit AllCreditPlusThroughput 5.028 5.028 5.028 5.028

Site Name GoldInsure InsureAll InsurePlusThroughput 1.679 3.342 3.342

Table 8.3 – Average throughput for each of the contracted sites.

Chapter 9

The Torque Tool.

In this chapter we give the implementation details of the Torque tool. The Torque toolimplements the QoS theory developed in this thesis. Given a model of the orchestration— which includes its QoS behaviour — the tool performs contract composition to estimatethe orchestration’s QoS. The tool also has a basic monitoring unit, which implements ourstatistical monitoring algorithm.

9.1 Torque Architecture

Orc Program + QoS Contracts

Compiler

SLA Design

QoS StamperQoS Computing UnitPartial Order TraceComputing Unit

Orc Engine

DAGs Site QoSdata structs

Figure 9.1 – Architecture of the Torque tool

The architecture of the Torque tool is shown in figure 9.1. Torque is built over aninterpreter for Orc in Java, developed by the Orc team at the University of Texas atAustin. The inputs to the tool are the orchestration modelled as an Orc program, and theQoS behaviour of the sites of the orchestration, which we model as QoS contracts. Webriefly describe the units of the tool:

176 The Torque Tool.

1. Compiler: This unit consists of the Orc compiler and the QoS contract compiler. TheOrc compiler translates the input Orc program into a collection of Directed Acyclic Graphs(DAGs) which are then interpreted by the Orc engine. The QoS contract compiles the QoScontract of each site into an internal data structure representing that site’s QoS behaviour.

2. Orc Engine: This is the main unit of the Orc interpreter. The Orc engine executesthe Orc program by moving tokens along the DAGs. The nodes in the DAG may be ofdifferent types, and a token at a given node is processed by the Orc engine according to thenode’s type.

3. Partial Order Trace Computing Unit: This unit computes the partial order of the eventsin the execution of an Orc program. The execution events arise from the processing of thetokens by the Orc engine. This unit runs together with the Orc engine, adding the eventsof the execution to the partial order as an when they occur.

4. QoS Computing Unit: This unit computes the QoS of the tokens according to our rulesthat describe how the QoS parameters evolve in an execution. This computation makesuse of the partial orders of the execution events, computed previously.

5. SLA Design: This involves the assimilation and interpretation of the data from theoutput of the simulation. As a result, we get an estimate of the QoS contract that theorchestration can offer to its clients. The SLA design is done in the MATLAB environment.

In the rest of this chapter we will look at the principal units in greater detail. Westart by describing the Orc interpreter developed by the Orc team in section 9.2. Fromsection 9.3 onwards, we will look at the units added by us for QoS management.

9.2 The Orc interpreter

In this section we describe the functioning of the Orc interpreter. We will look at the Orccompiler and the Orc engine. We do not describe this in complete detail, the reference [CM]has a detailed description of the implementation of the Orc interpreter.

Compilation Phase: The Orc compiler converts each Orc program into a set of DAGs.An Orc program is essentially a collection of expression definitions and one goal expression.The compiler generates one DAG for each expression definition and for the goal expression.

We will now outline how the DAG of an Orc expression is built. An expression’s DAGis built recursively by combining the DAGs of its sub-expressions. Figure 9.2 shows theDAGs corresponding to the Orc expressions.

Each DAG has a unique input node and a unique output node. These nodes are usedto combine the DAGs of the sub-expressions: in the DAGs of figure 9.2, any node labelledby an Orc expression (f or g) is replaced by the DAG of that expression. For e.g., forthe node labelled f in the DAG of f | g, all the incoming edges of the node are directedtowards the input node of the DAG of f , and the outgoing edges are added to the outputnode of the DAG of f .

Execution Phase:The Orc expression is executed my moving tokens around the DAGs. Similar to tokens

in a colored Petri net, these tokens have different attributes, like the data value carriedby the token, a pointer to the environment which has the binding to the variables and inparticular, a pointer to the node at which the token resides.

At each step of the execution, a token is picked and processed. A token’s processingmay perform actions like calling a site or registering a return value, may modify the token’s

9.3 Partial Order Computation 177

f | g

τ

g

τ

f

M

τ

M

f<x< g

pull(x)

f g

f>x> g

f

g

τ

chokeremove(x)

f ; g

semi

f g

τ

leave

assign(x)

remove(x)

Figure 9.2 – DAGs for the different Orc expressions.

attributes, and may even delete tokens. The precise steps involved in processing a token isdefined by the node that it points to. For e.g., the τ node in the input DAG of f | g createstwo copies of the token being processed. One of these copies points to the left successorand the other to the right successor of the τ node. The assign(x) node binds variable x tothe value carried by the token being processed, thus updating its environment. The chokenode implements the termination mechanism in f <x< g: the value of the first token thatarrives at this node is bound to the variable x and all the tokens corresponding to theongoing computation of g are deleted.

9.3 Partial Order Computation

This section describes how the Torque tool computes the partial order of the events in anOrc execution. This partially ordered execution is then used in the “QoS Computing unit”to compute the QoS of the overall orchestration.

What are the execution events? The events in the execution of an Orc program are definedby the actions done by the Orc engine while processing the tokens. A tracer interface in theengine defines and records the events that occur during an execution. Some of these eventsare: 1) The send event, which corresponds to a site call. 2) The receive event, markingthe return of a site call. 3) The fork event, when a token is copied, for e.g., at the start ofa parallel computation. 4) The store event, when the expression g in f <x< g publishes avalue, and g is terminated. 5) The publish event, when the Orc program publishes a value.6) The die event, the deletion of a token, which usually occurs when the computation ofthe branch corresponding to that token ends.

How are the partial orders of the events computed? The partial order of the events inthe execution of are computed online as the execution unfolds. To compute the partialorder, the token now carries an additional pointer to the latest execution event (say e),that created it. When a new event (say e′) occurs when processing that token, e is added


as an immediate predecessor of e′, and the event pointer of the token is updated to e′. Animmediate predecessor of an event e′ is a causally maximal event that causes e′ to occur.

An event can have multiple immediate predecessors. This occurs in site calls whoseparameter values have been defined by events in parallel computations. Consider the callto M(x) in the expression (S1 ≫ M(x)) <x< S2. Clearly the call to M(x) depends on thereturn of the call to S1 and can only occur after it. But the call also needs the parameterx to be defined, and this happens when the concurrent call to S2 returns a value. The callto M is thus immediately preceded by the returns from the calls to S1 and S2. To tracksuch dependencies, the structures in the engine storing the actual value of parameters hasbeen enhanced to store the execution event which generated that value. Thus whenever anew event e occurs in the engine, the partial order is updated to include e, with e pointingto all its immediate predecessors.

τ

S1 S2

chokeM(x)

fork

S1 S2

store

M(x)

τ

S1 S2

chokeM(x)

τ

S1 S2

chokeM(x)

τ

S1 S2

chokeM(x)

τ

S1 S2

chokeM(x)

fork fork

S1

S1 S1

forkfork

store

S1 S2

x

τ

S1 S2

chokeM(x)

Figure 9.3 – Building the partial order.

Figure 9.3 shows a series of steps in the execution of the expression (S1 ≫ M(x)) <x< S2.Each step is shown in a box, the left half of which shows the DAG of the expression and thelocation of tokens in the DAG. The right half shows the events that have occurred in theexecution and their causal relationships. The dashed arc from a token shows the executionevent that the token points to. At the start there are no execution events and the pointeris null, but in the second step when the fork event has occurred both the successor tokenspoint to this event. In the fifth (last but one) step, the call to S2 has returned a value,which causes the store event to occur. A pointer to this event is kept in the structurestoring the value of x, shown by the elliptical box. This information is used to build thecomplete partial order in the last step.

Figure 9.4 shows a snapshot of a partial ordered execution as computed by the partialorder computing unit. The rectangles are the events of the execution and the arrowsrepresent the causal links between them. The site calls are shown in yellow and the publishevents are colored black. The execution fragment of figure 9.4 is an if-else evaluation ofOrc, where the two branches of a fork event evaluate the condition and its negation.

Since the partial orders are computed online as the execution unfolds, the events in thepartial order are the actually occurring events of the execution. In particular, events whichare in conflict with these actually occurring events do not appear in this partial order.

9.3 Partial Order Computation 179

These conflicting events may appear in another run of the execution, for which the partialorder is computed afresh.

The online computation of the partial order also enables us to cleanly handle recursiveexpressions. The partial order of the events in a recursive expression is theoretically infinite,which is unsuitable for practical implementations. However, in any (terminating) execution,only a finite depth of recursive calls are made. By computing the partial order of eventsonline, only the events that actually occur are considered.

Relationship with the Event Structures

How are the partial orders that are computed here related to the partially ordered eventsin the event structure semantics of Chapter 4? A couple of points may be made:

1. Events that appear in the partial order: The event structure is a representation of allthe possible events in the execution of an Orc program. An event, in particular, mayhave other conflicting and pre-empting events in the event structure. The partialorder of events we are built in the interpreter however corresponds to one executionof the Orc program. There are no conflicting events here. This can be thought of asbeing one configuration of the event structure.

2. Correspondence between events: The events occurring in the Orc interpreter do nothave a one-one correspondence between the events in the Orc semantics. Oftenmultiple execution events in the interpreter correspond to one single event in the Orcsemantics. This is because one event in the Orc semantics may require a sequenceof actions to be performed by the Orc interpreter. For e.g., , the termination ofthe right side of a pruning combinator is represented by a single τ event in the Orcsemantics. In the interpreter, this translates into performing many steps: setting thevariable of the pruning operator to the value carried by the token, stopping all thecomputations of the right sub-expression and deleting all the corresponding tokens.

3. Recursion: This is related to the first point and has been mentioned before. The eventstructure of recursive expressions as defined in Chapter 4 is an infinite structure. Thepartial order of the events in the interpreter are however built online, and so for anyterminating execution only a finite number of events are built.


Figure 9.4 – A snapshot of a partial ordered execution.

9.4 QoS Computation 181

9.4 QoS Computation

We will first look at how the QoS of the services in the orchestration are modelled, and thensee how this information is used in the QoS computing unit to compute the orchestration’send-to-end QoS.

QoS Contracts: The second input to the Torque tool is the QoS behaviour of the servicesinvolved in the orchestration. We model the QoS behaviour of a service as a probabilis-tic QoS contract. Accordingly, each service of the orchestration has an associated QoScontract, which specifies its QoS behaviour. In practice, this is implemented by having acontract file for each service in the orchestration.

The QoS of the service in a contract file may be specified in different ways: 1) It can bea standard probabilistic distribution like an exponential, gamma or a normal distribution.For each case the relevant parameters of the distribution are specified. 2) The QoS canalso be specified as a discrete approximation of the continuous distribution, by giving a setof quantiles corresponding to that distribution. 3) The service’s QoS can also be directlymodelled as a collection of previously measured QoS values.

QoS Stamper - Sampling QoS values: The QoS compiler compiles the QoS contractfiles into internal data structures, stored in the QoS stamper unit. At runtime, whenevera site is called, the QoS stamper unit is called to generate a QoS value for the site. Gen-erating the QoS value is done according to its QoS contract. When the QoS contract isa probabilistic distribution, this distribution is sampled to generate a QoS value. If theQoS is specified as a discrete distribution with a set of quantiles, we assume that the QoSvalues are uniformly distributed between these quantile values. In the case when the QoSis modelled as a collection of measured values, we use a bootstrap technique to generate theQoS values: each of the measured value is assumed to have an equal probability of beingpicked.

Computing the QoS in an execution: The QoS Computation unit computes the end-to-end QoS of the orchestration in each execution. This unit runs together with the Orcengine and the partial order computing unit, using the partial orders and the informationfrom the QoS contracts to compute the orchestration’s QoS.

To represent the QoS values, the tokens of the engine are enhanced with a set of QoSattributes. This is similar to the way we represent QoS values in colored nets. In fact, apartial ordered execution of the engine, where tokens are enhanced with QoS attributes,can be seen as a maximal configuration of the Orchnet in Chapter 8.

The computation of QoS is done in a similar fashion as described for the case of Orchnetsin Chapter 8. Each time the processing of a token creates an execution event, the QoS ofthat token is updated. We will refer to this updated QoS value as the execution event’sQoS.

e1 e2 en

e

q2q1 qn

qe = q0 + δe

Figure 9.5 – QoS Computation of an event.


Now consider in figure 9.5, the occurrence of a new execution event e. The partial ordercomputing unit calculates its immediate predecessor events e1, e2, . . . en. Let the QoS ofthese events be q1, q2, . . . qn respectively. Let the ∨ operator denote the least upper boundof different values of the QoS parameter, and the ⊕ operator denote the increment operatorof the QoS parameter. The QoS of the event e, qe is then computed as qe = q0 ⊕ δe, whereq0 = q1∨q2∨ . . . qn is the synchronization QoS and δe is the QoS increment due to the evente. This corresponds exactly to the QoS computation in the case of Orchnets, as describedin Chapter 8.

If event e is a site call, the value of the increment δe is got from the QoS stamper, asdescribed previously. All the other events are internal to the orchestration and so theirQoS increments are assumed to be zero.

SLA Design: The QoS of all the events in the execution are thus progressively com-puted as and when they occur. At the end of an execution, this computation gives theorchestration’s end-to-end QoS. After performing many such runs we get a collection ofestimates of the orchestration’s end-to-end QoS.

In the SLA Design phase, we use the collection of QoS estimates got at the end of thesimulations to derive a probabilistic contract that the orchestration can offer to its ownclient. The collection of QoS values gives an empirical distribution of the QoS parameter.We also compute quantiles of this distribution and/or try to fit standard probabilisticdistributions onto the empirical distribution. The SLA Design is done under MATLAB,which provides many libraries for doing this.

9.5 Conclusion

The Torque tool serves as an implementation of our QoS management framework. UsingOrc as the modelling language of the orchestration, we enhance it to support QoS de-scription. This is used to perform simulations to derive an estimate of the orchestration’sQoS.

We were however not able to implement all the features that we have proposed intheory. In particular, the competition operator proposed in Chapter 8 is not yet supportedby the tool. Supporting for our entire QoS theory along with related experimentation isthe object of our current and further study.

Appendix A

Proofs of Chapter 4

A.1 Proof of Theorem 4.4

We show that (G [E] ,¹,ր, α) satisfies all the conditions for an LAES given in section4.2.1. Let us first show that ¹ is a partial order on G [E]. By (4.9) and Condition 4of Definition 4.2, ¹ is a preorder on G [E]. It thus remains to show that there exists nonon trivial circuit e1 ¹ e2 ¹ . . . ¹ en ¹ e1. Let κ be a configuration containing e1. ByCondition 3 of Definition 4.2, the circuit e1 ¹ e2 ¹ . . . ¹ en ¹ e1 must be contained inκ. But, since ≺⊆ր, we have e1 ր e2 ր . . . ր en ր e1, which contradicts Condition 1of Definition 4.2. This shows that ¹ is a partial order on G [E]. Also ⌊e⌋ =def e′ | e′ ¹e is finite, since an infinite ≺ sequence of events would be an infinite ր sequence ofevents, again contradicting Condition 1 of Definition 4.2. The same reasoning shows thatր⌊e⌋ =def (e1, e2) | e1, e2 ∈ ⌊e⌋, e1 ր e2 is acyclic. This proves the first statement ofthe theorem.

To prove the second statement, let F ⊆ E be such that (F,¹,ր, α) is an LAES. Denoteby κF a generic configuration of this LAES. By the definition of configurations, for LAES,any such κF must satisfy Conditions 1–3 of Definition 4.2. In addition, by (4.9)–(4.10),κF must be such that, for each event e belonging to κF , if f• ∩ •e 6= ∅ then f ∈ F . SinceF ⊆ E, this implies that κF also satisfies Condition 4 of Definition 4.2. Hence κF satisfiesall conditions of Definition 4.2 for heap configurations. This proves the theorem.

A.2 Characteristic property of the Stop operator

The following result shows that stop is a preemption operator.

Lemma A.1 Let E be a heap such that (⊥, “stop”) /∈ CE and let F ⊆ E. Let bijectionϕ−1 be the inverse map of ϕ introduced in (4.13) for the definition of the stopF (E), i.e.,for all e ∈ E,

ϕ−1(e) =

(•e − (⊥, “stop”), e, α(e)) if e ∈ F

(•e, e − (⊥, “stop”), α(e)) if e /∈ F(A.1)

If κ is a configuration of stopF (E), then the following properties hold:

184 Proofs of Chapter 4

1. ϕ−1(κ) is a configuration of E.

2. ϕ−1(κ) ∩ F contains at most one event; if ϕ−1(e) is such an event, then ∀f ∈ κ ⇒¬[e ≺ f ].

Proof. The first statement is immediate, since ϕ−1 removes read and consume conditionsfrom the preset of each event.

To prove the second statement, assume that ϕ−1(κ) ∩ F contains two events e and e′.Since e, e′ ∈ F , the events ϕ(e) and ϕ(e′) have condition (⊥, “stop”) in their consumepreconditions set. From (4.10), we have that ϕ(e) ր ϕ(e′) and ϕ(e′) ր ϕ(e), which implythat they cannot both occur in the same configuration κ. Now let e ∈ ϕ−1(κ) ∩ F for aconfiguration κ. Following our previous argument, ϕ(e) has the condition (⊥, “stop”) in itsconsume preconditions set. By definition of stop, all events f in stopF (E), and hence inκ have (⊥, “stop”) in their preconditions set •f . From (4.10) it follows that f ր e whichimplies ¬[e ≺ f ] since e, f are in the same configuration κ.

A.3 Proof of Lemma 4.6

These conditions are established by examining the corresponding constructions on heaps.The parallel expression f | h is monotonic because ⊎ is monotonic. ⊎ is monotonic be-cause marking El is monotonic and ∪ is monotonic. Marking is monotonic because it isa pointwise function over minConds, a monotonic selection of a subset of its argumentevents.

Sequential composition f >x> h is monotonic because pipex(E, F, ∅) is monotonic inboth E and F . pipex(E, F, ∅) is monotonic if send(F ) and recvx(E, F, ∅) are monotonic.send, like marking, is a pointwise function over a monotonic selection !E of events from E.Receive recvx(E, F, ∅) is trivially monotonic in E, because it is a union over the monotonicsubset !E, and is monotonic in F if link(e, v, x, F, ∅) is monotonic in F . Linking dependson monotonicity of copyl(F, ∅), a simple pointwise function on events. Linking also appliesa pointwise function based on min, a monotonic subset of a heap.

Monotonicity of asymmetric composition f where x :∈ g is more complicated. Itdepends on monotonicity of G(x) and pipex(stop(F ), G(x), G(x)). The free variable con-structs, G(x) and G(x) are pointwise selectors of events, so they are monotonic. stop(E)is also a pointwise function affecting !E, a monotonically increasing subset of E. Finally,there is the question of the monotonicity of pipex(F, G(x), G(x)). As mentioned above,pipex is monotonic in its first argument, in this case F . Monotonicity of pipex for G de-pends on monotonicity of link(e, v, x, G(x), G(x)), which in turn depends on monotonicityof copyl(G(x), G(x)). Note that copyl(E, F ) is not monotonic in its second argument:although ∅ ≺ F , it is easy to see that copyl(E, ∅) 6≺ copyl(E, F ) in general. However, weonly need monotonicity of the special case where the arguments to copy are the partitionG(x), G(x) of G. Assume G ≺ G′ and set H = G′ − G. We have

copyl(G′(x), G′(x)) = copyl(G(x) ∪ H(x), G′(x))

= copyl(G(x), G′(x)) ∪ copyl(H(x), G′(x))= copyl(G(x), G(x) ∪ H(x)) ∪ copyl(H(x), G′(x))

(A.2)

By definition of the copy, copyl(G(x), G(x) ∪ H(x)) is obtained by changing all minimalconditions c = (e, µ) ∈ minConds(G(x)) as specified in (4.14). By the second condi-tion of (4.20), we have minConds(G) ∩ CH = ∅. Thus copyl(G(x), G(x) ∪ H(x)) =copyl(G(x), G(x)), and thus (A.2) implies that copyl(G(x), G(x)) ≺copyl(G

′(x), G′(x)). This finishes the proof of Lemma 4.6.

A.4 Proof of Theorem 4.7 185


We prove the theorem recursively over the structure of f

• Base expressions:

If f ∈ 0, let(v), ?k, M(v), the theorem is evident. This is because all the event of[f ] taken together, for a maximal configuration of [f ] and so they belong to [[f ]]. Since[[f ]] ⊆ [f ], we get [f ] = [[f ]].

• f | g

Since [f | g] = [f ]left ∪ [g]right, the events of [f ]left and [g]right are disjoint and thusconcurrent. A union of any two configurations of [f ]left and [f ]right. gives a configurationof [f | g]. Assuming that the theorem holds for f and g, i.e. [f ] = [[f ]] and [g] = [[g]], wethen have [f | g] = [[f | g]].

• f >x> g

[f >x> g] = send([f ])∪recvx([f ] , [g] , ∅). Assuming the [f ] = [[f ]] we have that send([f ]) ⊆[[f >x> g]] since the send operator only renames the events of [f ] and so all the events insend([f ]) belong to some configuration of it.

The recvx([f ] , [g] , ∅) operation creates a set of copies of [g]. The minimal events ofeach such copy depend on a maximal event of send([f ]). Since all the maximal events ofsend([f ]) belong to some configuration κ of [f >x> g], assuming [g] = [[g]] implies that forevery event in the copy of g there is some configuration κ′ of [f >x> g] such that κ ⊆ κ′.Thus recvx([f ] , [g] , ∅) ⊆ [[f >x> g]] too, and we get the desired result.

• f where x :∈ g

[f where x :∈ g] = send(stop([g])) ∪ recvx(stop([g]), [fx] , [fx]) ∪ [fx], where [fx] and[fx] represent the G(x) and G(x) of equation (4.19) respectively. Assume that [g] = [[g]].Note that the new dependencies introduced by the stop are of a particular nature: everymaximal event of stop([g]) preempts every other event of stop([g]). As a result for anyevent e ∈ stop([g]), the set e′|e′ ¹ e is a configuration of stop([g]) and so the effectiveheap G [stop([g])] = stop([g]). Since send only renames events (and does not changetheir dependencies), we have that send(stop([g])) ⊆ [[f where x :∈ g]].

Following a similar argument as that of the previous case of f >x> g, we have [fx] ⊆[[f where x :∈ g]] and recvx(stop([g]), [fx] , [fx]) ⊆ [[f where x :∈ g]], from which we getthat [f where x :∈ g] = [[f where x :∈ g]]


We start by proving part 1 of Theorem 4.8.

Proof of part 1 of Theorem 4.8.

If fa−→ f ′ then there exists a minimal non-free event e of [f ] such that α(e) = a and

[f ] \ e = [f ′].


We prove the theorem recursively on the structure of f . In the following, each bulletcorresponds to a different structure of f .

• 0

The theorem trivially holds, since no transition can be performed by 0.

• let(v)

There are two cases here depending on whether v is a variable value x or a constant value.In the former case no transition rule exists for let(x) and so the theorem trivially holds.If v is a constant value, from the SOS rules of Figure 4.1, the only possible transition is

let(v)!v−→ 0. The heap [ let(v)] has only one minimal, non-free event e with α(e) =!v.

Moreover [ let(v)] \ e = [0] which proves the theorem for this case.

• ?k

The only possible transition here is ?kk?vk−−−→ let(vk). The heap [?k] = e = (c1, ∅, k?vk), (c2, ∅, !vk) ,

where condition c1 = (⊥,nil) and c2 = (e,nil). e is the only minimal, non-free event in [?k],and α(e) = k?vk, the label of the transition. Also [?k] \ e = (c2, ∅, !vk) = [ let(vk)]which proves the theorem for this case.

• M(v)

This case is similar to the previous ones. If v is a variable value x no transition rule

applies and the theorem trivially holds. If v is a constant value, M(v)Mk(v)−−−−→?k is the only

possible transition. The minimal, non-free event e of [M(v)] is such that α(e) = Mk(v)and [M(v)] \ e = [?k].

• f | g

According to Figure 4.1, there are two possible rules — (Sym1) and (Sym2) — for f | g.Since the rules are symmetric, it suffices to prove the theorem for one of them. Supposethat f | g

a−→ f ′ | g. Then (Sym1) implies that fa−→ f ′. Assuming that the theorem holds

for f , there exists a minimal, non-free event e ∈ [f ] such that α(e) = a and [f ] \ e = [f ′].

Now [f | g] = [f ]left∪ [g]right. Consider the event e′ = eleft, got by appending the markleft to the conditions in the preset •e. e′ ∈ [f ]left, is called the event corresponding to ein [f ]left. Clearly e′ is minimal and non-free in [f ]left and so it is minimal and non-free in[f | g] too. Finally, we have

[f | g] \ e′ = ([f ]left ∪ [g]right) \ e′= ([f ]left \ e′) ∪ ([g]right \ e′)= ([f ]left \ e′) ∪ [g]right (events of [g]right and [f ]left are disjoint)

= [f ′]left ∪ [g]right (Since [f ] \ e = [f ′])= [f ′ | g] .

• f >x> g

There are two transitions which apply in this case: (Seq1N) and (Seq1V).


Case 1: (Seq1N)Here (f >x> g)

a−→ (f ′>x> g), where f

a−→ f ′ and a 6=!v. Assuming the theorem holds onf , there is a minimal, non-free event e ∈ [f ] such that α(e) = a and [f ] \ e = [f ′]. Now

[f >x> g] = pipex([f ] , [g] , ∅) = send([f ]) ∪ recvx([f ] , [g] , ∅).

The event e′ ∈ send([f ]) corresponding to the event e ∈ [f ] is minimal and non-free insend([f ]) and so in [f >x> g]. This is because the send operator only changes the labelsof the events and does not change their dependencies. Moreover since a 6=!v, the send

operator does not change the label of e and thus α(e′) = α(e) = a. It remains to showthat [f >x> g] \ e′ = [f ′

>x> g]. For this, we note two facts:

1. send([f ′]) = send([f ] \ e′) = send([f ]) \ e′. This is again because the send

operator does not change dependencies of the events in [f ].

2. recvx([f ] , [g] , ∅) \ e′ = recvx([f ′] , [g] , ∅). To see this, consider

recvx([f ] , [g] , ∅) \ e′ =

⋃

e∗∈ ![f ],α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅)

\ e′

=⋃

e∗∈ ![f ],α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅) \ e′ (A.3)

link(τ(e∗), v, x, [g] , ∅) is a copy of [g], all of whose events are preceded by τ(e∗). Sincee∗ ∈ ![f ], τ(e∗) is an event in send([f ]) whose label is changed to τ . If τ(e∗) ր e′,then link(τ(e∗), v, x, [g] , ∅) \ e′ = ∅ since all the events of this heap are preceded byτ(e∗). However, if ¬(τ(e∗) ր e′), then link(τ(e∗), v, x, [g] , ∅) \ e′ = link(τ(e∗), v, x, [g] , ∅),since none of the events in this copy are pre-empted by e′. So we only need to con-sider τ(e∗) such that ¬(τ(e∗) ր e′), i.e. to consider e∗ ∈ ![f ] such that ¬(e∗ ր e).The set of such e∗ is exactly the set of publication events of [f ′] since [f ] \ e = [f ′].So,

recvx([f ] , [g] , ∅) \ e′ =⋃

e∗∈ ![f ],α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅) \ e′

=⋃

e∗∈ ![f ′],α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅)

= recvx([f ′

], [g] , ∅)

From the above two observations, we have

[f >x> g] \ e′ = pipex([f ] , [g] , ∅) \ e′= (send([f ]) \ e′) ∪ (recvx([f ] , [g] , ∅) \ e′)= send([f ′]) ∪ (recvx([f ] , [g] , ∅) \ e′) (First observation)= send([f ′]) ∪ recvx([f ′] , [g] , ∅) (Second observation)= [f ′

>x> g]

Case 2: (Seq1V)

Here (f >x> g)τ−→ (f ′

>x> g) | [v/x]g, where f!v−→ f ′. Again assuming the theorem holds

on f we have a minimal non-free event e ∈ [f ] such that α(e) =!v and [f ] \ e = [f ′].


Following a similar argument as (Seq1N), the event e′ ∈ send([f ]) corresponding toe ∈ [f ] is minimal and non-free in [f >x> g]. Since α(e) =!v, the send renames this actionto τ in e′. Hence α(e′) = τ , the action of the transition (f >x> g)

τ−→ (f ′>x> g) | [v/x]g.

Now consider

recvx([f ] , [g] , ∅) \ e′ =⋃

e∗∈ ![f ],α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅) \ e′

= link(τ(e), v, x, [g] , ∅) \ e′ ∪ (Since e ∈! [f ])⋃

e∗∈( ![f ]−e),α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅) \ e′

= [[v/x]g] ∪⋃

e∗∈( ![f ]−e),α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅) \ e′ (Since τ(e) = e′)

= [[v/x]g] ∪⋃

e∗∈ ![f ′],α(e∗)= !v

link(τ(e∗), v, x, [g] , ∅) \ e′ (2nd observation

of (Seq1N))

= [[v/x]g] ∪ recvx([f ′] , [g] , ∅)

So we have

[f >x> g] \ e′ = pipex([f ] , [g] , ∅) \ e′= (send([f ]) \ e′) ∪ (recvx([f ] , [g] , ∅) \ e′)= send([f ′]) ∪ (recvx([f ] , [g] , ∅) \ e′) (First observation of (Seq1N))= send([f ′]) ∪ [[v/x]g] ∪ recvx([f ′] , [g] , ∅)= [f ′

>x> g] ∪ [[v/x]g]= [(f ′

>x> g) | [v/x]g]


f where x :∈ g can transition according to three rules: (Asym1V), (Asym1N) and(Asym2).

Case 1: (Asym1N)Here (f where x :∈ g)

a−→ (f ′ where x :∈ g), from fa−→ f ′. There exists thus a minimal

non-free event e ∈ [f ] such that α(e) = a and [f ] \ e = f ′.For convenience, we denote the events in [f ] that depend on x as [fx], and those that

are not dependent on x as [fx]. Since e is non-free , e ∈ [fx]. Now

[f ] \ e =[f ′

]

[fx] \ e ∪ [fx] \ e =[f ′

x

]∪

[f ′

x

]

Equating the events that depend on x and those that do not, we have [fx] \ e = [f ′x] and

[fx] \ e = [f ′x].

Now [f where x :∈ g] = pipex(stop(G), F (x), F (x)) ∪ F (x), where F = [f ]right andG = [g]left . The labels left and right are used to ensure that the events of [f ] and [g] aredisjoint. To simply the presentation, we assume that [f ] and [g] are disjoint. So we will


write [f where x :∈ g] = pipex(stop([g]), [fx] , [fx])∪ [fx]. Since e is minimal and non-freein [fx], it is minimal and non-free in [f where x :∈ g] also, with α(e) = a. Also

[f where x :∈ g] \ e = (pipex(stop([g]), [fx] , [fx]) \ e) ∪ ([fx] \ e)= (pipex(stop([g]), [fx] , [fx]) \ e) ∪

[f ′

x

]

= send(stop([g])) \ e ∪

⋃

e∗∈ !stop([g]),α(e∗)= !v

link(τ(e∗), v, x, [fx] , [fx]) \ e

∪[f ′

x

]

= send(stop([g])) ∪

⋃

e∗∈ !stop([g]),α(e∗)= !v

link(τ(e∗), v, x, [fx] , [fx]) \ e

∪[f ′

x

]

send(stop([g])) \ e = send(stop([g])) since the events of [g] are disjoint from those of[f ], to which e belongs. Now consider link(τ(e∗), v, x, [fx] , [fx])\e. Since link(τ(e∗), v, x, [fx] , [fx])is a heap which is a copy of [fx], and since the minimal event τ(e∗) is independent of e,the future of the link copy after e is the same as link copy of the future [fx] \ e = [f ′

x].We thus have

[f where x :∈ g] \ e = send(stop([g])) ∪

⋃

e∗∈ !stop([g]),α(e∗)= !v

link(τ(e∗), v, x,[f ′

x

],[f ′

x

])

∪[f ′

x

]

= pipex(stop([g]),[f ′

x

],[f ′

x

]) ∪

[f ′

x

]

=[f ′ where x :∈ g

]

Case 2: (Asym1N)Here (f where x :∈ g)

a−→ (f where x :∈ g′), from ga−→ g′, a 6=!v. There exists thus a

minimal, non-free event e ∈ [g] such that α(e) = a and [g] \ e = [g′]. Since e is minimaland non-free in [g], the corresponding event e′ in send(stop([g])) is minimal and non-freein send(stop([g])). Since send(stop([g])) ⊂ [f where x :∈ g], e′ is minimal and non-freein [f where x :∈ g] too. Also α(e′) = a, since e is not a publication event, and so it is notrenamed by send.

Finally to prove that [f where x :∈ g] \ e′ = [f where x :∈ g′], we note that

1. [fx] \ e′ = [fx]. This is because the events of [f ] and [g] are assumed to be disjoint.

2. pipex(stop([g]), [fx] , [fx])\e′ = pipex(stop([g′]), [fx] , [fx]). This follows a similarargument as that of (Seq1N), which we used to show that pipex([f ] , [g] , ∅) \ e′ =pipex([f ′] , [g] , ∅).

From this, we have

[f where x :∈ g] \ e′ = (pipex(stop([g]), [fx] , [fx]) \ e′) ∪ [fx] \ e′= pipex(stop(

[g′

]), [fx] , [fx]) ∪ [fx]

=[f where x :∈ g′

]


Case 3: (Asym1N)

Here (f where x :∈ g)τ−→ [v/x]f , from g

!v−→ g′. There exists a minimal non-free evente ∈ [g] such that α(e) =!v and [g] \ e = [g′]. As in (Asym2), there exists a minimalevent e′ ∈ send(stop([g])) corresponding to e. e′ is thus also minimal in [f where x :∈ g].Since e is a publication event, e′ is renamed to τ by the send operator. So e′ is non-freewith α(e′) = τ , the action of the transition (f where x :∈ g)

τ−→ [v/x]f . Now

[f where x :∈ g] \ e′ = (pipex(stop([g]), [fx] , [fx]) \ e′) ∪ ([fx] \ e′)= send(stop([g])) \ e′ ∪

⋃

e∗∈ !stop([g]),α(e∗)= !v

link(τ(e∗), v, x, [fx] , [fx]) \ e′

∪ [fx]

Consider the first term send(stop([g])) \ e′. Since e is a publication event of [g], theevent e′ in stop([g]) preempts all the other events of stop([g]) (see Lemma A.1). As aresult send(stop([g])) \ e′ = ∅.

For the same reason, link(τ(e∗), v, x, [fx] , [fx]) \ e′ = ∅ for all e∗ ∈ !stop([g]) excepte∗ = e in which case τ(e∗) = e′. Therefore the second term of the above equation reduceslink(τ(e), v, x, [fx] , [fx]), which is simply a copy of [fx], with x replaced by v in the labels.Thus link(τ(e), v, x, [fx] , [fx]) = [[v/x]fx] and the above equation reduces to

[f where x :∈ g] \ e′ = ∅ ∪ link(τ(e), v, x, [fx] , [fx]) ∪ [fx]

= [[v/x]fx] ∪ [fx]

= [[v/x]f ]

Proof of part 2 of Theorem 4.8. For every minimal non-free event e of [f ], there is

an expression f ′ such that fα(e)−−→ f ′ and [f ] \ e = [f ′].

As before, the proof recurses over the structure of f .

• 0

The theorem holds trivially since there are no minimal non-free events in [0].

• let(v)

When v is a variable there is no minimal and non-free event in [ let(v)] so the theoremtrivially holds. If v is a constant, then the only minimal, non-free event e in [ let(v)] is

such that α(e) =!v. Moreover let(v)!v−→ 0 and [ let(v)] \ e = [0] and so we are done.

• ?k

The only minimal, non-free event e in [?k] is such that α(e) = k?vk. Since ?kk?vk−−−→ let(vk)

and [?k] \ e = [ let(vk)] the theorem holds.

• M(v)

When v is a variable there is no minimal non-free event in [M(v)] and so the theoremtrivially holds. If v is a constant value, the only minimal, non-free event e in [M(v)] is such

that α(e) = Mk(v). Again M(v)Mk(v)−−−−→?k and [M(v)] \ e = [?k], so the theorem holds.


• f | g

Since [f | g] = [f ]left ∪ [g]right, any minimal, non-free event e of [f | g] has a correspondingminimal, non-free event e′ in either [f ] or [g]. Let e′ ∈ [f ], the other case is symmetric. By

recursively applying the theorem on [f ], ∃f ′ such that fα(e′)−−−→ f ′ and [f ] \ e′ = [f ′].

Since α(e) = α(e′), from (Sym1) we have f | gα(e)−−→ f ′ | g. Finally,

[f | g] \ e = [f ]left \ e ∪ [g]right \ e=

[f ′

]left ∪ [g]right

=[f ′ | g

].

• f >x> g

[f >x> g] = send([f ]) ∪ recvx([f ] , [g] , ∅). Since all the events in recvx depend on a(renamed) publication event τ(e∗) of send([f ]), any minimal, non-free event e of [f >x> g]is also minimal and non-free in send([f ]). Thus e has a corresponding minimal, non-free

event e′ in [f ]. Applying the theorem recursively on f , we have that ∃f ′ such that fα(e′)−−−→ f ′

and [f ]\e′ = [f ′]. We have to distinguish two cases, when α(e′) 6=!v and when α(e′) =!v.

Case 1: α(e′) 6=!v In this case, send does not rename the label of e′ in e and α(e) =

α(e′) 6=!v. We can thus apply (Seq1N) to derive that f >x> gα(e)−−→ f ′

>x> g. Followingthe exact same steps as in the (Seq1N) case in the proof of f >x> g in part 1, we havethat [f >x> g] \ e = [f ′

>x> g].

Case 2: α(e′) =!v Since e′ is a publication event in [f ], the send changes the label of

e and α(e) = τ . From (Seq1V) we have f >x> gα(e)−−→ f ′

>x> g | [v/x]g. Again fromthe same steps as in the (Seq1V) case in the proof of f >x> g in part 1, we have that[f >x> g] \ e = [f ′

>x> g | [v/x]g].


[f where x :∈ g] = send(stop([g])) ∪ recvx(stop([g]), [fx] , [fx]) ∪ [fx]. Since all theevents in the recvx heap depend on maximal events of send(stop([g])), any minimal,non-free event e of [f where x :∈ g] is either in send(stop([g])) or in [fx].

Case 1: e ∈ [send(stop([g]))]. Consider the event e′ ∈ [g] corresponding to e. Since eis minimal and non-free in [send(stop([g]))], e′ is minimal and non-free in [g]. There aretwo sub-cases here depending on whether α(e′) 6=!v or α(e′) =!v.

i) α(e′) 6=!v: Applying the theorem on g, we have that gα(e′)−−−→ g′ and [g]\e′ = [g′]. Since

e′ is not a publication event, α(e) = α(e′) 6=!v. From (Asym2) we have

(f where x :∈ g)α(e)−−→ (f where x :∈ g′).

The same steps as in the (Asym2) case in part 1 of the proof give [f where x :∈ g]\e =[f where x :∈ g′].


ii) α(e′) =!v: Applying the theorem on g, we have that g!v−→ g′ and [g] \ e′ = [g′]. Since

e′ is a publication event, α(e) = τ . From (Asym1V) we have

(f where x :∈ g)α(e)−−→ [v/x]f.

The same steps as in the (Asym1V) case in part 1 of the proof give [f where x :∈ g]\e =[[v/x]f ].

Case 2: e ∈ [fx]. In this case e is minimal and non-free in [f ] too. Applying the theorem

on f , fα(e)−−→ f ′ and [f ] \ e = [f ′]. From (Asym1N) we have

(f where x :∈ g)α(e)−−→ (f ′ where x :∈ g).

The same steps as in the (Asym1N) case in part 1 of the proof give [f where x :∈ g]\e =[f ′ where x :∈ g].

Appendix B

Proofs of Chapter 5

B.1 Proof of Lemma 5.14

Lemma B.1 Let v be a R-stopped configuration and c be an initial stopping prefix of E.Then either v ∩ c = ∅ or v ∩ c ∈ Θc.

Proof: By Proposition 2, v ∩ B is R-stopped in B for every stopping prefix B and inparticular, when B = c. Since c is an initial stopping prefix, clearly v ∩ c is either emptyor maximal in c. ⋄

Lemma B.2 Let u be a configuration of E and let c be an initial stopping prefix of E. Ifu ∩ c = ∅, then c ∈ δ(u).

Proof: We first show that c ⊆ Eu. For this we observe that the following holds:

∄e1 ∈ u, e2 ∈ c, s.t. (e1e2) or (e2e1) (B.1)

If (B.1) was false, then there would exist events e′1 ∈ ⌊e1⌋ , e′2 ∈ ⌊e2⌋ such that (e′1 րm e′2)or (e′2 րm e′1) respectively. Since e′2 ∈ c, this would mean that e′1 ∈ c contradictingu ∩ c = ∅.

Now, consider any e ∈ c. We show that ⌊e⌋ and u are compatible. Call u′ = u ∪ ⌊e⌋.We show that:1. u′ is a configuration: u′ is a prefix. Since u and ⌊e⌋ are acyclic, and ր⌊e∗⌋ is acyclicfor any e∗ ∈ E, the existence of a cycle in u′ would need events e1 ∈ u, e2 ∈ ⌊e⌋ such thateither (e1e2) or (e2e1). This cannot be since (B.1) holds. Also, the set e′ ∈ u′ | e′ ր e∗can not be infinite for any e∗ ∈ u′. Since c is finite, this would imply that there exists anevent e∗ ∈ c such that e′ ∈ u | e′ ր e∗ is infinite, again contradicting (B.1).2. u ¹ u′ : Suppose not. Then ∃e1 ∈ u, e2 ∈ ⌊e⌋ , s.t. e2 ր e1. Since u∩ c = ∅, ¬(e2 < e1)holds, and so (e2e1) holds, contradicting (B.1).3. ⌊e⌋ ¹ u′ : If it is false, then ∃e1 ∈ u, e2 ∈ ⌊e⌋ such that e1 ր e2. Since c is a prefix andu ∩ c = ∅, ¬(e1 < e2) holds and so e1e2. Contradiction.

⌊e⌋ and u are thus compatible for any e ∈ c and thus c ⊆ Eu. From Lemma 5.8 wehave c ∩ Eu = c is a stopping prefix. c is an initial stopping prefix of Eu since any otherstopping prefix c′ ⊆ c of Eu would also be a stopping prefix in E, contradicting the factthat c is initial in E. ⋄


Definition B.3 A configuration u of E is called a germ if there is an initial stoppingprefix B of E such that u ∈ ΘB. A valid decomposition of a R-stopped configuration v ofE is called a germ decomposition of v.

Proposition 5 Let B be a stopping prefix of E. Every stopping prefix of B is a stoppingprefix of E. If v is a configuration of B, then every stopping prefix of Bv is a stoppingprefix of Ev.

Lemma B.4 Every R-stopped configuration of E has a germ decomposition.

Proof: We first prove this for a finite R-stopped configuration v of E. Let v ∈ ΘB whereB is a stopping prefix of E. Proposition 5 means that it is enough to show that v has agerm decomposition in the event structure B.

We give a method to construct the germ decomposition of v, (vn)n≥0. Set v0 = ∅ andrepeat the following steps, starting with n = 0.

• Case vn = v : Stop.

• Otherwise : Let w = v ⊖ vn. Pick any initial stopping prefix cn+1 of Bvn . Callz = cn+1 ∩ v = cn+1 ∩ w. w is maximal in Bvn and so by Lemma 5.7, z ∈ Θcn . Setvn+1 = vn ⊕ z and repeat the procedure.

Since the branching cell cn is non-empty, z ∈ Θcn is non-empty. |vn+1| > |vn|. Sincev is finite, and vn ⊆ v, the procedure eventually terminates at a step ’m’ when vm = v.(vi)0≤i≤m is a germ decomposition of v.

When the R-stopped configuration v may be infinite, we consider any valid decompo-sition (vi)0≤i of v. Since each configuration vi+1 ⊖ vi is finitely stopped in Evi , we can dothe above construction to get a germ decomposition for every vi+1⊖vi. The concatenationof all these germ decompositions gives a germ decomposition for v. ⋄

Lemma B.5 (First Exchange Lemma) Let v0 be a R-stopped configuration of E. Letζ be a germ of Ev0 and ξ be a germ of E. Assume that v0 ⊕ ζ and ξ are compatible andset:

v =def v0 ∪ ξ, v′ =def (v0 ⊕ ζ) ∪ ξ, ζ ′ =def v′ \ v. (B.2)

Then ζ ′ is stopped in Ev.

Proof: Let c be the initial stopping prefix of E such that ξ ∈ Θc. We distinguish the twocases:Case 1: v0 ∩ c 6= ∅. According to Lemma B.1, ξ′ = v ∩ c is a maximal configuration of c.ξ and ξ′ are compatible maximal configurations of c implies that ξ = ξ′. So ξ ⊆ v0. Thusv = v0 and ζ ′ = ζ. Since ζ is a germ of Ev0 , it is stopped in Ev0 = Ev.Case 2: v0 ∩ c = ∅. From Lemma B.2, c ∈ δ(v0). Let c′ ∈ δ(v0) be such that ζ ∈ Θc′ . cand c′ are two initial stopping prefixes of Ev0 and so from Theorem 5.12, either c = c′ orc ∩ c′ = ∅.

(a) c = c′. Then ζ and ξ are compatible maximal configurations of c, thus ζ = ξ and soζ ′ = ∅ which is stopped in Ev0 .

(b) c ∩ c′ = ∅. Then ζ ∩ ξ = ∅ and so ζ ′ = ζ. Also, ξ ∩ c′ = ∅ and so from Lemma B.2,c′ ∈ δ(v0 ⊕ ξ). ζ ′ is thus a germ of Ev.

B.1 Proof of Lemma 5.14 195

⋄

Lemma B.6 (Second Exchange Lemma) Let u and u′ be two R-stopped configurationsof E. Assume u and u′ are compatible. Then (u ∪ u′) ⊖ u′ is R-stopped in Eu′

.

Proof: First consider the case when u′ is a germ of E. From Lemma B.4 we can choose agerm decomposition (un)0≤n≤N of u. Consider the sequence (u′

n)0≤n≤N , where u′0 = ∅ and

u′n = u′ ∪ un zn = un ⊖ un−1 z′n = u′

n ⊖ u′n−1

for 1 ≤ n ≤ N . Then z′n = (u′ ∪ un−1 ∪ zn) \ (u′ ∪ un−1) By applying Lemma B.5with v0 = un−1, ξ = u′ and ζ = zn, we get that z′n is R-stopped in Eu′

n−1 . This makes(u′

n)0≤n≤N a valid decomposition of u ∪ u′ in E such that u′n ⊇ u′ for n ≥ 1. Therefore

(u′n+1⊖u′)0≤n≤N−1 is a valid decomposition of (u∪u′)⊖u′ in Eu′

. This proves the lemmawhen u′ is a germ of E.

If u′ is any R-stopped configuration, let (v′n)0≤n≤K be its germ decomposition. Thenapplying the first part of the proof to v′1 gives (u ∪ v′1) \ v′1 is R-stopped in Ev′

1 . Sincev′2 ⊖ v′1 is a germ of Ev′

1 , again applying the proof gives (u ∪ v′2) \ v′2 is R-stopped in Ev′2 .

Doing this recursively K times, gives (u ∪ u′) \ u′ is R-stopped in Eu′. ⋄

Lemma B.7 Let u and u′ be compatible finite R-stopped configurations of E. Let c ∈ δ(u)and c′ ∈ δ(u′). If c ∩ c′ 6= ∅, then c = c′.

Proof: From Lemma B.4, it suffices to show it when u′ = ∅. Let c ∩ c′ 6= ∅. ApplyingLemma B.1 to u and c′ implies that u∩ c′ is either ∅ or in Θ′

c. The latter case cannot occursince then c ⊆ Eu∩c′ and so c∩ c′ would be empty. Now since u∩ c′ = ∅, applying LemmaB.2 gives c′ ∈ δ(u). c and c′ are thus initial stopping prefixes of Eu. From Theorem 5.12,initial stopping prefixes are disjoint, and so c = c′. ⋄

Lemma B.8 Let v be a finite R-stopped configuration of E. Define

∆(v) = c ∈ δ(w) | w ∈ W, w ⊆ v (B.3)

Then, for any germ decomposition (vn)0≤n≤N of v we have:

∆(v) =N⋃

n=0

δ(vn) (B.4)

Proof: The definition of equation (B.3) implies that⋃

n δ(vn) ⊆ ∆(v). We need to showinclusion in the other sense. Let c ∈ ∆(v). So c ∈ δ(u) for a finite stopped configurationu such that u ⊆ v. Applying Lemma B.5, we get that v ⊖ u is R-stopped in Eu. Nowapplying Lemma B.1 implies that (v ⊖ u) ∩ c is either empty or maximal in Θc.

a) (v ⊖ u) ∩ c = ∅.Lemma B.2 in Eu gives c is an initial stopping prefix in (Eu)v⊖u = Ev. Thereforec ∈ δ(v) = δ(vn) and so c ∈ ⋃

n δ(vn).

b) (v ⊖ u) ∩ c ∈ Θc.

Let k be the greatest integer such that vk ∩ c = ∅. Such a k exists since v0 ∩ c = ∅and vn ∩ c 6= ∅. Let vk+1 ⊖ vk ∈ Θc′ , where c′ is an initial stopping prefix of Evk .c∩ c′ 6= ∅ by construction. Since u and vk are compatible, applying Lemma B.7 givesc = c′. Hence c ∈ ⋃

n δ(vn).

⋄


B.1.1 Proof of Lemma 5.14

The existence of a germ decomposition for the R-stopped configuration v follows fromLemma B.4. Let (cn)n be the associated sequence of branching cells. Since every configu-ration vi+1⊖vi is a maximal configuration of ci+1, no event of ci+1 appears in Bvi+1 . Thusall the branching cells in the sequence (cn)n are disjoint.

We now show that the set of branching cells C = (cn)n are disjoint. Initially, we assumethat v is finite. Consider the set of branching cells ∆(v) defined by (B.3) of Lemma B.8.We have C ⊆ ∆(v). Also, from (B.4), any branching cell c ∈ ∆(v) satisfies c ∈ C if andonly if c ∩ v 6= ∅. So,

C = ∆(v) = ∆(v) \ δ(v)

As is evident from the above equation, the covering δ(v) does not depend on the decom-position of v and so is unique for a given v.

For the generic case, when v is any R-stopped configuration, we show that the followingholds:

C =⋃

w∈W,w⊆v

∆(w) (B.5)

where C = (cn)n∈I is the set of branching cells associated with the decomposition(vn)n∈I of v.

C = cn, n ∈ I=

⋃

n∈I

cj , 1 ≤ j ≤ n

=⋃

n∈I

∆(vn), , from the above result, since vn is finite

This shows the inclusion ⊆ part of (B.5). Now let c ∈ ∆(w) for some w ∈ W such thatw ⊆ v. Since w is finite, there exists a finite index i such that w ⊆ vi. Since vi is finite,applying the above theorem implies that there exists an index k ≤ i such that ck = c. Soc ∈ C, which shows the inclusion ⊇ of (B.5). ⋄

B.1.2 Proof of Theorem 5.19

1. Obvious.

2. The claim is obvious for e ≺ e ′ or e ′ ≺ e, so assume neither holds. Let H(e) bethe random variable given by (H(e, ω))ω∈Ω, and define analogously H(κ) for anyconfiguration κ. Let x be the configuration

x , ⌊x⌋ \x.

Then A , H(e)−H(e) and A′ , H(e′)−H(e′) are independent of one another andof H(e) and H(e′). In particular, A and A′ are independent of B , H(e′) − H(e),and thus A is independent of A′ + B. Now, for any ω,

H(e, ω) = H(e ′, ω) ⇔ A(ω) = A′(ω) + B(ω).

Now, if X and Y are two atomless independent real random variables, then X − Yis also atomless. Then P(X = Y ) = PW (X − Y = 0) = 0. Setting X , A andY , A′ + B, one can conclude.

B.1 Proof of Lemma 5.14 197

3. Assume there exist ε > 0 and τ ∈ [0,∞) such that

P ω : |Eτ (ω)| = ∞ > ε, (B.6)

and let τε , infτ ∈ [0,∞) | (B.6) holds. If

P ω : |Eτε(ω)| = ∞ > ε, (B.7)

then, by construction of τε, there is a positive probability that an infinite number offirings must occur simultaneously at time τε. However, N is safe and finite, thereforeonly a finite number of transition firings can be simultaneously enabled; from thiscontradiction, we conclude that

u , P ω : |Eτε(ω)| = ∞ < ε.

Thus, for every ǫ > 0,

P|Eτε+ǫ(ω)| = ∞ | |Eτε(ω)| < ∞ >ε − u

1 − u> 0.

Since N is finite, this implies that there exists some transition t such that

Pt fires ∞ly often in [τε, τε + ǫ] | |Eτε(ω)| < ∞ > 0.

But since N is safe, no two occurrences of the same transition are enabled simul-taneously. Hence, since the δt(nk, ω) are i.i.d., this implies the existence of a seriesn1 < n2 < . . . of indexes such that

P

∞∑

k=1

δt(nk, ω) < αt(2−k)

> 0, (B.8)

where αt(x) is the x-quantile of the distribution of δt . Note that by assumption 2.2,one has that P(δt(nk, ω) = 0) = 0, and therefore αt(x) > 0 for all x > 0. Byconstruction,

∞∑

k=1

Pδt(nk, ω) < αt(2−k) 6

∞∑

k=1

2−k = 1; (B.9)

but then the Borel-Cantelli lemma1 contradicts (B.8), and we are done.

B.1.3 Proof of Lemma 5.21

Let us first show that R(ω) is a configuration. By definition, e ∈ R(ω) entails e ⊆ R(ω).Since E is well-ordered, this implies that R(ω) is downward closed. Suppose now thatր|R(ω) is not acyclic; then there must exist events e, e1, . . . , en ∈ R(ω) such that

e ր e1 ր . . . ր en ր e.

This implies by definition of R(ω) that

H(e, ω) ≤ H(e1, ω) ≤ . . . ≤ H(en, ω) ≤ H(e, ω),

which has zero probability. Therefore ր|R(ω) must be a.s. acyclic. Finally, supposethe set Ae = e ′ ∈ R(ω) | e ′ ր e is infinite. By construction of Ae , we must haveH(e ′, ω) 6 H(e, ω) for all e ′ ∈ Ae ; from statement 3 of Theorem 5.19, this is a.s. impossible,hence R(ω) is a.s. a configuration of E. For maximality, suppose there exists e 6∈ R(ω)such that R(ω)∪e is a configuration of E and R(ω) ≺E R(ω)∪e. This implies that

1see e.g. Lemma 8.1 in P. Brémaud. Markov Chains. Gibbs Fields, Monte Carlo Simulation, andQueues. Texts in Applied Mathematics 31, Springer 1999.


1. e ⊆ R(ω), and

2. there is no e ∈ R(ω) such that e ր e.

But this implies that occ(e, ω) is true, contradicting the assumption e 6∈ R(ω).

Appendix C

Proofs of Chapter 7

C.1 Proof of Theorem 7.4

Proof: We first prove Statement 1. Let N ′ ∈ N be such that N ′ ≥ N . We have:

Eω(κ(N ′, ω),N ′) ≥ Eω(κ(N ′, ω),N ) ≥ Eω(κ(N , ω),N )

where the first inequality follows from the fact that κ(N ′, ω) is a conflict free partial orderand N ′ ≥ N , and the second inequality follows from (7.7) applied with κ = κ(N ′, ω). Thisproves Statement 1.

We prove statement 2 by contradiction. Let (N , ω, κ†) be a triple violating Condition(7.7), in that

κ† cannot occur, but Eω(κ†,N ) < Eω(κ(N , ω),N ) nevertheless holds.

Now consider the OrchNet net N ′ = (N, Φ, T ′, Tinit) where the family T ′ is the same as Texcept that in ω, ∀t /∈ κ†, τ ′

t(ω) > Eω(κ†,N ). Clearly N ′ ≥ N . But using construction(7.4), it is easy to verify that κ(N ′, ω) = κ† and thus

Eω(κ(N ′, ω),N ′) = Eω(κ†,N ′) = Eω(κ†,N ) < Eω(κ(N , ω),N ),

which violates monotonicity. ⋄


Proof: Let ϕW be the net morphism mapping NW onto W and let N ∈ N be anyOrchNet. We prove that condition 1 of Theorem 7.4 holds for N by induction on thenumber of transitions in the maximal configuration κ(N , ω) that actually occurs. Thebase case is when it has only one transition. Clearly this transition has the least latencyand any other maximal configuration has a greater execution time.

Induction Hypothesis. Condition 1 of Theorem 7.4 holds for any maximal occur-ring configuration with m − 1 transitions (m > 1). Formally, for a pre-OrchNet N =(N, Φ, T, Tinit): ∀N ∈ N,∀ω ∈ Ω,∀κ ∈ V (N),

Eω(κ,N ) ≥ Eω(κ(N , ω),N ) (C.1)


holds if |t ∈ κ(N , ω)| ≤ m − 1.

Induction Argument. Consider the OrchNet N , where the actually occurring config-uration κ(N , ω) has m transitions. κ′ is any other maximal configuration of N . If thetransition t in κ(N , ω) with minimal date dt also occurs in κ′ then comparing executiontimes of κ(N , ω) and κ′ reduces to comparing Eω(κ(N , ω) \ t,N t) and Eω(κ′ \ t,N t).Since κ(N , ω) \ t is the actually occurring configuration in the future N t of transition t,using our induction hypothesis, we have

Eω(κ(N , ω) \ t,N t) ≤ Eω(κ′ \ t,N t)

and soEω(κ(N , ω),N ) ≤ Eω(κ′,N )

If t /∈ κ′ for some κ′, then there must exist another transition t′ such that •t ∩ •t′ 6= ∅.By the definition of clusters, ϕW (t) and ϕW (t′) must belong to the same cluster c. Hence,t• = t′• follows from condition 7.8 of Theorem 7.5. The futures N t and N t′ thus haveidentical sets of transitions: they only differ in the initial marking of their places. If Tinit

and T ′init are the initial marking of these places, Tinit ≤ T ′

init (since dt ≤ dt′ , t• has dateslesser than t′•). Hence

Eω(κ(N , ω),N ) = Eω(κ(N , ω) \ t,N t) (C.2)

and

Eω(κ′,N ) = Eω(κ′ \ t′,N t′) ≥ Eω(κ′ \ t′,N t) (C.3)

The inequality holds since N t′ ≥ N t. The induction hypothesis on (C.2) and (C.3) givesEω(κ(N , ω),N ) ≤ Eω(κ′,N ). This proves the theorem. ⋄


Proof: We will show that when condition (7.8) of Theorem 7.5 is not satisfied by W , theOrchnets in its induced preOrchNet NW can violate condition (7.7) of Theorem 7.4, thenecessary condition for monotonicity.

Let cW be any cluster in W that violates the condition 7.8 of Theorem 7.5. Considerthe unfolding of W , NW and the associated morphism ϕ : NW 7→ W as introduced before.Since W is sound, all transitions in cW are reachable from the initial place i and so thereis a cluster c in NW such that ϕ(c) = cW . There are transitions t1, t2 ∈ c such that•t1 ∩ •t2 6= ∅, •ϕ(t1) ∩ •ϕ(t2) 6= ∅ and ϕ(t1)

• 6= ϕ(t2)•. Call [t] = ⌊t⌋ \ t and define

K = [t1] ∪ [t2]. We consider the following two cases:

K is a configuration. If so, consider the OrchNet N ∗ ∈ NW obtained when transitionsof NW (and so W ) have latencies as that in W∗. So for the daemon value ω∗, the quantityEω∗(K,N ∗) is some finite value n∗. Now, configuration K can actually occur in a OrchNetN , such that N > N ∗, where N is obtained as follows (τ and τ∗ denote the latencies oftransitions in N and N ∗ respectively): ∀t ∈ K, t′ ∈ NW s.t. •t∩•t′ 6= ∅, set τt′(ω

∗) = n∗+1and keep the other latencies unchanged. In this case, for the daemon value ω∗, the latenciesof all transitions of N (and so its overall execution time) is finite. Denote by NK the futureof N once configuration K has actually occurred. Both t1 and t2 are minimal and enabledin NK .

C.4 Proof of Theorem 7.13 201

Since ϕ(t1)• 6= ϕ(t2)

•, without loss of generality, we assume that there is a placep ∈ t1

• such that ϕ(p) ∈ ϕ(t1)• but ϕ(p) /∈ ϕ(t2)

•. Let t∗ be a transition in NK such thatt∗ ∈ p•. Such a transition must exist since p can not be a maximal place: ϕ(p) can notbe a maximal place in W which has a unique maximal place. Now consider the OrchnetN ′ > N obtained as follows: τ ′

t1(ω∗) = τt1(ω

∗), τ ′t2(ω

∗) = τt1(ω∗) + 1 and for all other

t ∈ c, τ ′t(ω

∗) = τ ′t2(ω

∗) + 1. Set τ ′t∗(ω

∗) = ∞ and for all other transitions of N ′, the delaysare the same as that in N and thus are finite for ω∗.

t1 has the minimal delay among all transitions in c, and t∗ is in the future of t1. Sothe actually occurring configuration Eω∗(κ(N ′, ω∗),N ′) has an infinite delay. However anymaximal configuration κ which does not include t1 (for eg, when t2 fires instead of t1) willhave a finite delay. For such κ we thus have Eω∗(κ(N ′, ω∗),N ′) > Eω∗(κ,N ′) and so N ′

violates the condition (7.7) of Theorem 7.4.

K is not a configuration. If so, there exist transitions t ∈ [t1] \ [t2], t′ ∈ [t2] \ [t1] suchthat •t ∩ •t′ 6= ∅, •ϕ(t) ∩ •ϕ(t′) 6= ∅ and ϕ(t)• 6= ϕ(t′)•. The final condition holds since t2and t1 are not in the causal future of t and t′ respectively. Thus t and t′ belong to the samecluster, which violates condition 7.8 of Theorem 7.5 and we can apply the same reasoningas in the beginning of the proof. Since [t] is finite for any transition t, we will eventuallyend up with K being a configuration. ⋄


Proof: The proof is by contradiction. Assume that N is not monotonic with positiveP-probability, i.e., :

there exists a pair (N ,N ′) of OrchNets such thatN ≥ N ′ and P ω ∈ Ω | Eω(N ) < Eω(N ′) > 0.

(C.4)

To prove the theorem it is enough to prove that (C.4) implies:

there exists No,N ′o ∈ N such that No ≥ N ′

o,but E(No) ≥s E(N ′

o) does not hold(C.5)

To this end, set No = N and define N ′o as follows, where Ωo denotes the set ω ∈ Ω | Eω(N ) < Eω(N ′):

N ′o(ω) = if ω ∈ Ωo then N ′(ω) else N (ω)

Note that No ≥ N ′o by construction. Also, N ′

o ≥ N ′, whence N ′o ∈ N since condition 2b of

theorem 7.4 is satisfied. On the other hand, we have Eω(No) < Eω(N ′o) for ω ∈ Ωo, and

Eω(No) = Eω(N ′o) for ω 6∈ Ωo. By (C.4), we have P(Ωo) > 0. Consequently, we get:

[∀ω ∈ Ω ⇒ Eω(No) ≤ Eω(N ′

o)]

and[P

ω ∈ Ω | Eω(No) < Eω(N ′

o)

> 0]

which implies that E(No) ≥s E(N ′o) does not hold. ⋄


Appendix D

Proofs of Chapter 8

D.1 Study of the contract composition procedure

To prove the convergence of this iterative procedure we will need the following assumptions:

Assumption 7

1. For any contract (FA, FQ), there exists a weaker contract (F ′A, F ′

Q), meaning that

F ′A ≤A

s FA and F ′Q ≥s FQ, that is acceptable to both parts engaged in the negotiation.

2. The considered probabilistic OrchNet is strictly monotonic w.r.t the assumptions,meaning that, for any transition t and any given F ′

A,t, there exists a QoS assumptionFA for the orchestration that generates a QoS assumption FA,t for a transition t suchthat FA,t ≥A

s F ′A,t, i.e., FA,t is better than F ′

A,t for transition t.

Assumption 7.1 is “societal” because it is an assumption about the behavior of the “agents”that undertake contract negotiation. This assumption is not of a mathematical nature,unlike the second one, which is a strenghtening of monotonicity.

Proof of convergence of contract composition procedure: We successively studythe first and the second approach for the negotiation phase.

For the first approach, observing convergence is simple. We observe that using assump-tion 7.1, we can find assumption distribution F 2

A,t such that (F 2A,t, F

1Q,t) is an acceptable

contract for t. Now using assumption 7.2, we can strengthen the assumptions F 1A suffi-

ciently enough, such that the assumptions that t is subject to is F 2A,t.

For the second approach, use assumption 7.1 to find F 1Q,t ≥s F 0

Q,t such that (F 1A,t, F

1Q,t)

is now an acceptable contract for transition t. Now, re-running the simulation phase withinitial conditions F 0

A and F 1Q,t, yields F 1

A,t as an assumption for each transition t, and F 2Q as

an updated guaranteed QoS for the orchestration. Since contract (F 1A,t, F

1Q,t) is acceptable

to t, the orchestration can offer to its client F 2Q as a guaranteed QoS. Observe that we do

not need the stronger assumption 7.2 for the second approach.


D.2 Proof of Theorem 8.7

Proof: Throughout the proof, we fix an arbitrary value ω for the daemon. We first provethat the condition is sufficient. Let N ′ ∈ N be such that N ′ ≥ N . Since operators ⊕ and¢ are both monotonic, see definition 8.1, we have, by procedure 2 and formulas (8.8) and(8.9):

Eω(κ(N ′, ω),N ′) ≥ Eω(κ(N ′, ω),N )

By (8.12) applied with κ = κ(N ′, ω), we get that

Eω(κ(N ′, ω),N ) ≥ Eω(κ(N , ω),N )

holds. This proves the suffiency of the condition. We prove the necessity part by contra-diction. Let (N , ω, κ†) be a triple violating Condition (8.12), in that

κ† cannot occur, but Eω(κ†,N ) ≥ Eω(κ(N , ω),N ) does not hold.

Now consider the OrchNet net N ′ = (N, Φ, Q′, Qinit) where the family Q′ is such that,∀t ∈ κ†, ξ′t(ω) = ξt(ω) holds, and ∀t /∈ κ†, using conditions 1 and 3 for operator ⊕in definition 8.1 together with the assumption that (D,≤) is an upper lattice, we caninductively select ξ′t(ω) such that the following two inequalities hold:

∨

t∈κ†

qt ≤( ∨

p′∈•t

qp′)⊕ ξ′t(ω) (D.1)

ξt(ω) ≤ ξ′t(ω) (D.2)

Condition (D.2) expresses that N ′ ≥ N . By procedure 1 defining competition policy, (D.1)implies that configuration κ† can win all competitions arising in step 3 of competition policy,κ(N ′, ω) = κ† holds, and thus

Eω(κ(N ′, ω),N ′) = Eω(κ†,N ′) = Eω(κ†,N )

However, Eω(κ†,N ) ≥ Eω(κ(N , ω),N ) does not hold, which violates monotonicity. ⋄


We prove that the if (sufficiency) and only if (necessity) parts separately.

D.3.1 Proof of Sufficiency

Proof: Let ϕW be the net morphism mapping NW onto W and let N be any OrchNet.We prove that Theorem 8.7 holds for N by induction on the number of transitions in themaximal configuration κ(N , ω) that actually occurs. The base case is when it has onlyone transition. Clearly this transition has minimal QoS increment and any other maximalconfiguration has a greater end-to-end QoS value.

Induction Hypothesis. Theorem 8.7 holds for any maximal occurring configurationwith m − 1 transitions (m > 1). Formally, for a pre-OrchNet N = (N, Φ, T, Tinit): ∀N ∈N,∀ω ∈ Ω,∀κ ∈ V (N),

Eω(κ,N ) ≥ Eω(κ(N , ω),N ) (D.3)

must hold if |t ∈ κ(N , ω)| ≤ m − 1.

D.3 Proof of Theorem 8.8 205

Induction Argument. Consider the OrchNet N , where the actually occurring configu-ration κ(N , ω) has m transitions and let

∅ = κ0(ω) ⊂ κ1(ω) · · · ⊂ κM(ω)(ω) = κ(N , ω)

be the increasing chain of configurations leading to κ(N , ω) under competition policy. Lett be the unique transition such that t ∈ κ1(ω). Let κ′ be any other maximal configurationof N . Then two cases can occur.

• t ∈ κ′: In this case, comparing end-to-end QoS of κ(N , ω) and κ′ reduces to compar-ing Eω(κ(N , ω) \ t,N t) and Eω(κ′ \ t,N t). Since κ(N , ω) \ t is the actuallyoccurring configuration in the future N t of transition t, using our induction hypoth-esis, then

Eω(κ′ \ t,N t) ≥ Eω(κ(N , ω) \ t,N t)

holds, which implies

Eω(κ′,N ) ≥ Eω(κ(N , ω),N )

• t /∈ κ′: Then there must exist another transition t′ such that •t ∩ •t′ 6= ∅. By thedefinition of clusters, ϕW (t) and ϕW (t′) must belong to the same cluster c. Hence,t• = t′•. The futures N t and N t′ thus have identical sets of transitions: they onlydiffer in the initial marking of their places. If Qinit and Q′

init are the initial QoSvalues for the futures N t and N t′ , then Qinit ≤ Q′

init holds (since ξt ≤ ξt′ , t• hasQoS lesser than t′• by monotonicity of ⊕). On the other hand,

Eω(κ(N , ω),N ) = Eω(κ(N , ω) \ t,N t) (D.4)

and

Eω(κ′,N ) = Eω(κ′ \ t′,N t′)

Now, since N t′ and N t possess identical underlying nets and N t′ ≥ N t, then we get

Eω(κ′ \ t′,N t′) ≥ Eω(κ′ \ t′,N t) (D.5)

Finally, the induction hypothesis on (D.4) and (D.5) together imply Eω(κ′,N ) ≥Eω(κ(N , ω),N ).

This proves that Theorem 8.7 holds which finishes the proof of the sufficiency condition. ⋄

D.3.2 Proof of Necessity

Proof: We will show that when the condition of Theorem 8.8 is not satisfied by W , thenNW violates the condition of Theorem 8.7, the necessary condition for monotonicity.

Let cW be any cluster in W that violates the condition of Theorem 8.8. Consider theunfolding of W , NW and the associated morphism ϕ : NW 7→ W as introduced before.Since W is sound, all transitions in cW are reachable from the initial place i and so thereis a cluster c in NW such that ϕ(c) = cW . There are transitions t1, t2 ∈ c such that•t1 ∩ •t2 6= ∅, •ϕ(t1) ∩ •ϕ(t2) 6= ∅ and ϕ(t1)

• 6= ϕ(t2)•. Call [t] = ⌊t⌋ \ t and define

κ = [t1] ∪ [t2]. We consider the following two cases:


• κ is a configuration. If so, consider the OrchNet N ∗ obtained when transitions ofNW (and so W ) have QoS increments as that in W∗. So for the daemon value ω∗, thequantity Eω∗(κ,N ∗) is some finite value q∗. Now, configuration κ can actually occurin a OrchNet N , such that N > N ∗, where N is obtained as follows (ξ and ξ∗ denotethe QoS increments of transitions in N and N ∗ respectively): ∀t ∈ κ, t′ ∈ NW

s.t. •t ∩ •t′ 6= ∅, select ξt′(ω∗) such that ξt′(ω

∗) > q∗ and keep the other QoSincrements unchanged. In this case, for the daemon value ω∗, the QoS increments ofall transitions of N (and so its overall execution time) is finite. Denote by N κ thefuture of N once configuration κ has actually occurred. Both t1 and t2 are minimaland enabled in N κ.

Since ϕ(t1)• 6= ϕ(t2)

•, without loss of generality, we assume that there is a placep ∈ t1

• such that ϕ(p) ∈ ϕ(t1)• but ϕ(p) /∈ ϕ(t2)

•. Let t∗ be a transition in N κ

such that t∗ ∈ p•. Such a transition must exist since p can not be a maximal place:ϕ(p) can not be a maximal place in W which has a unique maximal place. Now,consider the Orchnet N ′ > N obtained as follows: using repeatedly condition 3 foroperator ⊕ in definition 8.1, ξ′t1(ω

∗) = ξt1(ω∗), ξ′t2(ω

∗) ≥ ξt1(ω∗), and, for all other

t ∈ c, ξ′t(ω∗) ≥ ξ′t2(ω

∗). For all remaining transitions of N ′, with the exception of t∗,the QoS increments are the same as that in N and thus are finite for ω∗. Finally,select ξ′t∗(ω

∗) such that

ξt1(ω∗) ⊕ ξ′t∗(ω

∗) > Q∗ (D.6)

where Q∗ ∈ D will be chosen later — here we used Assumption 4 together withthe third condition 3 for operator ⊕ in definition 8.1. Transition t1 has a minimalQoS increment among all transitions in c. It can therefore win the competition, thusgiving raise to an actually occuring configuration κ(N ′, ω∗). Select Q∗ equal to themaximal value of the end-to-end QoS of the set K of all maximal configurations κthat do not include t1 (for eg, when t2 fires instead of t1). By (D.6), since t∗ is in thefuture of t1, we thus have Eω∗(κ(N ′, ω∗),N ′) ≥ ξt1(ω

∗)⊕ ξ′t∗(ω∗) > Q∗ ≥ Eω∗(κ,N ′)

for any κ ∈ K and so N ′ violates the condition of Theorem 8.7.

• κ is not a configuration. If so, there exist transitions t ∈ [t1] \ [t2], t′ ∈ [t2] \ [t1] suchthat •t ∩ •t′ 6= ∅, •ϕ(t) ∩ •ϕ(t′) 6= ∅ and ϕ(t)• 6= ϕ(t′)•. The final condition holdssince t2 and t1 are not in the causal future of t and t′ respectively. Thus t and t′

belong to the same cluster, which violates the condition of Theorem 8.8 and we canapply the same reasoning as in the beginning of the proof. Since [t] is finite for anytransition t, we will eventually end up with κ being a configuration.

⋄


Proof: The proof is by contradiction. Assume that (N ,P) is not probabilistically mono-tonic. This implies that N is not monotonic with positive P-probability, i.e., :

there exists a pair (N ,N ′) of OrchNets such thatN ≥ N ′ and P ω ∈ Ω | Eω(N ) < Eω(N ′) > 0.

(D.7)

To prove the theorem it is enough to prove that (D.7) implies:

there exists No,N ′o ∈ N such that No ≥ N ′

o,but E(No) ≥s E(N ′

o) does not hold(D.8)

D.4 Proof of Theorem 8.15 207

To this end, set No = N and define N ′o as follows, where Ωo denotes the set ω ∈ Ω | Eω(N ) < Eω(N ′):

N ′o(ω) = if ω ∈ Ωo then N ′(ω) else N (ω)

Note that No ≥ N ′o ≥ N ′ by construction. On the other hand, we have Eω(No) < Eω(N ′

o)for ω ∈ Ωo, and Eω(No) = Eω(N ′

o) for ω 6∈ Ωo. By (D.7), we have P(Ωo) > 0. Consequently,we get:

[∀ω ∈ Ω ⇒ Eω(No) ≤ Eω(N ′

o)]

and[P

ω ∈ Ω | Eω(No) < Eω(N ′

o)

> 0]

which implies that E(No) ≥s E(N ′o) does not hold. ⋄


Bibliography

[AAF+02] Assaf Arkin, Sid Askary, Scott Fordin, Wolfgang Jekeli, Kohsuke Kawaguchi,David Orchard, Stefano Pogliani, Karsten Riemer, Susan Struble, Pal T.Nagy, Ivana Trickovic, and Sinisa Zimek. Web Service Choreography Inter-face (WSCI) 1.0. Technical report, 2002.

[AB06] Samy Abbes and Albert Benveniste. True-concurrency probabilistic mod-els: Branching cells and distributed probabilities for event structures. Inf.Comput., 204(2):231–274, 2006.

[AB08] Samy Abbes and Albert Benveniste. True-concurrency probabilistic models:Markov nets and a law of large numbers. Theor. Comput. Sci., 390(2-3):129–170, 2008.

[ACD+] Alain Andrieux, Karl Czajkowski, Asit Dan, Kate Keahey, Heiko Ludwig,Toshiyuki Nakata, Jim Pruyne, John Rofrano, Steve Tuecke, and Ming Xu.Web Services Agreement Specification (WS-Agreement).

[AFFK04] Jesús Arias-Fisteus, Luis Sánchez Fernández, and Carlos Delgado Kloos.Formal verification of bpel4ws business collaborations. In EC-Web, pages76–85, 2004.

[AFFK05] Jesús Arias-Fisteus, Luis Sánchez Fernández, and Carlos Delgado Kloos.Applying model checking to BPEL4WS business collaborations. In SAC,pages 826–830, 2005.

[AP07] Danilo Ardagna and Barbara Pernici. Adaptive Service Composition inFlexible Processes. IEEE Trans. Software Eng., 33(6):369–384, 2007.

[AVMM04] Rohit Aggarwal, Kunal Verma, John A. Miller, and William Milnor. Con-straint driven web service composition in meteor-s. In IEEE SCC, pages23–30, 2004.

[BASE07] Antonia Bertolino, Guglielmo De Angelis, Antonino Sabetta, and Sebas-tian G. Elbaum. Scaling up sla monitoring in pervasive environments. InESSPE, pages 65–68, 2007.

[BCM01] Paolo Baldan, Andrea Corradini, and Ugo Montanari. Contextual Petrinets, Asymmetric Event Structures, and Processes. Inf. Comput., 171(1):1–49, 2001.

[BDK01] Eike Best, Raymond Devillers, and Maciej Koutny. Petri net algebra.Springer-Verlag New York, Inc., New York, NY, USA, 2001.

210 Bibliography

[BHR09] Anne Bouillard, Stefan Haar, and Sidney Rosario. Critical paths in thePartial Order Unfolding of a Stochastic Petri Net. In FORMATS, 2009.

[BMT06] Roberto Bruni, Hernán C. Melgratti, and Emilio Tuosto. Translating OrcFeatures into Petri Nets and the Join Calculus. In WS-FM, pages 123–137,2006.

[BN93] Michèle Basseville and Igor Nikiforov. Detection of Abrupt Changes - Theoryand Application. Prentice-Hall, Inc., April 1993.

[Bpe07] Web Services Business Process Execution Language Version 2.0.OASIS Standard, April 2007. Available at http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf.

[BPM] Business Process Modeling Language (BPML). www.bpmi.org.

[BRBH08] Anne Bouillard, Sidney Rosario, Albert Benveniste, and Stefan Haar.Monotony in Service Orchestrations. Research Report RR-6528, INRIA,2008.

[BRBH09] Anne Bouillard, Sidney Rosario, Albert Benveniste, and Stefan Haar. Mono-tonicity in Service Orchestrations. In Petri Nets, pages 263–282, 2009.

[BS09] Stefano Bistarelli and Francesco Santini. A Nonmonotonic Soft ConcurrentConstraint Language for SLA Negotiation. Electr. Notes Theor. Comput.Sci., 236:147–162, 2009.

[BSC01] Preeti Bhoj, Sharad Singhal, and Sailesh Chutani. SLA management infederated environments. Computer Networks, 35(1):5–24, 2001.

[CAH05] D.B. Claro, P. Albers, and J.K. Hao. Selecting Web Services for OptimalComposition. In International Conference on Web Services, Workshop onSemantic and Dynamic Web Processes, 2005.

[CM] William Cook and Jayadev Misra. The implementation outline of orc. http://orc.csres.utexas.edu/papers/OrcImpDraft.pdf.

[CMS+03] Senthilanand Chandrasekaran, John A. Miller, Gregory A. Silver, Ismail-cem Budak Arpinar, and Amit P. Sheth. Performance Analysis and Simu-lation of Composite Web Services. Electronic Markets, 13(2), 2003.

[CPEV05] Gerardo Canfora, Massimiliano Di Penta, Raffaele Esposito, andMaria Luisa Villani. QoS-Aware Replanning of Composite Web Services.In ICWS, pages 121–129, 2005.

[CS92] Javier Campos and Manuel Silva. Structural techniques and performancebounds of stochastic petri net models. In Advances in Petri Nets: TheDEMON Project, pages 352–391, 1992.

[CSM+04] Jorge Cardoso, Amit P. Sheth, John A. Miller, Jonathan Arnold, and KrysKochut. Quality of service for workflows and web service processes. J. WebSem., 1(3):281–308, 2004.

Bibliography 211

[DB07] Andrea D’Ambrogio and Paolo Bocciarelli. A model-driven approach todescribe and predict the performance of composite services. In WOSP,pages 78–89, 2007.

[DH97] Anthony Christopher Davison and D. V. Hinkley. Bootstrap Methods andtheir application. Cambridge University Press, 1997.

[DK04] Raymond Devillers and Hanna Klaudel. Solving Petri Net Recursionsthrough Finite Representation. In Proc of IASTED, 2004.

[DLZ06] Jin Song Dong, Yang Liu, Jun Sun 0001, and Xian Zhang. Verification ofcomputation orchestration via timed automata. In ICFEM, pages 226–245,2006.

[Eng91] Joost Engelfriet. Branching Processes of Petri Nets. Acta Inf., 28(6):575–591, 1991.

[ERV02] Javier Esparza, Stefan Römer, and Walter Vogler. An improvementof mcmillan’s unfolding algorithm. Formal Methods in System Design,20(3):285–310, 2002.

[FBS04] Xiang Fu, Tevfik Bultan, and Jianwen Su. Analysis of interacting bpel webservices. In WWW, pages 621–630, 2004.

[Fer04] Andrea Ferrara. Web services: a process algebra approach. In ICSOC, pages242–251, 2004.

[Fie00] Roy Fielding. Architectural Styles and the Design of Network-based Soft-ware Architectures. Phd. Dissertation, 2000. http://www.ics.uci.edu/

~fielding/pubs/dissertation/top.htm.

[FLBT+02] V. Firoiu, J.-Y. Le Boudec, D. Towsley, Z.-L. Zhang, and Jean-YvesLe Boudec. Theories and Models for Internet Quality of Service. Proceedingsof the IEEE, 90(9):1565–1591, 2002.

[GM99] Stéphane Gaubert and Jean Mairesse. Modeling and analysis of timed petrinets using heaps of pieces. IEEE Trans. Aut. Cont, 44:683–697, 1999.

[HHK02] Holger Hermanns, Ulrich Herzog, and Joost-Pieter Katoen. Process algebrafor performance evaluation. Theor. Comput. Sci., 274(1-2):43–87, 2002.

[Hoa78] C. A. R. Hoare. Communicating Sequential Processes. Commun. ACM,21(8):666–677, 1978.

[HSS05] Sebastian Hinz, Karsten Schmidt, and Christian Stahl. Transforming BPELto Petri Nets. In Business Process Management, pages 220–235, 2005.

[HWTS07] San-Yih Hwang, Haojun Wang, Jian Tang, and Jaideep Srivastava. A prob-abilistic approach to modeling and estimating the QoS of web-services-basedworkflows. Inf. Sci., 177(23):5484–5503, 2007.

[JSO] Javascript object notation (json). http://www.json.org/.

[KBL01] J. P. Katoen, C. Baier, and D. Latella. Metric semantics for true concurrentreal time. Theoretical Computer Science, pages 501–542, 2001.

212 Bibliography

[KBRL04] N. Kavantzas, D. Burdett, G. Ritzinger, and Y. Lafon. Web Services Chore-ography Description Language Version 1.0. Technical report, W3C WorkingDraft, October 2004. Available at: http://www.w3.org/TR/ws-cdl-10.

[KCM06] David Kitchin, William R. Cook, and Jayadev Misra. A Language for TaskOrchestration and its Semantic Properties. In Proc. of the Intl. Conf. onConcurrency Theory (CONCUR), 2006.

[KKO77] T. Kamae, U. Krengel, and G.L. O’Brien. Stochastic inequalities on partiallyordered spaces. The Annals of Probability, 5(6):899–912, 1977.

[KL02] Alexander Keller and Heiko Ludwig. The WSLA Framework: Specifying andMonitoring Service Level Agreements for Web Services. Technical report,IBM Research Division, 2002.

[KL03] Alexander Keller and Heiko Ludwig. The WSLA Framework: Specifyingand Monitoring Service Level Agreements for Web Services. J. NetworkSyst. Manage., 11(1), 2003.

[Kle75] L. Kleinrock. Queueing Systems, Volume 1: Theory. Wiley, 1975.

[KQCM09] David Kitchin, Adrian Quark, William R. Cook, and Jayadev Misra. TheOrc Programming Language. In FMOODS/FORTE, pages 1–25, 2009.

[KQM09] David Kitchin, Adrian Quark, and Jayadev Misra. Quicksort: CombiningConcurrency, Recursion, and Mutable Data Structures. Submitted to aFestschrift in honor of Tony Hoare on his 75th birthday, 2009.

[KvB04] Mariya Koshkina and Franck van Breugel. Modelling and verifying webservice orchestration by means of the concurrency workbench. volume 29,pages 1–10, 2004.

[LAP06] Alexander Lazovik, Marco Aiello, and Mike P. Papazoglou. Planning andmonitoring the execution of web service requests. Int. J. on Digital Libraries,6(3):235–246, 2006.

[LMSW06] Niels Lohmann, Peter Massuthe, Christian Stahl, and Daniela Weinberg.Analyzing Interacting BPEL Processes. In Business Process Management,pages 17–32, 2006.

[LR05] E. L. Lehmann and Joesph P. Romano. Testing Statistical Hypothesis.Springer-Verlag New York, LLC, 2005.

[LSW01] Zhen Liu, Mark S. Squillante, and Joel L. Wolf. On maximizing service-level-agreement profits. In ACM Conference on Electronic Commerce, pages213–223, 2001.

[LZ05] Cosimo Laneve and Gianluigi Zavattaro. Foundations of Web Transactions.In FoSSaCS, pages 282–298, 2005.

[MA01] D. A. Menascé and V. A. F. Almeida. Capacity planning for web services :metrics, models, and methods. Prentice Hall, 2001.

[Mar04] Axel Martens. Analysis and Re-Engineering of Web Services. In ICEIS (3),pages 419–426, 2004.

Bibliography 213

[MB02] Jean Mairesse and T. Bousch. Asymptotic height optimization for topicalIFS, Tetris heaps, and the finiteness conjecture. J. Amer. Math. Soc, 15:77–111, 2002.

[MBB+89] Marco Ajmone Marsan, Gianfranco Balbo, Andrea Bobbio, Giovanni Chiola,Gianni Conte, and Aldo Cumani. The effect of execution policies on thesemantics and analysis of stochastic petri nets. IEEE Trans. Software Eng.,15(7):832–846, 1989.

[MBC+98] Marco Ajmone Marsan, Gianfranco Balbo, Gianni Conte, Susanna Do-natelli, and Giuliana Franceschinis. Modelling with Generalized StochasticPetri Nets. SIGMETRICS Performance Evaluation Review, 26(2):2, 1998.

[MC07] Jayadev Misra and William R. Cook. Computation Orchestration. Softwareand System Modeling, 6(1):83–110, 2007.

[McM95] Kenneth L. McMillan. A Technique of State Space Search Based on Unfold-ing. Formal Methods in System Design, 6(1):45–65, 1995.

[Men02] Daniel A. Menascé. QoS Issues in Web Services. IEEE Internet Computing,6(6):72–75, 2002.

[MG99] Jean Mairesse and Stéphane Gaubert. Modeling and Analysis of Timed Petrinets using Heaps of Pieces. IEEE Trans. Autom. Control, 44(4):683–697,1999.

[Mil80] Robin Milner. A Calculus of Communicating Systems, volume 92 of LectureNotes in Computer Science. Springer, 1980.

[Mil99] Robin Milner. Communicating and mobile systems: the π-calculus. Cam-bridge University Press, New York, NY, USA, 1999.

[Mob] The Mobius Project. http://www.mobius.illinois.edu/.

[Mur89] Tadao Murata. Petri Nets: Properties, Analysis and Applications. Proceed-ings of the IEEE, 77(4):541–580, April 1989.

[NKP06] Xuan Thang Nguyen, Ryszard Kowalczyk, and Manh Tan Phan. Modellingand solving qos composition problem using fuzzy discsp. In ICWS, pages55–62, 2006.

[NPW81] Mogens Nielsen, Gordon D. Plotkin, and Glynn Winskel. Petri Nets, EventStructures and Domains, Part I. Theor. Comput. Sci., 13:85–108, 1981.

[OVvdA+05] Chun Ouyang, Eric Verbeek, Wil M. P. van der Aalst, Stephan Breutel,Marlon Dumas, and Arthur H. M. ter Hofstede. WofBPEL: A Tool forAutomated Analysis of BPEL Processes. In ICSOC, pages 484–489, 2005.

[OVvdA+07] Chun Ouyang, Eric Verbeek, Wil M. P. van der Aalst, Stephan Breutel,Marlon Dumas, and Arthur H. M. ter Hofstede. Formal semantics andanalysis of control flow in WS-BPEL. Sci. Comput. Program., 67(2-3):162–198, 2007.

214 Bibliography

[Pou07] Hélia Pouyllau. Algorithmes distribués pour la négociation de contrats deQualité de Service dans les réseaux multi-domaines. Phd. Dissertation,IRISA, December 2007.

[PW05] Frank Puhlmann and Mathias Weske. Using the pi-Calculus for FormalizingWorkflow Patterns. In Business Process Management, pages 153–168, 2005.

[QKCM] Adrian Quark, David Kitchin, William R. Cook, and Jayadev Misra. TheOrc Language Project. http://orc.csres.utexas.edu.

[RBHJ06a] Sidney Rosario, Albert Benveniste, Stefan Haar, and Claude Jard. Founda-tions for Web Services Orchestrations: Functional and QoS Aspects, Jointly.In ISoLA, pages 309–316, 2006.

[RBHJ06b] Sidney Rosario, Albert Benveniste, Stefan Haar, and Claude Jard. Net sys-tems semantics of Web Services Orchestrations modeled in Orc. TechnicalReport 1780, January 2006.

[RBHJ07] Sidney Rosario, Albert Benveniste, Stefan Haar, and Claude Jard. Proba-bilistic QoS and soft contracts for transaction based Web services. In ICWS,pages 126–133, 2007.

[RBHJ08] Sidney Rosario, Albert Benveniste, Stefan Haar, and Claude Jard. Proba-bilistic QoS and Soft Contracts for Transaction based Web Services. Trans-actions on Service Computing, 1(4):187–200, 2008.

[RBJ09a] Sidney Rosario, Albert Benveniste, and Claude Jard. A Theory of QoS forWeb service Orchestrations. International Journal of Web Services Research(JWSR), 2009. Accepted for Publication.

[RBJ09b] Sidney Rosario, Albert Benveniste, and Claude Jard. Flexible ProbabilisticQoS Management of Transaction Based Web Services Orchestrations. InICWS, pages 107–114, 2009.

[Rei85] Wolfgang Reisig. Petri Nets: An Introduction, volume 4 of Monographs inTheoretical Computer Science. An EATCS Series. Springer, 1985.

[Rei05] Wolfgang Reisig. Modeling- and Analysis Techniques for Web Services andBusiness Processes. In FMOODS, pages 243–258, 2005.

[RKB+07a] Sidney Rosario, David Kitchin, Albert Benveniste, William Cook, StefanHaar, and Claude Jard. Event Structure Semantics of Orc. Research ReportRR-6221, INRIA, 2007.

[RKB+07b] Sidney Rosario, David Kitchin, Albert Benveniste, William R. Cook, StefanHaar, and Claude Jard. Event Structure Semantics of Orc. In WS-FM,pages 154–168, 2007.

[RtHvdAM06] N. Russell, A.H.M. ter Hofstede, W.M.P. van der Aalst, and N. Mulyar.Workflow Control-Flow Patterns: A Revised View. Technical report, BPMCenter, 2006.

[SBS04] Gwen Salaün, Lucas Bordeaux, and Marco Schaerf. Describing and Reason-ing on Web Services using Process Algebra. In ICWS, pages 43–, 2004.

Bibliography 215

[SDM02] Akhil Sahai, Anna Durante, and Vijay Machiraju. Towards AutomatedSLA Management for Web Services. Technical Report HPL2001-310 (R.1),Hewlett Packard Laboratories, 2002.

[SL05] Hyung Gi Song and Kangsun Lee. sPAC (Web Services Performance Anal-ysis Center): Performance Analysis and Estimation Tool of Web Services.In Business Process Management, pages 109–119, 2005.

[SOA] The W3C Soap Specification. http://www.w3.org/TR/soap/.

[SSG97] Peter J. Smith, Mansoor Shafi, and Hongsheng Gao. Quick Simulation: AReview of Importance Sampling Techniques in Communications Systems.IEEE Journal on Selected Areas in Communications, 15(4):597–613, 1997.

[SWA] The SWAN project. http://swan.elibel.tm.fr.

[TGRS04] Min Tian, Andreas Gramm, Hartmut Ritter, and Jochen H. Schiller. Effi-cient Selection and Monitoring of QoS-Aware Web Services with the WS-QoS Framework. In Web Intelligence, pages 152–158, 2004.

[vBK05] Franck van Breugel and Mariya Koshkina. Dead-Path-Elimination inBPEL4WS. In ACSD, pages 192–201, 2005.

[vBK06] Franck van Breugel and Mariya Koshkina. Models and verificationof bpel. Technical report, York Univeristy, 2006. Available athttp://www.cse.yorku.ca/franck/research/drafts/tutorial.pdf.

[vdA97] Wil M. P. van der Aalst. Verification of Workflow Nets. In ICATPN, pages407–426, 1997.

[vdAtH05] Wil M. P. van der Aalst and Arthur H. M. ter Hofstede. Yawl: yet anotherworkflow language. Inf. Syst., 30(4):245–275, 2005.

[vdAtHKB03] Wil M. P. van der Aalst, Arthur H. M. ter Hofstede, Bartek Kiepuszewski,and Alistair P. Barros. Workflow Patterns. Distributed and ParallelDatabases, 14(1):5–51, 2003.

[vdAvDH+03] Wil M. P. van der Aalst, Boudewijn F. van Dongen, Joachim Herbst, LauraMaruster, Guido Schimm, and A. J. M. M. Weijters. Workflow mining: Asurvey of issues and approaches. Data Knowl. Eng., 47(2):237–267, 2003.

[vdAvH02] Wil M. P. van der Aalst and Kees M. van Hee. Workflow Management:Models, Methods, and Systems. MIT Press, 2002.

[Vir04] Mirko Viroli. Towards a Formal Foundation to Orchestration Languages.Electr. Notes Theor. Comput. Sci., 105:51–71, 2004.

[W3c03] QoS for Web Services: Requirements and Possible Approaches. W3CWorking Group Note, November 2003. http://www.w3c.or.kr/kr-office/TR/2003/ws-qos.

[Win86] Glynn Winskel. Event Structures. In Advances in Petri Nets, pages 325–392,1986.

216 Bibliography

[WKCM08] Ian Wehrman, David Kitchin, William R. Cook, and Jayadev Misra. Atimed semantics of orc. Theor. Comput. Sci., 402(2-3):234–248, 2008.

[XMe] XMethods. http://www.xmethods.net.

[XPD] The XPDL Standard. WorkFlow Management Coalition. http://www.

wfmc.org/xpdl.html.

[ZBN+04] Liangzhao Zeng, Boualem Benatallah, Anne H. H. Ngu, Marlon Dumas,Jayant Kalagnanam, and Henry Chang. QoS-Aware Middleware for WebServices Composition. IEEE Trans. Software Eng., 30(5):311–327, 2004.

[ZCL04] Chen Zhou, Liang-Tien Chia, and Bu-Sung Lee. Qos-aware and federatedenhancement for uddi. Int. J. Web Service Res., 1(2):58–85, 2004.

[ZLC07] Liangzhao Zeng, Hui Lei, and Henry Chang. Monitoring the qos for webservices. In ICSOC, pages 132–144, 2007.

List of Figures

1.1 L’orchestration Caronline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2 Orchestration Caronline, sans Timeouts et choix. . . . . . . . . . . . . . . . 171.3 Orchestration Caronline sans Timeouts. . . . . . . . . . . . . . . . . . . . . 18

2.1 The CarOnLine orchestration. . . . . . . . . . . . . . . . . . . . . . . . . . . 302.2 A Petri net. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3 The net after firing t0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.4 An occurrence net and one of its configuration (shaded). . . . . . . . . . . . 332.5 A Petri net and one of its branching process. . . . . . . . . . . . . . . . . . 342.6 The unfolding of the Petri net of figure 2.5. . . . . . . . . . . . . . . . . . . 342.7 The Syntax and Operational Semantics of Orc. . . . . . . . . . . . . . . . . 402.8 Rules for halt propogation in Orc expressions. . . . . . . . . . . . . . . . . . 412.9 CarOnLine orchestration without timeouts and data-dependant choices. . . 472.10 CarOnLine orchestration without timeouts. . . . . . . . . . . . . . . . . . . 482.11 A non-monotonic net. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.12 The WSLA monitoring architecture. . . . . . . . . . . . . . . . . . . . . . . 54

3.1 Generic Petri net box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.2 Petri net translation of a site call S(x1, x2. . . , xn). . . . . . . . . . . . . . . . 713.3 Site Call example : let(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.4 Petri net translation of Constant 0 and 1. . . . . . . . . . . . . . . . . . . . . 723.5 Translation for f >(x1, x2, ...xn)> g. . . . . . . . . . . . . . . . . . . . . . . . 733.6 Translating f | g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.7 Translating f where (x1, x2, ...xn) :∈ g. . . . . . . . . . . . . . . . . . . . . . 753.8 Example showing expression call/return . . . . . . . . . . . . . . . . . . . . . 763.9 Net for the main expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.10 CarOnLine and its firing rules. . . . . . . . . . . . . . . . . . . . . . . . . . . 793.11 Broadcast and its firing rules. . . . . . . . . . . . . . . . . . . . . . . . . . . 803.12 CarPrice and its firing rules. . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.1 The Syntax and Operational Semantics of Orc. . . . . . . . . . . . . . . . . 904.2 Heap Construction Example. . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1 A Pre-Asymmetric Event Structure. . . . . . . . . . . . . . . . . . . . . . . 995.2 Future of a configuration 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.3 Future of a configuration 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.4 Minimal Conflicts in AES 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.5 Minimal Conflicts in AES 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.6 Stopped configuration are not closed under concatenation. . . . . . . . . . . 103

218 List of Figures

6.1 Response times for StockQuote. . . . . . . . . . . . . . . . . . . . . . . . . . 1136.2 A simplified view of the CarOnLine orchestration. . . . . . . . . . . . . . . . 1146.3 Overall architecture of the TOrQuE tool. . . . . . . . . . . . . . . . . . . . 1196.4 Deriving response time for a fork-join pattern. . . . . . . . . . . . . . . . . . 1206.5 A labelled event structure of CarOnLine . . . . . . . . . . . . . . . . . . . . . 1216.6 T-location fit on measured delays. . . . . . . . . . . . . . . . . . . . . . . . . 1246.7 CDF for the measured delays of the six web services. . . . . . . . . . . . . . 1256.8 Empirical distribution of CarOnline’s latency . . . . . . . . . . . . . . . . . 1266.9 Monitoring of GarageA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.1 A non-monotonic orchestration . . . . . . . . . . . . . . . . . . . . . . . . . 1367.2 The Travel Planner orchestration . . . . . . . . . . . . . . . . . . . . . . . . 1377.3 The Modified Travel Planner orchestration. . . . . . . . . . . . . . . . . . . 1387.4 An OrchNet showing the dates of its tokens. . . . . . . . . . . . . . . . . . . 140

8.1 Schematic representation of the CarOnLine example. . . . . . . . . . . . . . . 1548.2 A simple example for QoS computation. . . . . . . . . . . . . . . . . . . . . 1578.3 Evaluation steps of the example of figure 8.2. . . . . . . . . . . . . . . . . . 1598.4 Cumulative distributions of sites in CarOnLine . . . . . . . . . . . . . . . . . . 173

9.1 Architecture of the Torque tool . . . . . . . . . . . . . . . . . . . . . . . . . 1759.2 DAGs for the different Orc expressions. . . . . . . . . . . . . . . . . . . . . . 1779.3 Building the partial order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1789.4 A snapshot of a partial ordered execution. . . . . . . . . . . . . . . . . . . . 1809.5 QoS Computation of an event. . . . . . . . . . . . . . . . . . . . . . . . . . 181

Documents

présentée par Sidney Rosario - Inriapeople.rennes.inria.fr/Albert.Benveniste/pub/SidneyPhD_0911.pdf · Sidney Rosario préparée à l’unité de recherche IRISA Équipe d’accueil