Distributed ArchitecturesSo�ware Architecture VO (706.706)
Roman Kern
Institute for Interactive Systems and Data Science, TU Graz
2020-01-15
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 1 / 58
IntroductionWhat is a distributed architecture?
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 2 / 58
Goals
Main goal: ScalabilityIn the optimal case a system scales linearly with the objective
E.g. number of transactions, size of the data, users
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 3 / 58
Solutions
Main solutions1 More e�icient algorithms2 Use faster hardware (e.g. more powerful machines)
I Scale vertically3 Add more machines
I Scale horizontally (scale out)I Parallel computing, distributed computing
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 4 / 58
Distributed Architectures
Parallel computing vs. distributed computingIn parallel computing all component share a common memory, typically threads within asingle programIn distributed computing each component has it own memory
I Typically in distributed computing the individual components are connected over a networkI Dedicated programming languages (or extensions) for parallel computing
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 5 / 58
Distributed Architectures
Distributed architecturesMost complex solution
I Due to the added parallelism of data and processingI And therefore an increased risk of errors
Overall latency will be the the one of the slowest machineI I.e. latency cannot be decreased via distributed solutions
Therefore the architecture needs to be sound
… focus on abstraction and composition
Note: In practice people are o�en biased towards technologies they know (e.g. SQL)
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 6 / 58
Distributed Architectures
http://nighthacks.com/roller/jag/resource/Fallacies.html
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 7 / 58
Distributed Architectures
Di�erent levels of complexityLowest complexity for operations, which can easily be distributed
I If they are independent and short enough be to executed independent from each otherI And if the data can be partitioned into independent parts
Higher degree of complexity for operations, which compute a single result on multiplenodes
I Synchronisation of data access also raises the complexity
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 8 / 58
Distributed Architectures
Additional aspects of complexityComplexity = intrinsic complexity + accidental complexity
I Intrinsic complexity of the problem itself (i.e. how hard is the problem)I Accidental complexity arises from the implementationI Low accidental complexity good for maintenanceI If high accidental complexity, you can never be sure it has been correctly implementedI The risk of errors rises with the complexity (i.e. it scales)
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 9 / 58
Distributed Architectures - Practical Advise
General advice to deal with complexityDesign for failure
I E.g. hardware will fail, human errors (bugs)I Limit their consequences
Push complexity into a single placeI E�ect of bugs will be minimisedI Also called “complexity isolation”
Avoid tricky operationsAvoid to store aggregates
I Prefer to deal with raw dataI Treat the data as immutable
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 10 / 58
Distributed Architectures - Theory
Locality of ReferenceGranularity
I CPU caches to storage position within the data centre
In a distributed context: bring the code to the dataBest practice: position the data in a way it is accessed
I Partition the data according to some criteriaI e.g. Rack awareness of the computing infrastructure
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 11 / 58
Architecture for Simple Applications
Share Nothing ArchitectureNo centralised data storage
Can scale almost infinitely
Used since the beginning of the 80ies, popularised by Google
Only a few systems allow for such an architecture
ImplicationsIf a system requires some sort of shared resources or orchestrated processing, the complexityrises.
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 12 / 58
Distributed Architectures BasicsBuilding blocks of distributed systems
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 13 / 58
Distributed Architectures Basics
Number of issues to address1 Serialisation2 Group membership3 Leader election4 Distributed locks5 Barriers6 Shared resources7 Configuration
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 14 / 58
Distributed Architectures Basics - Serialisation
Group serialisationTransform an object into a byte array (and back)
I Needed to transfer objects between nodes in a distributed environmentI Used to store objects (e.g. in databases)
Should work across programming languages
Therefore serialisation frameworks provide a Schema Definition Language
Examples: Thri�, Protocol Bu�ers, Avro
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 15 / 58
Distributed Architectures Basics - Group Membership
Group membershipWhen a single node comes online…
How does it know where to connect to?
How do the other members know of an added node?
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 16 / 58
Distributed Architectures Basics - Group Membership
⇒ Peer-to-peer architectural styleEach node is client, as well as server
Parts of the bootstrapping mechanism
Dynamic vs. static
Fully dynamic via broadcast/multicast within local area networks (UDP)
Centralised P2P - e.g. central login components/servers
Static lists of group members (needs to be configurable)
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 17 / 58
Distributed Architectures Basics - Leader Election
Leader electionNot all nodes are equal, e.g. centralised components in P2P networks
Single node acts asmaster, others are workersSome nodes have additional responsibilities (supernodes)
Having centralised components makes some functionality easier to implement
E.g. assign work-load
Disadvantage: might lead to a single point of failure
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 18 / 58
Distributed Architectures Basics - Leader Election
⇒ Client-server architectural styleOnce the leader has been elected, it takes over the role of the server
All other group members then act as clients
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 19 / 58
Distributed Architectures Basics - Leader Election
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 20 / 58
Distributed Architectures Basics - Leader Election
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 21 / 58
Distributed Architectures Basics - Leader Election
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 22 / 58
Distributed Architectures Basics - Locks
Distributed locksRestrict access to shared resources to only a single node at a time
E.g. allow only a single node to write to a file
May yield many non-trivial problems, for example deadlocks or race conditions
Distributed locks without central component are very complex to realise
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 23 / 58
Distributed Architectures Basics - Locks
Distributed locks example⇒ Blackboard architectural style
The shared repository is responsible to orchestrate the access to a locks
Notifies waiting nodes once the lock has been li�ed
This functionality is o�en coupled with the elected leader
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 24 / 58
Distributed Architectures Basics - Barriers
BarriersSpecific type of distributed lock
Sychronise multiple nodes
E.g. multiple nodes should wait until a certain state has been reached
Used when a part of the processing can be done in parallel and some parts cannot bedistributed
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 25 / 58
Distributed Architectures Basics - Shared Resources
Shared ResourcesIf all nodes need to be able to access a common data-structure
Read-only vs. read-write
If read-write, the complexity rises due to synchronisation issues
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 26 / 58
Distributed Architectures Basics - Zookeeper
Example: Apache ZookeeperZookeeper is a framework/library
Used by Yahoo!, LinkedIn, Facebook
Initially developed by Yahoo!, Now managed by ApacheFeatures
I Coordination kernelI File-system like APII Synchronisation, Watches, LocksI ConfigurationI Shared data
Alternative approaches: Google Chubby, Microso� Centrifuge
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 27 / 58
Distributed Architectures Basics - DFS
Distributed File SystemsVirtual file system distributed over multiple machines
I Based on a local file system
Same semantics like traditional file systemsI Folders & files
Files are internally split into smaller blocks (e.g. 64MB)
Blocks are redundantly stored on multiple machines
Logic to record, which block is stored on which machine
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 28 / 58
Distributed Architectures Basics - DFS
Examples of Distributed File SystemsGoogle File System (GFS)
I Based on a local file system
Hadoop Distributed File System (HDFS)I Portable, open-source solutionI Consists of (a single) name node and (many) data nodes
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 29 / 58
Distributed Architectures Basics - Sharding
ShardingSplit the data horizontally
Each node in a network may manage a separate chunk of the data
For example in web search engines
Each node is responsible for a number of web-pages
Returns search results from the local collection
All results from all shards are then combined into a single result
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 30 / 58
Distributed Architectures Basics - Sharding
Sharding example
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 31 / 58
Distributed Architectures Basics - Sharding
Sharding - PropertiesNeed redundancy, in case a node goes down
Level of redundancy depends on the data
E.g. if a node with low-tra�ic web-pages goes down, it might not even have an impact onthe quality of the search results (at least on the first page)
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 32 / 58
Distributed Architectures Basics - �ality A�ribute
Anarchic ScalabilityDesign for a large distributed system
Parts of the system are developed independently from each other
Therefore the system need to be designed for malfunctioning or even maliciouslycomponentsThe web is an example where anarchic scalability is one of the most important aspects
I Thus clients are not expected to know all serversI … and servers are not supposed to know all clientsI Another consequence is the link integrity → only one-directional
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 33 / 58
Asynchronous ArchitecturesThe default choice for scalable solutions
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 34 / 58
Asynchronous Architectures - Motivation
Synchronous architecturesEach call terminates a�er the request has been completely processed
E.g. traditional data-centric architectures (databases)
Pro: Easy to use and predictive behaviour
Con: Does not deal well with load (need to plan with the worst case)
Asynchronous architecturesThe call returns before the request has been processed
The processing happens in the background
Con: Non predictive behaviour
Pro: Load can be distributed over time, thus be�er scalability
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 35 / 58
Asynchronous Architectures - Worker
Asynchronous worker architecture
The client issues a call without waiting for the result
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 36 / 58
Asynchronous Architectures - Worker
Asynchronous worker architectureThe client does not wait for the end of the processing
I.e. does not track the result
If the worker fails, all currently processed request will fail
�eue MotivationIntroduce a new component to track the processing of the requests, e.g. a queueing system
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 37 / 58
Asynchronous Architectures - �eue & Worker
Asynchronous queue & worker architecture
The client puts the request into the queue, the workers poll the queue for requests
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 38 / 58
Asynchronous Architectures - �eue & Worker
�eue and worker architectureThe client is decoupled from the worked via a queue component
The queue component is (usually) responsible to track the status of the requests
The size of the queue depends on the current load
If a worker fails, the request will be put back into the queue
Default architecture choice for many distributed systems
Note: O�en, the queue itself might be distributed and might store the requests (e.g. in adatabase)
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 39 / 58
Asynchronous Architectures - �eue & Worker
�eue and worker architecture aspectsThe number of workers/applications may vary
I Single consumer vs. multi consumer queuesI Multiple independent workers that execute the same code vs.I … each worker has a di�erent task
Typically queues are FIFO (first in, first out)I Some queue support items with higher priority
In some configurations the application is responsible to track the statusA typical application may consist of multiple layers of queues and workers
I I.e. the output of a worker is fed as input to another queue
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 40 / 58
Asynchronous Architectures - �eue & Worker
�eue & worker propertiesThe architecture is straightforward, but not simple
Race conditions may occurPartitioning might be hard to implement
I Bad for fault tolerance
Tedious to buildI Much code necessary for serialisation/routing logic/monitoring of queues/etc.
Complex deployment for all workers and queues
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 41 / 58
Asynchronous Architectures - Stream processing
Stream processingThe queue & worker architecture can be used for stream processing
I Not suitable for real-time applications, due to delay from the queue
Continuous stream of incoming requestsI I.e. the queue is perpetually filledI O�en these reflect events in such se�ings
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 42 / 58
Asynchronous Architectures - Stream Processing
Two basic stream processing typesOne-at-a-time
I Each event is processed individually
Micro-batchI Multiple events are combined into a single batch
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 43 / 58
Asynchronous Architectures - Stream Processing
Basic execution semanticsAt-least-once
I Each event is guaranteed to be executedI But might be processed more o�en (thus the same result might be reported multiple times)I Might introduce inaccuracies
At-most-onceI Each event is optionally executed, but no more than once
Exactly-onceI Each event is guaranteed to be executed only once
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 44 / 58
Asynchronous Architectures - Stream Processing
Trade-o� between the stream processing types
One-at-a-time Micro-Batch
Lower Latency XHigher Throughput XAt-least-once semantic X XExactly-once semantic (sometimes) XSimpler programming model X
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 45 / 58
Asynchronous Architectures - Stream Processing
ImplicationsExactly-once can be achieved via strictly ordered processing
I Fully process an event before continuing
More e�icient for micro-batchesI Multiple batches in parallelI Need to store the batch-id of the last successfully processed batch
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 46 / 58
Asynchronous Architectures - Stream Processing
Strength Examples
Batch High throughput Hadoop, SparkOne-at-a-time Low latency StormMicro-batch Tradeo� Trident, Spark
Table: Comparison of the main distributed architectures
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 47 / 58
Lambda ArchitectureBig Data Architecture
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 48 / 58
Lambda Architecture - Motivation
Target scenarioLarge amount of data
I Too big for a single machine→ distributed system
Data is continuously updatedI Mostly just additions, i.e. new data
Majority of operations are read-onlyI E�ectively, queries on the data
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 49 / 58
Lambda Architecture - Motivation
Typical solutionThe typical solution would be a data-centric architecture
Data is stored in a distributed traditional RDBMS or noSQL databases
Updates are wri�en into the database (via transactions)
The queries are computed in a distributed manner, over all the data
As some queries are too slow, they need to be precomputed
… and the precomputed results need to be updated (as soon a new data comes in)
Therefore this can also be called a incremental architecture
The incremental architecture is too complex, mainly because of i) the (distributed) transactionsupport and ii) the complex algorithms to merge the new data.
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 50 / 58
Lambda Architecture - Motivation
Target propertiesRobustness & fault tolerance
Scalability
Generalisation
Extensibility
Ad hoc queries
Minimal maintainance
Debuggability
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 51 / 58
Lambda Architecture - Overview
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 52 / 58
Lambda Architecture - Overview
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 53 / 58
Kappa ArchitectureSimpler Alternative to Lambda Architecture
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 54 / 58
Kappa Architecture - Introduction
8 Rules of Stream Processing (Stonebraker et at., 2005)1 Keep the Data Moving2 �ery using SQL on Streams (StreamSQL)3 Handle Stream Imperfections (Delayed, Missing and Out-of-Order Data)4 Generate Predictable Outcomes5 Integrate Stored and Streaming Data6 Guarantee Data Safety and Availability7 Partition and Scale Applications Automatically8 Process and Respond Instantaneously
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 55 / 58
Kappa Architecture - Motivation
Basic idea behind Kappa ArchitectureThe Lambda Architecture is relatively complex
Most of the problems can be solved via a streaming approach→ treat the batch processing as if it were a stream of data
I Have the framework optimise for the two cases (stream vs. batch)
http://www.kappa-architecture.com/
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 56 / 58
Kappa Architecture - Example
Example of streaming framework: Apache KafkaMessage queueing system
Stores all data for a given timespan, e.g. 2 daysTopics
I Partitions within topicsI Partitions are chosen by the application, i.e. both producers and consumers need to agree on
the partitions
Brocker
Each message within a partition has an o�set
Clients have a unique id and remember their o�set
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 57 / 58
The End
Roman Kern (ISDS, TU Graz) Distributed Architectures 2020-01-15 58 / 58