Upload
doomachaley
View
218
Download
0
Embed Size (px)
Citation preview
8/9/2019 lec13-multiprocessors.ppt
1/69
Lecture 13: Multiprocessors
http://list.zju.edu.cn/kaibu/comparch
8/9/2019 lec13-multiprocessors.ppt
2/69
Assignment 4 due June 3
Lab 5 demo due June 10
Quiz June 3
8/9/2019 lec13-multiprocessors.ppt
3/69
Chapter 5.1–5.4
8/9/2019 lec13-multiprocessors.ppt
4/69
IL !" #Linstruction-levelparallelism
thread-level
parallelism
8/9/2019 lec13-multiprocessors.ppt
5/69
MIM$multiple instruction streams
multiple %ata streams
Each processor fetches its own instructions
and operates on its own data
8/9/2019 lec13-multiprocessors.ppt
6/69
multiprocessorsmultiple instruction streams
multiple %ata streams
computers consisting of tightly coupled processors
Coordination and usage
are typically controlled by
a single OS
Share memory
through a shared
address space
8/9/2019 lec13-multiprocessors.ppt
7/69
multiprocessorsmultiple instruction streams
multiple %ata streams
computers consisting of tightly coupled processors
Muticore
Single-chip systems with
multiple cores
Multi-chip computers
each chip may be a
multicore sys
8/9/2019 lec13-multiprocessors.ppt
8/69
&'ploiting #L
two sotware models
! arallel processing
the e"ecution o a ti#htl$ coupled set othreads collaboratin# on a sin#le disk
! (e)uest!le*el parallelism
the e"ecution o multiple% relati&el$independent processes that ma$ori#inate rom one or more users
8/9/2019 lec13-multiprocessors.ppt
9/69
+utline
! 'ultiprocessor (rchitecture
! )entralized *hared+'emor$ (rch
! ,istributed shared memor$ anddirector$+based coherence
8/9/2019 lec13-multiprocessors.ppt
10/69
+utline
! 'ultiprocessor (rchitecture
! )entralized *hared+'emor$ (rch
! ,istributed shared memor$ anddirector$+based coherence
8/9/2019 lec13-multiprocessors.ppt
11/69
Multiprocessor Architecture
! (ccordin# to memor$ or#anization andinterconnect strate#$
! -wo classes
s,mmetric-centralize% share%!memor, multiprocessors /M0
%istribute% share% memor,multiprocessors $M0
8/9/2019 lec13-multiprocessors.ppt
12/69
centralize% share%!memor,
eight or fewer cores
8/9/2019 lec13-multiprocessors.ppt
13/69
centralize% share%!memor,
Share a single centralized memory
All processors have equal access to
8/9/2019 lec13-multiprocessors.ppt
14/69
centralize% share%!memor,
All processors have uniform latency from memory
Uniform memory access ( UMA ) multiprocessors
8/9/2019 lec13-multiprocessors.ppt
15/69
8/9/2019 lec13-multiprocessors.ppt
16/69
%istribute% share% memor,
more processors
istributing mem among the nodes
increases bandwidth ! reduces local-mem latency
physically distributed memory
8/9/2019 lec13-multiprocessors.ppt
17/69
%istribute% share% memor,
more processors
NUMA" nonuniform memory access
access time depends on data word loc in mem
physically distributed memory
8/9/2019 lec13-multiprocessors.ppt
18/69
%istribute% share% memor,
more processors
Disadvantages:
more comple# inter-processor communication
more comple# software to handle distributed mem
physically distributed memory
8/9/2019 lec13-multiprocessors.ppt
19/69
2ur%les o arallel rocessing
! imited parallelism a&ailable inpro#rams
! elati&el$ hi#h cost o communications
8/9/2019 lec13-multiprocessors.ppt
20/69
2ur%les o arallel rocessing
! imited parallelism a&ailable inpro#rams
makes it diicult to achie&e #oodspeedups in an$ parallel processor
! elati&el$ hi#h cost o communications
8/9/2019 lec13-multiprocessors.ppt
21/69
2ur%les o arallel rocessing
! imited parallelism aects speedup
! &'ample
to achie&e a speedup o 0 with 100processors% what raction o the ori#inalcomputation can be seuential2
Anserb$ (mdahls law
8/9/2019 lec13-multiprocessors.ppt
22/69
2ur%les o arallel rocessing
! imited parallelism aects speedup
! &'ample
to achie&e a speedup o 0 with 100processors% what raction o the ori#inalcomputation can be seuential2
Anserb$ (mdahls law
8/9/2019 lec13-multiprocessors.ppt
23/69
2ur%les o arallel rocessing
! imited parallelism aects speedup
! &'ample
to achie&e a speedup o 0 with 100processors% what raction o the ori#inalcomputation can be seuential2
Anserb$ (mdahls law
4ractionse 5 1 6 4ractionparallel
5 0.789
8/9/2019 lec13-multiprocessors.ppt
24/69
2ur%les o arallel rocessing
! imited parallelism a&ailable inpro#rams
makes it diicult to achie&e #oodspeedups in an$ parallel processor
in practice% pro#rams oten use lessthan the ull complement o theprocessors when runnin# in parallelmode
! elati&el$ hi#h cost o communications
8/9/2019 lec13-multiprocessors.ppt
25/69
2ur%les o arallel rocessing
! imited parallelism a&ailable inpro#rams
! elati&el$ hi#h cost o communications
in&ol&es the lar#e latenc$ o remoteaccess in a parallel processor
8/9/2019 lec13-multiprocessors.ppt
26/69
2ur%les o arallel rocessing
! elati&el$ hi#h cost o communications
in&ol&es the lar#e latenc$ o remoteaccess in a parallel processor
&'ample
app runnin# on a 37+processor ';
700 ns or reerence to a remote memclock rate 7.0 0.8
Q: how much aster i no
communication &s i 0.79 remote re2
8/9/2019 lec13-multiprocessors.ppt
27/69
2ur%les o arallel rocessing
! &'ample
app runnin# on a 37+processor ';
700 ns or reerence to a remote memclock rate 7.0 0.8
Q: how much aster i no
communication &s i 0.79 remote re2Anser
i 0.79 remote reerence
8/9/2019 lec13-multiprocessors.ppt
28/69
2ur%les o arallel rocessing
! &'ample
app runnin# on a 37+processor ';
700 ns or reerence to a remote memclock rate 7.0 0.8
Q: how much aster i no
communication &s i 0.79 remote re2Anser
i 0.79 remote re% emote re cost
8/9/2019 lec13-multiprocessors.ppt
29/69
2ur%les o arallel rocessing
! &'ample
app runnin# on a 37+processor ';
700 ns or reerence to a remote memclock rate 7.0 0.8
Q: how much aster i no
communication &s i 0.79 remote re2Anser
i 0.79 remote re
no comm is 1.3/0.8 5 7.? times aster
8/9/2019 lec13-multiprocessors.ppt
30/69
2ur%les o arallel rocessing
solutions
! insuicient parallelism
new sotware al#orithms that oer better
parallel perormancesotware s$stems that ma"imize theamount o time spent e"ecutin# with theull complement o processors
! long!latenc, remote communication
b$ architecture: cachin# shared data
b$ pro#rammer: multithreadin#%preetchin#
8/9/2019 lec13-multiprocessors.ppt
31/69
+utline
! 'ultiprocessor (rchitecture
! )entralized *hared+'emor$ (rch
! ,istributed shared memor$ anddirector$+based coherence
8/9/2019 lec13-multiprocessors.ppt
32/69
Centralize% /hare%!Memor,
$arge% multilevel cachesreduce mem bandwidth demands
8/9/2019 lec13-multiprocessors.ppt
33/69
Centralize% /hare%!Memor,
Cache private&shared data
8/9/2019 lec13-multiprocessors.ppt
34/69
8/9/2019 lec13-multiprocessors.ppt
35/69
Centralize% /hare%!Memor,
shared dataused by multiple processors
may be replicated in multiple caches to reduce
access latency% required mem bw% contention
8/9/2019 lec13-multiprocessors.ppt
36/69
Centralize% /hare%!Memor,
shared dataused by multiple processors
may be replicated in multiple caches to reduce
access latency% required mem bw% contention
w/o additional precautions
different processors can have different values
for the same memory location
8/9/2019 lec13-multiprocessors.ppt
37/69
Cache Coherence roblem
write-through cache
8/9/2019 lec13-multiprocessors.ppt
38/69
8/9/2019 lec13-multiprocessors.ppt
39/69
Cache Coherence roblem
! ( memor$ s$stem is Coherent i an$read o a data item returns the mostrecentl$ written &alue o that data item
! -wo critical aspects
coherence: deines what &alues canbe returned b$ a read
consistenc,: determines when awritten &alue will be returned b$ a read
8/9/2019 lec13-multiprocessors.ppt
40/69
Coherence ropert,
! ( read b$ processor ; to location A thatollows a write b$ ; to A% with writes oA b$ another processor occurrin#
between the write and the read b$ ;%
alwa$s returns the &alue written b$ ;.
preserves program order
8/9/2019 lec13-multiprocessors.ppt
41/69
Coherence ropert,
! ( read b$ a processor to location A thatollows a write b$ anther processor to Areturns the written &alue i the read the
write are suicientl$ separated in timeand no other writes to A occur betweenthe two accesses.
8/9/2019 lec13-multiprocessors.ppt
42/69
Coherence ropert,
! Write serialization
two writes to the same location b$ an$two processors are seen in the sameorder b$ all processors
8/9/2019 lec13-multiprocessors.ppt
43/69
Consistenc,
! When a written &alue will be seen isimportant
! 4or e"ample% a write o A on oneprocessor precedes a read o A onanother processor b$ a &er$ smalltime% it ma$ be impossible to ensure
that the read returns the &alue o thedata written%
since the written data ma$ not e&en
ha&e let the processor at that point
8/9/2019 lec13-multiprocessors.ppt
44/69
Cache Coherence rotocols
! $irector, base%
the sharin# status o a particular blocko ph$sical memor$ is kept in onelocation% called directory
! /nooping
e&er$ cache that has a cop$ o the datarom a block o ph$sical memor$ couldtrack the sharin# status o the block
8/9/2019 lec13-multiprocessors.ppt
45/69
/nooping Coherence rotocol
! 6rite in*ali%ation protocol
in&alidates other copies on a write
e"clusi&e access ensures that no otherreadable or writable copies o an iteme"ist when the write occurs
8/9/2019 lec13-multiprocessors.ppt
46/69
/nooping Coherence rotocol
! 6rite in*ali%ation protocol
in&alidates other copies on a write
write-back cache
8/9/2019 lec13-multiprocessors.ppt
47/69
/nooping Coherence rotocol
! 6rite up%ate-broa%cast protocol
update all cached copies o a data itemwhen that item is written
consumes more bandwidth
8/9/2019 lec13-multiprocessors.ppt
48/69
6rite In*ali%ation rotocol
! -o perorm an in&alidate% the processorsimpl$ acuires bus access andbroadcasts the address to be
in&alidated on the bus! (ll processors continuousl$ snoop on
the bus% watchin# the addresses
! -he processors check whether theaddress on the bus is in their cache
i so% the correspondin# data in the
cache is in&alidated.
8/9/2019 lec13-multiprocessors.ppt
49/69
6rite In*ali%ation rotocol
three block states (MSI protocol)
! In*ali%
! /hare%indicates that the block in the pri&atecache is potentiall$ shared
! Mo%iie%indicates that the block has beenupdated in the pri&ate cache
implies that the block is e'clusi*e
8/9/2019 lec13-multiprocessors.ppt
50/69
6rite In*ali%ation rotocol
8/9/2019 lec13-multiprocessors.ppt
51/69
6rite In*ali%ation rotocol
8/9/2019 lec13-multiprocessors.ppt
52/69
6rite In*ali%ation rotocol
8/9/2019 lec13-multiprocessors.ppt
53/69
8/9/2019 lec13-multiprocessors.ppt
54/69
M/I &'tensions
! M+&/I
owned: indicates that the associatedblock is owned b$ that cache and out+o+date in memor$
'odiied + Cwned without writin# theshared block to memor$
8/9/2019 lec13-multiprocessors.ppt
55/69
increase mem bandwidth
through multi-bus ' interconnection networ(
and multi-ban( cache
8/9/2019 lec13-multiprocessors.ppt
56/69
Coherence Miss
! #rue sharing miss
irst write b$ a processor to a sharedcache block causes an in&alidation to
establish ownership o that block
another processor reads a modiiedword in that cache block
! 7alse sharing miss
8/9/2019 lec13-multiprocessors.ppt
57/69
Coherence Miss
! #rue sharing miss
! 7alse sharing miss
a sin#le &alid bit per cache blockoccurs when a block is in&alidated Danda subseuent reerence causes a missEbecause some word in the block% otherthan the one bein# read% is written into
8/9/2019 lec13-multiprocessors.ppt
58/69
Coherence Miss
! &'ample
assume words "1 and "7 are in thesame cache block% which is in shared
state in the caches o both ;1 and ;7.
identi$ each miss as a true sharin#miss% a alse sharin# miss% or a hit2
8/9/2019 lec13-multiprocessors.ppt
59/69
Coherence Miss
! &'ample
1. true sharing misssince "1 was read b$ ;7 and needs tobe in&alidated rom ;7
8/9/2019 lec13-multiprocessors.ppt
60/69
8/9/2019 lec13-multiprocessors.ppt
61/69
Coherence Miss
! &'ample
3. alse sharing misssince the block is in shared state% needto in&alidate it to write
but ;7 read "7 rather than "1
8/9/2019 lec13-multiprocessors.ppt
62/69
Coherence Miss
! &'ample
4. alse sharing missneed to in&alidate the block
;7 wrote "1 rather than "7
8/9/2019 lec13-multiprocessors.ppt
63/69
Coherence Miss
! &'ample
5. true sharing misssince the &alue bein# read was writtenb$ ;7 Din&alid + sharedE
8/9/2019 lec13-multiprocessors.ppt
64/69
+utline
! 'ultiprocessor (rchitecture
! )entralized *hared+'emor$ (rch
! ,istributed shared memor$ anddirector$+based coherence
A directory is added to each node)
8/9/2019 lec13-multiprocessors.ppt
65/69
y
Each directory trac(s the caches that share the
memory addresses of the portion of memory in
the node)need not broadcast for on every cache miss
$irector,!base%
8/9/2019 lec13-multiprocessors.ppt
66/69
$irector,!base%Cache Coherence rotocol
)ommon cache states
! /hare%
one or more nodes ha&e the block cached%
and the &alue in memor$ is up to date Daswell as in all the cachesE
! 9ncache%
no node has a cop$ o the cache block
! Mo%iie%e"actl$ one node has a cop$ o the cacheblock% and it has written the block% so thememor$ cop$ is out o date
8/9/2019 lec13-multiprocessors.ppt
67/69
$irector, rotocol
state transition diagram
for an individual cache bloc(
requests from outside the node in gray
8/9/2019 lec13-multiprocessors.ppt
68/69
$irector, rotocol
state transition diagram
for the directory
All actions in gray
because they*re all e#ternally caused
8/9/2019 lec13-multiprocessors.ppt
69/69