lec13-multiprocessors.ppt

Embed Size (px)

Citation preview

  • 8/9/2019 lec13-multiprocessors.ppt

    1/69

     

    Lecture 13: Multiprocessors

    Kai [email protected]

    http://list.zju.edu.cn/kaibu/comparch

  • 8/9/2019 lec13-multiprocessors.ppt

    2/69

     

    Assignment 4 due June 3

    Lab 5 demo due June 10

    Quiz June 3

  • 8/9/2019 lec13-multiprocessors.ppt

    3/69

     

    Chapter 5.1–5.4

  • 8/9/2019 lec13-multiprocessors.ppt

    4/69

     

    IL !" #Linstruction-levelparallelism

    thread-level

    parallelism

  • 8/9/2019 lec13-multiprocessors.ppt

    5/69

     

    MIM$multiple instruction streams

    multiple %ata streams

    Each processor fetches its own instructions

    and operates on its own data

  • 8/9/2019 lec13-multiprocessors.ppt

    6/69

     

    multiprocessorsmultiple instruction streams

    multiple %ata streams

    computers consisting of tightly coupled processors

    Coordination and usage

    are typically controlled by

    a single OS

    Share memory

    through a shared

    address space

  • 8/9/2019 lec13-multiprocessors.ppt

    7/69

     

    multiprocessorsmultiple instruction streams

    multiple %ata streams

    computers consisting of tightly coupled processors

    Muticore

    Single-chip systems with

    multiple cores

    Multi-chip computers

    each chip may be a

    multicore sys

  • 8/9/2019 lec13-multiprocessors.ppt

    8/69

     

    &'ploiting #L

    two sotware models

    ! arallel processing

    the e"ecution o a ti#htl$ coupled set othreads collaboratin# on a sin#le disk

    ! (e)uest!le*el parallelism

    the e"ecution o multiple% relati&el$independent processes that ma$ori#inate rom one or more users

  • 8/9/2019 lec13-multiprocessors.ppt

    9/69

     

    +utline

    ! 'ultiprocessor (rchitecture

    ! )entralized *hared+'emor$ (rch

    ! ,istributed shared memor$ anddirector$+based coherence

  • 8/9/2019 lec13-multiprocessors.ppt

    10/69

     

    +utline

    ! 'ultiprocessor (rchitecture

    ! )entralized *hared+'emor$ (rch

    ! ,istributed shared memor$ anddirector$+based coherence

  • 8/9/2019 lec13-multiprocessors.ppt

    11/69

     

    Multiprocessor Architecture

    ! (ccordin# to memor$ or#anization andinterconnect strate#$

    ! -wo classes

    s,mmetric-centralize% share%!memor, multiprocessors /M0

    %istribute% share% memor,multiprocessors $M0

  • 8/9/2019 lec13-multiprocessors.ppt

    12/69

     

    centralize% share%!memor,

    eight or fewer cores

  • 8/9/2019 lec13-multiprocessors.ppt

    13/69

     

    centralize% share%!memor,

    Share a single centralized memory

    All processors have equal access to

  • 8/9/2019 lec13-multiprocessors.ppt

    14/69

     

    centralize% share%!memor,

    All processors have uniform latency from memory

    Uniform memory access ( UMA ) multiprocessors

  • 8/9/2019 lec13-multiprocessors.ppt

    15/69

  • 8/9/2019 lec13-multiprocessors.ppt

    16/69

     

    %istribute% share% memor,

    more processors

    istributing mem among the nodes

    increases bandwidth ! reduces local-mem latency

    physically distributed memory

  • 8/9/2019 lec13-multiprocessors.ppt

    17/69

     

    %istribute% share% memor,

    more processors

    NUMA" nonuniform memory access

    access time depends on data word loc in mem

    physically distributed memory

  • 8/9/2019 lec13-multiprocessors.ppt

    18/69

     

    %istribute% share% memor,

    more processors

    Disadvantages:

    more comple# inter-processor communication

    more comple# software to handle distributed mem

    physically distributed memory

  • 8/9/2019 lec13-multiprocessors.ppt

    19/69

     

    2ur%les o arallel rocessing

    ! imited parallelism a&ailable inpro#rams

    ! elati&el$ hi#h cost o communications

  • 8/9/2019 lec13-multiprocessors.ppt

    20/69

     

    2ur%les o arallel rocessing

    ! imited parallelism a&ailable inpro#rams

    makes it diicult to achie&e #oodspeedups in an$ parallel processor

    ! elati&el$ hi#h cost o communications

  • 8/9/2019 lec13-multiprocessors.ppt

    21/69

     

    2ur%les o arallel rocessing

    ! imited parallelism aects speedup

    ! &'ample

    to achie&e a speedup o 0 with 100processors% what raction o the ori#inalcomputation can be seuential2

    Anserb$ (mdahls law

  • 8/9/2019 lec13-multiprocessors.ppt

    22/69

     

    2ur%les o arallel rocessing

    ! imited parallelism aects speedup

    ! &'ample

    to achie&e a speedup o 0 with 100processors% what raction o the ori#inalcomputation can be seuential2

    Anserb$ (mdahls law

  • 8/9/2019 lec13-multiprocessors.ppt

    23/69

     

    2ur%les o arallel rocessing

    ! imited parallelism aects speedup

    ! &'ample

    to achie&e a speedup o 0 with 100processors% what raction o the ori#inalcomputation can be seuential2

    Anserb$ (mdahls law

    4ractionse 5 1 6 4ractionparallel

      5 0.789

  • 8/9/2019 lec13-multiprocessors.ppt

    24/69

     

    2ur%les o arallel rocessing

    ! imited parallelism a&ailable inpro#rams

    makes it diicult to achie&e #oodspeedups in an$ parallel processor

    in practice% pro#rams oten use lessthan the ull complement o theprocessors when runnin# in parallelmode

    ! elati&el$ hi#h cost o communications

  • 8/9/2019 lec13-multiprocessors.ppt

    25/69

     

    2ur%les o arallel rocessing

    ! imited parallelism a&ailable inpro#rams

    ! elati&el$ hi#h cost o communications

    in&ol&es the lar#e latenc$ o remoteaccess in a parallel processor

  • 8/9/2019 lec13-multiprocessors.ppt

    26/69

     

    2ur%les o arallel rocessing

    ! elati&el$ hi#h cost o communications

    in&ol&es the lar#e latenc$ o remoteaccess in a parallel processor

    &'ample

    app runnin# on a 37+processor ';

    700 ns or reerence to a remote memclock rate 7.0 0.8

    Q: how much aster i no

    communication &s i 0.79 remote re2

  • 8/9/2019 lec13-multiprocessors.ppt

    27/69

     

    2ur%les o arallel rocessing

    ! &'ample

    app runnin# on a 37+processor ';

    700 ns or reerence to a remote memclock rate 7.0 0.8

    Q: how much aster i no

    communication &s i 0.79 remote re2Anser

    i 0.79 remote reerence

  • 8/9/2019 lec13-multiprocessors.ppt

    28/69

     

    2ur%les o arallel rocessing

    ! &'ample

    app runnin# on a 37+processor ';

    700 ns or reerence to a remote memclock rate 7.0 0.8

    Q: how much aster i no

    communication &s i 0.79 remote re2Anser

    i 0.79 remote re% emote re cost

  • 8/9/2019 lec13-multiprocessors.ppt

    29/69

     

    2ur%les o arallel rocessing

    ! &'ample

    app runnin# on a 37+processor ';

    700 ns or reerence to a remote memclock rate 7.0 0.8

    Q: how much aster i no

    communication &s i 0.79 remote re2Anser

    i 0.79 remote re 

    no comm is 1.3/0.8 5 7.? times aster

  • 8/9/2019 lec13-multiprocessors.ppt

    30/69

     

    2ur%les o arallel rocessing

    solutions

    ! insuicient parallelism

    new sotware al#orithms that oer better

    parallel perormancesotware s$stems that ma"imize theamount o time spent e"ecutin# with theull complement o processors

    ! long!latenc, remote communication

    b$ architecture: cachin# shared data

    b$ pro#rammer: multithreadin#%preetchin#

  • 8/9/2019 lec13-multiprocessors.ppt

    31/69

     

    +utline

    ! 'ultiprocessor (rchitecture

    ! )entralized *hared+'emor$ (rch

    ! ,istributed shared memor$ anddirector$+based coherence

  • 8/9/2019 lec13-multiprocessors.ppt

    32/69

     

    Centralize% /hare%!Memor,

    $arge% multilevel cachesreduce mem bandwidth demands

  • 8/9/2019 lec13-multiprocessors.ppt

    33/69

     

    Centralize% /hare%!Memor,

    Cache private&shared data

  • 8/9/2019 lec13-multiprocessors.ppt

    34/69

  • 8/9/2019 lec13-multiprocessors.ppt

    35/69

     

    Centralize% /hare%!Memor,

    shared dataused by multiple processors

    may be replicated in multiple caches to reduce

    access latency% required mem bw% contention

  • 8/9/2019 lec13-multiprocessors.ppt

    36/69

     

    Centralize% /hare%!Memor,

    shared dataused by multiple processors

    may be replicated in multiple caches to reduce

    access latency% required mem bw% contention

    w/o additional precautions

    different processors can have different values

    for the same memory location

  • 8/9/2019 lec13-multiprocessors.ppt

    37/69

     

    Cache Coherence roblem

    write-through cache

  • 8/9/2019 lec13-multiprocessors.ppt

    38/69

  • 8/9/2019 lec13-multiprocessors.ppt

    39/69

     

    Cache Coherence roblem

    ! ( memor$ s$stem is Coherent i an$read o a data item returns the mostrecentl$ written &alue o that data item

    ! -wo critical aspects

    coherence: deines what &alues canbe returned b$ a read

    consistenc,: determines when awritten &alue will be returned b$ a read

  • 8/9/2019 lec13-multiprocessors.ppt

    40/69

     

    Coherence ropert,

    ! ( read b$ processor ; to location A thatollows a write b$ ; to A% with writes oA b$ another processor occurrin#

    between the write and the read b$ ;%

    alwa$s returns the &alue written b$ ;.

     preserves program order 

  • 8/9/2019 lec13-multiprocessors.ppt

    41/69

     

    Coherence ropert,

    ! ( read b$ a processor to location A thatollows a write b$ anther processor to Areturns the written &alue i the read the

    write are suicientl$ separated in timeand no other writes to A occur betweenthe two accesses.

  • 8/9/2019 lec13-multiprocessors.ppt

    42/69

     

    Coherence ropert,

    ! Write serialization

    two writes to the same location b$ an$two processors are seen in the sameorder b$ all processors

  • 8/9/2019 lec13-multiprocessors.ppt

    43/69

     

    Consistenc,

    ! When a written &alue will be seen isimportant

    ! 4or e"ample% a write o A on oneprocessor precedes a read o A onanother processor b$ a &er$ smalltime% it ma$ be impossible to ensure

    that the read returns the &alue o thedata written%

    since the written data ma$ not e&en

    ha&e let the processor at that point

  • 8/9/2019 lec13-multiprocessors.ppt

    44/69

     

    Cache Coherence rotocols

    ! $irector, base%

    the sharin# status o a particular blocko ph$sical memor$ is kept in onelocation% called directory 

    ! /nooping

    e&er$ cache that has a cop$ o the datarom a block o ph$sical memor$ couldtrack the sharin# status o the block

  • 8/9/2019 lec13-multiprocessors.ppt

    45/69

     

    /nooping Coherence rotocol

    ! 6rite in*ali%ation protocol

    in&alidates other copies on a write

    e"clusi&e access ensures that no otherreadable or writable copies o an iteme"ist when the write occurs

  • 8/9/2019 lec13-multiprocessors.ppt

    46/69

     

    /nooping Coherence rotocol

    ! 6rite in*ali%ation protocol

    in&alidates other copies on a write

    write-back cache

  • 8/9/2019 lec13-multiprocessors.ppt

    47/69

     

    /nooping Coherence rotocol

    ! 6rite up%ate-broa%cast protocol

    update all cached copies o a data itemwhen that item is written

    consumes more bandwidth

  • 8/9/2019 lec13-multiprocessors.ppt

    48/69

     

    6rite In*ali%ation rotocol

    ! -o perorm an in&alidate% the processorsimpl$ acuires bus access andbroadcasts the address to be

    in&alidated on the bus! (ll processors continuousl$ snoop on

    the bus% watchin# the addresses

    ! -he processors check whether theaddress on the bus is in their cache

    i so% the correspondin# data in the

    cache is in&alidated.

  • 8/9/2019 lec13-multiprocessors.ppt

    49/69

     

    6rite In*ali%ation rotocol

    three block states (MSI protocol)

    ! In*ali%

    ! /hare%indicates that the block in the pri&atecache is potentiall$ shared

    ! Mo%iie%indicates that the block has beenupdated in the pri&ate cache

    implies that the block is e'clusi*e

  • 8/9/2019 lec13-multiprocessors.ppt

    50/69

     

    6rite In*ali%ation rotocol

  • 8/9/2019 lec13-multiprocessors.ppt

    51/69

     

    6rite In*ali%ation rotocol

  • 8/9/2019 lec13-multiprocessors.ppt

    52/69

     

    6rite In*ali%ation rotocol

  • 8/9/2019 lec13-multiprocessors.ppt

    53/69

  • 8/9/2019 lec13-multiprocessors.ppt

    54/69

     

    M/I &'tensions

    ! M+&/I

    owned: indicates that the associatedblock is owned b$ that cache and out+o+date in memor$

    'odiied + Cwned without writin# theshared block to memor$

  • 8/9/2019 lec13-multiprocessors.ppt

    55/69

     

    increase mem bandwidth

    through multi-bus ' interconnection networ(

    and multi-ban( cache

  • 8/9/2019 lec13-multiprocessors.ppt

    56/69

     

    Coherence Miss

    ! #rue sharing miss

    irst write b$ a processor to a sharedcache block causes an in&alidation to

    establish ownership o that block

    another processor reads a modiiedword in that cache block

    ! 7alse sharing miss

  • 8/9/2019 lec13-multiprocessors.ppt

    57/69

     

    Coherence Miss

    ! #rue sharing miss

    ! 7alse sharing miss

    a sin#le &alid bit per cache blockoccurs when a block is in&alidated Danda subseuent reerence causes a missEbecause some word in the block% otherthan the one bein# read% is written into

  • 8/9/2019 lec13-multiprocessors.ppt

    58/69

     

    Coherence Miss

    ! &'ample

    assume words "1 and "7 are in thesame cache block% which is in shared

    state in the caches o both ;1 and ;7.

    identi$ each miss as a true sharin#miss% a alse sharin# miss% or a hit2

  • 8/9/2019 lec13-multiprocessors.ppt

    59/69

     

    Coherence Miss

    ! &'ample

    1. true sharing misssince "1 was read b$ ;7 and needs tobe in&alidated rom ;7

  • 8/9/2019 lec13-multiprocessors.ppt

    60/69

  • 8/9/2019 lec13-multiprocessors.ppt

    61/69

     

    Coherence Miss

    ! &'ample

    3. alse sharing misssince the block is in shared state% needto in&alidate it to write

    but ;7 read "7 rather than "1

  • 8/9/2019 lec13-multiprocessors.ppt

    62/69

     

    Coherence Miss

    ! &'ample

    4. alse sharing missneed to in&alidate the block

    ;7 wrote "1 rather than "7

  • 8/9/2019 lec13-multiprocessors.ppt

    63/69

     

    Coherence Miss

    ! &'ample

    5. true sharing misssince the &alue bein# read was writtenb$ ;7 Din&alid + sharedE

  • 8/9/2019 lec13-multiprocessors.ppt

    64/69

     

    +utline

    ! 'ultiprocessor (rchitecture

    ! )entralized *hared+'emor$ (rch

    ! ,istributed shared memor$ anddirector$+based coherence

    A directory is added to each node)

  • 8/9/2019 lec13-multiprocessors.ppt

    65/69

     

    y

    Each directory trac(s the caches that share the

    memory addresses of the portion of memory in

    the node)need not broadcast for on every cache miss

    $irector,!base%

  • 8/9/2019 lec13-multiprocessors.ppt

    66/69

     

    $irector,!base%Cache Coherence rotocol

    )ommon cache states

    ! /hare%

    one or more nodes ha&e the block cached%

    and the &alue in memor$ is up to date Daswell as in all the cachesE

    ! 9ncache%

    no node has a cop$ o the cache block

    ! Mo%iie%e"actl$ one node has a cop$ o the cacheblock% and it has written the block% so thememor$ cop$ is out o date

  • 8/9/2019 lec13-multiprocessors.ppt

    67/69

     

    $irector, rotocol

    state transition diagram

    for an individual cache bloc(

    requests from outside the node in gray

  • 8/9/2019 lec13-multiprocessors.ppt

    68/69

     

    $irector, rotocol

    state transition diagram

    for the directory

     All actions in gray

    because they*re all e#ternally caused

  • 8/9/2019 lec13-multiprocessors.ppt

    69/69