View
242
Download
0
Category
Preview:
Citation preview
8/9/2019 Cours Gpgpu
1/99
1
General-Purpose Programmingon the GPU
M1 Info 2014/2015
Benoit.Crespin@unilim.fr
12h CM (8 san es!
18h "# (12 san es!$%aluation par un mini&pro'et ( erniers "#!
) rit final ( o uments autoriss!
mailto:Benoit.Crespin@unilim.frmailto:Benoit.Crespin@unilim.fr8/9/2019 Cours Gpgpu
2/99
2
*efs +,penC- #ro rammin ui e+ + etero eneous Computin ith
,penC-+ http3// .hetero eneous ompute.or / http3//uni%&limo es. erli ris.fr/ oo6/88807 45
an s ,n ,penC-3 http3//han sonopen l. ithu .io/ #ro rammation es s st9mes parall9les htro 9nes
http3// .te hni:ues&in enieur.fr/ ase& o umentaire/te hnoloies& e&l&information&th7/lan a es& e&pro rammation&42;04210/pr
o rammation& es&s stemes¶lleles&hetero enes&h;1 0/ C $?ample3 >n Intro u tion to eneralurpose # ison Desle Eul 2010.p f
http://www.heterogeneouscompute.org/http://univ-limoges.cyberlibris.fr/book/88809645http://handsonopencl.github.io/http://www.techniques-ingenieur.fr/base-documentaire/technologies-de-l-information-th9/langages-de-programmation-42304210/programmation-des-systemes-paralleles-heterogenes-h3160/http://www.techniques-ingenieur.fr/base-documentaire/technologies-de-l-information-th9/langages-de-programmation-42304210/programmation-des-systemes-paralleles-heterogenes-h3160/http://www.techniques-ingenieur.fr/base-documentaire/technologies-de-l-information-th9/langages-de-programmation-42304210/programmation-des-systemes-paralleles-heterogenes-h3160/http://www.techniques-ingenieur.fr/base-documentaire/technologies-de-l-information-th9/langages-de-programmation-42304210/programmation-des-systemes-paralleles-heterogenes-h3160/http://www.techniques-ingenieur.fr/base-documentaire/technologies-de-l-information-th9/langages-de-programmation-42304210/programmation-des-systemes-paralleles-heterogenes-h3160/http://www.techniques-ingenieur.fr/base-documentaire/technologies-de-l-information-th9/langages-de-programmation-42304210/programmation-des-systemes-paralleles-heterogenes-h3160/http://handsonopencl.github.io/http://univ-limoges.cyberlibris.fr/book/88809645http://www.heterogeneouscompute.org/8/9/2019 Cours Gpgpu
3/99
3
he ule
1.Intro u tion to #arallel #ro rammin F # #r hite tures;.Basi ,penC- $?amples4.
8/9/2019 Cours Gpgpu
4/994
,penC- H "he Open Computing Language (,penC-! is a
hetero eneous pro rammin frame or6 mana e the nonprofit onsortium Khronos Group .
It supports a i e ran e of le%els of parallelism aneffi ientl maps to homo eneous or hetero eneousFsin le& or multiple& e%i e s stems onsistin of C#
8/9/2019 Cours Gpgpu
5/995
,penC- 1.0F 1.1F 1.2F ... ,penC- as initiall e%elope >pple In ,penC- 1.0 release ith Ma , no
-eopar (2008! "ersion en salle #P ,penC- 1.1 (2010! most e$amples
presente% in this ourse ,penC- 1.2 (2012! ,penC- 2.0 (2014! supports an >n roi
$?tension ,penC- %s C H
8/9/2019 Cours Gpgpu
6/996
#arallel #ro rammin
8/9/2019 Cours Gpgpu
7/997
#arallel #ro rammin
8/9/2019 Cours Gpgpu
8/99
8
#arallel #ro rammin "hrea s an hare
Memor Messa eassin
Communi ation =ata harin an
n hroniJation
=ifferent rains of#arallelism...
8/9/2019 Cours Gpgpu
9/99
9
Man &Core Kuture
Man ores runnin at lo er fre:uen ies are
fun amentall more po er&effi ient
8/9/2019 Cours Gpgpu
10/99
8/9/2019 Cours Gpgpu
11/99
11
etero eneous #latforms... arealrea here L
8/9/2019 Cours Gpgpu
12/99
12
eneral #ro rammin on the #< "ra itionall F mo ules are e?pli itl tie to the
omponents in the hetero eneous platform. Kore?ampleF raphi s soft are runs on the #
8/9/2019 Cours Gpgpu
13/99
13
Con eptual Koun ations of ,penC-
1. =is o%er the omponents that ma6e up the hetero eneouss stem.
2. #ro e the hara teristi s o! these omponents so that thesoft are an a apt to the spe ifi features of ifferent har are
elements.;. Create the lo 6s of instru tions ( &ernels ! that ill run on the
platform.
4. et up an manipulate memor' o()e ts in%ol%e in the
omputation.5. $?e ute the 6ernels in the ri ht or er an on the right
omponents of the s stem.
. Colle t the final results.
8/9/2019 Cours Gpgpu
14/99
14
#latform Mo el
> e%i e an e a C#
8/9/2019 Cours Gpgpu
15/99
15
o a Oernel $?e utes on an,penC- =e%i e
1. > 6ernel is efine on the host.
2. "he host pro ram issues a omman that su mits the 6ernel for e?e ution on an,penC- e%i e.
;. Dhen this omman is issue the host 3 the ,penC- runtime s stem reates an integer in%e$ spa e* >n instan e of the 6ernel e?e utes for ea h point in this in e? spa eF alle a
+or&-item* Its oor inates in the in e? spa e are the glo(al ID for the or6&item.
4. Dor6&items are or aniJe into +or&-groups hi h e?a tl span the lo al in e?spa e. Dor6&items are assi ne a uni ue lo al ID +ithin a +or&-group so that asin le or6&item an e uni:uel i entifie its lo al I= or a om(ination o!its lo al ID an% +or&-group ID .
5. ,penC- onl assures that the +or&-items +ithin a +or&-group e$e uteon urrentl' on the pro essin elements of a sin le ompute unit (an share
pro essor resour es on the e%i e!
8/9/2019 Cours Gpgpu
16/99
16
P=*an e "he in e? spa e spans an P& imensione
ran e of %alues an thus is alle an ND ange(P an e 1F 2 or ;!
Insi e an ,penC- pro ramF an P=*an e isefine an inte er arra of len th Pspe if in the siJe of the in e? spa e in ea h
imension.
8/9/2019 Cours Gpgpu
17/99
17
Dor6&items an or6& roups
Glo(al ID .g$, g'/ 0 .1, 2/3or&-group ID .+$, +'/ 0 .4, 4/
Lo al ID .l$, l'/ 0 .5, 4/
8/9/2019 Cours Gpgpu
18/99
18
Conte?t De"i es 3 the olle tion of ,penC- e%i es to
e use the host Kernels 3 the ,penC- fun tions that run on
,penC- e%i es Program o()e ts 3 the pro ram sour e o e
an e?e uta les that implement the 6ernels
Memor' o()e ts 3 a set of o 'e ts in memorthat are %isi le to ,penC- e%i es an ontain%alues that an e operate on instan es ofa 6ernel
8/9/2019 Cours Gpgpu
19/99
19
Comman &Queues "he intera tion et een the host an the ,penC- e%i es
o urs throu h omman s poste the host to theomman%- ueue .
"hese omman s ait in the omman &:ueue until the
e?e ute on the ,penC- e%i e. > omman &:ueue is reate the host an atta he to a
sin le ,penC- e%i e after the onte?t has een efine . Kernel e$e ution omman%s e?e ute a 6ernel on the
pro essin elements of an ,penC- e%i e. Memor' omman%s transfer ata et een the host an ifferent
memor o 'e tsF mo%e ata et een memor o 'e tsF or map anunmap memor o 'e ts from the host a ress spa e.
6'n hroni7ation omman%s put onstraints on the or er in
hi h omman s e?e ute.
8/9/2019 Cours Gpgpu
20/99
20
Memor Mo el
8/9/2019 Cours Gpgpu
21/99
21
Memor Mo el 8ost memor' 3 %isi le onl to the host. Glo(al memor' 3 permits rea / rite a ess to all or6&items
in all or6& roups. *ea s an rites to lo al memor mae a he epen in on the apa ilities of the e%i e.
Constant memor' 3 remains onstant urin the e?e ution ofa 6ernel (rea &onl a ess!. Lo al memor' 3 lo al to a or6& roup. It an e use to
allo ate %aria les that are share all or6&items in thator6& roup. It ma e implemente as e i ate re ions of
memor on the ,penC- e%i e. Pri"ate memor' 3 "his re ion of memor is pri%ate to a or6&
item. Raria les efine in one or6&itemGs pri%ate memor arenot %isi le to other or6&items.
8/9/2019 Cours Gpgpu
22/99
22
ummar
8/9/2019 Cours Gpgpu
23/99
23
*emem er this H
1. =is o%er the omponents that ma6e up the hetero eneouss stem.
2. #ro e the hara teristi s o! these omponents so that thesoft are an a apt to the spe ifi features of ifferent har are
elements.;. ...
8/9/2019 Cours Gpgpu
24/99
24
#latform>n =e%i es.Sin lu e TC-/ l.hUSin lu e T...U
S efine C- C $CO( e?pr! Vo W l int err X e?prY V
if ( err XX C- M$F 10240F ufferF PM$ X [sVn+F uffer!Y C- C $CO( l et#latformInfo(platforms]i^F C- #->"K,*M R$P=,*F 10240F ufferF P"K,*M $ "$P I,P F 10240F ufferF P
8/9/2019 Cours Gpgpu
25/99
25
#latform>n =e%i es. ( t ! printf(+XXX [ ,penC- e%i e(s! foun on
platform3Vn+F platforms n!Y
for (int iX0Y iT e%i es nY i))! W har uffer]10240^Y l uint uf uintY l ulon uf ulon Y printf(+ && [ &&Vn+F i!Y C- C $CO( l et=e%i eInfo( e%i es]i^F
C- =$RIC$ P>M$F siJeof( uffer!F ufferF PM$ X [sVn+F uffer!Y C- C $CO( l et=e%i eInfo( e%i es]i^F
C- =$RIC$ R$P=,*F siJeof( uffer!F ufferFP
8/9/2019 Cours Gpgpu
26/99
26
Compilin an runnin (finall L!
`U g;; plat!ormAn%De"i es* -lOpenCL`U *M$ X eKor e 8 00 " =$RIC$ R$P=,* X PRI=I> Corporation =$RIC$ R$* I,P X ,penC- 1.0 C =*IR$* R$* I,P X ;17.1 =$RIC$ M> C,M# C-,CO K*$Q- M$M I_$ X 5; 15001
C,M# C-,CO K*$Q- M$M I_$ X 824 722 88
Dith >M= 3 g;; plat!ormAn%De"i es* -I
8/9/2019 Cours Gpgpu
27/99
27
l et#latformI=s
l?int lGetPlat!ormIDs . l?uint num?entries, l?plat!orm?i% 9 plat!orms, l?uint 9 num?plat!orms/
"his omman o tains the list of a%aila le platforms In the ase that the ar ument plat!orms is Pnother e?ample 3
errNum 0 lGetPlat!ormIDs. , NULL, numPlat!orms/ plat!ormI%s 0 . l?plat!orm?i% 9/allo a.si7eo!. l?plat!orm?i%/ 9 numPlat!orms/ errNum 0 lGetPlat!ormIDs.numPlat!orms, plat!ormI%s, NULL/
8/9/2019 Cours Gpgpu
28/99
28
l et#latformInfo
l?int lGetPlat!ormIn!o . l?plat!orm?i% plat!orm, l?plat!orm?in!o param?name, si7e?t param?"alue?si7e, "oi% 9 param?"alue,
si7e?t 9 param?"alue?si7e?ret/ "his omman returns spe ifi information a out the
,penC- platform 3 profileF %ersionF nameF b >nother e?ample 3
err 0 lGetPlat!ormIn!o.i%, CL?PLA# O M?NAM=, , NULL, si7e/ har 9 name 0 . har 9/allo a.si7eo!. har/ 9 si7e/ err 0 lGetPlat!ormIn!o.i%, CL?PLA# O M?NAM=, si7e, in!o, NULL/
8/9/2019 Cours Gpgpu
29/99
29
l et=e%i eI=s
l?int lGetDe"i eIDs . l?plat!orm?i% plat!orm, l?%e"i e?t'pe %e"i e?t'pe, l?uint num?entries, l?%e"i e?i% 9%e"i es,
l?uint 9num?%e"i es/ "his omman o tains the list of a%aila le ,penC-
e%i es asso iate ith platform. %e"i e?t'pe 3
C- =$RIC$ "A#$ C#< 3 ,penC- e%i e that is the host pro essor. C- =$RIC$ "A#$ #< 3 ,penC- e%i e that is a #
8/9/2019 Cours Gpgpu
30/99
30
l et=e%i eInfo
l?int lGetDe"i eIn!o . l?%e"i e?i% %e"i e, l?%e"i e?in!o param?name, si7e?t param?"alue?si7e, "oi% 9 param?"alue,
si7e?t 9 param?"alue?si7e?ret/ "his omman returns spe ifi information a out the
,penC- e%i e. param?name 3
C- =$RIC$ "A#$F C- =$RIC$ R$P=,* I=F b http3//m .safari oo6sonline. om/ oo6/pro rammin /7 8
01;248800 /platforms& onte?ts&an &e%i es/ h0;le%1se 2
8/9/2019 Cours Gpgpu
31/99
8/9/2019 Cours Gpgpu
32/99
8/9/2019 Cours Gpgpu
33/99
8/9/2019 Cours Gpgpu
34/99
"$# 3 Drite host ata to e%i e uffers
8/9/2019 Cours Gpgpu
35/99
35
$"$# 3 Create an ompile the pro ram
"$# 83 Create the 6ernel////////////////////////// "$# ////////////////////////////////////////// // // to the e%i e uffer uffer> status X l$n:ueueDriteBuffer( m QueueF uffer>F C- K>- $F 0F
atasiJeF >F 0FP
8/9/2019 Cours Gpgpu
36/99
36
"$# 103 Confi ure the or6&item stru ture"$# 113 $n:ueue the 6ernel for e?e ution
////////////////////////// "$# 7 //////////////////////////////////////////
// >sso iate the input an output uffers ith the// 6ernel usin l etOernel>r (!
status X l etOernel>r (6ernelF 0F siJeof( l mem!Fuffer>!Y
status dX l etOernel>r (6ernelF 1F siJeof( l mem!FufferB!Y
status dX l etOernel>r (6ernelF 2F siJeof( l mem!FufferC!Y
////////////////////////// "$# 10 //////////////////////////////////////////
// =efine an in e? spa e ( lo al or6 siJe! of or6 // items for e?e ution. > or6 roup siJe (lo al or6 // siJe! is not re:uire F ut an e use . siJe t lo alDor6 iJe]1^Y
// "here are ZelementsZ or6&items lo alDor6 iJe]0^ X elementsY
////////////////////////// "$# 11 //////////////////////////////////////////
// $?e ute the 6ernel usin // l$n:ueueP=*an eOernel(!. // Z lo alDor6 iJeZ is the 1= imension of the // or6&items status X l$n:ueueP=*an eOernel( m QueueF 6ernelF 1F P
8/9/2019 Cours Gpgpu
37/99
8/9/2019 Cours Gpgpu
38/99
38
Memor mappin
$a h e?e utin or6&item nee s to 6no hi h in i%i ual elementsfrom arra s a an ( nee to e summe . "his must e a uni:ue %alue for ea h or6&item an shoul e eri%e
from the P&= omain spe ifie hen :ueuin the 6ernel for e?e ution. "he get?glo(al?i%. / returns the one& imensional lo al I= for ea h
or6&item.
o to he 6 for errors in our
8/9/2019 Cours Gpgpu
39/99
39
o to he 6 for errors in our6ernel 3 lGetProgramBuil%In!o
houl ta6e pla e after lBuil%Program (step ! et the len th of the lo strin 3
si7e?t lenlGetProgramBuil%In!o.program, %e"i esE F,
CL?P OG AM?BUILD?LOG, , NULL, len/
et the lo itself 3har 9(u!!er 0 . har 9/mallo .len/lGetProgramBuil%In!o.program, %e"i esE F,
CL?P OG AM?BUILD?LOG, len, (u!!er, NULL/
" pi al results 3
e?pe te ZYZ after e?pression use of un e lare i entifier Zlo alI Z error3 e?pe te Z\Z ...
8/9/2019 Cours Gpgpu
40/99
40
$?er i es "esteJ lZen%ironnement ,penC- en salle "# et
sur %otre propre ma hine "esteJ les performan es u pro ramme :ui
a itionne eu? %e teurs 3 $n omparant a%e une implmentation
s:uentielle lassi:ue $n faisant %arier le %olume es onnes
( s ala ilit !
imple Matri? Multipli ation
8/9/2019 Cours Gpgpu
41/99
41
imple Matri? Multipli ation(se:uential o e!
// Iterate o%er the ro s of Matri? >
for(int i X 0Y i T hei ht>Y i))! W
// Iterate o%er the olumns of
// Matri? B for(int ' X 0Y ' T i thBY '))! W C]i^]'^ X 0Y
// Multipl an a umulate the // %alues in the urrent ro // of > an olumn of B for(int 6 X 0Y 6 T i th>Y 6))! C]i^]'^ )X >]i^]6^ c B]6^]'^Y \
\
8/9/2019 Cours Gpgpu
42/99
8/9/2019 Cours Gpgpu
43/99
43
o an e retrie%e ro an olumn in i es from get?glo(al?i%. / H
# ll l M i? M l i li i O l
8/9/2019 Cours Gpgpu
44/99
44
#arallel Matri? Multipli ation Oernel
// i th> X hei htB for %ali matri? multipli ation 6ernel %oi simpleMultipl ( lo al floatc outputCF int i th>F int hei ht>F int i thBF int hei htBF lo al floatc input>F
lo al floatc inputB! W // et lo al position in A ire tion int ro+ 0 get?glo(al?i%.4/ // et lo al position in ire tion int ol 0 get?glo(al?i%. /
float sum X 0.0fY //Cal ulate result of one element of Matri? C for (int i X 0Y i T i th>Y i))!
sum )X input>] ro+9+i%thA;i ^ c inputB]i9+i%thB; ol ^Y outputC]ro c i thB) ol^ X sumY\
* hi H
8/9/2019 Cours Gpgpu
45/99
45
*emem er this H
Glo(al ID .g$, g'/ 0 .1, 2/3or&-group ID .+$, +'/ 0 .4, 4/
Lo al ID .l$, l'/ 0 .5, 4/
M i
8/9/2019 Cours Gpgpu
46/99
46
Memor mappin
siJe t lo alDor6 iJe]2^Y
lo alDor6 iJe]0^ X DBYlo alDor6 iJe]1^ X >Y
err o e X l$n:ueueP=*an eOernel (:ueueF 6ernelF 2F P
8/9/2019 Cours Gpgpu
47/99
47
l$n:ueueP=*an eOernel l int l=n ueueND angeKernel ( omman :ueueF 6ernelF or6 imF lo al or6 offsetF
lo al or6 siJeF lo al or6 siJeF num e%ents in ait listF e%ent ait listF e%ent!
+or&?%im 3 the num er of imensions use to spe if the lo al or6&items an or6&items in the or6&roup
lo al or6 offset 3 must urrentl e a P
8/9/2019 Cours Gpgpu
48/99
Ima e *otation Oernel
8/9/2019 Cours Gpgpu
49/99
49
Ima e *otation Oernel
6ernel %oi im rotate( lo al floatc est ataF lo al floatc sr ataFint DF int F //Ima e =imensions
float sin"hetaF float os"heta ! //*otation #arametersW
8/9/2019 Cours Gpgpu
50/99
50
Ima e *otation $?ample
ee the full e?ample to is o%er ho to3 -oa a BM# ima e file into a uffer
(http3//fr. i6ipe ia.or / i6i/Din o s itmap ! tore ,penC- 6ernels in separate files an loa them at runtime
Co e ith ,penC- in C)) L
8/9/2019 Cours Gpgpu
51/99
51
$?er i es
Co eJ a%e une implmentation s:uentiellelassi:ue puis en ,penC- 3
la multipli ation e eu? matri es
la rotation Zima e (en utilisant lZ>#I C ! "esteJ les performan es en faisant %arier le
%olume es onnes
Kun tion Qualifiers
8/9/2019 Cours Gpgpu
52/99
52
Kun tion Qualifiers
??&ernel or &ernel "he follo in rules appl to 6ernel fun tions3
"he return t pe must e %oi . If the return t pe is not%oi F it ill result in a ompilation error.
"he fun tion an e e?e ute on a e%i e en:ueuin a omman to e?e ute the 6ernel from thehost.
"he fun tion eha%es as a re ular fun tion if it is allefrom a 6ernel fun tion. "he onl restri tion is that a6ernel fun tion ith %aria les e lare insi e thefun tion ith the lo al :ualifier annot e alle fromanother 6ernel fun tion.
Ba / oo
8/9/2019 Cours Gpgpu
53/99
53
Ba / oo
6ernel %oi m fun a( lo al float csr F lo al float c st!W lo al float l %ar];2^Y b\
6ernel %oi m fun ( lo al float csr F lo al float c st!W // implementation& efine eha%ior m fun a(sr F st!Y\
&ernel "oi% m'?!un ?a.glo(al !loat 9sr , glo(al !loat 9%st, lo al !loat 9l?"ar/ :
&ernel "oi% m'?!un ?(.glo(al !loat 9 sr , glo(al !loat 9%st, lo al !loat 9l?"ar/:
m'?!un ?a.sr , %st, l?"ar/
> ress pa e Qualifiers
8/9/2019 Cours Gpgpu
54/99
54
> ress pa e Qualifiers
"he t pe :ualifier an e glo(al (or ??glo(al !Flo al (or ??lo al !F onstant (or ?? onstant !F orpri"ate (or ??pri"ate !
If the t pe of an o 'e t is :ualifie an a ress spa e
nameF the o 'e t is allo ate in the spe ifie a ressspa e (if not spe ifie F then the o 'e t is allo ate in thepri"ate a ress spa e!
#ointers to the glo(al a ress spa e are allo e asar uments to fun tions (in lu in 6ernel fun tions! an%aria les e lare insi e fun tions. Raria les e lareinsi e a fun tion cannot e allo ate in the glo(al a ressspa e.
-e al an ille al
8/9/2019 Cours Gpgpu
55/99
55
-e al an ille al
6ernel %oi m fun (int cp! // illegal e ause eneri a ress// spa e name for p is pri%ate. 6ernel %oi m fun (pri%ate int cp! //illegal e ause memor
// pointe to p is allo ate in// pri%ate.
%oi m fun (int cp! // eneri a ress spa e name for p is// pri%ate3legal as m fun is not a// 6ernel fun tion
%oi m fun (pri%ate int cp!// legal as m fun is not a 6ernel fun tion
%oi m fun ( lo al float4 c%>F lo al float4 c%B! W lo al float4 cpY // legal lo al float4 aY // illegal\
Constant > ress pa e
8/9/2019 Cours Gpgpu
56/99
56
Constant > ress pa e
] ^ X W 0F 1F 2F . . . \Y // pro ram s ope
// illegal & pro ram s ope %aria les an e allo ate onl// in the onstant a ress spa e
lo al float tsB] ^ X W 0F 1F 2F . . . \Y 6ernel %oi m fun ( onstant float4 c%>F onstant float4 c%B! W
onstant float4 cp X %>Y // legal onstant float aY // illegal not initialiJe onstant float X 2.0fY // legal initialiJe ith a
// ompile&time onstant p]0^ X (float4!(1.0f!Y //illegal p annot e mo ifie ...
Lo al > ress pa e
8/9/2019 Cours Gpgpu
57/99
57
Lo al > ress pa e
oo analo for lo al memor is a user&mana e a he. It is prefera leto rea the re:uire ata from lo al memor ( hi h is an or er ofma nitu e slo er! on e into lo al memor an then ha%e the or6&itemsrea multiple times from lo al memor .
6ernel %oi m fun ( lo al float4 c%>F lo al float4 cl! W lo al float4 cpY // legal lo al float4 aY // legal a X 1Y
lo al float4 X (float4!(0!Y // illegal annot e initialiJe if (b! W lo al float Y //illegal must e allo ate at 6ernel fun tion s ope b \\
Ima e Con%olution $?ample
8/9/2019 Cours Gpgpu
58/99
58
Ima e Con%olution $?ample
e:uential on%olution
8/9/2019 Cours Gpgpu
59/99
59
e:uential on%olution
spe ifi a to han le ima es inC 1 0
8/9/2019 Cours Gpgpu
60/99
60
,penC- 1.0 "Y
8/9/2019 Cours Gpgpu
61/99
61
Con%olution 6ernel 6ernel %oi on%olution( ??rea%?onl' ima e2 t sour eIma eF ??+rite?onl' ima e2 t outputIma eF int ro sF int olsF ?? onstant floatc filterF
int filterDi thF sampler t sampler! W
8/9/2019 Cours Gpgpu
62/99
62
esults ith a ;?; filter
Iteration 1 Iteration 2
Iteration ; Iteration
...
n hroniJation ith ,penC-
8/9/2019 Cours Gpgpu
63/99
63
,p
$?ample of s n hroniJationi i 6 l (l l!
8/9/2019 Cours Gpgpu
64/99
64
insi e a 6ernel (lo al!// ost o e. . .
l mem input X lCreateBuffer( onte?tF C- M$M *$>= ,P-AF 10csiJeof(float!F 0F 0!Y
l mem interme iate X lCreateBuffer( onte?tF C- M$M *$>= ,P-AF 10csiJeof(float!F 0F 0!Y
l mem output X lCreateBuffer( onte?tF C- M$M D*I"$ ,P-AF 10csiJeof(float!F 0F 0!Y
l$n:ueueDriteBuffer(:ueueF inputF C- "*r (6ernelF 2F 2csiJeof(float!F 0!Y
siJe t lo al s]1^ X W2\ YsiJe t lo al s]1^ X W10\Y
l$n:ueueP=*an eOernel(:ueueF 6ernelF 1F P ress X ( et lo al i (0! ) 1! [et lo al siJe(0!Y
] et lo al i (0!^ X l ata] et lo al i (0!^ ) l ata]other> ress^Y
\
$?ample of lo al s n hroniJation
8/9/2019 Cours Gpgpu
65/99
65
p// #erform setup of platformF onte?t an reate uffers. . .// Create :ueue lea%in parameters as efault so :ueue is in-or%er :ueue X lCreateComman Queue( onte?tF e%i es]0^F 0F 0!Y. . .
l$n:ueueDriteBuffer(:ueueF uffer>F C- "*
8/9/2019 Cours Gpgpu
66/99
66
p ,p
="ents pro%i e a ate a to aomman Gs histor 3 the ontaininformation etailin hen the
omman as pla e in the :ueueFhen it as su mitte to the e%i eF
an hen it starte an en e
e?e ution. l int l et$%ent#rofilin Info (
l e%ent e%entF l profilin infoparam nameF siJe tparam %alue siJeF %oi
cparam %alueF siJe tcparam %alue siJe ret! #rofilin is ena le hen reatin a
omman :ueue settin theC- Q
8/9/2019 Cours Gpgpu
67/99
67
If no lo al imension is i%en to l$n:ueueP=*an eOernelFthen OpenCL %e i%es !or the programmer
,ther ise itZs safer to hoose a po+er o! 5 Po e an test ho performan es are affe te hen e use
this feature f or %e tor a ition or simple matri? multipli ation
,ptimiJin matri? multipli ation...
8/9/2019 Cours Gpgpu
68/99
68
p p
Dith the pre%ious implementation an or er&1000 matri esF oneor6&item per matri? element results in a million or6&items(appro?. 511 MK-,# !
In the ne?t %ersion of the pro ramF ea h or6&item ill omputea ro of the matri?
"he P=*an e is han e from a 2= ran e set to mat h theimensions of the C matri? to a 4D range set to the num(er o!
ro+s in the C matri$* M e%i e has four ompute units.
en e for a or er&1000 matri? ean set the or6& roup siJe to 250an reate four or6& roups to
o%er the full siJe of the pro lem.
Rersion 1
8/9/2019 Cours Gpgpu
69/99
69
// ,ptimiJe matri? multipli ation 6ernel// Rersion 1
6ernel mmul( onst int M imFonst int P imFonst int # imF
lo al floatc >F lo al floatc BF lo al floatc C! W
int 6F'Y int i X get?glo(al?i%. / float tmpY if (i T P im! W for('X0Y'TM imY'))! W tmp X 0.0Y for(6X0Y6T# imY6))!
tmp )X >]icP im)6^ c B]6c# im)'^Y C]icP im)'^ X tmpY \ \\
In the host o eF laun hin the 6ernele omes 3
siJe t lo alDor6 iJe]1^Flo alDor6 iJe]1^Y
lo alDor6 iJe]0^ X >/4Ylo alDor6 iJe]0^ X >Yl$n:ueueP=*an eOernel(
lComman QueF lOernelF 4FP
8/9/2019 Cours Gpgpu
70/99
70
,ur matri? multipli ation 6ernels up to this point ha%ele!t all three matri es in glo(al memor' . "his meansthe omputation streams ro s an olumns throu hthe memor hierar h ( lo al to pri%ate! repeate l forea h ot pro u t.
De an re u e this memor traffi re o niJin thatea h or6&item reuses the same ro+ o! A for ea hro of C that is ompute .
Rersion 2
8/9/2019 Cours Gpgpu
71/99
71
// ,ptimiJe matri? multipli ation 6ernel// Rersion 2
6ernel mmul( onst int M imFonst int P imFonst int # imF
lo al floatc >F lo al floatc BF lo al floatc C! W
int 6F'Y int i X et lo al i (0!Y !loat A+r&E4 5@F float tmpY if (i T P im! W !or.&0 &HP%im &;;/ A+r&E&F 0 AEi9N%im;&F for('X0Y'TM imY'))! W
tmp X 0.0Y for(6X0Y6T# imY6))!tmp )X A+r&E&F c B]6c# im)'^Y
C]icP im)'^ X tmpY \ \\
Be areF e an notuse prepro essin
onstants L
Cop %alues in
pri%ate memor
8/9/2019 Cours Gpgpu
72/99
72
"he use of pri%ate memor has a ramati impa t onperforman e 3 8 ; MK-,# But a areful onsi eration sho s that hile ea h or6&item
reuses its o n uni:ue ro of >F all the or6&items in a rouprepeate l stream the same olumns of B
De an re u e the o%erhea of mo%in ata from lo almemor if the or6&items in a or6& roup op the olumns ofthe matri? B into lo al memor efore the start up atin theirro s of C.
Rersion ;
8/9/2019 Cours Gpgpu
73/99
73
// ,ptimiJe matri? multipli ation 6ernel// Rersion ;
6ernel mmul( onst int M imF onst int P imFonst int # imF
lo al floatc >F lo al floatc BF lo al floatc CF ??lo al !loat9 B+r& ! W
int 6F'Y int i X et lo al i (0!Y int ilo 0 get?lo al?i%. /, nlo 0 get?lo al?si7e. / float > r6]1024^Y float tmpY if (i T P im! W for(6X0Y6T# imY6))! > r6]6^ X >]icP im)6^Y for('X0Y'TM imY'))! W !or.&0ilo &HP%im &0&;nlo / B+r&E&F 0 BE&9P%im;)F (arrier.CLK?LOCAL?M=M? =NC=/ tmp X 0.0Y for(6X0Y6T# imY6))!
tmp )X > r6]6^ c B r6]6^Y C]icP im)'^ X tmpY \ \
\
Before laun hin the 6ernel 3l6etKernelArg.9&ernel, 1,
si7eo!.!loat/9P%im, NULL/
*elati%e to lo al spa e
8/9/2019 Cours Gpgpu
74/99
74
#he goal is to ma$imi7e the amount o! +or& per &ernelan% optimi7e memor' mo"ement
,ptimiJin ima e on%olution...
8/9/2019 Cours Gpgpu
75/99
75
Ima e support in ,penC- ( lCreateImage5D F et !pro%i es automati a hin an ata a esstransformations that impro%e memor s stemperforman eF espe iall on #n optimiJe on%olution 6ernel an e naturall i%i einto three se tions3
1. "he a hin of input ata from lo al to lo al memor2. #erformin the on%olution;. "he ritin of output ata a 6 to lo al memor
8/9/2019 Cours Gpgpu
76/99
ele tin or6 roup siJes ana hin ata
8/9/2019 Cours Gpgpu
77/99
77
a hin ata In ,penC-F or6&item reation an al orithm esi n must
e onsi ere simultaneousl F espe iall hen lo almemor is use
"he first approa h is to reate the same num er of or6&
items as there are ata elements to e a he in lo almemor $a h element oul simpl op one pi?el from lo al to
lo al memor ... an then the +or&-items representingthe (or%er pi$els +oul% sit i%le urin the on%olution
Conse:uentl F lar e filter siJes ill not allo manoutput elements to e ompute per or6 roup
ele tin or6 roup siJes ana hin ata
8/9/2019 Cours Gpgpu
78/99
78
a hin ata "he se on approa h is to reate fe er or6&items than pi?els to e
a he F so some or6&items ill ha%e to op multiple elements annone +ill sit i%le %uring the on"olution
ele tin an effi ient or6 roup siJe re:uires onsi eration of theun erl in memor ar hite ture 3
Kor the >M= 7 0 #
8/9/2019 Cours Gpgpu
79/99
79
Kor an ima e ith imensions image3i%th animage8eight F onl .image3i%th-pa%%ingPi$els/ $.image8eight-pa%%ingPi$els/ or6&items are nee e
Be ause the ima e ill li6el not e an e?a t multiple of
the or6 roup siJeF a itional or6 roups must ereate 3 the ill not e full utiliJe F an this must ea ounte for in the 6ernel
Computin P=&*an e
8/9/2019 Cours Gpgpu
80/99
80
// "his fun tion ta6es a positi%e inte er an roun s it up to// the nearest multiple of another pro%i e inte er unsigne% int roun%Up.unsigne% int "alue, unsigne% int multiple/ : // =etermine ho far past the nearest multiple the %alue is unsigne% int remain%er 0 "alue Q multiple // > the ifferen e to ma6e the %alue a multiple i!.remain%er R0 / "alue ;0 .multiple-remain%er/ return "alue
bint !ilter3i%th 0 S, pa%%ingPi$els 0 .int/.!ilter3i%th
8/9/2019 Cours Gpgpu
81/99
81
"he pro ess of op in ata from lo al memor to lo al memor isoften the most error&prone operation hen ritin a 6ernel
"he or6&items first nee to etermine here in lo al memor toop from an then ensure that the o not a ess a re ion that is
outsi e of their or6in area or out of oun s for the ima e.
Ca hin =ata to -o al Memor
8/9/2019 Cours Gpgpu
82/99
82
??&ernel "oi% on"olution.
??glo(al !loat9 imageIn, ??glo(al !loat9 imageOut, ?? onstant !loat9 !ilter,int ro+s, int ols, int !ilter3i%th,
??lo al !loat9 lo alImage,int lo al8eight, int lo al3i%th/ :
// =etermine the amount of pa in for this filter int !ilter a%ius 0 .!ilter3i%th
8/9/2019 Cours Gpgpu
83/99
83
#erforman e on oth PRI=I> an >M= #
8/9/2019 Cours Gpgpu
84/99
84
"he onl re:uirement is to pa the input ata ith e?traolumns so that its i th e omes a multiple of the &imension of the or6 roup
But manuall pa in a ata arra on the host an eompli ate F time& onsumin F an sometimes infeasi le
"o a%oi su h te ious ata fi?upF ,penC- has a ommanalle l=n ueue3riteBu!!er e t to op a host arra into the
mi le of a lar er e%i e uffer ,ther impro%ement in lu e usin "e tor rea%s, for e?ample
rea in !loat@ ata allo s us to ome loser to a hie%in pea6memor an i th than rea in !loat ata ,n the >M= *a eon 7 0F a si nifi ant performan e ain is
a hie%e usin %e tor rea s... ut a sli ht performan ee ra ation as seen on PRI=I> #
8/9/2019 Cours Gpgpu
85/99
85
// #erform the on%olution
i!.glo(al o+ H ro+s-pa%%ing glo(alCol H ols-pa%%ing/ :
// $a h or6 item ill filter aroun its start // lo ation (from the filter ra ius left an up! !loat sum 0 * ! int !ilterI%$ 0 // Pot unrolle !or.int i 0 lo al o+ i H lo al o+;!ilter3i%th
i;;/ :int o!!set 0 i9lo al3i%th!or.int ) 0 lo alCol ) H lo alCol;!ilter3i%th
);;/sum ;0 lo alImageEo!!set;)F 9
!ilterE!ilterI%$;;F // Drite the ata out imageOutE.glo(al o+;!ilter a%ius/9 ols ;
.glo(alCol;!ilter a%ius/F 0 sum return
// Inner loop unrolle for(int i X lo al*o Y i T lo al*o )filterDi thY i))! W
int offset X iclo alDi th)lo alColYsum )X lo alIma e]offset))^ c
filter]filterI ?))^Ysum )X lo alIma e]offset))^ c
filter]filterI ?))^Ysum )X lo alIma e]offset))^ c
filter]filterI ?))^Ysum )X lo alIma e]offset))^ c
filter]filterI ?))^Ysum )X lo alIma e]offset))^ c
filter]filterI ?))^Ysum )X lo alIma e]offset))^ c
filter]filterI ?))^Ysum )X lo alIma e]offset))^ c
filter]filterI ?))^Y \
,n an >M= *a eon 7 0F ith a ? filter an a00?400 ima eF unrollin the innermost looppro%i es a 2.4 spee up. In eneralF this
pro u es a su stantial spee up on oth >M=an PRI=I> e%i es.
#arallel ata re u tion
8/9/2019 Cours Gpgpu
86/99
86
> re%u tion is an al orithm that on%erts a lar e ata setinto a smaller ata set usin an operator on ea h element
> simple re u tion e?ample is to ompute the sum o! theelements in an arra' F ut it oul also e minF ma?F or6eep onl positi%e elementsF et !
float sum arra (float c aF int Po of elements! Wfloat sum X 0.0fYfor (int i X 0Y i T Po of elementsY i))! sum )X a]i^Yreturn sumY
\ Dith ,penC-F the ommon a to paralleliJe a re u tion
is to i%i e the input ata set et een ifferent or6roups on a #
8/9/2019 Cours Gpgpu
87/99
87
Dithin a or6 roupF the re u tion is performeo%er multiple sta es >t ea h sta eF or6&items sum an element an
its nei h or that is one stri e a a .
*e u tion 6ernel
8/9/2019 Cours Gpgpu
88/99
88
// > simple re u tion tree 6ernel here ea h or6 roup re u es a set
// of elements to a sin le %alue in lo al memor an rites the// resultant %alue to lo al memor . ??&ernel "oi% re%u tion?&ernel. unsigne% int N, // num er of elements to re u e
??glo(al !loat9 input, ??glo(al !loat9 output, ??lo al !loat9 s%ata/ :
// et in e? into lo al ata arra an lo al arra unsigne% int lo alI% 0 get?lo al?i%. /, glo(alI% 0 get?glo(al?i%. / unsigne% int groupI% 0 get?group?i%. /, +g6i7e 0 get?lo al?si7e. / // *ea in ata if ithin oun s s%ataElo alI%F 0 .glo(alI%HN/ T inputEglo(alI%F // n hroniJe sin e all ata nee s to e in lo al memor an %isi le to all or6 items (arrier.CLK?LOCAL?M=M? =NC=/ // $a h or6 item a s t o elements in parallel. >s stri e in reasesF or6 items remain i le. !or.int o!!set 0 +g6i7e o!!set o!!set 0 4/ : i! .lo alI% H o!!set lo alI% ; o!!set H +g6i7e/
s%ataElo alI%F ;0 s%ataElo alI% ; o!!setF(arrier.CLK?LOCAL?M=M? =NC=/
// ,nl one or6 item nee s to rite out result of the or6 roupGs re u tion (arrier.CLK?LOCAL?M=M? =NC=/i! . lo alI% 00 / outputEgroupI%F 0 s%ataE F
Impro%in re u tion performan es(see +,penC- ,ptimiJation Case tu 3 imple *e u tions+!
8/9/2019 Cours Gpgpu
89/99
89
>t ea h step of the re u tion treeF the a ti%eor6&items et sparser an sparser. "his lea s to poor IM= effi ien 3 e ha%e
onl a out ;0[ of the or6&items a ti%eF ona%era e.
Impro%in re u tion performan es(see +,penC- ,ptimiJation Case tu 3 imple *e u tions+!
8/9/2019 Cours Gpgpu
90/99
90
*e u tions usin atomi s 3 operations su h as atom? a%%./ anre u e the partial results from ea h lo al re u tion. But the arelimite to the operators an ata&t pes supporte the platform
" o&sta e re u tion 3 the input is i%i e up hun6s lar e enou h to6eep all of pro essors us . "he final lo al re u tion is performese:uentiall F hi h impro%es effi ien ompare to the full ¶llelmulti&sta e re u tion
C#< re u tion (min! 6ernel
8/9/2019 Cours Gpgpu
91/99
91
8/9/2019 Cours Gpgpu
92/99
92
"he histogram o! animage pro%i es afre:uen istri ution ofpi?el %alues in the ima e.
De ha%e either a sin lehisto ram if the luminositis use as the pi?el %alueor three histo rams if the*F F an B olor hannel%alues are use .
"he prin iple of the
histo ram al orithm is toperform an operation o%erea h pi?el of the ima e3!or .man' input "alues/
histogramE "alue F;;
e:uential isto ram
8/9/2019 Cours Gpgpu
93/99
93
(its per hannel
8/9/2019 Cours Gpgpu
94/99
94
Atomic operations
8/9/2019 Cours Gpgpu
95/99
95
in e man or6&items e?e ute in parallel in a or6 roupFe annot uarantee the or erin of rea &after& riteepen en ies on our lo al histo ram ins.
e oul repro u e the histo ram ins multiple timesF utthis oul re:uire a op of ea h in for ea h or6&iteminthe roup
"he alternati%e solution is to use har are atomi s 3an' time t+o threa%s operate on a share% "aria(le
on urrentl', an% one o! those operations per!orms a
+rite, (oth threa%s must use atomi operations
8/9/2019 Cours Gpgpu
96/99
96
6ernel %oi histogram?image?rg(a (
ima e2 t im F int num pi?els per or6itemFlo al uint chisto ram! W int lo al siJe X et lo al siJe(0! c et lo al siJe(1!Y int ima e i th X et ima e i th(im !Y int ima e hei ht X et ima e hei ht(im !Y int roup in ? X 25 c ; ( et roup i (1! c
et num roups(0! ) et roup i (0!!Y int ? X et lo al i (0!Yint X et lo al i (1!Y
lo al uint tmp histo ram]25 c ;^Y int ti X et lo al i (1! c et lo al siJe(0! )
et lo al i (0!Y int ' X 25 c ;Yint in ? X 0Y
int iF i ?Y
8/9/2019 Cours Gpgpu
97/99
97
b
8/9/2019 Cours Gpgpu
98/99
98
"hrust
8/9/2019 Cours Gpgpu
99/99
99
Recommended