16

CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

.Con epts ofHigh Performan e FEM Simulation

and Realization in the Feast SoftwareS. Turek + Feast GroupM. Altieri, Chr. Be ker, S. Buijssen, S. Kilian, D. Kuzmin, ...Institut f�ur Angewandte Mathematik & NumerikUniversit�at Dortmundhttp://www.featflow.de

1

Page 2: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Classi� ation of PDE software:For edu ation: MATLAB, et .) `play{around' with Mathemati s, `easy' user interfa e) simple, but robust algorithms, easy appli able) `independent of implementation/language'For resear h: FEAT, UG, DIFFPACK, KARDOS, LIMA, et .) open for numeri al and algorithmi hanges) exible, but robust data stru tures, reusable omponents) `independent of user interfa e'For appli ations:) `optimal play{together of numeri s and implementation') `real life' on�gurations, PC/Workstation/Super omputer) robustness and eÆ ien y`Criti al' example: CFD

2

Page 3: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

`Realisti ' CFD appli ations:� omplex domains, anisotropi meshes in spa e and time� 103 � 105 time steps + 104 � 108 unknowns� optimal ontrol of user{de�ned physi al quantities*Numeri s+ Algorithms+ Implementation+ Hardware platforms+DFG Ben hmark "Channel ow around ylinder"

0.16m1.95m

2.5m

0.1m

0.15m0.45m0.41m

0.41m

(0,0,H)(0,0,0)

(0,H,0)

Inflow plane

Outflow plane

z

U=V=W=0

U=V=W=0

U=V=W=0

0.1m y

x

� laminar 2D + 3D� stationary (Re = 20)� nonstationary (Re = 100)� lift l, drag l ???

3

Page 4: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Example I in 3D (`Poisson Problem'):� 100 � 100 � 100 grid points =) Problem size: N = 106� Complexity of GE: N 7=3 � 1014 FLOPs100 se on a 1 TFLOP/s omputer� Complexity of (opt.) MG: 1; 000N � 109 FLOPs100 se on a 10 (!!!) MFLOP/s omputer

Example II in 3D (`Poisson Problem'):� 1; 0003 grid points =) Problem size: N = 109� Complexity of GE: N 7=3 � 1021 FLOPs106 se on a 1 P(!!!)FLOP/s omputer� Complexity of (opt.) MG: 1; 000N � 1012 FLOPs1,000 se on a 1 (!!!) GFLOP/s omputer

+Huge improvements by modern Numeri s !!!� Dis retization te hniques� Iterative solvers (Krylov-spa e, MG)4

Page 5: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Typi al results for `realisti ' appli ations:� STAR-CD (k � �), 500,000 ells (by DaimlerChrysler)� SGI Origin2000 (6 pro essors), 6.5 h (CPU)*`Optimal' multigrid � 1 se /subproblem ???Quantity Experiment Simulation Di�eren eDrag (` w') 0.165 0.317 92 %Lift (` a') -0.083 -0.127 53 %

�MV multipli ation in FEATFLOW (F77 !!!)Computer #Unknowns CM TL STO13,688 22 20 19SUN E450 54,256 17 15 13(� 250 MFLOP/s) 216,032 16 14 6862,144 16 15 4

5

Page 6: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Sparse Matrix-Ve tor te hniquesDO 10 IROW=1,NDO 10 ICOL=KLD(IROW),KLD(IROW+1)-110 Y(IROW)=Y(IROW)+A(ICOL)*X(KCOL(ICOL))! Standard te hnique in most FEM or FV odes! Storage of `non-zero' matrix elements only (CSR, CSC, ...)! A ess via index ve tors, linked lists, pointers, et

Sparse Banded MV te hniquesK M

! Typi al for FD odes on `tensorprodu t meshes'! Storage of `non-zero' elements in bands/diagonals! MV multipli ation `bandwise' and `window-oriented'! Sparse Banded Blas () FEAST !!!)

6

Page 7: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Results on lo ally stru tured meshes3D ase N SPARSE SBB-V SBB-C MG-V MG-CDEC 21264 173 164 (150) 446 765 342 500(500 MHz) 333 64 (54) 240 768 233 474`DS20' 653 72 (24) 249 713 196 447IBM RS6K/597 173 86 (81) 179 480 171 368(160 MHz) 333 81 (16) 170 393 152 300`SP2' 653 81 (8) 178 393 150 276INTEL PII 173 29 (28) 56 183 48 136(400 MHz) 333 29 (24) 53 139 47 116`LOW COST' 653 29 (19) 54 125 45 101

Sparse MV te hniques� MFLOP/s rates far away from `Peak Performan e'� Depending on problem size + numbering� `Old' (IBM PWR2) partially faster then `new' (IBM P2SC)� PC partially faster than pro essors in `super omputers' !!!+

`Most adaptive odes should run on PC's'

7

Page 8: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Sparse Banded Blas MV te hniques� `Super omputing' power gets visible (up to 800 MFLOP/s)�Warning: Hard work !!! (! Sparse Banded Blas)� However: `Optimal' (lo al) multigrid solver possible !!!*1997 National Te hnology Roadmap for Semi ondu torsYear 1997 1999 2003 2006 2009 2012Transistors/ hip 11M 21M 76M 200M 520M 1.4B1 Billion Transistors (� 100 !)10 Ghz lo k rate (� 20 !)+1 PC `faster' than omplete CRAY T3E today !Memory a ess ???Pro essors are very `sensible' super omputers ???

8

Page 9: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

FEAST proje tNumeri s and implementationte hniques adapted to hardware !!!*

Pre ise knowledge of pro essor hara teristi sfor di�erent tasks in MG for FEM spa es*

(Colle tion of) FEAST INDICES+

1) Get the optimum !!!2) Prevent from dramati performan e losses !!!

9

Page 10: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Examples for performan e losses:Memory management

DOUBLE PRECISION DX(8000000)DO 220 I = 1,ITER(J)220 CALL DAXPY(N(J),ALPHA,DX(1),INCX,DX(1+N(J)+JSTEP),INCX)

0

20

40

60

80

100

120

140

160

180

0 20000 40000 60000 80000 100000 120000JSTEP for N=4225 on SUN ULTRA II

20

25

30

35

40

45

50

55

60

2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500JSTEP for N=1050625 on CRAY T3E+Ca he asso iativity ???

10

Page 11: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Compiler IGNU (EGCS F77) vs. vendor-optimized ompilers ???DEC 21164 PC/LINUX, in Ca he!!!ULTRIX F77: INDEX 59 - MV-C 830 MFLOP/s !!!EGCS F77: INDEX 37 - MV-C 196 MFLOP/s !!!

Compiler IIF90 vs. F77 ompilers ???SUN ULTRA II, in Ca he!!!F77: INDEX 49 - MV-C 404 MFLOP/s !!!F90: INDEX 24 - MV-C 64 MFLOP/s !!!

Programming LanguagesC vs. F77 ompilers (SUN E3500) ???N = 10252 F77 CDAXPY-C 30 14DAXPY-V 16 12DAXPY-I 8 7

N = 652 F77 CDAXPY-C 228 62DAXPY-V 166 42DAXPY-I 109 3611

Page 12: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

`My' on lusions:Everybody an/must (!) test the own omputational performan e!

Most re ent (FEM) odes are NOT slower on PC'sthan on `High Performan e' pro essors!! However: there is `super omputing power' available!!!Compromise between `reusable exibility' and`ma hine-dependent eÆ ien y' is required!Modern Numeri s has to onsider re ent andfuture hardware trends!! (ma ro-oriented) adaptivity on epts !!!! S aRC as improved MG/DD solver !!!! MPSC: Balan e of FLOPs and data a ess+

`Hardware-Oriented Numeri s'12

Page 13: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Hardware-Oriented Numeri s for PDEs:1) Pat h-oriented adaptivity`Many' (lo al) tensorprodu t grids (S-B Blas)`Few' (lo al) unstru tured grids (Sparse)

2) Generalized MG-DD solver: S aRCFind lo ally `regular' stru tures (eÆ ien y)Re ursive ` lustering' of anisotropies (robustness)`Strong lo al solvers improve global onvergen e !'3) Navier-Stokes solvers: Full Galerkin MPSCDevelop Navier-Stokes solvers with more `arithmeti 'than `memory intensive' operations

`Exploit lo al regularity !!!'13

Page 14: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

Example: Automotive Aerodynami s (DaimlerChrysler)

Parallel `two level S aRC' for thePressure{Poisson problem+Adaptivity via modifying lo altensorprodu t-like meshesNl S aRCg32 0.03hmin � 10�4 64 0.04(`I') 128 0.0432 0.03hmin � 10�8 64 0.04(`II') 128 0.04`I' Nl TGSl `II' TGSl32 0.05 0.05AR � 1 64 0.06 AR � 1 0.06128 0.06 0.0632 0.01 0.01AR � 10 64 0.01 AR � 105 0.01128 0.01 0.01

`Robustness + A ura y'14

Page 15: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

MFLOP/s rates of Sparse Banded BlasNEQ MV-V MV-C TGS-V TGS-C MGJ-V MGJ-C MGT-V MGT-CCray T3E 652 82 391 59 143 96 391 84 281(600 MHz) 2572 72 293 49 118 75 194 70 171`MPP' 10252 47 282 35 86 54 163 51 142Cray T90 652 855 414 51 52 842 450 183 162(440 MHz) 2572 932 887 124 125 912 698 275 257`Ve tor' 10252 939 1091 223 215 930 950 422 418SUN E3500 652 298 520 67 70 262 382 158 188(400 MHz) 2572 271 588 64 69 252 408 153 190`Shared' 10252 51 224 39 65 62 153 57 117AMD K7 652 182 522 110 159 188 444 167 317(700 MHz) 2572 82 283 55 127 87 189 83 175`LINUX' 10252 68 237 51 98 65 131 63 125`EÆ ien y'

15

Page 16: CFD - TU Dortmund...Concepts of High P erformance FEM Sim ulation and Realization in the Feast Soft w are S. T urek + Feast Gr oup M. Altieri, Chr. Bec k er, S. Buijssen, Kilian, D

.High Performan e FEM/FV Simulation:

(International) Cooperation for thedesign + realization of exible software foredu ation, resear h and industrial appli ations !

16