Stochastic modelling of diffusion equations on a parallel machine

ComputerPhysicsCommunications76 (1993)159—183 Computer PhysicsNorth-Holland Communications

Stochasticmodellingof diffusion equationson a parallelmachine

M. Mastrangelo,V. Mastrangelo1

Equipe de Recherche CNAM—UniversitéParis VI, “Modélisation Physiqueet Stochastique’~292 rue SaintMartin 75141,Paris Cedex 03, France

D. GassilloudandF. SimonInstituteof SystemsEngineeringand Informatics,AdvancedComputingand 3D ImageProcessingLaboratories,CEC, JRC-IspraSite, 21020Ispra (Va), Italy

Received23 July 1992; in revised form 30 November1992

We present the parallelizationof the code “MIXAGE” 3D on the T-Node tandemof JRC-Ispra.This code solvesnumericallyparabolicsystemsof partial differentiationequations.Theseequations,which governmanyphysical,chemicalorbiological phenomena,describetime-dependentdiffusion in heterogeneousmedia. We mainly usestochasticdifferentialequationsassociatedto the equationsf/at= V(DVf). Moreover,we define the evolution operatorscorrespondingto thedifferentphysicalphenomena.By a processthat we call “mixing”, we constructthegeneralsolutionconsideringsimultane-ously all the physicalphenomena.

In view of the implementationof the code “MIXAGE” 3D on the T-Node,we have chosengeometricparallelization.Usinga matrix7x7 processor,theCPUtime reachedwith the T-Nodeis of the sameorderaswith the CRAY H machine.

1. Basicdescriptionof the T-Node/ Tandemmachinethe JRC-ISPRA[3]

The T-Node/Tandemis a massively parallel machine, based on a reconfigurable and modularnetwork of 64 transputers(T800). Each of thesebasic elementsof the T-Node systemhas its ownenvironment: processor,memory, links for communication between transputers.The power of theT-Node is thena function of the numberof theseelements.The theoreticalpowerof eachtransputeris10 Mips and1.5 Mflops. The topologyof the transputernetworkis definedby the user(see fig. 1).

One of the major featuresof the T-Node architectureis the possibility for the user to reconfigureentirely the networktopology in an easyway. The T-Node is connectedto a host systemfor conversingwith the outerworld.

1.1. Architectureof the T-node[9]

In the T-Node system,communicationchannels(links) of all the processorsare connectedin aswitchingdevice,which canmodify the networktopologyfor theprogramto run. This switchingdevice iscalled simply “switch”; this characteristicallows the userto get the optimal networktopology for theprogramhe wants to run. One transputerpilots the switch, and is able to set a non-blockingand

Correspondenceto: V. Mastrangelo,EquipedeRechercheCNAM—UniversitéParisVI, “ModélisationPhysiqueet Stochastique”,292 rue SaintMartin 75141, Paris Cedex03, France.‘Visiting Scientistof CEC, JRC-Ispra,Site.

0010-4655/93/$06.00© 1993 — Elsevier SciencePublishersB.V. All rights reserved

160 M. Mastrangeloet a!. / Stochasticmodellingof diffusion equationson a parallel machine

I~,Fig. 1. User-definedtransputernetworktopologies.

re-arrangeablenetwork. The switch is also usedfor communicationto the outer environment,like thehostsystem.The control systemof the networkis ableto makea partitionof thenetworkin independentsubnetworks:severalusershavethusaccessto the resourcesof the T-Node system.Eachtransputerisconnectedto a “control bussystem”,througha specific component:the “control gatearray”.This systemis basedon a controllerbus,independentof thelinks in the network.Thetransputerwhichcontrolsthissystemis the masterof the bus,andmanagesthe synchronismin a fast andefficient way. Moreover,thisstructureoffers a hardwaresupportfor an interactivedebugger,without affecting the links (see fig. 2).

The T-Node/Tandem is constituted of a connectionof two T-Nodes of 32 transputers.Theconnectionis direct betweeneachswitch andthereis no needof an electronicswitchingdeviceof higherlevel. Oneof the two controllersis the masterof the control but of theT-Node/Tandemsystem.

Accessto the T-Node/Tandemis done by meansof a hostmachine,a T-4000 (Unix). The switchingdevice of the T-Node/Tandemis ordered by the host machine. Generally speaking, the controloperationsof the T-Node/Tandemare accomplishedby this one.

It permits, moreover, the stocking of a specific software environment for the transputers.Theconnectionbetweenthe hostmachineandthe T-Node machineis madeby meansof a transputercalled“ROOT”, also put on the hostmachine.

1.2. Operatingsystemsandsoftwareenvironmentfor the T-Node

Two softwareenvironmentsareavailableto programthe T-Node:— the C or Fortran 3L language;— the C or Fortranlanguageunderthe Helios system.One quotesonly the most commonly used languages,the othersare, for instance,OCAM, PASCAL,STRAND, ASSEMBLER,ADA, etc. [181.

The first one allows only oneuserto work on the T-Node. The secondone offers meansto work inmulti-user mode. In both cases,the host machinemust have one or several interfacing cards, eachpossessingonetransputercalled“ROOT” soasto permit accessto the T-Nodefor oneor severalusers.

1.3. 3L environment

We have usedFORTRAN 3L to developthe parallelizationof the “MIXAGE” 3D program(seesection2).

M Mastrangeloeta!. / Stochasticmodellingof diffusionequationson a parallel machine 161

T.Node Parallel Computers

c~ —

M .,..‘ — — c ~

:::::‘ 1 — ‘~~ll ~

/~ ,~‘ ~ \\

____ ~\\Master — __J L — — — ~

72SWITCH ~

t.nd..,____ ~r

... -

~. ..~ry ‘.‘i~ - -

EI~1 — L - - - - -

~Y •.tfl?

//~. - -~ \ l~~ / ‘f ~

~// ~ 4~7/ ;;~_ ,~/\~l~ij~/ ~.

JL ..J~

I II ——_.7 72~ SWITCH 4

~\- ~ - “I

~ ~• OI/~II~ ~‘ / //

N ~ //

- - ~ iL ~ - -

Fig. 2. The T-Node/Tandem.

162 M. Mastrangeloet a!. / Stochasticmodellingof diffusion equationson a parallel machine

~ b

one Processor

Fig. 3. Oneapplicationconsistingof threeprocessesrunningon one processor(left) andthreeprocessors(right).

1.3.1.Logical modelofprogrammingThe 3L programmingmodel is basedon the CSPmodel (communicationsequentialprocesses)[15].

An application is composedof a set of concurrently running sequentialprocesses;eachtask must beconsideredlike a black box possessinginternalstates.Its carrying outbeginsat a precisemomentandtakesplace at its own speed.The task may be composedof concurrentsubtasks(parts run in parallel).Only one meansof communicationbetweentasks is to accountfor one or severalexchangechannels.Eachchannelcan,in this way, link up two concurrenttasksby a serialandunidirectionalcommunicationway. It is possibleto haveany numberof channelsbetweentwo tasks.All communicationaccomplishedby meansof thesechannelsis synchronized.A synchronizationis a mutual waiting of two tasks toexchangeinformation. This exchangemechanismis automatized,i.e. the programmerspecifiesonly ahand-writing instruction in the transmittingtask and a readinginstruction in the receptivetask. Thismodel is the very sameas for OCCAM programming[10,11,201.

1.3.2. PhysicalimplementationThe INMOS transputerarchitecturehas beenfitted to the OCCAM languageso as to implement

notions of tasks and exchangechannels[201.Each task is fixed to a transputerand each channelisaffectedby a physical link. Nevertheless,thereis a materiallimitation on theseexchangesbecausetheexistinggenerationof transputershasonly four physicallinks. The future transputerT-900 will allow toovercomethis restriction [14]. Actually, the transputerT800 does not permit to exploit all exchangepossibilitiesof the OCCAM andFORTRAN 3L languages.

As in sequentialmachines,it is possibleto makedifferent runs by sharingtime on onetransputer.Inthis way it is possibleto simulatean applicationon onetransputer(see fig. 3).

The FORTRAN 3L compiler[121takesup the sequentialFORTRAN 77 languageandimplementsanovercomprehensivenumberof proceedingsin view of expressingandoperatingthe parallelism.

The instructions

CALL F77-CHAN-INT-MESSAGE(),CALL F77-CHAN-OUT-MESSAGE(),

permit to readandto write one pieceof information from or to other competitivetasks[17].

M Mastrangeloet aL / Stochasticmodellingof diffusion equationson a parallel machine 163

2. Stochasticresolutionof systemsof parabolic partial differential equationsand physicalapplications

We hereaftersummarizethe theory of the stochasticsolution of multigroup diffusion equationsdevelopedfor nuclearreactortheoryandpublishedin refs. [2,4—6].

2.1. Statementof theproblem

Let W~(d ~ 3) be the reald-dimensionalspaceand let r = (r1, r2, . . . , rd) be the pointsof this space.We consideranopenboundedset 11 with externalboundary~(2.We note w =11 X ~11,whereanE C

2andits curvatureis restricted.

Dg, X5,,, Ci~Qg’ !Lh, with h, g E (1, 2,...,G) and i ~ (1, 2,..., I) are functionswhich aredefinedon

w X [0,+ o4. We considerthe following systemof parabolicpartial differential equations:1a4

__!(r, t) = V[Dg(r, t) V45(r, t)] + ~ Xgh(r, t) cbh(r, t)

g h=1

I

+ ~ A1C~(r,t) + Q5(r, t), (1)i= 1

ac.—~-~(r,t) = —A~C~(r,t) + ~ j~~(r,t) q~~(r,t), (2)

t h=1

wherethe symbolshavetheir usualmeaningand

rEP, tE[0,+oo[,h,gE{1,2,...,G}, iE{1,2,...,I}.

The initial conditionsare representedby

4tg(r, 0) =4~°~(r), 45(r, t)/aQ=0.

We integrateeq. (2) to eliminate C1(r, t) from (1):

C~(r,t) =C1(r, 0) exp(—A~t)+jtexp(_Aj(t_s))~~h(r, s) ~h(r, s) ds

and

a~(rt) G I

= VgV[Dg(r, t) V4g(r, t)] + ~VgXgh(r, t) 4h(r, t) + Ea~(r, t)

+ ~ J’/4h(r, s, t) 4~h(r, s) ds+ VgQg(r, t), (3)i=1 h=1 0

where

P~gh(T, s, t) = VA,/Lh(r, s) exp(—A,(t—s)), a~(r,t) = V5A~C1(r, 0) exp(—A~t).

To solve (3), we study successivelythe resolutionof

am”)—~-~--(r,t) = VgV[Dg(r, t) V44’~(r,t)], (3a)

164 M. Mastrangeloeta!. / Stochasticmodellingofdiffusion equationson a parallel machine

G—~---(r,t) = ~ VgXg~(r,t) 4/h2)(r, t), (3b)

h=1

—~-——(r,t) = ~ag(r, t), (3c)

I GL(r, t) = ~ ft~il~h(r,s, t) ~ s) ds, (3d)

at i=lh=1 0

~ t)=l/~Qg(r, t), for gE{1,2,...,G}. (3f)

The theory that we have formulated is valid when the coefficients are ~ but on the level of thecalculationsthe discontinuitiesdo not modify the proposedscheme.This is dueto the discretizationofthe method. A theory is now being developedto include the problem of discontinuitiesusing skewBrownianmotions.

2.2. Solutionofpartial differential equations(3a)—(3e)

2.2.1. Equation (3a)We carry out the study of this equation in the next section.The notationswill be defined in the

present one and will lead us to put [2,5] (see fig. 5):

(M”)(s, t) f(r))g=F[fg

whereM~1~is the operatorgiving the approximatesolutionof eq.(3a).

2.2.2. Equation (3b)We put xg~(r, t) = VgXgh(T,t), and eq. (3b) is written

G 2—(r, t) = ~ xgh(r, t) 44~(r,t).at

We supposethe coefficientsXgh arecontinuouswith respectto t. We put

4~(2)(r t)

4~2~(rt) x11(r, t), x12(r, t),...,xlG(r, t)

~2(r, t) = 2 ‘ and x(r, t) =

~g~(r, t) xGl(r, t), xG2(r, t),...,xGG(r, t)

with precedingnotations, eq. (3b) becomes

—~--—(r,t) =x(r, t) 4~2~(r,t).

We search4~2~(r,s, t) for t ~- s and s E l~÷,satisfying

4/2~(r,s, s) =çb~°~(r).

M MastrangeloetaL / Stochasticmodellingof diffusionequationson aparallel machine 165

We defineexp[ * J~’x(r, u) du] in the following way:

exP[ * jtx(r, u) du] = limexp[(t—p~)x(r,p,,)} exp{-~--x(r,p~— -~--)J

x “~exp—x(r,s+—~ exp —x(r,s)n \ nJ n

We putfor all integersn> 1,

= —Ent[n(t — s)J + s,

whereEnt[n(t — s)] is the greatestintegersmallerthan n(t — s):

M~2~(r,s,t)=exp[(t—p~)x(r,p~)J

x exp —x r, s+ —~ exp —x(r,s)n nJ n

In paper[5] the operatorsM~2kr,s, t) areprovedderivableon the right with respectto t and thederivativeson the right converge,when n tendsto infinity, to the t-continuousoperator:

x(r, t) exp[* ftx(r, u) du].

The function ~2kr, s, t) satisfyingeq.(3b)and

s, s) =4~°~(r)

is written

s, t) =exp[* ftx(r, u) duj ~°~(r).

We introducethe operator

M~2~(s,t) =ex~[* f’x(r, u) du]7

which gives us

s, t) =M’~2~(r,s) çb°(r) (t>~s).

2.2.3. Equation (3c)We canwrite eq. (3c)

—~-—(r,t) = ~a~(r, t),

166 M. Mastrangeloeta!. / Stochasticmodellingof diffusion equationson a parallel machine

where we assume that the functions are continuous with respect to time. The solution is given by

~(r, t) =~(r, s) +ft~a~(r, u) du.~ i=1

We introducethe operatorM~3~(s,t), for which

~(r, t) = (M~3~(s,t) f(r))g =fg(r) + ~ a~(r,u) du,~ i=1

with the condition

~ s) ~fg(r).

2.2.4. Equation (3d)Equation(3d) is written

I G—~-~-—(r,t) = ~ f ugh(r, s, t) 4~4~(r,s) ds.

t i=lh=1 0

If, for instance,s and t are two positive realsand f a boundedfunction, we define the operatorM~4ks,t) by the following formula:

(M~4~(s,t) f)g(r) =fg(r, s) + (t —s) ~ ~ f5u~h(r,u,s) fh(r, u) du.i=1 h=1 0

It is easyto provethat this operatorgives the solutionof eq.(3d).

2.2.5. Equation (3e)We write againeq. (3e)

—~-~—(r,t) = VgQ5(r, t), ~ s) =fg(r).

We solve this equationin the sameway aseq.(3c), putting

I~~(r,t) =4~(r, s) +ftVgQg(r, u) du,

introducingthe translationoperatordefinedby

(M~5)(s,t)f)g(r) =fg(r) +ftVgQg(r, u) du.

This operatorgives the solutionof eq. (3e).

M MastrangeloetaL / Stochasticmodellingofdiffusion,equationson a parallel machine 167

2.3. Mixing the operatorsassociatedwith eqs. (3a)—(3e)in order to determinethe global solutionof eq. (3)(seefig. 5)

Let s and t be two realnumberswhere0 ~ s ~ t. We put p,, = s+ n — ‘(Ent[n(t — s)]) whereEnt[x] isthe greatestintegeroverestimatedby the positiverealnumberx. We definethe affine operators

M~(s,t) f(t) = {M(1)(P~,t) ° M~2~(p~,t) o M~3~(p~,t) ° M~4~(p~,t)

o M~5~(p~,t) o MU)(pn k+1 ~ o M(5)(Pn; ~‘ ~ ~

o ... ~ M~)(s+ —, s + ... o M(5)(s+ —, ~ +

o ... o MU)(s, s + o o M(5)(s7 s+ ~)f}(r)~ n

wheref is a function definedon P x [0, 5], valued in l~°.We supposethat the coefficientsof eq. (3), Dg, VDg, Xgh, l~g,hand5, areof classW~andthat for any

interval [0, V], their norms II 114, on P x [OmV] areboundedand that Sg(r, t) is null on a neighbour-hood of ~Ll x ~

We suppose,also, that f is of class~ with respectto time andclass ~72 with respectto spacewithboundedpartial derivatives.Thenthe function 4(r, t) is definedby

f(r, t) Vt~<s,~(r, t) = M(s, t) f(r) = IimM0(s, t) f(r) Vt ~s,

on P x [0,+ oo] andis of class~“ with respectto time and~‘2 with respectto spaceandon the intervalEs,+ ~[, it satisfieseq.(3) in a functionalspace.Detailedproofof the correspondingtheoremis given inref. [5]. M~(s,t)f satisfies “nearly” the sameeq. (3). The differencebetweenthe secondmembersiso-(n, r). This latterfunction convergesto zerowhen n -~~ uniformly on any compactset of P x Es, + o4

(see ref. [5]).

2.4. Studyof the operatorM~1~in thethree-dimensionalspace[4,5] (seefig. 4)

To constructthe operator~ associatedwith eq. (3a), we use the Taylor stochasticformula [1].After a translationin the time realline, we cansupposethat the starting time is s = 0. We considerthenthe parabolicdifferential equation

1 ~ a2~-~—-(r,t) = ~$~(r, t)-~—(r,t) + ~a(r, t)~ ~2(r, t),

1irn4(r, t) =f(r),

where 4’, f3~,a andf are functions definedon R2 x [0, T],

aDf~,(r,t) = v-~-—(r,t), i = 1, 2,3,

a2vD(r, t).

168 M Mastrangeloet al. / Stochasticmodellingofdiffusion equationson a parallel machine

exit out ofci after thetime t.

~,(t)

at time s

anr,.

~g(Ti)ri r,i

~ (t)r,e

exit out of cibefore thetime t.

isdefinedandcontinuouson = ~U ëJíì. It is null aneighbourhoodofacl.weintegrate

o ~,s (t). 1 with the probability pr

M(i.~+~) M°’(~.~./~..s÷i.)—~ —4

M07(s.s+~) M°¼i4s÷/,.) ~&

—-4 —~Mra(~+

1~) M

0t(s+~i+~.)~

Mt4>(s...~) M~(i+~+~) ~

——-4 ~‘

-~- I ---- I I

0 utif If

Fig. 4.

M. Mastrangeloet a!. / Stochasticmodellingofdiffusion equationson a parallel machine 169

As a is positivewe can put a = g2 We supposethat /3 and a- are of class ~ such that I f3(r, t)I +

I o(r, t)I ~ K(1 + I r I) and suchthat their partial derivativeshavepolynomial growth to infinity. Let usput X(t) = (X

1(t), X2(t), X3(t)), a standardBrownian motion in R~,and yr(t) the stochasticprocess,then the solutionof the stochasticdifferentialequationassociatedwith eq.(4) is

dY(t) b(Y(t), t) dt+o(Y(t), t) dX(t), (5)

Y(0) = r,

with Y(t) = (Y1(t),Y2(t), Y3(t)) and b = (b1, b2, b3).Wenote T the exit time of Y(t) such that the process Y(t) should be in the domain £2 for any time

between 0 and T. For each e> 0 we consider the diffusion Y~(t)= Y.(�2t). Introducing in Li the

Brownian motion X1E = (1/e)X(e

2t) of the same law as X(t), we obtain for eq. (3),

dY~(t)=e2f3(Y(t), e2t) dt+�a-(YEt, e2t) dX0(t) (6)

or

dY0(t) =B(YE(t), e, t) dt+C(Y~(t), �, t) dX(t), (7)

with

B(y, e, t) =e2f3(y, e2t), C(y, �, t) =�a-(y, e2t).

The diffusion 1’7 solution of (7) conditionedby 17(0)= r is written as a Taylor seriesfollowing thepowersof e.The coefficientsarestochasticprocessesg

1. For the fourth order,for example,it is writtenas

5

Y~(t)=r+ ~E1g

1(t)+�5R

5(t) for t<T. (8)1=1

The stochasticprocessesg3 aresemi-martingalesandthe remainderR5 verifies

lim sup IR~(s)j~p;t<T}=0.~ sE[0,t]

Thereis unicity of this series.We write for eachcomponent

5

Y~1(t)—r1= ~�~g1~(t) +e5R~,

1(t), (9)j=1

d17~(t)= ~ dg~(t)+e5 dR~~(t)(i= 1,2,3), (10)

where g~1is the ith componentof the stochasticprocess~ We put:

1“ t~— B Otj,k,m,n~~-

1~’~ — j!k!m!n! ae~ay~ay~”ay~~ ‘

1Cjk,m,n(y, t) = j!k!m!n! ~ 0, t).

170 M Mastrangeloet al. / Stochasticmodellingof diffusion equationson a parallel machine

Identifying in (10) and (7) written componentby component,and replacing the B~’~and C by theirTaylor expansions,we obtain:

5

d17~(t)= ~ e~dg13(t) + e5 dR

51(t)f—I

k m n

= ~ B~,m,nE~ ~ �1~,i(t) ~ ~(t) ~ g

3~(t) + ~(e5) dt

j+k+m+n~5 1—1 1=1 1=1

k m n

+ ~ Cjj~mflE~’ E �‘g1,1(t) ~ ~2,~(t) ~ g31(t) +~9(~

5) dX1(t).

j+k+m+n~5 l~1 1=1 1=1

(11)

Then,we use thefact that the coefficientsof �3 (j = 1, - . - , 4) arethe samein the right andtheleft termsto obtainthe stochasticdifferential equationsdeterminingthe processesg11 satisfyingg,3(0) = 0.

As the domain P is bounded,we mustpay attentionto the exit of the trajectoriesoutof it. But, usingthe Lindebergproperty(seeref. [5]), we know that, for any t in P

t”P(lc(t) E CLi) ~ t~P(T<t) —~ 0 (t —~ 0).

So, we can suppose,in a first approximation,that T~ t, for every trajectory and the trajectoriesforwhich this relationis falsebelongto a set,the probabilityof which is smallandcanbeneglected.A goodapproximationof the solutionof (4) — exact if the coefficientsare time-independent— is given by

M’~’~(O,E2t) f( r) = EE[f(Yr(t))J -The calculusimplies the use of the expectationsof the processesg1~They are obtainedprogressivelyandthe calculationmakesessentiallyuseof the ITO formula, the FUBINI theoremand the martingalepropertiesof the stochasticintegrals.

The calculusrequiresthe partial derivativesof f. We denote

af a2f

fri = ‘ frirj =

The approachedvalueof M~lks,t)f= E{f ° )‘.‘~(1))is given by

(3 3

M”~(r, t)f(r) =f(r) + u~~ B~oofr.(r)+ ~C~000~ f~(r)

i=1

+ U2{~0 ~frt(r) + ~ ~fr1r~(r)~ ~00C~000+ C~00~C1~,1812~13)

i #j

+ ~Lfr~(r) c~00(B~~+ C1OOOC1a~2as)

+ ~ ~ f~(r) (~(B~~+B~~)C~O+ B~ooB~o)I,) = 1

M. Mastrangeloetat / Stochasticmodellingof diffusionequationson a parallel machine 171

aB°~+ ~ f~(r)(C~0y1+ ~Y2) + ~ ~ f~1(r) ~ B~JLa,2&J3B~IJ~IO+ —~--—(r,0)

i—i i=1 j=1

+ c12~( B~x2~x2a))) } + ~(u5/2), (12)

where frkr(r) = (a~l/ar,kar3)f(r),where~ is the Kroneckersymbol andwhere

= —~(r, 0) + ~ (B~0C1~.,~.28.3+ C~00OC1(2o!X2aX233))

and

3....ç’2 c’2

72 — ‘~1000 ~

i= 1

Remark 1

- In the casewherethe coefficientof diffusion D is constant,eq.(12) becomesmuch simplified

(M”)(s, s + u)f)(r) =f(r) + u[ VD(f~(r)+f~(r) +J~(r))j

+u2[~V2D2(f~(r) +f~(r) +f~(r) + 2f~2(r)+ 2f~2(r)+ 2f~2(r))]+t9(u~). (13)

Remark2To deducethevalueof M~

1~in dimension1 or 2, the calculusis similar. For example,in dimension1,we fix b

2 = b3 = 0 and we consideronly the first componentof the Brownian motion, consideringthatthe secondand third componentsof C are identically null. In all the summationsof the formula givingM~’~we consideronly the index i = 1. For example,in eq.(1), we havenow

a/3(r,t)=B~o=Vg—~--Dg(r,t) and a=C~0=2VgDg(r,t).

Equation(12) is thenwritten

(M”)(s, s +u)f)(r) =f(r) +u[f’(r) /3(r, 0) + ff”(r) a(r, 0)]

+u2[(~f3(r, 0) +$~(r,0) + ~f3~’2(r, 0) a(r, 0) + ~f3(r, 0))f’(r)

+(~J3~(r,0) a(r, 0) + ~I3(r, 0) a~(r,0) + ~a~2(r, 0) a(r, 0)

+~f32(r, 0) + ~a1/2(r, 0) a(r, 0))f”(r) + (~/3(r, 0) a(r, 0)

+ ~a(r, 0) a(r, 0))f~3~(r)+ ~a2(r, 0) f(4)(r)] + e~’(u5”2). (14)

In reality the restis of c9(u3). In the casewhere Dg(r, t) = Dg is constanton (1 c D~,eq. (14) takesthesimpleform

(M”)(s, s + u)f)g(r) =fg(r) + uvgDgf;’(r) + ~ +~‘(u~). (15)

172 M Mastrangeloet al. / Stochasticmodellingof diffusionequationson a parallel machine

3. A testproblemfor the T-Node:stochasticsolutionof diffusion equationsystem[3]

The classof problemstreatedis that of convection-diffusionequations;the heatdiffusion or particlediffusion in heterogeneousmedia.We havechosenas a test problemthe code“MIXAGE” 3D whichsimulatesa particle(neutron)diffusion in a tridimensionaheterogeneousmedium.

Before presentingthe parallelizationprinciple of the code“MIXAGE” 3D on the T-Node/Tandem,we summarizethe numerical solution method. It is conditioned by the mathematicalmodelling used.This is called“constructive”becauseat eachtime step,oneconstructsthe solution of physicalproblemsby composition of the mathematicaloperators (three in our example) associatedto a subsystemdescendedto the initial system, the numerical resolution of which is easier(see section 2). On theflowchart (seefig. 5) the different phasesof calculusare presentedas the threeoperators~ ~~

3.1. Parallelization principle of “MIXAGE” 3D codeandchoice ofprocessornetworktopology

The modelof thephysicalproblemspossessesaninherentgeometricalparallelism[7,8,17,21].We havefirst developeda sequential“MIXAGE” 3D code(whichtakesinto accountthe geometricalparallelism).The sequentialcodeis duplicatedandthe “copies” aresharedon severalprocessors.In otherwords,thespatial domain calculusin ~ (parallelepiped)is divided in subdomains.Eachof theseis assignedto aprocessoron which thereis a “copy” of the code.Afterwards,one must establishthe communicationsnecessaryfor information exchangebetweenthe subdomains.The objective is to adapt, in the bestpossibleway, the networkprocessorconfigurationto thespecific problemtreated.This canbe donewellwith a matrix cutting-upof spatialdomains.In thecaseconsidered,thespatialdomain is a parallelepipedwith a squarebasis,the subdomainwill also be parallelepipedwith a squarebasis to n X n number.Thechoiceof the largerdimensionrestson the natureof the desiredresults.

With the help of the flowchart in fig. 5 (seealso appendix),thevariousoperationscanbesummarizedin this way. The first phaseis to share,on the different processors,the physical and geometricdata.Then, simultaneously,the exponentialof the matrix is calculatedon the ROOT processorand thecalculusof initial functions or vectorsfor eachsubdomainis accomplishedon different processors.Assoonasthe processorshavefinished theinitial conditionphasecalculus,the resultingexponentialsof thematrix are distributed on the network from the processorROOT to all the processors.Finally, eachprocessoraccomplishesthe first iteration as the next for the subdomainaffectedby it.

We note when the phasecalculus is of three points second-derivativeorder, an operatorM~’~exchangeof data is accomplishedwith adjacentprocessors.Exchangeis madeby meansof logicalchannelsconnectingseveraltasks(at a physicallevel this correspondsto transputerlinks). For this to bepossible,a synchronousblocking communicationsbetweentasks must be done. This is automaticallyachievedby meansof transferinstructionsinsertedin the sourceprogram[22].

In this way we obtain the different segmentsof the resultingvector; theseare sent to the operatorM~3~in view of reachinga new iteration. At the endof N iterations,if onewishes to showon thescreenof the host machinethe correspondingresults,the processor— fixed with input—output— cuts theseoffandmakesa transferto the ROOT processorwith the (N + 1)-th iteration.

3.2. Network topology [19]

Concerningthe spatialdomain(R3) being a squarebasisparallelepiped,it suitsnaturally — in view ofpreservingthe symmetryof the problem— that the subdomainsareof the sameclass.

M. Mastrangeloet al. / Stochasticmodellingofdiffusion equationson a parallel machine 173

Input of physical and geometrical data

Obtention of operator M2

U Ulal~ = “MER 12I U;, •W~

Calculus of initial functions Ci (5.11and

{ ~ (x.t). ~ (x,

[N=i

Calculus of operator M3~l3 (x, I + Nat) = 4~’~(x, I + (N—1) ~t)

AV1~i+ ~— ~

+ (N_I1AI(.4~ta,t + IN—IIA,)With C(x,i+Ni~t)orFl (x.t +)N—t)A,l

Calculus of operator M2

+ NALII •3i~, 4 NAil

(2 = EMERI ~Iz.g + NAg) 4 (~,i+ (N—hAil

Calculus of operator M1

s ~ ,t21 (a,t + NAt) V, D1 ,,(21 (a. I + NAn

II 1= ~2 (2)

•( l~, + NAIl 4 (xi + NAt) V2D2~2 (a,I 4 NAil

~1~x1flO I

~ of results 1Fig. 5. Flow chartof MIXAGE 3D code.

174 M. Mastrangeloet a!. / Stochasticmodellingof diffusionequationson a parallel machine

l’r,,cessor r,o)/iiii,d IIk~ 111111%

I i~~ u-~i f2 u-~21 (2 u-~31 if u-~1 f2 if u~51 (2 if up~1 (2 f u.~:i

u-p.12 2 ~ u-p-22 2 4 u-p-32 ‘ 4 2 4 2 4 2 ~ u-p-12

______ IIfroc.,sorX o.p.l)J 0 0 0 0 0 0 0nIodik unft.(ruv L~u-p-13 fi ~(u-p.23 f~ ~f u-p-33 ~1 12 (~ f~ ~~ 1Lf

Nomber,~ E~u.p.14fi if f~ if f~ if u.p~44 u-p-Sd (2 if F ____logkiI chono), (‘ _________ ( module unIt-sort r r

fi if f~ ~f 1~ ~1f~ 4f (~ ~f f~ _______

L(U.P.16f2 if f� if f2 ~l f~ (2 if F uJu.P~761~1

u-p-li 12 if (2 if I’ if (2 4j (2 4 f (2 f u-p-fl

Fig. 6. Squarematrix(7 x 7) processornetworktopology.

Following a fixed axial plane, the basis of a subdomainis square.This domain decompositionis anaturalone for the problem and insuresthat the load is well balanced.Our choice fell on the squarematrix (7 x 7) processornetwork.

In order to haveefficiency at the levelof transferinput—outputand,consequently,of the calculus,it isnecessaryto foreseea processordedicatedto the managementof this. For the applicationconsidered,the outputof the resultsof the tridimensionalcalculusis thatwhichwe havefor coordinatef( y, -ix, ~z).The processorassignedto the correspondingsubdomainmust transmit the results to the processordedicatedto input—output.This occupiesa “principal” position at the level of the processornetwork(seefigs. 6 and7).

3.3. Presentationofthe results

We haveconsidereda cubic spatialdomainandhavediscretizedthis in (23 x 23bx23)points.For thistest casewe havereducedvoluntarily the volume of the data-setso as to limit handling of the datadirectory.The objective is to evaluate the calculus performance of T-Node in relation to other machines,especiallythe CRAY2.

The “MIXAGE” 3D code has been optimised in vectorial mode on a VP 200 (Siemens)and aCRAY2, respectively,by the ComputerScienceDepartmentof CNRS at Orsayand the “Centre deCalculVectoriel pour Ia Recherche”(CCVR) at Palaiseau.

M Mastrangeloet aL / Stochasticmodellingofdiffusion equationson a parallel machine 175

File of Processor Processordata-set root u-p-53

fl ~ LOGE — _____________________________ — ~LDGE —

L...._J1—CEM CE)

results LIEME — __________________________ — —~LEME — ~ t[1 CPT~~:

I ~ — LIPTE -~- —Processor Output Processoru-p-44 processor

— ~-LDGE — —i — -~-LDGE — LOGE

CE) CFI

— * LEME — —~LEME LEME

~ ~CPT~ ____ c~L14T LRPT~J -~~-EOCDP-.4-

—~.-LRPTE — .....~ Processor______________ u-p-SS

— .~—LDGE — —~.-

LDGE Read-out of geometrical data and dispatching CE)CEM Calculus of expoflen(iat matrix — .~—LEME —CFI Calculus of Initial func(iOnSLEME Read-out of exponential matrix and dispatchingRDG Receipt of geometrical data SREM Receipt of exponentia( matrix c •CPTI Calculus for time-step i utilize EDCPEDCDP Exchange of data for the catcu(us of partial derivatives _______________

LRPTE Read-out of results for the time step and dispatching

Fig. 7. Flow chartand detailsof exchangesbetweenprocessors.

Note that, in the meantime,in the researchgroup CNAM—University Paris6 “Modélisationphysiqueet stochastique”,we have accomplisheda calculus on realistic configurations and have obtainedtridimensionflux distributions(see fig. 8) ~.

3.3.1. Carrying out of test case on the CRAY2of CCVGOf the vectorial version of the code“Mixage” 3D [22] thereexists an “auto-tasking” versionwhich

runs on the CRAY2 of the “Centre de Calcul Vectoriel de Grenoble (CCVG)”, accessiblefromJRC-Isprathrougha dedicatedline. The “autotasking”option permits to bring the automaticparalleliza-tion into play on the CRAY2. The “autotasking” option is a logical prolongationof the “microtasking”option where the userinsertsdirectivesin his programin view to isolateportionsof the programwhichcan be performedin parallel on severalprocessors.The parallelism is accomplishedessentiallyat thelevel of the imbricated “DO” loop; the most external loop is parallelizedand the internal loop isvectorized.In somecasesit is difficult for the preprocessorto detectthe parallelism;the usercan thenlead it adding additional information with the help of directives [16]. We have usedthis option, forinstance,to inhibit the dependenceanalysisof variablesdeclaredin equivalence.We haveperformedtwocalculations,the first with 100 time stepsandthe secondwith 1000 time steps.The CPU timesaregivenin table 1. The displayof the resultshasbeenmadeonly for the last time step.

* The resourcesof our calculushave beenattributedto thegroupby thescientific councilof “Centre deCalcul Vectoriel pourIa

Recherche”at Palaiseau.

176 M. Mastrangeloet al. / Stochasticmodellingof diffusion equationson a parallel machine

‘l’UERMAL FLUX DIS’l’RlBU’flON - PLANE 2

Fig. 8. Thermal flux distribution— plane2.

Table 1

Execution CPUtime (s)

100time steps 1000time steps

“autotasking”mode 21.2 211.17vectorialmode 19 212.29

The execution in the “autotasking” mode of “Mixage” 3D brings, in practice, no profit in respect tothe vectorial option of the program. This is not due to a particular difficulty in the “autotasking” option;we simply seethat the automaticpartitionof the programin taskson the four processorsis suchthat oneof these is very much loaded at the expenseof the threeothers. In other words, the compiler cannotisolateindependentpartsin the sourceprogram.It is necessaryto do a “manual” parallelizationof theprogram.It will also be necessaryto considera step analogousto that which has beenusedfor theT-Node; i.e. a similar “geometric” parallelizationthat we haveusedfor this one;of course,in the firstcasethe memoryis common to the four processorsof CRAY2.

In view of making a comparisonwith CPU timesof the T-Node, it is moreadvisableto consideronlythe vectorialversion of the “Mixage” 3D program.If we suppose,in addition, that in the nextversion,the multitaskpercentageis of the order of 99%, the CPU time of the availablevectorial versionwill bedivided by a factorwhich is very nearto 4 (maximal theoreticalvalue).The new CPU timewould thenbe5 s and58 s, respectively,for 100 and 1000time steps.Nevertheless,it must be borne in mind that thevectorialversionof “Mixage” 3D can still be improvedon the CRAY2 ~‘.

‘~The optimisationof theveetorizationhasbeenmadewith the helpof (very powerful) toolsof V.P. Siemensof the “ComputerScienceCenter” of CNRS atOrsay.

M Mastrangeloet at / Stochasticmodellingofdiffusion equationsona parallel machine 177

Table 2

Execution 100 time steps 1000time steps

Sequentialmode 230.29s 2270.90 s

Parallelmodenetwork(3x 3) 22.084s 218.59Efficiency 94.8% 94.4%

Parallelmodenetwork(7x7) 4.74s 45.3 5

Efficiency 95.3% 98.3%

3.3.2. Performanceof testcaseon the T-Node/ Tandemof the JRC-IspraWe haveperformedthreetypesof calculations. The first one is in sequential mode on oneprocessor;

the secondwith a networkof (3 x 3) processorsand the third with a networkof (7 x 7) processors.Wehaveobtainedthe CPU times listed in table2.

One defines an efficiency rate with the relation:

— CPU time in sequentialmode

E — CPU time in parallelmodexnumber of processors used ~

The displayof the resultsbeing accomplishedat the last time step.We observeda neatdecreaseof CPU time asmoreprocessorswere added.We also notehighvalues

of efficiency rate (~a95%). Thesegive evidenceof the high degree of parallelization of the applicationand the optimal utilization of the material resources used. We obtained for a (7 X 7) network ofprocessorsCPU time runs in the sameorder as the CRAY2.

A delicatepoint in the distributedmemorymachineconcernsthe transfertime of results to the hostmachinewith regardto running time on the processor.By the way, runs havebeenmadewith displayofresultsat eachtime step.The performancesareappreciablypreservedwith, nevertheless,a decreaseofefficiencyrate with increasingnumberof processors(7 x7). This is becausethe transfer time of theresultsbecomesnonnegligible in respectto the runningtime. In fact, theload of work for eachprocessordecreaseswith the increasingnumberof processors(seetables3 and 4).

Finally, we carriedout simulationswith much longertimes with anetworkof (7 X 7) processors.Thisshows that the law of the CPU variationtime is linear (seetable 5). Besides,we wantedto check thestability of the methodused.This stability cannotbe definedfrom the absolutevaluesof the functionsorresultvectorsas this simulationis doneon a dynamicalsystem.However, it is possibleto havean ideaofthis stability by taking the ratio for a given point in spaceof the componentvalues of the fast andthermalflux.

From table 5, we can see that this ratio remains constant in time with a good approximation.

4. Conclusion

This testcaseshowsthat in the presentstateof developmentof parallelmachines,it is not possibletohavea generalmethodof parallelization.However,theT-Node possessesa certainflexibility at this levelbecausethe networktopologycanbe adaptedto inherentparallelismof the problem considered.

The running timeson the T-Node areof the sameorder as thoseof the CRAY2, taking into accounthoweverthat, in this specific case,all output is to be caughton one processor only. In the case of atridimensionaloutput flow oneshouldhavetakenthe calculationresultson the variousprocessors.

* Thenumberof processorsusedarethe (nX n) matrixof processorsandtwo additionalprocessors(root+processordedicatedto

input—output;see fig. 6).

178 M Mastrangeloetal. / Stochasticmodellingofdiffusion equationson a parallel machine

Table 3

CPUtime for 100timessteps

Displayat the last time step Displayateachtime step

Sequentialmode 230.286s 235.085s

Parallelmodenetwork(3x 3) 22.084s 22.263sEfficiency 94.8% 96.0%

Parallelmodenetwork(7x7) 4.4 s 5.512 sEfficiency 95.3% 83.6%

Table 4

CPUtimefor 1000 timessteps

Displayat thelast time step Display ateachtime step

Sequentialmode 2270.90s 232L74s

Parallelmodenetwork(3 x 3) 218.59s 21879sEfficiency 94.4% 96.5%

Parallelmodenetwork(7x7) 45.30 s 5L45 sEfficiency 98.3% 88.5%

Table 5

CPU time(s) Numberof time steps 4~/~ ratio at centerof thecube

543.91 io~ 9.96432645436.92 9.9643286

10892.15 2x105 0.9643280

16336.64 3x105 9.964328054377.31 iO~ 9.9643288

217529.13 4x106 9.9643292326709.50 6x106 9.9643292

If onewantsto recoverthe resultson all the processors,it is necessaryto manageall information tothe input—outputprocessorpermitting a graphicdisplay. In order then to keepa level of performanceaccuracy,it would be necessaryto put into placea level to transferthe resultsto.

In the presentstateof accessiblescientific librariesin paralleldistributed-memorymachines[23—25],in practice it is necessaryto possessthe entire FORTRAN 77 source program. This is the case in“Mixage” 3D. We remarkthat this lendsitself verywell to parallelizationon the T-Node/Tandem.

The operatingsystemhelios, availableon the T-Node, is a distributedsystemwhich permits a more“comfortable” operatingof the T-Node.Moreprecisely,afterhaving definedseveraltasksof applicationandthe mannerin which thosecommunicate,the systemdistributesin the bestway possiblethe disposalof tasks on the severalprocessorsthe numberandtopologyof which arenot a priori known by the user[13]. The numberof processorsandthe networktopology(interconnections)canbe entirelyindependentof applicationandin all casesthe availableresourceswill be efficiently utilized.

M Mastrangeloet at / Stochasticmodellingofdiffusion equationsona parallel machine 179

Acknowledgement

This work hasbeenperformedduring a periodas “Visiting Scientist”of oneof the authors.Materialfor this paperhasbeendrawnfrom many sources.Consequently,it is difficult to acknowledgeall thosewho worked on it. Nevertheless,particularthanksaredueto Dr. R.W. Witty, director of ISEI, Drs. G.Gasini, J. LarisseandJ.C.Grossetie,who havecollaboratedto a successfulconclusionof the work.

Appendix. “MIXAGE” 3D parallel codecalculus on a subdomain

implicit double precision (a—h,o—z)include ‘chan . mc’integer ciO, co_O,ci_2, co 2,ci 3,co_3, ci_4, co_4logical milieu, not_g, notd, noth, not_b

parameter (NN—2,NPRCUR—6,NAR—21,NXY3)parameter (NXY2~2*NXY,NPT_NAR*NXY*NXY,N4~(NXY+l) /2)parameter (IDES—i, IFIN—NAR+l—NXY,M(NAR+l) /2, IMIL=M—N4+l)parameter (NPXNAR,NPYNAR*NXY,NN2NN*NN)parameter (M_NXY2_8*NXY2,M_NN2~~8*NN2,M_NAR8*NAR)double precision mer(NN,NN),nusfcl,nusfc2,mu2,mui

common xi].,x2l,x12,x22equivalence (xll,mer(l,l))

common/don c/mul,mu2.haut,nusfcl,nusfc2,sacl,sac2..diffCl,diffC2!,srci2,vl,v2,bel,be2,be3,be4,be5,be6,all,a12,a13,a14,a15,a16,CdJ.m

,n, oS, npast, n8, id_x, id_ydimension don c(28),bet(6),aJ.amb(6)equivalence (rnul,donc(].)),(bel,bet(i)),(all,alamb(1))dimension dxa(NPT),dxb(NPT),dya(NPT),dyb(NPT),ifld(NAR,NXY,NXY)

!,dza(NPT),dzb(NPT),vpa(NPT),vpb(NPT)dimension exa(NPRCUR) ,exh(NPRCUR) ,e3(NPRCEJR)dimension ccl (NPT,NPRCUR)dimension wO(NXY2) ,wi(NXY2) ,w2(NXY2) ,w3 (NXY2)dimension wtO (NXY2) ,wtl (NXY2) ,wt2 (NXY2) ,wt3 (NXY2)dimension waO(NXY) ,wbO(NXY) ,wai(NXY) ,wbi(NXY),wa2(NXI) ,wb2 (NXY),wa3(NXY) ,wb3(NXY)equivalence (wO(l),waO(l)),(wO(l+NXY),wbO(l))equivalence (wi(l),wai(l)), (wl(l+NXY),wbl(i))equivalence (w2(i),wa2(l)),(w2(l+NXY),wb2(l))equivalence (w3(l),wa3(l)),(w3(l+NXY),wb3(l))dimension wtaO (NXY) ,wtbO (NXY) ,wtal (NXY) , wtbl (NXY)

I ,wta2 (NXY) ,wtb2 (NXY) ,wta3 (NXY) ,wtb3 (NXY)equivalence (wtO(l),wtaO(l)), (wtO(l+NXY),WtbO(l))equivalence (wti(l),wtai(l)), (wtl(l+NXY),wtbl(l))equivalence (wt2(l).wta2(i)), (wt2(i+NXY),wtb2(l))equivalence (wt3(i),wta3(l)),(wt3(l+NXY),wtb3(l))

dimension rsl (NAR) , rs2 (NAP.)dimension vpl(NAR,NXY,NXY),vp2(NAR,NXY,NXY)equivalence (vpa(l),vpl(l,i,i)),(vpb(l),vp2(l,i,l))equivalence (rs].(i),vpi(l,N4,N4)),trs2(l),vp2(i,N4,N4))

data pi/3.14i59265358979d0/ci O—F77 CHAN_IN_PORT(O)coO—F~7CHAN_OUT_PORT(0)ci 2—F77 CHAN_IN_PORT(2)co2—F77ThHAN_0UT.yORT(2)ci3—F77CHAN IN_PORT(3)co3—F7 7ThHAN_.OUT_PORT(3)

ci 4—F77 CHSAN_IN PORT(4)co:4—F77:CHAN_OUT_PORT(4)call F77_CHAN_IN_MESSAGE(224,don_c,ci_4)n5—l—n5

CC Controleur de priorite du processeur en emission ou en receptionC bra des echanges inter—processeursCC Si une tache est en phase d’emission, lea taches voisines doivent

180 M Mastrangeloetat / Stochasticmodellingof diffusionequationsona parallel machine

C etre en phase de reception afin d’eviter be bbocage de l’iriformationC

nunxero—n5CC Indices en X et Z du sous-domaine relativement au domaine entierC

idx—idxidy—id_y

CC Calcul des indices pour be processeur suivaritC

id_x—id_x+NXYif (id_xeqNAR+l) then

i d_x—1idy=idy+NXYif (idy.eqNAR+1) then

i dy~1n5—1—n5

endifendif

CC Transfert du flot de donnees au processeur suivantC

call F77CHAN_OUTMESSAGE(224,donc,co2)CC Definition des conditions de frontiere global et de bigne medianeC

not_g—idx.ne.IDEBnot d—idx .ne.IFINnot_h—idy - ne.IDEBnot_b—idy - ne.IFINmilieu—idx. eq. IMIL. and. idy. eq. It-ilL

Ci3—0do 1 k—b,NXYdo 1 j—b,NXYdo 1 l—b,NARi3—i3+b

1 ind(l,j,k)—i3CC Initialisation des variables et calcul des fonctions initiabes relativesC au sous—domaine attribue au processeurC

an—Ndt—idO/anazn—Mdx—haut/(2 .dO*am)dxx_dx*dxpl_vb*diffcl*dt/dxxp2—v2*diffc2*dt /dxxdo 50 i—l,NPRCURexa(i)_dexp(_alaxtth(i)*dt)exb(i)—(idO—exa(i)) *bet(i)/al~(i)e3(i)_alamb(i)*vl*dt

50 continuebin—pit (2d0*aln)cfmu—nusfci *mul+nusfc2*mu2i3=0do 2 k=b,NXYak—dabs(dsin (bm* (k+idy—b) ))

do 2 j=l,NXYaj=ak*dabs (dsintbm* (j+idx—i)))

do 2 l—l,NARal_aj*dabs(dsin(btn*l))i3—i3-i-lvpa (i3) _mul*alvpb(i3)_mu2*al

do 2 i—l,NPRCtJRceb(i3, i)—bet Ci) *a1*cfmu/ala~(i)

2 continueCC Reception et transfert du resuj.tat du calcul de l’exponentielle de matrice

M. Mastrangeloetat / Stochasticmodellingof diffusionequationson a parallel machine 181

Ccall F77_CHAN_IN_MESSAGE(MNN2,mer,ci_4)call F77_CHAN_OUT MESSAGE(N NN2,mer,co 2)

nb0—090 nlO—nlO+l

CC Debut du calcul pour be pas de temps considereCC Calcub de l’operateur “143”C

do 40 i—1,NPRCURdo 40 i3—b,NPTce2—cel(i3, i) *exa(i)

!+(nusfcb*vpa(i3)+nusfc2*vpb(i3))*exb(i)vpa(i3)—vpa(i3)+e3(i) *(cebti3,i)+ce2) /2d0cel(i3, i)—ce2

40 continueCC Calcul de b’operateur

0M2•’C

do 73 i3—1,NPTvpai3—vpa (i3)vpa(i3) _xbl*vpai3+x12*vpb(i3)

73 vpb(i3)_x21*vpai3+x22*vpb(i3)CC Definition des variables de frontiere d’un processeur avec ces voisinsC

do 20 1—b,NARdo 21 J—l,NXYif (not_h) then

wa3(j)—vpl(1,j,1) Swb3(j)—vp2(b, j,1)

endifif (not_b) then

wa0(j)—vpl(b, j,NXY)

wb0 (j)—vp2 (1, j,NXY)endifif (not_g) then

wal(j)—vpb(l,1,j)wbl(j)—vp2(b,1,j)

endifif (not_d) then

wa2(j)—vpb(l,NXY, j)wb2(i)—vp2(l,NXY, i)

endif21 continue

CC Echange des variables de frontiere entre processeursC

if (nuinero.eq.b) thenCC Reception puis emission des variables de frontiereC

if (not_b) call F77_CHAN_IN_MESSAGE(M_NXY2,wtO,ci_0)if (not_g) call F77 CHAN IN MESSAGE(M NXY2,wtb, ci4)if (not_d) call F77_CHAN_IN_MESSAGE(M_NXY2,wt2,ci_2)if (not_h) call F77 CHAN_IN MESSAGE(M NXY2,wt3, ci3)if (not_h) call F77_CHAN OtJT_MESSAGE(M_NXY2,w3, co_3)if (not_d) call F77_CNAN_OUT_MESSAGE(M_NXY2,w2,co_2)if (not_g) call F77_CHAN_OUT_MESSAGE(M_NXY2,wl,co_4)if (not_b) call F77CKANOUTMESSAGECM_NXY2,wO,co_0)

elseCC Emission puis reception des variables de frontiereC

if (not_h) call F77 CHAN_OUT_MESSAGE(M_NXY2,w3, co_3)if (not_d) call F77_CRAN_OUT_MESSAGE(M_NXY2,w2,co_2)if (not_g) call F77_CHANOUT_MESSAGE(M_NXY2,wl,co_4)if (not_b) call F77_CHAN_OUT_MESSAGE(M_NXY2,wO,co_0)if (not_b) call F77 CHAN_IN MESSAGE(M_NXY2, wt0, ci_0)if (not_g) call F77CILANINMESSAGE(MNXY2,wtl,ci_4)if (not_d) call F77_CHAN_IN MESSAGE(M_NXY2, wt2, ci_2)

182 M Mastrangeloetal. / Stochasticmodellingof diffusion equationson a parallel machine

if (not_h) call F77_CHAN IN MESSAGECM NXY2, wt3, ci_3)endif

CC Calcul des derivees secondessur 3 pts aux frontieres du sous-domairieC

do 22 j—1,NXYi3—ind(l, 1,])

if (not_g) thendxa(i3)_wtab(j)_2d0*vpa(i3)+vpa(i3+NPX)dxb(i3)—wtbl(j) —2d0*vpb (i3)+vpb (i3+NPX)

elsedxa(i3)_cdbm_2d0*vpa(i3)~I~vpa(i3+NPX)dxb(i3)_cdbm_2d0*vpb(i3)+vpb(i3+NPX)

endifdo 23 k—2,NXY—li3.-i3+NPX

dxa(i3)vpa(i3—NPX) _2d0*vpa(i3) +vpa(i3-fNPX)23 dxb(i3)_vpb(i3_NPX)_2d0*vpb(i3)+vpb(i3+NPX)

i3—i3+NPXif (not_d) then

dxa(i3)—wta2(j) _2d0*vpa (i3)+vpa(i3—NPX)dxb (i3)wtb2 (j) _2d0*vpb(i3) +vpb(i3—NPX)

elsedxa (i3) —vpa(i3—NPX) _2d0*vpa (i3) +cdlmdxb (i3)—vpb (i3—NPX) _2d0*vpb (i3) +cdlm

endifi3=ind(b, j, 1)

if (not_h) thendya (i3) —wta3(j) —2d0*vpa (i3) +vpa(i3+NPY)dyb (i3) —wtb3 (j) _2d0*vpb (i3) +vpb (i3+NPY)

elsedya(i3)=cdlm_2d0*vpa(i3)+vpa(i3+NPY)dyb (i3) _cdlm_2d0*vpb(i3) +vpb(i3+NPY)

endifdo 25 k~2,NXY—bi3—i3+NPYdya1i3)=vpa(i3—NPY) -..2d0*vpa(i3) +vpa(i3+NPY)

25 dyb(i3)—vpb(i3—NPY)—2d0~vpb(i3l+vpb(i3+NPY)j3—j3+NPY

if (not b) thendya(i3)_w~a0(j)_2d0*vpa(i3) +vpa(i3—NPY)dyb(i3) =wtb0tj)_2d0*vpb(i3)+Vpb(i3_NPY)

elsedya(i3)_vpa(i3_NPY)_2d0*vpa(i3)+cdlrndyb(i3) .vpb (i3—NPY) _2d0*vpb (i3) +cdlm

endif22 continue20 continue

CC Calcub des derivees secondes sur 3 pts a b’interieur du sous—domaineC

do 24 j—b,NXYdo 24 k—b,NXYi3—ind(b, j,k)

dza (i3) _cdlm_2d0*vpa(i3)+vpa(i3+1)dzb(i3) _cdlm_2d0*vpb(i3)+vpb(i3+b)

do 26 b—2,NAR—1i3—i3+b

~26 dzb(i3)_vpb(i3_l)~2d0*vpb(i3)+vpb(i3+l)

i3—i3+bdza(i3) —vpa(i3—b) _2d0*vpa (i3) +cdlmdzb(i3)—vpb (i3—1) ~2d0*vpb (i3) +cdlm

24 continueCC Calcul de l’operateur “Ml”C

do 48 i3—l,NPTvpa(i3)_vpa(i3)+pl* (dxa(i3)+dya(i3)+dza(i3))vpb(i3)_vpb(i3)+p2*(dxb(i3)+dyb(i3)+dZb(i3))

48 continue

M Mastrangeloetat / Stochasticmodellingofdiffusion equationson a parallel machine 183

CC Test de fin de calculC

if (nbo—npast)133,133,10CC Test de validite de ba phase d’emission des vecteurs resubtatsC

133 if (mod(nb0,n8) .ne.0) goto 90CC Verification de l’autorisation d’ecritureC

if (milieu) thenCC Transfert des vecteurs resultatsC

call F77_CHAN_OUT_MESSAGE(M_NAR,rsl,co2)call F77_CHAN_OUT_MESSAGE(MMAR, rs2, co2)

endifgoto 90

10 stopend

References

[1] T. Azencott, Formule de Taylor Stochastique et DéveloppementsAsymptotiquesd’IntégralesdeFeynman,LectureNotesNo.921 (Springer,Berlin 1983).

[2] A. Huard, P. Laigle, V. Mastrangelo,M. Talbi and S. Xhemalce,A method basedon a stochasticapproachfor spacedependentnuclearreactorkinetics in onedimension,Comput.Phys.Commun.46 (1987)351.

[3] V. Mastrangelo,Stochasticmodelisationand parallelcomputing,invited lecturerArchitecture,Programmingenvironmentandapplicationof the supernodenetworkof transputers,EuroCourses— JointResearchCentreIspra,4—8 November1991.

[4] A. Huard,M. TalbiandS. Xhemalce,Solutionapprochéed’une equationauxdérivéespartiellesparaboliquepar uneméthodestochastique,C.R.Acad. Sc. Paris, t. 3C2, Série5, No. 9 (1986).

[5] M. Mastrangelo and V. Mastrangelo, Stochastic resolution of space—time nuclear reactor kinetics in multigroup diffusiontheory, Transport Theor. Stat. Phys. 13 (1984) 533.

[6] P. Laigle, V. Mastrangelo and S. Xhemalce, Resolution stochastique de systèmes d’Cquations aux dérivéespartiellesdu typeparabolique affine et applicationsphysiques,EDF/Bulletin de Ia Direction des Etudes et Recherches, Série C, No. 4 (1990)17.

[71Revue “La Recherche”,Lesnouveauxordinateurs(November1988).[8] V. Pierre, Nouvellesarchitecturesd’ordinateurs,processeurset systèmesd’exploitation,ediTest(1989).[9] Telmat Informatique,T-NodeUser Manuel(1990).

[101 D. Pountainand D. May, A tutorial introduction to Occamprogramming(PSP/ProfessionalBooks, 1988).[11] INMOS Limited, OCCAIvI 2 referenceManual,CAR HoareSeries(PrenticeHall, EnglewoodCliffs, NJ, 1989).[121 3L Ltd parallelFortranUser Guide,3LLtd. (1988).[13] PerihelionSoftwareLtd., TheHeliosoperatingsystem(PrenticeHall, EnglewoodCliffs, NJ, 1989).[14] D. Pountain,Virtual channels:the nextgenerationof transputers,Byte (April 1990).[15] C.A.R. Hoare,CommunicatingSequentialprocesses(Prentice-Hall,EnglewoodCliffs, NJ, 1985).[16] CRAY, SN-2088AutotaskingUser’s Guide(1988).[17] D.Heidrich,ImplementationdeslangagesC, FortranetPascal,Paralèles3L surune machineMIMD a rCseaureconfigurable:

supernode,Rapportde DEA Université deMulhouse(October1990).[18] J.J. Dongarra, Overview of current high-performance computers,supercomputingEurope ‘89 (1989).[19] R.W. Hockneyand CR. Jesshope,ParallelComputers(Adam Hilger, Bristol, 1981).[20] INMOS, Transputerdevelopmentsystem(Prentice-Hall, EnglewoodCliffs, NJ, 1988).[21] A. GibbonsandW. Rytter, Efficient parallel algorithms(CambridgeUniv. Press,Cambridge,1988).[22] V. Mastrangeloetal., Modélisationstochastiqueet calcul parallCle, JRC-Ispra/CEC,TechnicalNoteNo. 1.91.58(April 1991).[23] TopexpressLtd., Mathematicalprocedurelibrary referencemanual.[24] TopexpressLtd., Vector library referencemanual.[25] NA. SoftwareLtd., Liverpool parallel transputermathematicallibrary.

Documents

Stochastic modelling of diffusion equations on a parallel machine