25
I I I I T T DynaCORE DynaCORE Dynamically Reconfigurable Dynamically Reconfigurable Coprocessor Coprocessor for Network Processors for Network Processors Carsten Carsten Albrecht, Albrecht, Roman Koch Roman Koch , , Christoph Christoph Osterloh Osterloh , , Thilo Thilo Pionteck Pionteck , Erik , Erik Maehle Maehle Institut Institut f f ü ü r r Technische Technische Informatik Informatik Universit Universit ä ä t zu L t zu L ü ü beck beck Head: Prof. Dr. Head: Prof. Dr. - - Ing Ing . Erik . Erik Maehle Maehle DFG DFG - - SPP SPP - - 1148 Final Colloquium 1148 Final Colloquium Karlsruhe, September 24 Karlsruhe, September 24 th th 2009 2009

Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

I I I I T T

DynaCOREDynaCOREDynamically Reconfigurable Dynamically Reconfigurable CoprocessorCoprocessor

for Network Processorsfor Network Processors

CarstenCarsten Albrecht, Albrecht, Roman KochRoman Koch , , ChristophChristoph OsterlohOsterloh ,,ThiloThilo PionteckPionteck , Erik , Erik MaehleMaehle

InstitutInstitut ffüürr TechnischeTechnische InformatikInformatik

UniversitUniversitäät zu Lt zu Lüübeckbeck

Head: Prof. Dr.Head: Prof. Dr.--IngIng. Erik . Erik MaehleMaehle

DFGDFG--SPPSPP--1148 Final Colloquium1148 Final Colloquium

Karlsruhe, September 24Karlsruhe, September 24thth 20092009

Page 2: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering2

OverviewOverview

Introduction

System Architecture� Key Components� Internal Interconnect

Runtime-Adaptive Network-on-Chip� Architecture� Buffer Sizes

Fault Tolerance� Fault Scenarios� Stepwise Procedure

Modelling DynaCORE� Principles� DynaCore Model� Simulation

Runtime Reconfiguration� Point of Reconfiguration� Technical Aspects

Evaluation and Demonstrator

Publications

Summary

Page 3: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering3

IntroductionIntroduction (1/2)(1/2)

In-transit packet processing in edge routers

Header processing� Routing� Quality-of-Service� Accounting

Situation

Payload processing� Encryption/decryption� Compression� Intrusion Detection

Processing tasks

Page 4: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering4

IntroductionIntroduction (2/2)(2/2)

DynaCORE = Dynamically adaptable COprocessor based on Reconfiguration

Reconfigurable hardware accelerator for payload processing

Allows flexible adaptation to changes in network traffic profile→ Dynamic partial reconfiguration of FPGA

Combination ofNetwork processor(e. g. FlexPath NP)

→header processing

+ DynaCORE(in Xilinx Virtex-4 FX)

→payload processing

Loose coupling� Gigabit Ethernet� Suitable for various network processors

DynaCORE Approach

Page 5: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering5

System System ArchitectureArchitecture (1/3)(1/3)

InterfaceInterface

Type HType SType 0ApplicationspecificHardware Assist1

Hardware Assist2Hardware Assist3

HardwareAssist4Transmit-UnitReceive-UnitDispatcherReconfiguration Manager(HW + SW)

External memoryICAPReconfigurationLogic

Static partition Dynamic partition

Type V

Overview

Page 6: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering6

System System ArchitectureArchitecture (2/3)(2/3)

Transmit Unit� Send processed packets back to NP

Receive Unit/Dispatcher� Recognise requested type of processing� Assign packets to suitable hardware assists� Report to reconfiguration manager in case of unassignable packets

Reconfiguration Manager� Implemented as software running on embedded PowerPC� Collect utilisation information from hardware assists,decide when and how to reconfigure� Control actual process of reconfiguration,i.e. send configuration data to reconfiguration logic

Reconfiguration Control Logic� Write configuration data to FPGA-internal configuration access port (ICAP)

Software-based Hardware Assist� Backup processing unit� Utilises additional hard-wired PowerPC cores (UltraController II)

Components in the Static PartitionI/O Interface

Page 7: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering7

System System ArchitectureArchitecture (3/3)(3/3)

Hardware Assists� Actual payload processing modules� Equipped with universal, algorithm-independent interface� Embedded off-the-shelf IP cores

Switches� Forward packets from static partition to HAs and back

Components in the Dynamic Partition

Page 8: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering8

RuntimeRuntime --AdaptiveAdaptive NetworkNetwork --onon --ChipChip (1/2)(1/2)

� NoC architecture for runtime reconfigurable FPGAs� Virtual cut-through switches with for equal full-duplex links (16 bit)� Low hardware overhead compared to other NoCsSwitches not needed for a certain setting of processing units can be removed from the network → low latency� Support for QoS� Physical and logical addresses

• Physical addresses: refer to specific switches at specific locations within the NoC topology

• Logical addresses: refer to processing entities inside hardware modules

CoNoChiCoNoChi = Confígurable Network on Chip

log addInterfacephy addHardwareAssistphysical addressphysical address

logical addresslogical address

Page 9: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering9

RuntimeRuntime --AdaptiveAdaptive NetworkNetwork --onon --ChipChip (2/2)(2/2)

InterfaceHA 6 InterfaceHA 5Topology Adaptation� Network topology can be

adapted at runtime� Coarse-grained tileMerging/separation ofneighbouring tiles

→ Provides space for modules of varying complexity

Page 10: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering10

Fault Fault ToleranceTolerance (1/3)(1/3)

Fault scenarios:� User data• Non-permanent fault• Huge hardware effort to detect and correct • Tolerated by application area � Processing units and infrastructure• Device degradation

Fault in hardware structure• Single-Event Functional Interrupts (SEFIs)

Bitflip in configuration data

Approach: Combination of� Configuration readback• Slow (33 ms for one tile)• Does not detect hardware faults� Test packets

Do not cover all faults� Alive messagesMissing alive message indicates problem

Permanent faults

→→→→ need to be corrected

DynaCORE

Page 11: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering11

Fault Fault ToleranceTolerance (2/3)(2/3)

Fault detectionFault detection� Alive messages� Test packets� Periodic configuration readback

Fault localization and correctionFault localization and correction� Stepwise procedure using test packets� Test against different assumptions� SEU in control registers → tile reset� SEFI → rewritting reconfiguration� Permanent hardware fault → reorganization

DynaCORE

Page 12: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering12

Fault Fault ToleranceTolerance (3/3)(3/3)

Example: no alive message from switch 1

1. Identification of faulty segment� Identify path under testKnown by the reconfiguration manager� Send test packets to all switch along the path under test� If a test packet does not return correctly, faulty segment has been identified

DynaCORE

Page 13: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering13

Fault Fault ToleranceTolerance (3/3)(3/3)

Example: no alive message from switch 1

2. Assumption: SEU in control registers of switches or routing tables� Reset switches in affected section� Send new routing tables� Repeat test

DynaCORE

Page 14: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering14

Fault Fault ToleranceTolerance (3/3)(3/3)

Example: no alive message from switch 1

3. Assumption: SEFI� Readback configuration data for each tile and compare with reference� In case of mismatch, reconfigure tileIf tile contains a switch, send new routing tables� Repeat test

permanent hardware error→ reorganize system

Procedure takes time, does not cover all fault scenarios, yet is hardware efficient

DynaCORE

Page 15: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering15

ModellingModelling DynaCOREDynaCORE (1/4)(1/4)

Dynamically Structured Discrete Event-Based System Network (DSDEVN)� Extends discrete-event based system (DEVS) formalism� States of controller χ can again be models� „Simple“ DEVS simulator sufficient for simulation of DSDEVN

DynaCORE Model:

DSDEVN∆= < X∆, Y∆, χ, Mχ >� ∆ identifies DynaCORE� X∆, valid inputs of the system, and Y∆, outputs of the system:messages received from and send to the NP� χ: DynaCORE-specific controller� Mχ :model description of the controller (as DEVS)

Abstract DynaCORE Model

Page 16: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering16

ModellingModelling DynaCOREDynaCORE (2/4)(2/4)

Controller Description as DEVS:

Mχ = < Xχ, Sχ, Yχ, δintχ,, δext

χ, λχ, τχ >� Xχ: Set of valid controller input� Sχ: Controller state space� Yχ: Set of valid controller output� δintχ: State transition function for internal events – including „timeouts“� δextχ: State transition function for external events� λχ: Output function� τχ: Timeout function (assigns a timeout value to states from Sχ)

Controller States� Include information on system configuration, i.e. configured HAs� Contain, in turn, models of system components active in respective state

Abstract DynaCORE Model

Page 17: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering17

0

200

400

600

800

1000

1200

1400

0.0005 0.0165 0.0325 0.0485 0.0645

Ban

dwid

th [M

bit/s

]

Rec

onfig

urat

ion

input data rate output data rate reconfiguration

ModellingModelling DynaCOREDynaCORE (3/4)(3/4)

Structure of SystemCSimulation Model

Simulation Stimulus and Output

� Input burst

• Aggregated traffic composed of fourb-modeled packet streams� No packet loss (sufficient buffer sizes)

Simulation

Page 18: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering18

ModellingModelling DynaCOREDynaCORE (4/4)(4/4)

Influence of Buffer Sizes

4

16

64

2

8

32

1288,0

9,0

10,0

11,0

12,0

13,0

Latency [ms]

Buffer Switch[#Pkt] Buffer NoC-

Interface [#Pkt]

0,00

0,20

0,40

0,60

0,80

1,00

1,20

4 8 16 32 64 128

Buffer size [#packets]

Rat

io

0,00

2,00

4,00

6,00

8,00

10,00

12,00

Tim

e [m

s]

Data rate Packet loss Latency� Low impact of buffer sizes between NoC and HA� Large switch buffers:• Only little advantage for latency• Increased packet loss in case of reconfiguration

Simulation

Page 19: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering19

RuntimeRuntime ReconfigurationReconfiguration (1/3)(1/3)

Configuration State Space� Three modules� Three types of HA� Possible transitions betweenconfigurations� Transition costs(number of HAs to bereconfigured) { A B C } { A B B }{ A C C }

{ B C C } { B B B }{ B B C }

{ A A A } { A A B }{ A A C }{ C C C }

21 1

11

111

111

1 11 1 1 222 212212 2 1 3 22322 3 323323 3 2 3

Determining the Point of Reconfiguration

Page 20: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering20

RuntimeRuntime ReconfigurationReconfiguration (2/3)(2/3)

Reduced Configuration State Space� Transition cost limited

A B C A B BA C CC B C B B BB B C

A A A A B AA A CC C C

1 9222 3 10413516619

725 8 1112 281417 15 2729 1821 302023 24 26

Reconfiguration Trigger� Configurable per-HA utilisationthreshold exceeded multipletimes in sequence

ZeitSchwellwert TSχuSχv Sχu Sχv SχuSχu Sχv Sχu

Monitor-datum SχuSχvDetermining the Point of Reconfiguration

Page 21: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering21

RuntimeRuntime ReconfigurationReconfiguration (3/3)(3/3)

Merging and Separating Tiles� Changes number and shapes ofpartially reconfigurable regions� Different sets of bus macros

Technical AspectsScenario 1

removed bus macro

Scenario 2

Static elements

in original design as part of hard macro

Bus macros

Reconfiguration Speed� Achievable maximum speed dependent on• Memory bandwidth• Allowable clock-ratios between system components� Fraction of theoretically possible speed

Page 22: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering22

Evaluation/Evaluation/ DemonstratorDemonstrator (1/2)(1/2)Demonstrator Structure

Page 23: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering23

Evaluation/Evaluation/ DemonstratorDemonstrator (2/2)(2/2)

FlexPath NP� NP with reconfigurable data-path� Virtex-4 FX 60

DynaCORE� reconfigurable processing modules (HAs)� Virtex-4 FX 60

stimulusstimulus

analysis,analysis,visualisationvisualisation

FlexPath and DynaCORE Demonstrator

Page 24: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering24

PublicationsPublications[PKA09] Pionteck, T.: Koch, R.; Albrecht, C.; Maehl e, E.: A Design Technique for Adapting Number and Boundaries of Reconfigurable Modules at Runtime. Inter national Journal of Reconfigurable Computing, vol. 2009, Article ID 942930,, Hindawi Publishing Corporati on , New York 2009

[PAK08a] Pionteck, T.; Albrecht, C.; Koch, R,; Maeh le, E,: Adaptive Communication Architectures for RuntimeReconfigurable System-on-Chips. Parallel Processing Letters, 2008

[AFK09] Albrecht, C.; Foag, J.; Koch, R.; Maehle, E .; Pionteck, T.: DynaCORE – Dynamically Reconfigurable Coprocessor for NetworkProcessors. To Appear: Dynamically Reconfigurable Sys tems Architectures: Design Methods and Applications, Springer, 2009

[AKP09] Albrecht, C.; Koch, R.; Pionteck, T.; Glöse kötter, P.: Towards a Flexible Fault-Tolerant System- on-Chip. 22th International Conference on Architecture of Computing Systems - Works hop Proceedings, 83-90, VDE Verlag GmbH, Berlin 200 9

[KAP09] Koch, R.; Albrecht, C.; Pionteck, T.: Adapt ive Health Monitoring in a Reconfigurable Network-on- Chip. Workshop on Diagnostic Services in Network-on-Chips (DSNOC), Nice 2009

[AOP08] Albrecht, C.; Osterloh, Ch.; Pionteck, T.; Koch, R.; Maehle, E.: An Application-Oriented Synthe tic Network Traffic Generator. European Conference on Modelling and Simulation 2008, 299-305, ECMS, Nicosia, Cyprus 2008

[ARK08] Albrecht, C.; Roß, P.; Koch, R. ; Pionteck, T. ; Maehle, E.: Performance Analysis of Bus-Based Interconnects for a Run-TimeReconfigurable Co-Processor Platform. PDP 08, 200-205, IEEE Computer Society, Toulouse, France 2008

[AWP08] Albrecht, C.; Werner, M.; Pionteck, T.; Fuc hsen, R.; Koch, R.; Maehle, E.: WCET Determination Tool for Embedded Systems Software. SIMUTools08 Proceedings, 1, ICST, Marseille, Fran ce 2008

[PAK08] Pionteck, T.; Albrecht, C.; Koch, R.; Brix, T.; Maehle, E.: Design and Simulation of Runtime Reconfigurable Systems. IEEE Workshop on Design and Diagnostics of Electronic Cir cuits and Systems (DDECS 2008 ), 2008

[PAK08b] Pionteck, T.; Albrecht, C.; Koch, R.; Maeh le, E.: Performance and Reliability Monitoring in Network-on-Chips. To Appear: Workshop on Diagnostic Services in Network-on-Chips ( DSNOC), 2008

[PAK08c] Pionteck, T.; Albrecht, C.; Koch, R.; Maeh le, E.: On the Design Parameters of Runtime Reconfigurable Systems. Accepted for: International Conference on Field Programmable Logic and Applications (FPL 2008), Heidelberg, Ger many 2008[AKP07] Albrecht, C.; Koch, R.; Pionteck, T.; Maehl e, E.: Simulation System for Run-Time Reconfigurable Networks-on-Chip. Proceedings of the 6th EUROSIM Congress on Modelling and Simulation, ARGESIM - ARGE Simulation News, Wiedner Hauptstrasse 8-10, 1040 Vie nna 2007

[APK07]Albrecht, C.; Pionteck, T.; Koch, R.; Maehle , E.: Modelling Tile-Based Run-Time Reconfigurable Systems Using SystemC. European Conference on Modellingand Simulation 2007, Prague, Czech Republic 2007

• • •

Page 25: Kopie von Luebeck - FAU...NoC architecture for runtime reconfigurable FPGAs Virtual cut-through switches with for equal full-duplex links (16 bit) Low hardware overhead compared to

T T T I I I IIIUniversität zu LübeckUniversity of Lübeck

Institut für Technische InformatikInstitute of Computer Engineering25

SummarySummary

DynaCORE-specific aspects:� Interconnect performance analysis • Bus versus NoC• based on a formally derived simulation model � Synthetic traffic generator� Performance enhancement compared to software based systems� Proof of concept by means of demonstrator• In cooperation with FlexPath / TU Munich

Universal aspects� SystemC simulation methodology for runtime reconfigurable systems• SystemC kernel needs not to be adapted� Reconfiguration Management• Determining point of reconfiguration� NoC for runtime adaptable systems� Tile-based design methodology for runtime reconfigurable designs• Merging/separating reconfigurable regions