Optimisation conjointe de codes LDPC et de leurs ...€¦ · Viterbi Tanner 1969 1981 1993 V i t e...

Preview:

Citation preview

research & development

Jean-Baptiste Doré

Thèse présentée devant l’INSA de Rennes en vue de l’obtention du doctorat d’Électronique

26 Octobre 2007 à 10H00 – Amphithéâtre de FT R&D Cesson-Sévigné

Optimisation conjointe de codes LDPC et de leurs architectures de décodage et mise en œuvre sur FPGA

2

Foreword

France Telecom R&D UnitsBroadband Wireless Access

• Innovative Radio Interface (RESA/BWA/IRI)• Broadcasting network Cooporation and radio access Mobility (RESA/BWA/BCM)

SupervisorsMarie-Hélène Hamon – R&D engineer at France Telecom R&DPénard Pierre – R&D engineer at France Telecom R&D Ramesh Pyndiah – ENST Bretagne

ContextsPRICE - Internal project

• Prospective Research for Infrastructure and Communication Enhancement VERITY - Internal project

• Validation and Evaluation of Research studies in digITal sYstems

3

Outline

Introduction

Structured LDPC codes

Decoding architectures for LDPC decoders

FPGA implementation of LDPC coder/decoder

Conclusion

4

Introduction

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

■ Context■ LDPC codes■ Decoding LDPC codes■ Encoding LDPC codes■ Codes construction

5

Motivations

Digital communicationsHigh data rates

• Base assumption is now Hundreds of Mbit/s• 1-10 Gbit/s in the future?

Low complexity implementation • Small component size• Low power consumption• Low cost

Physical layerForward Error Correction Scheme

• Close to the theoretical limit

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction

6

Background history

1948

Shannon

SNR

BER

Theo

retic

al li

mit

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

time

▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction

7

Background history

1948

1955

Shannon

Elias

1962

Gallager

Convo

lution

alco

des

LDPC

code

s

Viterbi Tanner

1969 1981

Viter

bialg

orith

m

Tann

er co

des

> 3dB

SNRB

ERTh

eore

tical

lim

itSNR

BER

Theo

retic

al li

mit

time

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction

8

Background history

1948

1955

Shannon

Elias

1962

Gallager

Convo

lution

alco

des

LDPC

code

s

Viterbi Tanner

1969 1981 1993

Viter

bialg

orith

m

Tann

er co

des

Berrou & Glavieux

> 3dB

SNRB

ERTh

eore

tical

lim

itSNR

BER

Theo

retic

al li

mit

Bre

akth

roug

h

Turbo-CodesIterative principle

time

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction

9

Background history

1948

1955

Shannon

Elias

1962

Gallager

Convo

lution

alco

des

LDPC

code

s

Viterbi Tanner

1969 1981 1993

Viter

bialg

orith

m

Tann

er co

des

Berrou & Glavieux

MacKay

1995

> 3dB

SNRB

ERTh

eore

tical

lim

itSNR

BER

Theo

retic

al li

mit

Bre

akth

roug

h

Re-dis

cove

ry of

LDPC

code

s

1998 2000

Irreg

ular L

DPC co

des

Duo bi

nary

Turb

o-Cod

es

SNR

BER

Theo

retic

al li

mit

Error floor

Threshold effect

Turbo-CodesIterative principle

time

<0.01 dB

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ ContextLDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction

10

Definitions

LDPC codesLow Density Parity Check codesParity check constraints, M parity equations and N bits

ModelingMatrix representation

Graphical definition• Bipartite graph (Tanner graph)

Variable node (VN)

Hidden variable node (HVN)

Check node (CN)

Context

▪ LDPC codesDecoding LDPC codesEncoding LDPC codesCodes construction

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

11

Decoding algorithm

Belief Propagation (BP) AlgorithmGraph based algorithmComputation of messages which are propagated along the edges

• Exchange of extrinsic information

Optimal decoding• No cycle into the code graph

ContextLDPC codes

▪ Decoding LDPC codesEncoding LDPC codesCodes construction

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

12

Problematic of encoding

Encoding LDPC codesUnconstraint parity check matrices

• Encoding through the generator matrices G

ContextLDPC codesDecoding LDPC codes

▪ Encoding LDPC codesCodes construction

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

13

Problematic of encoding

Encoding LDPC codesUnconstraint parity check matrices

• Encoding through the generator matrices G

Constraint parity check matrices• Quasi-cyclic codes

• Upper/Lower triangular matrices– Unconstraint– Strictly or not dual-diagonal structure

Simple encoding in a linear time

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

ContextLDPC codesDecoding LDPC codes

▪ Encoding LDPC codesCodes construction

14

Design of LDPC codes

Objective:Define the position of all the non null elements into the parity check matrix

• Degree distribution optimization (EXIT Chart, Density Evolution)

ContextLDPC codesDecoding LDPC codesEncoding LDPC codes

▪ Codes construction

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ICV

I VC

check nodes

variable nodes

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

15

Design of LDPC codes

Objective:Define the position of all the non null elements into the parity check matrix

• Degree distribution optimization (EXIT Chart, Density Evolution)

Unconstraint constructionPseudo random constructionProgressive Edge-Growth (PEG) algorithm [Hu01]

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ICV

I VC

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

ContextLDPC codesDecoding LDPC codesEncoding LDPC codes

▪ Codes construction

16

Design of LDPC codes

Objective:Define the position of all the non null elements into the parity check matrix

• Degree distribution optimization (EXIT Chart, Density Evolution)

Unconstraint constructionPseudo random constructionProgressive Edge-Growth (PEG) algorithm

Structured LDPC codesDual-diagonal structure (RA and IRA codes)Protograph based codes

• Quasi-Cyclic codes

Etc..

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ICV

I VC

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

ContextLDPC codesDecoding LDPC codesEncoding LDPC codes

▪ Codes construction

17

Problematic

How to define an efficient coding system using LDPC codes?

Structured LDPC codes family

Study the link between architectures and codes design

Optimize jointly codes and architectures

A joint definition of the codes and the encoding/decodingmethods is highly recommended

● IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

ContextLDPC codesDecoding LDPC codesEncoding LDPC codes

▪ Codes construction

18

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

■ Structured LDPC codes design■ Codes analysis■ Decoding structured LDPC codes■ Conclusions

Structured LDPC codes

19

Motivations

Constraints on family of LDPC codes

Good codes have strictly concentrated CN degree distribution [Chung01]

Richardson et al. design rules about degree 2 variables nodes [Richardson01,03]

Simple characterization• Protograph based codes

dual-diagonal structure for H

Parity check matrices designed from permutation matrices

▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

20

Some definitions

Definition of the code consideredThe parity check matrix H (M x N) is divided into two sub matrices Hs(M x K) and Hp(M x M)

Hp is defined to be a dual-diagonal matrix• Stability condition

• No short cycles involving only degree 2 variable nodes

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions

21

Characterization

Matrix Hs of size MxK is constructed with both:Circularly shifted identity matrices of size z x z

• Notation: Iδ , δ≥0, is a right shifted identity matrix by δ positions (modulo z)

Null matrices of size z x z• Notation: Iδ , δ<0, is a null matrix

Hs can be defined by a (m x k) block matrix• Simple characterization

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions

22

Characterization

Matrix Hp of size MxM is a dual-diagonal matrixAvoid low weight codeword requires a new definition of Hp

Quasi-Cyclic Irregular Repeat Accumulate Codes (QC IRA) [Tanner99]

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

▪ Structured LDPC codes designCodes analysisDecoding Structured LDPC codesConclusions

23

Framework

Distances propertiesWhich kind of configurations are critical for performance?

Cycles propertiesHow to detect cycles into the code graph?What is the role of short cycles on decoder behavior?

Definition of a design algorithm for the family of codes studied

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes design

▪ Codes analysisDecoding Structured LDPC codesConclusions

24

Distances properties

Main resultsBased on Return To Zero properties of the dual-diagonal part of H

• Accumulator code

Bound on minimal distance• Influence the choice of parameter m and the smallest variable node degree q

Rules on permutation coefficients• Weight-Spectrum of the codes can be constrained• Avoid the generation of low weight codeword from low weight information word

Equi-repartition of permutation coefficients on [0,z-1] into a column of Hsbut not strictly…

Structured LDPC codes design

▪ Codes analysisDecoding Structured LDPC codesConclusions

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

25

Cycles detection

Detection of cycle and enumeration of the distributionGeometrical approach

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes design

▪ Codes analysisDecoding Structured LDPC codesConclusions

26

Algorithm for code design

Problematic: Find the unknown coefficient which

Maximizes the cycle length Guarantees a minimal cycle length (Target Cycle Length-TCL)

Application to the description of a design algorithm Incremental construction of the codePEG like algorithm

• Based on protograph representation of the code

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes design

▪ Codes analysisDecoding Structured LDPC codesConclusions

27

Additional constraints

Improve design algorithmTarget Cycle Length (TCL) depends on variable node degree

• ACE (Approximate Cycle Extrinsic message) [Tian03]

Avoid low weight codeword and pseudo-codeword (Trapping-set)• Better minimal distance• Better behavior of the BP decoder

Num

bero

f erro

rPseudo codeword

Decoding failure

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes design

▪ Codes analysisDecoding Structured LDPC codesConclusions

28

State of art

LDPC codes decoding algorithmNo a priori information on the code structure

• BP with flooding scheduling

When the structure of the code is known• Explore other decoding strategies

Example: Codes defined by a protographLayered BP decoding

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

29

Main idea

What’s about the dual-diagonal structure properties?“Isolate trellis-like sub graphs and locally applying the MAP algorithm is a good scheduling” [Forney01]

Modeling of the considered codesConsider the decoder as the dual of the encoder

Association with layer decoding concept

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

30

Illustration of the sequencing

Layered 1

Layered 2

Layered 3

Turbo Layered BP: Scheduling

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

31

Illustration of the sequencing

Layered 1

Layered 2

Layered 3

Turbo Layered BP: Scheduling

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

32

Illustration of the sequencing

Layered 1

Layered 2

Layered 3

Turbo Layered BP: Scheduling

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

33

Illustration of the sequencing

Layered 1

Layered 2

Layered 3

Turbo Layered BP: Scheduling

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

34

Simulation results

Comparison LBP/TLBP

Rules on permutation coefficients to reach the best possible convergence

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysis

▪ Decoding Structured LDPC codesConclusions

35

Synthesis

Definition of structured LDPC codesGood performanceSimple encoding (linear time)Simple characterization

Analysis of codes propertiesDistance propertiesCycles properties

Studies on the decoding of structured LDPC codesA priori information on code structure is exploited

QC IRA codes

Codes design algorithm

Turbo Layered BP

Structured LDPC codes designCodes analysisDecoding Structured LDPC codes

▪ Conclusions

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

36

Synthesis

Definition of structured LDPC codesGood performanceSimple encoding (linear time)Simple characterization

Analysis of code propertiesDistance propertiesCycles properties

Studies on the decoding of structured LDPC codesA priori information on code structure is exploited

QC IRA codes

Codes design algorithm

Turbo Layered BP

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysisDecoding Structured LDPC codes

▪ Conclusions

37

Synthesis

Definition of structured LDPC codesGood performanceSimple encoding (linear time)Simple characterization

Analysis of code propertiesDistance propertiesCycles properties

Studies on the decoding of structured LDPC codesA priori information on code structure is exploited at the decoder side

QC IRA codes

Codes design algorithm

Turbo Layered BP

Introduction

● Structured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoderConclusion

Structured LDPC codes designCodes analysisDecoding Structured LDPC codes

▪ Conclusions

38

Decoding architectures for LDPC decoders

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

■ Conception flow■ Architectures for LBP decoding algorithm■ Architectures for TLBP decoding algorithm■ Conclusions

39

Framework

Methodology“Cross stage” design flow

Codes design

Decoding AlgorithmLayered BP

Turbo Layered BP

▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

40

Codes design

Decoding AlgorithmLayered BP

Turbo Layered BP

Architectures

Framework

Methodology“Cross stage” design flow

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

41

Framework

Methodology“Cross stage” design flow

Codes design

Decoding AlgorithmLayered BP

Turbo Layered BP

ArchitecturesOptimization of complexityOptimization of processors activityOptimization of data rate

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

42

Framework

Methodology“Cross stage” design flow

Codes design

Decoding AlgorithmLayered BP

Turbo Layered BP

Architectures

A joint design of both code and decoder architecture is highly recommended for the design of an efficient system

Hardware integration

Optimization of complexityOptimization of processors activityOptimization of data rate

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

▪ Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

43

Efficiency of the decoder

Problematic: Maximize the activity of processors?Layered BP with serial architecture for CNP

z is the size of a shifted identity matrixnp is the number of processors working in parallel

z

np parallel processors

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flow

▪ Architectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

44

Efficiency of the decoder

Problematic: Maximize the activity of processors?

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flow

▪ Architectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

45

Configurations studied

np = max zAlready studied in the literature

• WiMAX LDPC codes R=1/2 and 2/3

But• Very complex for large z• Not very efficient when z is not constant

np < max zMotivations

• Optimize the activity of processor• Target a complexity

Goals• Look for design rules on permutation coefficients in order to keep the Layered BP

properties

z=np processors

z > np processors

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flow

▪ Architectures for LBP decoding algorithmArchitectures for TLBP decoding algorithmConclusions

46

Turbo Layered BP: Summary of the work

The decoding of a window can start if all the most up-to-date extrinsic information are available

Various sequencing have been studiedConstraints on the code design

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

47

Turbo Layered BP: Summary of the work

Various sequencing have been studiedSerial scheduling (pipelined or not)

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

48

Turbo Layered BP: Summary of the work

Various sequencing have been studiedSerial scheduling (pipelined or not)Parallel scheduling (pipelined or not)

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

49

Various sequencing have been studiedSerial scheduling (pipelined or not)Parallel scheduling (pipelined or not)

Definition of an efficient multi-rate decoder

Turbo Layered BP: Summary of the work

Coding rate

Check nodedegree

Good codes

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

50

J

J

Genericity problematic

Exploit the structure of the parity check matrixProperties of the dual-diagonal structure

Throughput

ComplexitySerial architecture-Low data rate-Very flexible scheme

Parallel architecture-Very high throughput-Bad flexibility (coding rate)

Find a good tradeoff

J

J

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

51

Proposed solution

Parallel architecture for SPC processorsJ0 messages are processed in parallel

J0 ones per rows of Hs

J0=2

J0

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

52

Proposed solution

Parallel architecture for SPC processorsJ0 messages are processed in parallel

J0 ones per rows of Hs

J0=2

J0

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

53

Proposed solution

Parallel architecture for SPC processorsJ0 messages are processed in parallel

J0 ones per rows of Hs

J0=2

J0

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

54

Proposed solution

Parallel architecture for SPC processorsJ0 messages are processed in parallel

Extension of the dual-diagonal part Window decoding of the trellis (serial oriented)

• Memory size proportional to the size of the window

Efficient method for TLBP algorithmVery flexible scheme if J0 is well designed

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithm

▪ Architectures for TLBP decoding algorithmConclusions

55

Synthesis

"Architecture driven" approachJoint design of code and decoder architectures

Architecture for Layered BP decoding algorithmModeling of the CNP processorStudy the case of np < max zSome open issues

• flexible permutation network (barrel shifter)

Architecture for Turbo Layered BP decoding algorithmVarious sequencing have been describedProblematic of multi rate decoder

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithm

▪ Conclusions

56

Synthesis

"Architecture driven" approachJoint design of code and decoder architectures

Architectures for Layered BP decoding algorithmModeling of the CNP processorsStudy the case of np < max zSome open issues

• Flexible permutation network (barrel shifter)

Architecture for Turbo Layered BP decoding algorithmVarious sequencing have been describedProblematic of multi rate decoder

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithm

▪ Conclusions

57

Synthesis

"Architecture driven" approachJoint design of code and decoder architectures

Architectures for Layered BP decoding algorithmModeling of the CNP processorsStudy the case of np < max zSome open issues

• Flexible permutation network (barrel shifter)

Architectures for Turbo Layered BP decoding algorithmVarious sequencing have been describedProblematic of multi-rate decoder

IntroductionStructured LDPC codes

● Decoding architectures for LDPC decodersFPGA implementation of LDPC coder/decoderConclusion

Conception flowArchitectures for LBP decoding algorithmArchitectures for TLBP decoding algorithm

▪ Conclusions

58

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

FPGA implementation of LDPC coder/decoder

■ Implementation options■ Quantization■ Complexity considerations■ Simulation results■ Conclusion

59

Options

Turbo Layered BP algorithmWith and without pipeline

2 decoding processors• p = 2• Duplication of the buffers

Double input buffers• Optimize the processor activity• Optimize the decoding throughput

Memory banks organization• Avoid simultaneous access• Exploit code structure

▪ Implementation optionsQuantizationComplexity considerationsSimulation resultsConclusions

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

60

Options

2 decoding processors• p = 2• Duplication of the buffers

Double input memories• Optimize the processor activity• Optimize the decoding throughput

Memory banks organization• Avoid simultaneous access• Exploit code structure

Turbo Layered BP algorithmWith and without pipeline

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

▪ Implementation optionsQuantizationComplexity considerationsSimulation resultsConclusions

61

Problematic

Continuous to discrete domainInfluence the performance

• Lower granularity• Introduction of erasures

Influence the complexity of the decoder• Size of the memories• Size of internal data path• Complexity of the basic operators (+, - , < …)

What is the good trade off between performance and complexity?

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation options

▪ QuantizationComplexity considerationsSimulation resultsConclusions

62

Problematic

Continuous to discrete domainInfluence the performance

• Lower granularity• Introduction of erasures

Influence the complexity of the decoder• Size of the memories• Size of internal data path• Complexity of the basic operators (+, - , < …)

What is the good trade off between performance and complexity?

Input data• Quantization on 4 bits is a good trade off

Internal data path• Various methods have been studied

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation options

▪ QuantizationComplexity considerationsSimulation resultsConclusions

63

FPGA integration

FIFO yp18%

Memory Banks30%

Decoding processors

52%

Control4%

FIFO yp3%

Memory Banks43%

Decoding processors

50%

Memory

Logic cells

ALTERA STRATIX EP1S80 - C6

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantization

▪ Complexity considerationsSimulation resultsConclusions

64

FPGA integration

FIFO yp18%

Memory Banks30%

Decoding processors

52%

Memory

ALTERA STRATIX EP1S80 - C6

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantization

▪ Complexity considerationsSimulation resultsConclusions

65

FPGA integration

FIFO yp18%

Memory Banks30%

Decoding processors

52%

Control4%

FIFO yp3%

Memory Banks43%

Decoding processors

50%

Memory

Logic cells

ALTERA STRATIX EP1S80 - C6

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantization

▪ Complexity considerationsSimulation resultsConclusions

66

FPGA integration

FIFO yp18%

Memory Banks30%

Decoding processors

52%

Control4%

FIFO yp3%

Memory Banks43%

Decoding processors

50%

Memory

Logic cells

Drawback of a double input buffer

ALTERA STRATIX EP1S80 - C6

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantization

▪ Complexity considerationsSimulation resultsConclusions

67

Simulation context

FPGA Hardware simulation chainALTERA Stratix EPS80-C6

Source QPSK Mapping

LDPC Encoder

AWGN Channel

BER Analyzer

QPSKDe mapping

Multi-Rate LDPC

Decoder

4

2x16

Source• PRBS 20

AWGN Channel• Box Muller algorithm

LDPC decoder/coder• 4 loaded codes

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

68

Simulation context

FPGA Hardware simulation chainALTERA Stratix EPS80-C6

Source QPSK Mapping

LDPC Encoder

AWGN Channel

BER Analyzer

QPSKDe mapping

Multi-Rate LDPC

Decoder

4

2x16

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

69

Simulation context

FPGA Hardware simulation chainALTERA Stratix EPS80-C6

Source QPSK Mapping

LDPC Encoder

AWGN Channel

BER Analyzer

QPSKDe mapping

Multi-Rate LDPC

Decoder

4

2x16

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

70

Simulation context

FPGA Hardware simulation chainALTERA Stratix EPS80-C6

■ QNX real time OS● Automatic measures● Real time performance curves

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

71

Applications

Broadcast contextLarge block size (≈16 kbits)

ParametersAWGN, QPSKQc = 4 bits (+/-7)15 it TLBP

No early error floor

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

FER (dot line)

BER (continuous line)

72

Applications

Broadcast contextLarge block size (≈16 kbits)

ParametersAWGN, QPSKQc = 4 bits (+/-7)15 it TLBP

No early error floor

Validation of both• Code design algorithm• Quantization strategy

Quasi Error Free

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

FER (dot line)

BER (continuous line)

73

Duo binary Turbo-Codes vs LDPC

ObjectivesTry to do a fair comparison

• 8 states duo binary Turbo-codes• LDPC codes studied

Difficulties of comparisonsUsually context are different

• Coding size, coding rate, coding structure, Target performance

Implementation choices• Architectures, Quantizations• FPGA (Altera or Virtex), mm² on x µm for ASIC

Proposed schemeSimilar context (Valentinno)

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

74

Duo binary Turbo-Codes vs LDPC

0.2- 0.4 dB

PerformanceDBTC

• 10 iterations • Max Log Map

LDPC• 20 iterations• TLBP

ContextAWGN, QPSKQc = 4 bits (+/-7)

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

75

Duo binary Turbo-Codes vs LDPC

0.2- 0.4 dB

PerformanceDBTC

• 10 iterations • Max Log Map

LDPC• 20 iterations• TLBP

ContextAWGN, QPSKQc = 4 bits (+/-7)

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

76

Implementation aspectsFPGA – ALTERA Stratix EP1S80F C6

Decoding throughput• At same data rates

x 1.33 x 1.34

Duo binary Turbo-Codes vs LDPC

Duo binary TC

LDPC studied

pipeline

Without pipeline

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

77

Implementation aspectsFPGA – ALTERA Stratix EP1S80F C6

Decoding throughput• At same data rates

Duo binary Turbo-Codes vs LDPC

x 1.33 x 1.34

Duo binary TC

LDPC studied

pipeline

Without pipeline

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

78

Duo binary Turbo-Codes vs LDPC

PerformanceDuo binary TC have a very good decoding threshold

• Even for small size

Proposed LDPC outperforms TC at low error rate

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

79

Duo binary Turbo-Codes vs LDPC

PerformanceDuo binary TC have a very good decoding threshold

• Even for small size• ….. For large coding size, the gap is reduced

Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)

ComplexityDTC outperforms LDPC codes studied…. however

• Maturity of the work and the architecture

A FEC technology should be considered into a global systemBe locally the best do not mean be the best globally

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

80

Duo binary Turbo-Codes vs LDPC

PerformanceDuo binary TC have a very good decoding threshold

• Even for small size• ….. For large coding size, the gap is reduced

Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)

ComplexityDBTC outperforms LDPC codes studied…. however

• Maturity of the work and the architecture

A FEC technology should be considered into a global systemBe locally the best do not mean be the best globally

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

81

Duo binary Turbo-Codes vs LDPC

PerformanceDuo binary TC have a very good decoding threshold

• Even for small size• ….. For large coding size, the gap is reduced

Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)

ComplexityDBTC outperforms LDPC codes studied…. however

• Maturity of the work and architectures

A FEC technology should be considered into a global systemBe locally the best do not mean be the best globally

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

82

Duo binary Turbo-Codes vs LDPC

PerformanceDuo binary TC have a very good decoding threshold

• Even for small size• ….. For large coding size, the gap is reduced

Proposed LDPC outperforms TC at low error rate• ….. It is possible to design DBTC with better behavior at low error rate (3D TC,16 states)

ComplexityDBTC outperforms LDPC codes studied…. however

• Maturity of the work and architectures

A FEC technology should be considered into a global system,according to the target application

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerations

▪ Simulation resultsConclusions

83

Synthesis

Integration of the decoders into a FPGADefinition of the computational unitsProof of concept: FPGA integration

Study of the quantization effectsInfluence of the quantization of channel observationsQuantization of internal data path

ApplicationsAnalysis of architectures proposed for different contextsComparison with duo binary Turbo-CodesLow error rate behavior: Track and analyze

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerationsSimulation results

▪ Conclusions

84

Synthesis

Integration of the decoders into a FPGADefinition of the computational unitsProof of concept: FPGA integration

Study of the quantization effectsInfluence of the quantization of channel observationsQuantization of internal data path

ApplicationsAnalysis of architectures proposed for different contextsComparison with duo binary Turbo-CodesLow error rate behavior: Track and analyze

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerationsSimulation results

▪ Conclusions

85

Synthesis

Integration of the decoders into a FPGADefinition of the computational unitsProof of concept: FPGA integration

Study of the quantization effectsInfluence of the quantization of channel observationsQuantization of internal data path

ApplicationsAnalysis of architectures proposed for different contextsComparison with duo binary Turbo-CodesLow error rate behavior: Track and analyze

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders

● FPGA implementation of LDPC coder/decoderConclusion

Implementation optionsQuantizationComplexity considerationsSimulation results

▪ Conclusions

86

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

Conclusions

■ General conclusion■ Future prospects■ Discussion

87

Analysis of QC IRA codesConstraints on permutation coefficients

Definition of a new algorithm for the design of codes

Joint studies on code design and decoding algorithm definition

Contributions

▪ General conclusionFuture prospectsDiscussion

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

88

Analysis of QC IRA codes

Decoding architectures for LDPC codesLayered BP algorithm

Turbo Layered BP algorithm

Contributions

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

▪ General conclusionFuture prospectsDiscussion

89

Analysis of QC IRA codes

Decoding architectures for LDPC codes

FPGA implementation of LDPC decodersStudy of quantization effects

Definition of computational units

Complexity analysis of the proposed architectures

Contributions

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

▪ General conclusionFuture prospectsDiscussion

90

Analysis of QC IRA codes

Decoding architectures for LDPC codes

FPGA implementation of LDPC decoders

Disseminations7 international conferences1 journal submission (under review)5 patents

Contributions

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

▪ General conclusionFuture prospectsDiscussion

91

Perspectives

Extension of the workIntegration of the hardware decoder into a realistic context

• Realistic channel• Integration into a whole communication system

Turbo Layered BP decoding• Application to other parity check matrix structures

– Modified dual-diagonal matrix (WiMax)

Theoretical aspectsAnalysis the behavior of the sequencing proposedHow to improve the convergence threshold of the codes?

• Practical aspect (Finite length)

Implementation issuesExplore the problematic of flexible ultra-parallelized LDPC decoder

• Very high throughput decoding

General conclusion

▪ Future prospectsDiscussion

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

92

Perspectives

Extension of the workIntegration of the hardware decoder into a realistic context

• Realistic channel• Integration into a whole communication system

Turbo Layered BP decoding• Application to other parity check matrix structures

– Modified dual-diagonal matrix (WiMax)

Theoretical aspectsAnalysis the behavior of the sequencing proposedHow to improve the convergence threshold of the codes?

• Practical aspect (Finite length)

Implementation issuesExplore the problematic of flexible ultra-parallelized LDPC decoder

• Very high throughput decoding

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

General conclusion

▪ Future prospectsDiscussion

93

Perspectives

Extension of the workIntegration of the hardware decoder into a realistic context

• Realistic channel• Integration into a whole communication system

Turbo Layered BP decoding• Application to other parity check matrix structures

– Modified dual-diagonal matrix (WiMax)

Theoretical aspectsAnalysis the behavior of the sequencing proposedHow to improve the convergence threshold of the codes?

• Practical aspect (Finite length)

Implementation issuesExplore the problematic of flexible ultra-parallelized LDPC decoder

• Very high throughput decoding

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

General conclusion

▪ Future prospectsDiscussion

94

Questions and answers

Optimisation conjointe de codes LDPC et de leursarchitectures de décodage et mise en œuvre sur FPGA

Jean-Baptiste Doré

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

General conclusion

▪ Future prospectsDiscussion

95

Bibliography

[Tanner99] - R. M. Tanner, “On quasi-cyclic repeat-accumulate codes”, in Proc. of. the 37th Allerton Conference, 1999.

[Forney01] - G. D. Forney, “Codes on graphs : Normal realizations”, IEEE Transactions on Information Theory, Feb 2001.

[Chung01] - S.-Y. Chung, T. Richardson, and R. Urbanke, “Analysis of sum-product decoding of low-density parity-check codes using a gaussian approximation”, IEEE Transactions on Information Theory, vol. 47, Feb 2001.

[Hu01] - X.-Y. Hu, E. Eleftheriou, and D.-M. Arnold, “Progressive edge-growth tanner graphs”, IEEE Global Telecommunications Conference, vol. 2, Nov 2001.

[Richardson01] - T. Richardson, A. Shokrollahi, and R. Urbanke, “Design of capacity approaching irregular low-density parity-check codes”, IEEE Transactions on Information Theory, vol. 47, Feb 2001.

[Tian03] - T. Tian, C. Jones, J. D. Villasenor, and R. D. Wesel, “Construction of irregular LDPC codes with low error floors”, IEEE International Conference on Communications, June 2003.

[Richardson03] - T. Richardson, “Error floors of LDPC codes”, 41st Annual AllertonConferenceon Communications, Control, and Computing, Oct 2003.

96

Annexes

General conclusionFuture prospectsDiscussion

▪ Annexes

Annexes

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

97

Outline

Simplification of BP Illustration of the design of codesPerformance example: effects of TCLBP/LBP/TLBPCNP modeling: case of Min-Sum approximation

IntroductionStructured LDPC codesDecoding architectures for LDPC decoders FPGA implementation of LDPC coder/decoder

● Conclusion

General conclusionFuture prospectsDiscussion

▪ Annexes

98

Simplification of BP algorithm

Practical implementation of BP algorithmApproximation of the function f(x)Limit the number of different edges messages

performance

complexity

99

Algorithm for codes design

Description of the algorithmTarget: > 6 length cyclez = 8Mask considered:

Incremental construction

100

Algorithm for codes design

101

Algorithm for codes design

Expansion of the protograph to detect the configurations

102

Algorithm for codes design

103

Algorithm for codes design

104

Algorithm for codes design

105

Simulation results

BERBLER

TCL>6 – 8 length-cycle

Algorithm parameterization: some resultsAvoid low length cycle involving low degree variable nodes

106

Simulation results

TCL depends on variable node degree

BERBLER

Algorithm parameterization: some resultsAvoid low length cycle involving low degree variable nodes

107

BP/LBP/TLBP

Comparison BP/LBP

Good convergence for LBP15-25 iterations

108

BP/LBP/TLBP

Comparison LBP/TLBP

Good convergence for TLBP10-20 iterations

109

About CNP processors: Modeling for Min-Sum algorithm (I)

Serial implementation of CNP Processorsdc cycles to compute mvc

dc cycles to compute Av

110

About CNP processors: Modeling for Min-Sum algorithm (II)

111

About CNP processors: Modeling for Min-Sum algorithm (II)

112

About CNP processors: Modeling for Min-Sum algorithm (II)

113

About CNP processors: Modeling for Min-Sum algorithm (II)

114

About CNP processors: Modeling for Min-Sum algorithm (II)

115

About CNP processors: Modeling for Min-Sum algorithm (III)

Min-Sum algorithmComparison of the input mvc

• Two smallest messages

It can be done during the forward step• The backward step is not required

but…• dc cycles are required for the computation of mvc

• At least dc cycles are required for the computation of the min (pipeline)• dc cycles are required to re-estimate Av

116

About CNP processors: Modeling for Min-Sum algorithm (IV)

Min-Sum algorithm

The proposed model is valid but parameter ε depends on the algorithm

117

Recommended