15
12/02/2004 France Telecom R&D Fast convergence project Nicolas DUBOIS FTR&D/DAC [email protected] Benoît FONDEVIOLE FTR&D/DAC [email protected] Nicolas MICHEL FTR&D/DAC [email protected]

Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

12/02/2004

France Telecom R&D

Fast convergence project

Nicolas DUBOIS FTR&D/DAC [email protected]ît FONDEVIOLE FTR&D/DAC [email protected]

Nicolas MICHEL FTR&D/DAC [email protected]

Page 2: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D2 - 12/02/2004France Telecom R&D

Agenda

What is the network context ?

How can we improve the scheduled maintenance ?

How can we improve the IS-IS convergence ?

Our roadmap

Page 3: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D3 - 12/02/2004France Telecom R&D

Network contextServices

Migration of a lot of new services on IP networks: Voice over IP, video over IPVPN with SLA (Service Level Agreement)

New applications are :Sensitive in term of performanceCritical for the business

EquipmentsMany of them with IP functions :

Switch (ATM, GE, etc.)BAS et NAS (mobility) …

A lot of different vendors within France Telecom networksRouters : Cisco, Juniper, Redback, Cosine, etc.

Networks4 major Ases (3215, 5511, 25186, Equant IGN)With various topologies and services

Page 4: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D4 - 12/02/2004France Telecom R&D

Network context : Hot issuesProblem with L2TP tunnels:

L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions are reset. We need less than 5s as L2TP timer is around 8s.

Problem with MPLS/VPN:Traffic from VPN customers is very sensitive in term of convergence because :

Video applications over INTRANET are growing fast There are a lot of real time applications : cartography apps, financial apps, etc.. SNA interconnections with thousands of usersDeployment of "thin clients" ( e.g. Citrix )

Service Convergence should be under the second

Two improvement areas :Improve the scheduled maintenance process so that :

L2TP session get rerouted before the maintenance operationsMPLS-VPN stability is enhanced (no traffic loss, no BGP session reset)

Improve the IS-IS convergence in the IP backbone (PE-P and P-P) so that :Unexpected traffic disruption become as short as possible

Page 5: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D5 - 12/02/2004France Telecom R&D

Improve the scheduled work (1/2)Taking into account the scheduled work is a priority. Most convergence cases arerelated to scheduled work.

Example 1: Study comparing IS-IS activities on maintenance hours compared todaytime. IS-IS changes are much more frequent during maintenance hours (study over 1 month)

Example 2: Study analyzing all failure and maintenance tickets and the related rootcauses: Logiciel

9% Matériel 13%

Exploitation 1%

Indéterminé1%

Software

Hardware

unknown

Human errors

TP 76%Maintenance

Page 6: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D6 - 12/02/2004France Telecom R&D

Improve the scheduled Maintenance (2/2)We can anticipate the rerouting for a scheduled maintenance. That is of course not possible in case of an unexpected failure event.

Exemple of an improved engineering rule in case of a router reload (software upgrade).

Before the reload :Step 1 : Set overload bit in the IS-IS configurationStep 2 : Depending on the router, modify some metrics (next-hop or loopback, etc.)Step 3 : Wait and verify that expected rerouting of the traffic as occured.Step 4 : Reload the router

After the reload :Step 5 : Don't forget to normalize the router, Step 6 : Verify that nominal routing is again working

Not always efficient, can become very complex.

We need a safe, simple, efficient solution to allow scheduled work without lossof packets in order to reduce operational costs and disruptions

Maybe dedicated reload feature combined with Graceful restart and Non Stop Forwarding ?Maybe other protocol changes in ISIS and BGP ?

Page 7: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D7 - 12/02/2004France Telecom R&D

Improve the convergence time

Time

Break

T0

Detection

T1 T2

ReceptionPropagation and trigger

Computation

T3

RIB update

Distributed update

T4

FIB updateConvergence Time

Time

State of routers

Analysis of the convergence time:1. Detection Time2. IS-IS Flooding and SPF triggering3. SPF Computation and RIB update4. FIB computation and distribution to the Line cards

Method to improve the convergence time:1. Understand the IS-IS activity2. Enhance IS-IS protocol behavior3. Improve our test tools

Customer traffic

4. Verify and measure on the real network the improvement

Page 8: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D8 - 12/02/2004France Telecom R&D

Principle for monitoring IS-IS

PE

PE

P P

P

OperationalNetwork

Back Office

LocalDatabase

Web Server

RawformatAnalyzed data

Database Server Probe

FT R&D

MRTGreplay

Agilent replay

Lab Network

Customized Auditdocument

Principle: Snoop an IS-IS session between two IS-IS RoutersThe probe receives and stores all LSP flooded in the networkThe database server stores and analyses the LSPThe web server displays statistics

Page 9: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D9 - 12/02/2004France Telecom R&D

Understand the network behavior Audit network behavior to prove IS-IS optimization feasability withSceptre:

Inter-LSP arrival :

Number of SPF and PRC computations per day :

General evolution of IS-IS activity (here over one month) :

Repartition of the interval

between LSPs (ms) Repartition of the interval

between non-refresh LSPs (ms)

Page 10: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D10 - 12/02/2004France Telecom R&D

Today possible IS-IS Optimization

Optimizing IS-IS reactions to failuresDetection

No possible changes in IS-IS hello : too dangerousPOS timer optimization (depending of layer 2 protection timer)

LSP generation /floodingSmall backoff LSP gen interval ( avoids spacing too much LSPs)

SPF and PRC triggering and computationSmall back off spf interval

IS-IS before IS-IS optimized Failure detection 2000 ms <= 5 ms Delay before flooding LSP 33 ms <= 5 ms Delay before SPF computation 5500 ms 1 ms to max time (avg 20ms) SPF computation time & RIB update

40 to 250 ms 40 to 250 ms

Page 11: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D11 - 12/02/2004France Telecom R&D

Tools to mesure convergence delay testLab tests with real hardware are also necessary and complementary to control plane auditing and monitoring.

In order to evaluate equipment performanceIn order to check that new equipments can effectively forward and reroute the traffic

Example of measurementsLine card FIB feeding speed can only be measured in the lab.LSP generation mechanisms evolution depending upon versionSPF triggering and computation speedSPF vs. Incremental SPF in realistic network conditions

RT for IS-IS& IP traffic

.x+2

MRT for BGP

C72VC

C12C C12F

C12EMgmt:0/0:.114 Mgmt:0/0:.132

.45 .2 .1

.69 .70

.33.81

.34

.82 .6

.5.10

X.80/30

X.0/30

X.8/30

X.32/30

X.68/30

X.4/30

Lo1:Y.13/32

5

200

2005

5

RT for IS-IS& IP traffic

.x+2

MRT for BGP

Web Server

RawformatAnalyzed data

Database Server

Use of Database toUpdate ISIS & BGP simulators

Page 12: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D12 - 12/02/2004France Telecom R&D

IS-IS convergence : FIB update

Temps de rétablissement du trafic

0

20 000

40 000

60 000

80 000

100 000

120 000

140 000

0 2 000 4 000 6 000 8 000 10 000 12 000 14 000 16 000 18 000Temps (ms)

Nb

préf

ixes

rét

ablis

FIB update : 25 us / prefixe

T1: link failure

T2: FIB Update of first IPv4 prefix

T3 : FIB Update of last IPv4 prefix

In some network situations for some equipments, the principal problem is the FIBupdate :

This time depends on the failure (ASBR, link in the backbone) and the number of prefixes that should be modified inside each line card.

After a change of output interface without change of BGP nexthop in NC1 (decision base on local pref)

Num

ber o

f upd

ated

pre

fixes

INTERNETFull-Routing

ASBR2ASBR1

NR1 NR1

NC1 NC2

5

30

25

1010

10 10

INTERNETcustomer

Nom

inal path for the FR

Time in msec

IS-ISconvergenceand FIBupdatefor the FR

The solution is : Two stages lookup , CISCO introduce s this feature in 12.0.27S

Backup path for the FR

Page 13: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D13 - 12/02/2004France Telecom R&D

Updating a lot of prefixesBut in other failure case like :

Temps d'établissement du trafic

y = 4,85 x

0

20 000

40 000

60 000

80 000

100 000

120 000

140 000

0 5 000 10 000 15 000 20 000 25 000 30 000Temps (ms)

Nb

préf

ixes

éta

blis

Time to update the FIB for all BGP prefixes (with a new next-hop). After an ASBR failure.

Time in msec

Num

ber o

f upd

ated

pre

fixes

Established prefixes over time

INTERNETFull-Routing

ASBR2ASBR1

NR1 NR1

NC1 NC2

5

30

25

1010

10 10

INTERNETcustomer

Nom

inal path for the FR

Backup path fot the FR

BGPconvergenceand FIBupdatefor the FR

What is the hardware, software solution ? (maybe just a best network design ? ;-)

Page 14: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D14 - 12/02/2004France Telecom R&D

Solution for the FIB Update issueSome possible solutions

Hardware improvement: Two stages lookup at the edge or in the core;Engineering changes :

Reduce the number of IS-IS and BGP prefixes (for example : Partial Routing with default route)Limit the number of next-hop changes.

Partial routing performance (Case of the collect network)It is not a perfect solution, but it provides a good improvement to the convergence in the core It can stabilize the network by reducing the amount of BGP updates

0

5 000 0

10 000 0

15 000 0

20 000 0

25 000 0

30 000 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

day

num

ber

of u

pdat

es p

er d

ay

BGP full-routing partial-routing

Page 15: Présentation du document Date · L2TP is used for INTERNET aggregation. The IP backbone is fully securized. But during link or node failure and scheduled work a lot of L2TP sessions

- D15 - 12/02/2004France Telecom R&D

SummaryFor Internet services

IS-IS optimizations are sufficient to fulfill the service requirements with routers that have two stages look-up. FT will deploy IS-IS fast convergence.BFD will be a good solution to improve the detection time on GigaEthernet

For MPLS-VPN servicesSome additionnal tests are needed to detail failure cases and discuss the benefits of subsecond IS-IS convergence. There is a strong interaction with LDP.

OutlookInteroperability : Test the end-to-end results with different routers (CISCO – JUNIPER)Need to consider Fast Reroute to decrease the convergence time for new services ? A problem for updating a lot of prefixes in the FIB remains (in some special failure)Improve the Inter-AS convergence time