Posts Tagged ‘sub-second convergence’

Jun
02

This goal of this post is brief discussion of main factors controlling fast convergence in OSPF-based networks. Network convergence is a term that is sometimes used under various interpretations. Before we discuss the optimization procedures for OSPF, we define network convergence as the process of synchronizing network forwarding tables after a topology change. Network is said to be converged when none of forwarding tables are changing for “some reasonable” amount of time. This “some” amount of time could be defined as some interval, based on the expected maximum time to stabilize after a single topology change. Network convergence based on native IGP mechanisms is also known as network restoration, since it heals the lost connections. Network mechanisms for traffic protection such as ECMP, MPLS FRR or IP FRR offering different approach to failure handling are outside the scope of this article. We are further taking multicast routing fast recovery out of the scope as well, even though this process is tied to IGP re-convergence.

It is interesting to notice that IGP-based “restoration” techniques have one (more or less) important problem. During the time of re-convergence, temporary micro-loops may exist in the topology due to inconsistency of FIB (forwarding) tables of different routers. This behavior is fundamental to link-state algorithms, as routers closer to failure tend to update their forwarding database before the other routers. The only popular routing protocol that lacks this property is EIGRP, which is loop-free at any moment during re-convergence, thanks to the explicit termination of the diffusing computations. For the link state-protocols, there are some enhancements to the FIB update procedures that allow avoiding such micro-loops with link-state routing, described in the document [ORDERED-FIB].

Even though we are mainly concerned with OSPF, ISIS will be mentioned in the discussion as well. It should be noted that compared to IS-IS, OSPF provides less “knobs” for convergence optimization. The main reason is probably the fact that ISIS is being developed and supported by a separate team of developers, more geared towards the ISPs where fast convergence is a critical competitive factor. The common optimization principles, however, are the same for both protocols, and during the conversation will point out at the features that OSPF lacks while IS-IS has for tuning. Finally, we start our discussion with a formula, which is further explained in the text:

Convergence = Failure_Detection_Time + Event_Propagation_Time + SPF_Run_Time + RIB_FIB_Update_Time

The formula reflects the fact that convergence time for a link-state protocol is sum of the following components:

  • Time to detect the network failure, e.g. interface down condition.
  • Time to propagate the event, i.e. flood the LSA across the topology.
  • Time to perform SPF calculations on all routers upon reception of the new information.
  • Time to update the forwarding tables for all routers in the area.

Continue Reading

Tags: , , , , ,

Categories

CCIE Bloggers