The purpose of event dampening is reducing the effect of oscillations on routing systems. In general, periodic process that affect the routing system as a whole should have the period no shorter than the system convergence time (relaxation time). Otherwise, the system will never stabilize and will be constantly updating its state. In reality, complex system have multiple periodic processes running at the same time, which results is in harmonic process interference and complex process spectrum. Considering such behavior is outside the scope of this paper. What we want to do, is finding optimal settings to filter high-frequency events from the routing system. In our particular case, events are interface flaps, occurring periodically. *We want to make sure that oscillations with period T or less are not reported to the routing system*. Here

**T**is found empirically, based on observed/estimated convergence time as suggested above.

Event dampening uses exponential back-off algorithm to suppress event reporting to the upper level protocols. Effectively, every time an interface flaps (goes down, to be accurate) a penalty value of **P** is added to the interface penalty counter. If at some point the accumulated penalty exceeds the “suppress” value of **S**, the interface is placed in the suppress state and further link events are not reported to the upper protocol modules. At all time, the interface penalty counter follows exponential decay process based on the formula **P(t)=P(0)*2^(-t/H)** where **H** is half-life time setting for the process. As soon as accumulated penalty reaches the lower boundary of **R** – the reuse value, interface is unsuppressed, and further changes are again reported to the upper level protocols.

What we want to find out, is the lowest value of **H** that suppresses all harmonic oscillation processes with the period of **T** or lower, but does not suppress longer-period processes. E.g. we want to block all oscillations happening every 5 seconds or more often, but report interface flaps happening every 6 seconds or less often. Look at the figure above and consider that there have been two flap events, separated by time period **T**. At the moment of the second flap, the suppress condition is: **P*2^(-T/H)+P >= S**. Here the left part is the penalty accumulated at the moment of the second flap, assuming the initial penalty at the moment of the first flap was zero. From this inequality, we quickly find out that **H >= T/log2(P/(S-P))**. if we could make P/(S-P)=2, then the formula would be greatly simplified. Per Cisco’s implementation, **P** (penalty) is fixed to 1000, and by setting S=1500 we get **1000/(1500-1000)=2**. Therefore, if we select **S=1500, P=1000** then our condition becomes **H >= T**. Since we are looking for the minimal value of **H** we can set **H=T**. Seeded with this values, event dampening filter will reject all oscillating porcesses with the period shorter than **T**. However, there is one more parameter we are left to find is **R** – the reuse time.

We may apply the following logic here. Observing no further events since the last flap for the duration of **2xT**, we may assume that the periodic process has stopped. Therefore, we may unblock the interface after **2xT** seconds. The reuse value could be found by taking the penalty accumulated after the second flap, and further decaying it for **2xT** more seconds: **(P*2^(-T/H)+P)*2^(-2T/H) <= R**. Since we set **H=T** we quickly find out that **R >= 3/8*P = 375**. At this point we have all parameters we need to know in order to apply optimal event dampening settings based on the cut-off period for oscillating processes. Here is a sample configuration, for **T=10** seconds. Notice the last parameter, know as the maximum suppress time – the maximum time that the interface could be kept in suppress state. Since our goal is to hold the interface suppressed for at least **2xT** seconds, the maximum suppress time is twice the half-life value.

R3: interface FastEthernet 0/0 dampening 10 375 1500 20

Lastly, a few words on figuring out the convergence time for your network. To being with, we only consider IGP protocols in this discussion. Dampening in BGP is more complicated, due to the scale of the routing system involved. The general consensus nowadays is that using dampening in BGP may result in more harm than good, due to cascading withdrawn messages. Next, for the IGPs, you are generally considered with a single fault domain, which in properly designed network is bounded to one IGP area (or EIGRP query scope zone). Convergence time for a single area depends on the following factors:

- Area size – impacts routing database sizes, affects LSA/Query propagation time and SPF runtime.
- Weakest (in terms of CPU/Memory) router in the area – this is the router to complete SPF computations the last.
- RIB/FIB sizes: a significant amount of time is wasted on updating RIB/FIB tables after IGP re-convergence. Again, depends on the area size

To summarize, the main factor is the area size and the number of links in the area (which normally follows the power law based on the number of nodes). However, knowing this fact does not give us a formula for the convergence time. In most cases, you should rely on empirical evidence to obtain this. Starting with one-two seconds could be reasonable, but you should scale this value by the factor of two or three to account for multiple oscillations that may run in the network concurrently. Still, one again, there is no magical formula for this – this is what network engineers and designers are for!

##### About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

**Find all posts by Petr Lapukhov, 4xCCIE/CCDE** | **Visit Website**

You can leave a response, or trackback from your own site.

### 5 Responses to “Optimizing IP Event Dampening”

### Leave a Reply

Petr,

Thanks a lot for the article.

Hi Petr,

What is the signigicant difference between BGP dampening and IP dampening. I can see the terminology seems to be same.

also the formula we use for BGP dampening

Max penalty = reuse-limit *2^(maximum suppress time/half time)

does it make any sense here. It would be helpful if you show the steps to simply the formula posted in your blog.

Thanks

Ronnie

[...] design network for redundancy to avoid full database reloads upon single link restoration. See [OPT-DAMPENING] for information on tuning the IP Event Dampening parameters. Lastly, egress queueing may result in [...]

[...] design network for redundancy to avoid full database reloads upon single link restoration. See [OPT-DAMPENING] for information on tuning the IP Event Dampening [...]

I am not stupid, but I feel like I have to be a mad scientist to understand the math behind this. I passed 5 semesters of Calculus and this still doesnt make sense to me. Any good websites out there with a “for dummies” example? Meanwhile, I’ll go over this a few more times and see if it sinks in.