Jan
03

Continuing my review of titles from Petr’s excellent CCDE reading list for his upcoming LIVE and ONLINE CCDE Bootcamps, here are further notes to keep in mind regarding EIGRP.

About the Protocol

  • The algorithm used for this advanced Distance Vector protocol is the Diffusing Update Algorithm.
  • As we discussed at length in this post, the metric is based upon Bandwidth and Delay values.
  • For updates, EIGRP uses Update and Query packets that are sent to a multicast address.
  • Split horizon and DUAL form the basis of loop prevention for EIGRP.
  • EIGRP is a classless routing protocol that is capable of Variable Length Subnet Masking.
  • Automatic summarization is on by default, but summarization and filtering can be accomplished anywhere inside the network.

Neighbor Adjacencies

EIGRP forms "neighbor relationships" as a key part of its operation. Hello packets are used to help maintain the relationship. A hold time dictates the assumption that a neighbor is no longer accessible and causes the removal of topology information learned from that neighbor. This hold timer value is reset when any packet is received from the neighbor, not just a Hello packet.

EIGRP uses the network type in order to dictate default Hello and Hold Time values:

  • For all point-to-point types - the default Hello is 5 seconds and the default Hold is 15
  • For all links with a bandwidth over 1 MB - the default is also 5 and 15 seconds respectively
  • For all multi-point links with a bandwidth less than 1 MB - the default Hello is 60 seconds and the default Hold is 180 seconds

Interestingly, these values are carried in the Hello packets themselves and do not need to match in order for an adjacency to form (unlike OSPF).

Reliable Transport

By default, EIGRP sends updates and other information to multicast 224.0.0.10 and the associated multicast MAC address of 01-00-5E-00-00-0A.

For multicast packets that need to be reliably delivered, EIGRP waits until a RTO (retransmission timeout) before beginning a recovery action. This RTO value is based off of the SRTT (smooth round-trip time) for the neighbor. These values can be seen in the show ip eigrp neighbor command.

If the router sends out a reliable packet and does not receive an Acknowledgement from a neighbor, the router informs that neighbor to no longer listen to multicast until it is told to once again. The local router then begins unicasting the update information. Once the router begins unicasting, it will try for 16 times or the expiration of the Hold timer, whichever is greater. It will then reset the neighbor and declare a Retransmission Limit Exceeded error.

Note that not all EIGRP packets follow this reliable routine - just Updates and Queries. Hellos and acknowledgements are examples of packets that are not sent reliably.

Dec
30

To start my reading from Petr's excellent CCDE reading list for his upcoming LIVE and ONLINE CCDE Bootcamps, I decided to start with:
EIGRP for IP: Basic Operation and Configuration by Russ White and Alvaro Retana
I was able to grab an Amazon Kindle version for about $9, and EIGRP has always been one of my favorite protocols.
The text dives right in to none other than the composite metric of EIGRP and it brought a smile to my face as I thought about all of the misconceptions I had regarding this topic from early on in my Cisco studies. Let us review some key points regarding this metric and hopefully put some of your own misconceptions to rest.

  • While we are taught since CCNA days that the EIGRP metric consists of 5 possible components - BW, Delay, Load, Reliability, and MTU; we realize when we look at the actual formula for the metric computation, MTU is actually not part of the metric. Why have we been taught this then? Cisco indicates that MTU is used as a tie-breaker in a situation that might require it. To review the actual formula that is used to compute the metric, click here.
  • Notice from the formula that the K (constant values) impact which components of the metric are actually considered. By default K1 is set to 1 and K3 is set to 1 to ensure that Bandwidth and Delay are utilized in the calculation. If you wanted to make Bandwidth twice as significant in the calculation, you could set K1 to 2, as an example. The metric weights command is used for this manipulation. Note that it starts with a TOS parameter that should always be set to 0. Cisco never did fully implement this functionality.
  • The Bandwidth that effects the metric is taken from the bandwidth command used in interface configuration mode. Obviously, if you do not provide this value - the Cisco router will select a default based on the interface type.
  • The Delay value that effects the metric is taken from the delay command used in interface configuration mode. This value depends on the interface hardware type, e.g. it is lower for Ethernet but higher for Serial interfaces. Note how the Delay parameter allows you to influence EIGRP pathing decisions without the manipulation of the Bandwidth value. This is nice since other mechanisms could be relying heavily on the bandwidth setting, e.g. EIGRP bandwidth pacing or absolute QoS reservation values for CBWFQ.
  • The actual metric value for a prefix is derived from the SUM of the delay values in the path, and the LOWEST bandwidth value along the path. This is yet another reason to use more predictive Delay manipulations to change EIGRP path preference.

In the next post on the EIGRP metric, we will examine this at the actual command line, and discuss EIGRP load balancing options. Thanks for reading!

Sep
02

Abstract

This publication briefly covers the use of 3rd party next-hops in OSPF, RIP, EIGRP and BGP routing protocols. Common concepts are introduced and protocol-specific implementations are discussed. Basic understanding of the routing protocol function is required before reading this blog post.

Overview

Third-party next-hop concept appears only to distance vector protocol, or in the parts of the link-state protocols that exhibit distance-vector behavior. The idea is that a distance-vector update carries explicit next-hop value, which is used by receiving side, as opposed to the "implicit" next-hop calculated as the sending router's address - the source address in the IP header carrying the routing update. Such "explicit" next-hop is called "third-party" next-hop IP address, allowing for pointing to a different next-hop, other than advertising router. Intitively, this is only possible if the advertising and receiving router are on a shared segment, but the "shared segment" concept could be generalized and abstracted. Every popular distance-vector protocols support third party next-hop - RIPv2, EIGRP, OSPF and BGP all carry explicit next-hop value. Look at the figure below - it illustrates the situation where two different distance-vector protocols are running on the shared segment, but none of them runs on all routers attached to the segment. The protocols "overlap" at a "pivotal" router and redistribution is used to provide inter-protocol route exchange.

third-party-nh-generic

Per the default distance-vector protocol behavior, traffic from one routing domain going into another has cross the "pivotal" router, the router where the two domains overlap (R3 in our case) - as opposed to going directly to the closes next-hop on the shared segment. The reason for this is that there is no direct "native" update exchange between the hops running different routing protocols. In situations like this, it is beneficial to rewrite the next-hop IP address to point toward the "optimum" exit point, using the "pivotal" router's knowledge of both routing protocols.

OSPF is somewhat special with respect to the 3rd party next-hop implementation. It supports third-party next-hop in Type-5/7 LSAs (External Routing Information LSA and NSSA External LSA). These LSAs are processed in "distance-vector manner" by every receiving router. By default, the LSA is assumed to advertise the external prefix "connected" to the advertising router. However, if the FA is non-zero, the address in this field is used to calculate the forwarding information, as opposed to default forwarding toward the advertising router. Forwarding Address is always present in Type-7 LSAs, for the reason illustrated on the figure below:

third-party-nh-ospf-nssa-fa

Since there could be multiple ABRs in NSSA area, only one is elected to perform 7-to-5 LSA translation - otherwise the routing information will loop back in the area, unless manual filtering implemented in the ABRs (which is prone to errors). Translating ABR is elected based on the highest Router-ID, and may not be on the optimum path toward the advertising ASBR. Therefore, the forwarding address should prompt the more optimum path, based on the inter-area routing information.

EIGRP

We start with the scenario where we redistribute RIP into EIGRP.

third-party-nh-rip2eigrp

Notice that EIGRP will not insert the third-party next-hop until you apply the command no ip next-hop-self eigrp on R3's connection to the shared segment. Look at the routing table output prior to applying the no ip next-hop-self eigrp command.

R1#show  ip route eigrp 
140.1.0.0/16 is variably subnetted, 2 subnets, 2 masks
D EX 140.1.2.2/32
[170/2560002816] via 140.1.123.3, 00:00:27, FastEthernet0/0

After the command has been applied to R3’s interface:

R1#show  ip route eigrp
140.1.0.0/16 is variably subnetted, 2 subnets, 2 masks
D EX 140.1.2.2/32
[170/2560002816] via 140.1.123.2, 00:00:04, FastEthernet0/0

The same behavior is observed when redistributing OSPF into EIGRP, but not when redistributing BGP. For some reason, BGP's next-hop is not copied into EIGRP, e.g. in the example below, EIGRP will NOT insert the BGP's next-hop into updates. Notice that you may enable or disable the third-party next-hop behavior in EIGRP using the interface-level command ip next-hop-self eigrp.

RIP

RIP passes the third-party next-hop from OSPF, BGP or EIGRP. For instance, assume EIGRP redistribution into RIP. You have to turn on the no ip split-horizon on R3's Ethernet connection to get this to work:

third-party-nh-eigrp2rip

R2#show ip route rip 
140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
R 140.1.1.1/32 [120/1] via 140.1.123.1, 00:00:17, FastEthernet0/0

Notice the following RIP debugging output, which lists the third-party next-hop:

RIP: received v2 update from 140.1.123.3 on FastEthernet0/0
140.1.1.1/32 via 140.1.123.1 in 1 hops
140.1.123.0/24 via 0.0.0.0 in 1 hops

Surprisingly, there is NO need to enable the command no ip split-horizon on the interface when redistributing BGP or OSPF routes into RIP. Seem like only EIGRP to RIP redistribution requires that. Keep in mind, however, that split-horizon is OFF by default on physical frame-relay interfaces. Here is a sample output of redistributing BGP into RIP using the third-party next-hop:

R3#show ip route bgp 
140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
B 140.1.2.2/32 [20/0] via 140.1.123.2, 00:22:13
R3#

R1#show ip route rip
140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
R 140.1.2.2/32 [120/1] via 140.1.123.2, 00:00:09, FastEthernet0/0

RIP’s third-party next-hop behavior is fully automatic. You cannot disable or enable it, like you do in EIGRP.

OSPF

Similarly to RIP, OSPF has no problems picking up the third-party next-hop from BGP, EIGRP or RIP. Here is how it would look like (guess which protocol is redistributed into OSPF, based solely on the commands output):

R1#sh ip route ospf 
140.1.0.0/16 is variably subnetted, 3 subnets, 2 masks
O E2 140.1.2.2/32 [110/1] via 140.1.123.2, 00:34:59, FastEthernet0/0

R1#show ip ospf database external

OSPF Router with ID (140.1.1.1) (Process ID 1)

Type-5 AS External Link States

Routing Bit Set on this LSA
LS age: 131
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 140.1.2.2 (External Network Number )
Advertising Router: 140.1.123.3
LS Seq Number: 80000002
Checksum: 0xF749
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 1
Forward Address: 140.1.123.2
External Route Tag: 200

If you’re still guessing, the external protocol is BGP, as could have been seen observing the automatic External Route Tag – OSPF set’s it to the last AS# found in the AS_PATH.

third-party-nh-bgp2ospf

There are special conditions to be met by OSPF for the FA address to be used. First, the interface where the third party next-hop resides should be advertised into OSPF using the network command. Secondly, this interface should not be passive in OSPF and should not have network type point-to-point or point-to-multipoint. Violating any of these conditions will stop OSPF from using the FA in type-5 LSA created for external routes. Violating any of these conditions prevents third-party next-hop installation in the external LSAs.

OSPF is special in one other respect. Distance vector-protocols such as RIP or EIGRP modify the next-hop as soon as they pass the routing information to other devices. That is, the third party next-hop is not maintained through the RIP or EIGRP domain. Contrary to these, OSPF LSAs are flooded within their scope with the FA unmodified. This creates interesting problem: if the FA address is not reachable in the receiving router’s routing table, the external information found in type 7/5 LSA is not used. This situation is discussed in the blog post “OSPF Filtering using FA Address”.

BGP

When you redistribute any protocol into BGP, the system correctly sets the third-party next-hop in the local BGP table. Look at the diagram below, where EIGRP prefixes are being redistributed into BGP AS 300:

third-party-nh-eigrp2bgp

R3’s BGP process installs R1 Loopback0 prefix into the BGP table with the next-hop value of R1’s address, not “0.0.0.0” like it would be for locally advertised routes. You will observe the same behavior if you inject EIGRP prefixes into BGP using the network command.

R3#sh ip bgp
BGP table version is 9, local router ID is 140.1.123.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
*> 140.1.1.1/32 140.1.123.1 156160 32768 ?

Furthermore, BGP is supposed to change the next-hop to self when advertising prefixes over eBGP peering sessions. However, when all peers share the same segment, the prefixes re-advertised over the shared segment do not have their next-hop changed. See the diagram below:

third-pary-nh-bgp2bgp

Here R1 advertises prefix 140.1.1.1/24 to R3 and R3 re-advertises it back to R2 over the same segment. Unless non-physical interfaces are used to form the BGP sessions (e.g. Loopbacks), the next-hop received from R1 is not changed when passing it down to R2. This implements the default third-party next-hop preservation over eBGP sessions. Look at the sample output for the configuration illustrated above: R1 receives R2’s prefix with unmodified next-hop.

R1#show ip bgp 
BGP table version is 3, local router ID is 140.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
*> 140.1.1.1/32 0.0.0.0 0 32768 i
*> 140.1.2.2/32 140.1.123.2 0 300 200 i

There is a way to disable this default behavior in BGP. A logical assumption would be that using the command neighbor X.X.X.X next-hop-self would work, and it does indeed, in the recent IOS versions. The older IOS, such as 12.2T did not have this command working for eBGP sessions, and your option would have been using a route-map with set ip next-hop command. The route-map method may still be handy, if you want insert totally “bogus” IP next-hop from the shared segment – receiving BGP speaker will accept any IP address that is on the same segment. That is not something you would do in the production environment too often, but definitely an interesting idea for lab practicing. One good use in production is changing the BGP next-hop to an HSRP virtual IP address, to provide physical BGP speaker redundancy. Here is a sample code for setting an explicit next-hop in BGP update:

router bgp 300
neighbor 140.1.123.1 remote-as 100
neighbor 140.1.123.1 route-map BGP_NEXT_HOP out
!
route-map BGP_NEXT_HOP permit 10
set ip next-hop 140.1.123.100

Summary

All popular distance-vector protocols support third-party next-hop insertion. This mechanism is useful on multi-access segments, in situations when you want pass optimum path information between routers belonging to different routing protocols. We illustrated that RIP implements this function automatically, and does not allow any tuning. On the other hand, EIGRP supports third-party next-hop passing from any protocol, other than BGP, and you may turn this function on/off on per-interface basis. Furthermore, OSPF’s special feature is propagation of the third-party next-hop within an area/autonomous system, unlike the distance-vector protocols that reset the next-hop at every hop (considering AS a being a “single-hop” for BGP). Thanks to that feature, OSPF offers interesting possibility to filter external routing information by blocking FA prefix from the routing tables. Finally, BGP gives most flexibility when it comes to the IP next-hop manipulation, allowing for changing it to any value.

Further Reading

Common Routing Problem with OSPF Forwarding Address
OSPF Prefix Filtering Using Forwarding Address
BGP Redundancy using HSRP

Aug
14

In the first part of this series, we subdivided the processes of EIGRP into four discrete steps, and detailed troubleshooting the first two. This is taken from the 5-Day CCNP bootcamp:

  • Discovery of neighbors
  • Exchange of topology information
  • Best path selection
  • Neighbor and topology table maintenance

Let us now discuss path selection and maintenance troubleshooting.

We should all remember that we can view the topology table of EIGRP with the command show ip eigrp topology. Here we can see the successor routes (these are the best routes that are placed in the routing table) and we can see the second best routes, the feasible successor routes. These feasible successor routes are the key to the lightening fast convergence that EIGRP can offer us. When a speaker loses its successor, it can quickly install a feasible successor route in its place.

We need to remember the important rule of feasible successors. The advertised distance of the proposed feasible successor must be less than the feasible distance of the current successor route. This is actually a loop prevention mechanism.

Another big gotcha when it comes to path selection in EIGRP is the configuration of variance to unequal cost load balance. I can remember fighting with this in an INE practice lab long ago when I was preparing for the exam. Something I had no idea of back then...in order to be considered for the unequal load balancing, the alternate paths must be feasible successors! Older editions of CCNP courses never thought to tell us that little nugget!

We should be careful when modifying bandwidth to effect path selection. Cisco gave us delay for this purpose. Modifying the bandwidth can starve EIGRP updates of bandwidth to use. Remember, by default, EIGRP will only use 50% of an interface's bandwidth. We can control this with the command ip bandwidth percent eigrp.

For table maintenance, show ip eigrp topology is critical. Note that in this table, passive is what we want to see. Active indicates there is not a feasible successor and neighbors are being queried for an alternative path. SIA log messages indicate a Stuck in Active issue. Here the router is not receiving a reply to queries. The most common reasons this can occur:

  • Bad link
  • Congested link
  • The query scope if too big (too many routers involved)
  • Excessive redundancy is built into the network
  • The router CPU is overloaded
  • There is a shortage of memory on the router
  • There are software defects

When it comes to table maintenance, another excellent troubleshooting command is show ip eigrp topology summary. This command displays the total number of routes in the topology table and the total number of queries the router is waiting on responses for. It also shows a quiescent interface field that shows which interface have no outstanding packets to be sent or acknowledged.

Some of our favorite EIGRP verification commands:

  • show ip route eigrp
  • show ip protocol
  • show ip eigrp neighbor
  • show ip eigrp topology
  • show ip eigrp topology all-links
  • show ip eigrp topology summary
  • debug eigrp packet hello
  • debug eigrp packet query reply
Dec
23

CCNA students can typically rattle off the fact that EIGRP uses Bandwidth and Delay in its composite metric calculation by default. In fact, they tend to know this as well as their own last name. But I often notice they might have some pretty big misconceptions about how this metric is really calculated, and how they can manipulate it.

Here are some very important "Core Knowledge" facts that we need to keep in mind about the EIGRP metric:

  • The metric formula uses the bandwidth and delay values that are set as default on the interface, or those values configured on the interface by an administrator
  • The bandwidth value that is used in the calculation is the slowest bandwidth in the path from source to destination; to remember this just think of the "weakest link" in the path
  • The delay value used in the calculation is the sum of the delay values in the path
  • You can set the bandwidth value of an interface using the BANDWIDTH command and you can set the delay value of an interface using the DELAY command
  • Setting bandwidth or delay on an interface does not change any physical properties of the interface at all; you are just changing the values that the interface reports for EIGRP metric purposes

Let's examine some of this at the command line:

R1#show run interface fa0/0
Building configuration...

Current configuration : 95 bytes
!
interface FastEthernet0/0
ip address 10.10.10.1 255.255.255.0
duplex auto
speed auto

Notice that we have not set BANDWIDTH or DELAY under this interface at all. Let us examine what EIGRP will be using regarding this interface in its overall calculation:

R1#show interface fa0/0
FastEthernet0/0 is up, line protocol is up
Hardware is Gt96k FE, address is c201.0111.0000 (bia c201.0111.0000)
Internet address is 10.10.10.1/24
MTU 1500 bytes, BW 10000 Kbit/sec, DLY 1000 usec,
reliability 255/255, txload 1/255, rxload 1/255

Notice the values that EIGRP can use by default are in place.

The DELAY command is a powerful command for manipulating EIGRP paths. Since the BANDWIDTH command can end up impacting a lot of other configurations (like QoS), we can use the DELAY command to manipulate EIGRP metrics (and therefore, paths) without having to touch the BANDWIDTH command.

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#interface fa0/0
R1(config-if)#delay ?
<1-16777215>  Throughput delay (tens of microseconds)

Notice from the above output just how easy it is to manipulate this value, and therefore, impact the EIGRP metric.

I hope you are enjoying your CCNA training here at INE, and I hope you will return to our blog often!

Nov
20

Every once in a while I come across a tip that is so exciting I want to share it with the world. I was recently going through one of the many posts I read, and saw the answer to a question that I have been wondering about for many years. Awesome job to Steve Shaw who came up with this. Here is the scenario. We are running EIGRP, and have a neighbor, but no console access to that neighbor. We get the message on our local router saying “%DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor A.B.C.D (Fa0/1) is down: K-value mismatch”.

Now for the tricky part. There are 5 K values, each K value supporting a value between 0-255 inclusive.  It would take a long time to test all of the possible values of K1-K5 until we “crack the code” and get the right values.

Here is a solution to discover which K values are in use on the remote neighbor. Locally, create an access-list to match on EIGRP updates:

access-list 100 permit ip host 136.1.45.4 host 224.0.0.10

Be sure to pick a free ACL #. For the source address, use the IP address of the neighbor.

Then use the following debug command:

debug ip packet 100 detail dump

Context sensitive help may not show the "dump" keyword as an option at the end, but it is most likely still available.  Make sure to log to a location where the output of the debug may be seen, such as the buffer, console or a syslog server.

If our neighbor has K values of:
EIGRP metric weight K1=1, K2=1, K3=1, K4=1, K5=1
On our local router we would should see the debug output similar to this:

01:04:28: IP: s=136.1.45.4 (FastEthernet0/1), d=224.0.0.10, len 60, rcvd 2, proto=88
0F4019C0: 0100 5E00000A ..^...
0F4019D0: 00444444 44440800 45C0003C 00000000 .DDDDD..E@.<....
0F4019E0: 0158239B 88012D04 E000000A 0205EDC9 .X#...-.`.....mI
0F4019F0: 00000000 00000000 00000000 00000001 ................
0F401A00: 0001000C 01010101 0100000F 00040008 ................
0F401A10: 0C040102 ....

The 16th byte from the end (or the 4th grouping from the end), is where the K values begin. The K values are in bold. There is one byte per K value, represented in the output as 2 hex characters.

If our neighbor has K values as:
EIGRP metric weight K1=1, K2=1, K3=0, K4=0, K5=0
Our output from the debug would resemble the output below.

01:04:28: IP: s=136.1.45.4 (FastEthernet0/1), d=224.0.0.10, len 60, rcvd 2, proto=88
0F4019C0: 0100 5E00000A ..^...
0F4019D0: 00444444 44440800 45C0003C 00000000 .DDDDD..E@.<....
0F4019E0: 0158239B 88012D04 E000000A 0205EDC9 .X#...-.`.....mI
0F4019F0: 00000000 00000000 00000000 00000001 ................
0F401A00: 0001000C 01010000 0000000F 00040008 ................
0F401A10: 0C040102 ....

The K values are conveniently displayed in order in the debug output. Remember, when you implement K values under the EIGRP routing process using the metric weights command, the first value is the TOS bit, and then the remaining 5 values are the K values in order 1-5.

One more example is called for, because the K values are not restricted to the values of 0 or 1. What if the remote neighbor had used the following under the EIGRP routing process:
metric weights 0 225 1 1 1 1
Then, on the local router, the debug output would look similar to the following:

01:23:07: IP: s=136.1.45.4 (FastEthernet0/1), d=224.0.0.10, len 60, rcvd 2, proto=88
0F8000E0: 0100 5E00000A ..^...
0F8000F0: 00444444 44440800 45C0003C 00000000 .DDDDD..E@.<....
0F800100: 0158239B 88012D04 E000000A 02050EC9 .X#...-.`......I
0F800110: 00000000 00000000 00000000 00000001 ................
0F800120: 0001000C E1010101 0100000F 00040008 ....`...........
0F800130: 0C040102 ....

225 in binary is 1110 0001. If we convert 1 nibble at a time to Hexadecimal, it would become E1 as shown in the debug output. We could then set our K values to match the neighbor and form a working adjacency.

Of course, in an environment where we manage both routers, we could just look at the output of "show ip protocols", or "show run | section router", and solve it immediately, but where is the challenge in that!

Good luck with your studies.

If you know of other solutions, please add it as a post, and share it with your peers, (assuming your K-values match your peers ;-).

Aug
03

EIGRP is based on the concept of diffusing computations. When something changes in network topology, the routers that detect a loss of network prefix will send out EIGRP QUERY messages that propagate in circular waves similar to the ripples on water surface. Every queried router will in turn query its neighbors and so on, until all routers that knew about the prefix affected. After this, the expanding circle will start collapsing back with EIGRP REPLY messages. The maximum radius of that circle may be viewed as the query scope. From scalability standpoint, it is very important to know what conditions will limit the average query scope, as this directly impact the network stability. You may compare the "query scope" with the concept of flooding domain in OSPF or ISIS. However, in contrast with the link-state protocols, you are very flexible with chosing the query scope boundaries, which is a powerful feature of EIGRP.

There are four conditions that affect query propagation. Almost all of them are based on the fact that query stops once the queried router cannot find the exact match for the requested subnet in its topology table. After this the router responds back that the network is unknown. Based on this behavior, the following will stop query from propagation

1) Network summarization. This could be considered as one of the most effective methods of query scoping. If router A sends a summary prefix to router B, then any query from A to B with regards to the subnets encompassed by the summary route will not find the exact match in B’s topology table. Thus queries are stopped on the routers that are one-hop from point of summarization. Given the fact that EIGRP allows for introducing summarization virtually everywhere you may easily partition your network into query domains. However, one important thing here – this requires well-planned hierarchical addressing, based on the Core-Edge model. Sending a default route to isolated stub domain could be considered an extreme case of summarization and is very effective.

2) Route filtering. You may filter EIGRP routes at any point using distribute-lists, route-maps etc. As soon as a route is filtered, the next-hop router will not learn it. When a query propagates to the next-hop, it will stop and return back. The use of route filtering is probably not as popular as route summarization, but could be used in some situations when you want to reduce the overall amount of queries but want to retain some routing information. In general, using proper route summarization should be enough.

3) Stub routers. This feature is different. Instead of “deflecting” queries, it simply signals the neighbors NOT to query the given router. This means the stub router could not be used as a transit path for any network. The stub router neighbors will not query it and thus stub feature prevents queries from even being generated. It is especially effective in “start” network designs, such as hub-and-spoke networks.

4) Different EIGRP AS numbers. EIGRP processes run independently from each other, and queries from one system don’t leak into another. However, if redistribution is configured between two processes a behavior similar to query leaking is observed. Consider that router R runs EIGRP processes for AS1 and AS2 with mutual redistribution configured between the two processes. Assuming that a query from AS1 reaches R and bounces back, R is supposed to remove the route from the routing table. This in effect will trigger route removal from AS2 topology table, as redistribution is configured. Immediately after this R will originate a query into AS2, as the prefix becomes active. The query will travel across AS2 per the normal query propagation rules and eventually R will learn all replies and become passive for the prefixes. It’s even possible for R to learn the path to the lost prefix via AS2 and re-inject it back into AS1 if the network topology permits that. As per the regular query propagation rules, you may use prefix summarization when redistributing routes to limit the query scope.

The above described are four main things that may limit query scoping. If you are using different AS numbers for this purpose, make sure you configure route summarization when doing redistribution, as otherwise you may get the “leaked query” effect.

May
01

The problem of unequal-cost load-balancing

Have you ever wondered why among all IGPs only EIGRP supports unequal-cost load balancing (UCLB)? Is there any special reason why only EIGRP supports this feature? Apparently, there is. Let’s start with the basic idea of equal-cost load-balancing (ECLB). This one is simple: if there are multiple paths to the same destination with equal costs, it is reasonable to use them all and share traffic equally among the paths. Alternate paths are guaranteed to be loop-free, as they are “symmetric” with respect to cost to the primary path. If we there are multiple paths of unequal cost, the same idea could not be applied easily. For example, consider the figure below:

uclb1

Suppose there is a destination behind R2 that R1 routes to. There are two paths to reach R2 from R1: one is directly to R2, and another via R3. The cost of the primary path is 10 and the cost of the secondary path is 120. Intuitively, it would make sense to start sending traffic across both paths, in proportion 12:1 to make the most use of the network. However, if R3 implements the same idea of unequal cost load balancing, we’ve got a problem. The primary path to reach R2 heading from R3 is via R1. Thus, some of the packets that R1 sends to R2 via R3 will be routed back to R1. This is the core problem of UCLB: some secondary paths may result in routing loops, as a node on the path may prefer to route back to the origin.

EIGRP’s solution to the problem

As you remember, EIGRP only uses an alternate path if it satisfies the feasibility condition: AD < FD, where AD is “advertised distance” (the peer’s metric to reach the destination) and FD is feasible distance (local best metric to the destination). The condition ensures that the path chosen by the peer will never lead into a loop, as our primary path is not subset of it (otherwise, AD would be higher or equal than FD). In our case, if we look at R1, FD=10 to reach R2 and R3’s AD=100, thus the alternate path may happen to lead into a loop. It has been proven by Garcia Luna Aceves that the feasibility condition is enough to always select loop-free alternative paths. Interested reader may look at the original paper at DUAL for the proof. In addition, if you are still thinking that EIGRP is history, I recommend you reading RFC5286 to find the same loop-free condition for Basic IP Fast Rerouting procedure (there are alternate approaches to IP FRR though). Since the feasibility condition used by EIGRP allows for selecting only loop-free alternatives it is safe to enable UCLB on EIGRP routers - provided that EIGRP is the only routing protocol - all alternate paths will never result in a routing loop.

Configuring EIGRP for Unequal-Cost Load Balancing

A few words about configuring UCLB in EIGRP. You achieve this by setting the “variance” value to something greater than 1. EIGRP routing process will install all paths with metric < best_metric*variance into the local routing table. Here metric is the full metric of the alternate path and best_metric is the metric of the primary path. By default, the variance value is 1, meaning that only equal-cost paths are used. Let’s configure a quick test-bed scenario We will use EIGRP as the routing protocol running on all routers, and set metric weights so that K1=0; K2=0; K3=1; K4=K5=0; This means that only the link delay is used for EIGRP metric calculations. Such configuration makes EIGRP metric purely additive and easy to work with.

uclb2

The metric to reach R2’s Loopback0 from R1 via the directly connected link is: FD = 256*(10+10)=5120. R3 is a feasible success for the same destination, as AD = 5*256 = 1280 < FD. Thus, R1 may use it for unequal-cost load-balancing. We should set variance to satisfy the requirement (50+10+5)*256 < 256*(10+10)*variance. Here (50+5)*256 is the alternate path’s full metric. From this equation, variance > 65/20=3.25 and thus we need to set the value to at least 4 in order to utilize the alternate path. If we look at R1’s routing table we would see the following:

Rack1R1#show ip route 150.1.2.2
Routing entry for 150.1.2.0/24
Known via "eigrp 100", distance 90, metric 5120, type internal
Redistributing via eigrp 100
Last update from 155.1.13.3 on Serial0/0.13, 00:00:04 ago
Routing Descriptor Blocks:
155.1.13.3, from 155.1.13.3, 00:00:04 ago, via Serial0/0.13
Route metric is 16640, traffic share count is 37
Total delay is 650 microseconds, minimum bandwidth is 128 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 2
* 155.1.12.2, from 155.1.12.2, 00:00:04 ago, via Serial0/0.12
Route metric is 5120, traffic share count is 120
Total delay is 200 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1

Notice the traffic share counters – essentially the represent the ratios of traffic shared between the two paths. You may notice that 120:37 is almost the same as 16640:5120 – that is, the amount of traffic send across a particular path is inversely proportional to the path’s metric. This represents the default EIGRP behavior set by the EIGRP process command traffic-share balanced. You may change this behavior using the command traffic-share min across-interfaces, which will instruct EIGRP to use only the minimum cost path (or paths, if any). Other feasible paths will be kept in routing table but not use until the primary path fails. The benefit is slightly faster convergence, as there is no need to insert the alternate path into RIB, as compared to scenarios where UCLB is disabled. This is how the routing table entry looks like when you enable the minimal-metric path routing:

Rack1R1#sh ip route 150.1.2.2
Routing entry for 150.1.2.0/24
Known via "eigrp 100", distance 90, metric 130560, type internal
Redistributing via eigrp 100
Last update from 155.1.13.3 on Serial0/0.13, 00:00:14 ago
Routing Descriptor Blocks:
155.1.13.3, from 155.1.13.3, 00:00:14 ago, via Serial0/0.13
Route metric is 142080, traffic share count is 0
Total delay is 5550 microseconds, minimum bandwidth is 128 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 2
* 155.1.12.2, from 155.1.12.2, 00:00:14 ago, via Serial0/0.12
Route metric is 130560, traffic share count is 1
Total delay is 5100 microseconds, minimum bandwidth is 1544 Kbit
Reliability 255/255, minimum MTU 1500 bytes
Loading 1/255, Hops 1

How CEF implements Unequal-Cost Load-Balancing

And now, a few words about data plane implementation of UCLB. It’s not documented on Cisco’s documentation site, and I first read about it the Cisco Press book “Traffic Engineering with MPLS” by Eric Osborne and Ajay Simha. The method does not seem to change since then.

As you know, the most prevalent switching method for Cisco IOS is CEF (we don’t consider distributed platforms). Layer 3 load-balancing is similar to load-balancing used with Ethernet Port-Channels. Router takes the ingress packet, hashes source and destination IP addresses (and maybe L4 ports) and normalizes the result to be, say, in range [1-16] (the result is often called “hash bucket”). The router than uses the hash bucket as the selector for one of the alternate paths. If the hash function distributes IP src/dst combinations evenly among the result space, then all paths will be utilized equally. In order to implement unequal balancing, an additional level of indirection is needed. Suppose you have 3 alternate paths, and you want to balance them in proportion 1:2:3. You need to fill the 16 hash buckets with the path selectors to maintain the same proportion 1:2:3. Solving the simply equation: 1*x+2*x+3*x=16 we find that x=2.6 or 3 rounded. Thus, to maintain the proportions 1:2:3 we need to associate 3 hash buckets with the first path, 6 hash buckets with the second part and the remaining 7 hash buckets with the third path. This not exactly the wanted proportion, but it is close. Here is how the “hash bucket” to “path ID” mapping may look like to implement the above proportion of 1:2:3.

Hash | Path ID
--------------
[01] -> path 1
[02] -> path 2
[03] -> path 3
[04] -> path 1
[05] -> path 2
[06] -> path 3
[07] -> path 1
[08] -> path 2
[09] -> path 3
[10] -> path 2
[11] -> path 3
[12] -> path 2
[13] -> path 3
[14] -> path 2
[15] -> path 3
[16] -> path 3

Once again, provided that the hash function distributes inputs evenly among all buckets, the paths will be used in the desired proportions. As you can see, the way that control plane divides traffic flows among different paths may be severely affected by the data plane implementation.

CEF may load-balance using per-packet or per-flow granularity. In the first case, every next packet of the same flow (src/dst IP and maybe src/dst port) is routed across the different paths. Why this may look like a good idea to better utilize all paths, it usually results in packets arriving out of order. The result is poor application performance, since many L4 protocols are better suited to packets arriving in order. Thus, in real-world, the preferred and the default load-balancing mode is per-flow (often called per-destination in CEF terminology). To change the CEF load-balancing mode on the interface, use the interface level command ip load-sharing per-packet|per-destination. Notice that it only affects the packets ingress on the configured interface.

Let’s look at the CEF data structures that may reveal information on load-balancing implementation. When you issue the following command, you may reveal all alternative adjacencies used to route the packet toward the prefix in question. This is the first part of the output.

Rack1R1#show ip cef 150.1.2.2 internal
150.1.2.0/24, version 77, epoch 0, per-destination sharing
0 packets, 0 bytes
via 155.1.13.3, Serial0/0.13, 0 dependencies
traffic share 37
next hop 155.1.13.3, Serial0/0.13
valid adjacency
via 155.1.12.2, Serial0/0.12, 0 dependencies
traffic share 120
next hop 155.1.12.2, Serial0/0.12
valid adjacency

The next block of information is more interesting. It reveals those 16 hash buckets, loaded with the path indexes (after the “Load distribution” string). Here zero means the first path, and 1 means the second alternate path. As we can see, 4 buckets are loaded with the path “zero”, and 12 buckets are loaded with the path “one”. The resulting share 4:12=1:3 is very close to 37:120, so in this case the data plane implementation did not change the load-balancing logic too much. The rest of the output details every hash bucket index and associated output interface, as well as the number of packets switches using the particular hash bucket. Keep in mind, that only CEF switched packets are reflected in the statistics, so if you use the ping command off the router itself, you will not see any counters incrementing.

[...Output Continues...]
0 packets, 0 bytes switched through the prefix
tmstats: external 0 packets, 0 bytes
internal 0 packets, 0 bytes
Load distribution: 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 (refcount 1)

Hash OK Interface Address Packets
1 Y Serial0/0.13 point2point 0
2 Y Serial0/0.12 point2point 0
3 Y Serial0/0.13 point2point 0
4 Y Serial0/0.12 point2point 0
5 Y Serial0/0.13 point2point 0
6 Y Serial0/0.12 point2point 0
7 Y Serial0/0.13 point2point 0
8 Y Serial0/0.12 point2point 0
9 Y Serial0/0.12 point2point 0
10 Y Serial0/0.12 point2point 0
11 Y Serial0/0.12 point2point 0
12 Y Serial0/0.12 point2point 0
13 Y Serial0/0.12 point2point 0
14 Y Serial0/0.12 point2point 0
15 Y Serial0/0.12 point2point 0
16 Y Serial0/0.12 point2point 0
refcount 6

Any other protocols supporting Unequal Cost Load-Balancing?

As we already know, nor OSPF, ISIS or RIP can support UCLB, as they cannot test that the alternate paths are loop free. In fact, all IGP protocols could be extended to use this feature, as specified in the above-mentioned RFC5286. However, this is not (yet?) implemented by any well-known vendor. Still, one protocol supports UCLB in addition to EIGRP. As you may have guessed this is BGP. What makes BGP so special? Nothing else but the routing-loop detection feature implemented for eBGP session. When a BGP speakers receives a route from the external AS, it looks for its own AS# in the AS_PATH attribute, and discards matching routes. This prevents routing loops on AS-scale. Additionally, this allows BGP to use alternative eBGP paths for unequal-cost load balancing. The proportions for the alternative paths are chosen based on the special BGP extended community attribute called DMZ Link bandwidth. By default, this attribute value is copied from the bandwidth of the interface connecting to the eBGP peer. To configure UCLB with BGP you need the configuration similar to the following:

router bgp 100
bgp maximum-path 3
bgp dmzlink-bw
neighbor 1.1.1.1 remote-as 200
neighbor 1.1.1.1 dmzlink-bw
neighbor 2.2.2.2 remote-as 200
neighbor 2.2.2.2 dmzlink-bw

In order for paths to be eligible for UCLB, they must have the same weight, local-preference, AS_PATH length, Origin and MED. Then, the local speaker may utilize the paths inversely proportional to the value of DMZ Link bandwidth attribute. Keep in mind that BGP multipathing is disabled by default, until you enable it with the command bgp maximum path. IBGP speakers may use DMZ Link bandwidth feature as well, for the paths injected into the local AS via eBGP. In order for this to work, DMZ Link Bandwidth attribute must be propagated across the local AS (send-community extended command) and the exit points for every path must have equal IGP costs in the iBGP speaker’s RIB. The data-plane implementation remains the same as for EIGRP multipathing, as CEF is the same underlying switching method.

So how should I do that in production?

Surprisingly enough, the best way to implement UCLB in real life is by using MPLS Traffic Engineering. This solution allows for high level of administrative control, and is IGP agnostic (well they say you need OSPF or ISIS, but you may use verbatim MPLS TE tunnels even with RIP or EIGRP). It is safe to use unequal-cost load-balancing with MPLS TE tunnels because they connect two nodes using “virtual circuits” and no transit node ever performs routing lookup. Thus, you may create as many tunnels between the source and the destination as you want, and assign the tunnel bandwidth values properly. After this, CEF switching will take care of the rest for you.

Further Reading

Load Balancing with CEF
Unequal Cost Load Sharing
CEF Load Sharing Details

Aug
02

DMVPN stands for Dynamic Multipoint VPN and it is an effective solution for dynamic secure overlay networks. In short, DMVPN is combination of the following technologies:

1) Multipoint GRE (mGRE)
2) Next-Hop Resolution Protocol (NHRP)
4) Dynamic Routing Protocol (EIGRP, RIP, OSPF, BGP)
3) Dynamic IPsec encryption
5) Cisco Express Forwarding (CEF)

Assuming that reader has a general understanding of what DMVPN is and a solid understanding of IPsec/CEF, we are going to describe the role and function of each component in details. In this post we are going to illustrate two major phases of DMVPN evolution:

1) Phase 1 – Hub and Spoke (mGRE hub, p2p GRE spokes)
2) Phase 2 – Hub and Spoke with Spoke-to-Spoke tunnels (mGRE everywhere)

As for DMVPN Phase 3 – “Scalable Infrastructure”, a separate post is required to cover the subject. This is due to the significant changes made to NHRP resolution logic (NHRP redirects and shortcuts), which are better being illustrated when a reader has good understanding of first two phases. However, some hints about Phase 3 will be also provided in this post.

Note: Before we start, I would like to thank my friend Alexander Kitaev, for taking time to review the post and providing me with useful feedback.

Multipoint GRE

Let us start with the most basic building component of DMVPN – multipoint GRE tunnel. Classic GRE tunnel is point-to-point, but mGRE generalizes this idea by allowing a tunnel to have “multiple” destinations.

GRE Tunnels

This may seem natural if the tunnel destination address is multicast (e.g. 239.1.1.1). The tunnel could be used to effectively distribute the same information (e.g. video stream) to multiple destinations on top of a multicast-enabled network. Actually, this is how mGRE is used for Multicast VPN implementation in Cisco IOS. However, if tunnel endpoints need to exchange unicast packets, special “glue” is needed to map tunnel IP addresses to “physical” or “real” IP addresses, used by endpoint routers. As we’ll see later, this glue is called NHRP.

mGRE Tunnel

Note, that if you source multiple mGRE tunnels off the same interface (e.g. Loopback0) of a single router, then GRE can use special “multiplexor” field the tunnel header to differentiate them. This field is known as “tunnel key” and you can define it under tunnel configuration. As a matter of fact, up to IOS 12.3(14)T or 12.3(11)T3 the use of “tunnel key” was mandatory – mGRE tunnel would not come up, until the key is configured. Since the mentioned versions, you may configure a tunnel without the key. There were two reasons to remove the requirement. First, hardware ASICs of 6500 and 7600 platforms do not support mGRE tunnel-key processing, and thus the optimal switching performance on those platforms is penalized when you configure the tunnel key. Second, as we’ll see later, DMVPN Phase 3 allows interoperation between different mGRE tunnels sharing the same NHRP network-id only when they have the same tunnel-key or have no tunnel-key at all (since this allows sending packets “between” tunnels).

Generic NHRP

Now let’s move to the component that makes DMVPN truly dynamic - NHRP. The protocol has been defined quite some time ago in RFC 2332 (year 1998) to create a routing optimization scheme inside NBMA (non-broadcast multiple-access) networks, such as ATM, Frame-Relay and SMDS (anybody remembers this one nowadays? :) The general idea was to use SVC (switched virtual circuits) to create temporary shortcuts in non-fully meshed NBMA cloud. Consider the following schematic illustration, where IP subnet 10.0.0.0/24 overlays partial-meshed NBMA cloud. NHRP is similar in function to ARP, allowing resolving L3 to L2 addresses, but does that in more efficient manner, suitable for partially meshed NBMA clouds supporting dynamic layer 2 connections.

NHRP Illustration

The following is simplified and schematic illustration of NHRP process. In the above topology, in order for R1 to reach R4, it must send packets over PVCs between R1-R2, R2-R3 and finally R3-R4. Suppose the NMBA cloud allows using SVC (Switched virtual circuits, dynamic paths) – then it would be more reasonable for R1 to establish SVC directly with R4 and send packets over the optimal way. However, this requires R1 to know NMBA address (e.g. ATM NSAP) associated with R4 to "place a call". Preferably, it would be better to make R1 learn R4 IP address to NSAP (NBMA address) mapping dynamically.

Now assume we enable NHRP on all NBMA interfaces in the network. Each router in topology acts as either NHC (Next-Hop Client) or NHS (Next-Hop Server). One of the functions of NHC is to register with NHS its IP address mapped to NBMA Layer 2 address (e.g. ATM NSAP address). To make registration possible, you configure each NHC with the IP address of at least one NHS. In turn, NHS acts as a database agent, storing all registered mappings, and replying to NHC queries. If NHS does not have a requested entry in its database, it can forward packet to another NHS to see if it has the requested association. Note that a router may act as a Next-Hop server and client at the same time. Back to the diagram, assume that R2 and R3 are NHSes, R1 and R4 are NHCs. Further, assume R4 is NHC and registers its IP to NBMA address mapping with R4 and R1 thinks R2 is the NHS. Both R2 and R3 treat each other as NHS. When R1 wants to send traffic to R4 (next-hop 10.0.0.4), it tries to resolve 10.0.0.4 by sending NHRP resolution request to R2 – the configured NHS. In turn, R2 will forward request to R3, since it has no local information.

Obviously, modern networks tend not to use ATM/SMDS and Frame-Relay SVC too much, but one can adopt NHRP to work with "simulated NBMA" networks, such as mGRE tunnels. The NBMA layer maps to “physical” underlying network while mGRE VPN is the “logical” network (tunnel internal IP addressing). In this case, mGRE uses NHRP for mapping “logical” or “tunnel inside” IP addresses to “physical” or real IP addresses. Effectively, NHRP perform the “glue” function described above, allowing mGRE endpoints discovering each other’s real IP address. Since NHRP defines a server role, it’s natural to have mGRE topology lay out in Hub-and-Spoke manner (or combination of hubs and spokes, in more advanced cases). Let’s see some particular scenarios to illustrate NHRP functionality with mGRE.

NHRP Phase 1

With NHRP Phase 1 mGRE uses NHRP to inform the hub about dynamically appearing spokes. Initially, you configure every spoke with the IP address of the hub as the NHS server. However, the spoke’s tunnel mode is GRE (regular point-to-point) tunnel with a fixed destination IP that equals to the physical address of the hub. The spokes can only reach the hub and only get to other spoke networks across the hub. The benefit of Phase 1 is simplified hub router configuration, which does not require static NHRP mapping for every new spoke.

As all packets go across the hub, almost any dynamic routing protocol would help with attaining reachability. The hub just needs to advertise a default route to spokes, while spokes should advertise their subnets dynamically to the hub. Probably it makes sense to run EIGRP and summarize all subnets to 0.0.0.0/0 on the hub, effectively sending a default route to all spokes (if the spokes do not use any other default route, e.g. from their ISPs). Configure spokes as EIGRP stubs and advertise their respective connected networks. RIP could be set up in similar manner, by simply configuring GRE tunnels on spokes as passive interfaces. Both EIGRP and RIP require split-horizon disabled on the hub mGRE interface in order to exchange subnets spoke to spoke. As for OSPF, the optimal choice would be using point-to-multipoint network type on all GRE and mGRE interfaces. In addition to that, configure ip ospf database filter-all out on the hub and set up static default routes via tunnel interfaces on the spokes (or static specific routes for corporate networks).

Here is a sample configuration. The detailed explanation of NHRP commands and “show” commands output follows the example.

mGRE + NHRP Phase 1 + EIGRP

R1:
!
! Hub router
!
router eigrp 123
no auto-summary
network 10.0.0.0 0.255.255.255
!
! Tunnel source
!
interface Loopback0
ip address 150.1.1.1 255.255.255.0
!
! VPN network
!
interface Loopback 1
ip address 10.0.1.1 255.255.255.0
!
! mGRE tunnel
!
interface Tunnel0
ip address 10.0.0.1 255.255.255.0
no ip redirects
ip nhrp authentication cisco
ip nhrp map multicast dynamic
ip nhrp network-id 123
no ip split-horizon eigrp 123
ip summary-address eigrp 123 0.0.0.0 0.0.0.0 5
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 123

R2:
!
! Spoke Router
!
router eigrp 123
no auto-summary
network 10.0.0.0 0.255.255.255
eigrp stub connected
!
interface Loopback0
ip address 150.1.2.2 255.255.255.0
!
interface Loopback 1
ip address 10.0.2.2 255.255.255.0
!
! GRE tunnel
!
interface Tunnel0
ip address 10.0.0.2 255.255.255.0
ip nhrp authentication cisco
ip nhrp map multicast 150.1.1.1
ip nhrp map 10.0.0.1 150.1.1.1
ip nhrp nhs 10.0.0.1
ip nhrp network-id 123
ip nhrp registration timeout 30
ip nhrp holdtime 60
tunnel source Loopback0
tunnel destination 150.1.1.1
tunnel key 123

R3:
!
! Spoke Router
!
router eigrp 123
no auto-summary
network 10.0.0.0 0.255.255.255
eigrp stub connected
!
interface Loopback0
ip address 150.1.3.3 255.255.255.0
!
interface Loopback 1
ip address 10.0.3.3 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.3 255.255.255.0
ip nhrp authentication cisco
ip nhrp map multicast 150.1.1.1
ip nhrp map 10.0.0.1 150.1.1.1
ip nhrp nhs 10.0.0.1
ip nhrp network-id 123
ip nhrp registration timeout 30
ip nhrp holdtime 60
tunnel source Loopback0
tunnel destination 150.1.1.1
tunnel key 123

Note that only the hub tunnel uses mGRE encapsulation, and spokes use regular point-to-point GRE tunnels. Now, let's look at the NHRP commands used in the example above. The most basic command ip nhrp map [Logical IP] [NBMA IP] - creates a static binding between a logical IP address and NBMA IP address. Since mGRE is treated by NHRP as NMBA medium, logical IP corresponds to the IP address “inside” a tunnel (“inner”) and the NBMA IP address corresponds to the “outer” IP address (the IP address used to source a tunnel). (From now on, we are going to call “inner” IP address and simply “IP address” or “logical IP address” and the “outer” IP address as “NBMA address” or “physical IP address”). The use of static NHRP mappings is to “bootstrap” information for the spokes to reach the logical IP address of the hub. The next command is ip nhrp map multicast dynamic|[StaticIP] and its purpose is the same as “frame-relay map… broadcast”. The command specifies the list of destination that will receive the multicast/broadcast traffic originated from this router. Spokes map multicasts to the static NBMA IP address of the hub, but hub maps multicast packets to the “dynamic” mappings – that is, the hub replicates multicast packets to all spokes registered via NHRP. Mapping multicasts is important in order to make dynamic routing protocol establish adjacencies and exchange update packets. The ip nhrp nhs [ServerIP] command configures NHRP client with the IP address of its NHRP server. Note the “ServerIP” is the logical IP address of the hub (inside the tunnel) and therefore spokes need the static NHRP mappings in order to reach it. The spokes use the NHS to register their logical IP to NBMA IP associations and send NHRP resolution request. (However, in this particular scenarios, the spokes will not send any NHRP Resolutions Requests, since they use directed GRE tunnels – only registration requests will be sent). The commands ip nhrp network-id and ip nhrp authentication [Key] identify and authenticate the logical NHRP network. The [ID] and the [Key] must match on all routers sharing the same GRE tunnel. It is possible to split an NBMA medium into multiple NHRP networks, but this is for advanced scenarios. As for the authentication, it’s a simple plain-text key sent in all NHRP messages. While the "network-id" is mandatory in order for NHRP to work, you may omit the authentication. Next command is ip nhrp holdtime that specifies the hold-time value set in NHRP registration requests. The NHS will keep the registration request cached for the duration of the hold-time, and then, if no registration update is receive, will time it out. The NHS will also send the same hold-time in NHRP resolution responses, if queried for the respective NHRP association. Note that you configure the ip nhrp holdtime command on spokes, and spoke will send registration requests every 1/3 of the hold-time seconds. However, if you also configure the ip nhrp registration timeout [Timeout] on a spoke, the NHRP registration requests will be sent every [Timeout] sends, not 1/3 of the configured hold-time. The hold-time value sent in NHRP Registration Requests will remain the same, of course.

Now let’s move to the show commands. Since it’s only the hub that uses the NHRP dynamic mappings to resolve the spokes NBMA addresses, it is useful to observe R1 NHRP cache:

Rack1R1#show ip nhrp detail
10.0.0.2/32 via 10.0.0.2, Tunnel0 created 00:16:59, expire 00:00:30
Type: dynamic, Flags: authoritative unique registered used
NBMA address: 150.1.2.2
10.0.0.3/32 via 10.0.0.3, Tunnel0 created 00:11:34, expire 00:00:55
Type: dynamic, Flags: authoritative unique registered used
NBMA address: 150.1.3.3

As we can see, the logical IP “10.0.0.2” maps to NBMA address “150.1.2.2” and the logical IP 10.0.0.3 maps to NBMA address 150.1.3.3. The “authoritative” flag means that the NHS has learned about the NHRP mapping directly from a registration request (the NHS “serves” the particular NHC). The “unique” flag means that the NHRP registration request had the same “unique” flag set. The use of this flag is to prevent duplicate NHRP mappings in cache. If unique mapping for a particular logical IP is already in the NHRP cache and another NHC tries to register the same logical IP with the NHS, the server will reject the registration, until the unique entry expires. Note that by default IOS routers set this flag in registration request, and this can be disabled by using ip nhrp registration no-unique command on a spoke. Sometimes this may be needed when spoke change its NBMA IP address often and needs to re-register a new mapping with the hub. The last flag, called “used” flag, means that the router uses the NHRP entry to switch IP packets. We will discuss the meaning of this flag in NRHP process switching section below. Also, note the “expires” field, which is a countdown timer, started from the “holdtime” specified in the Registration Request packet.

Let’s see the NHRP registration and reply process flows on the NHS.

Rack1R1#debug nhrp
NHRP protocol debugging is on

Rack1R1#debug nhrp packet
NHRP activity debugging is on

First, R3 tries to register its Logical IP to NBMA IP mapping with the hub. Note the specific NHRP packet format, split in three parts.

1) (F) – fixed part. Specifies the version, address family (afn) and protocol type (type) for resolution, as well as subnetwork layer (NBMA) type and length (shtl and sstl). Note that “shtl” equals 4, which is the length of IPv4 address in bytes, and “sstl” is for “subaddress” field which is not used with IPv4.

2) (M) – mandatory header part. Specifies some flags, like “unique” flag and the “Request ID”, which is used to track request/responses. Also includes are the source NBMA address (tunnel source in GRE/mGRE) and the source/destination protocol IP addresses. Destination IP address is the logical IP address of the hub and the source IP address is the logical IP address of the spoke. Using this information hub may populate the spoke logical IP address to NBMA IP address mapping.

3) (C-1) – CIE 1, which stands for “Client Information Element” field. While it’s not used in the packets below, in more advanced scenarios explored later, we'll see this filed containing the information about networks connected to requesting/responding routers.

Also note the NAT-check output, which is Cisco’s extension used to make NHRP work for routers that tunnel from behind the NAT.

NHRP: Receive Registration Request via Tunnel0 vrf 0, packet size: 81
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "unique", reqid: 26
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.1
(C-1) code: no error(0)
prefix: 255, mtu: 1514, hd_time: 60
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 1
NHRP: NAT-check: matched destination address 150.1.3.3
NHRP: Tu0: Found and skipping dynamic multicast mapping NBMA: 150.1.3.3
NHRP: Attempting to send packet via DEST 10.0.0.3
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.3.3

After processing the request, the router responds with NHRP Registration Reply. Note that the (M) header did not change, just the source and destination logical IP address of the packet are reversed. (R1->R3)

NHRP: Send Registration Reply via Tunnel0 vrf 0, packet size: 101
src: 10.0.0.1, dst: 10.0.0.3
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "unique", reqid: 26
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.1
(C-1) code: no error(0)
prefix: 255, mtu: 1514, hd_time: 60
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 101 bytes out Tunnel0

Now the NHS receives the Registration Request from R2, and adds the corresponding entry in its NHRP cache

NHRP: Receive Registration Request via Tunnel0 vrf 0, packet size: 81
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "unique", reqid: 38
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.1
(C-1) code: no error(0)
prefix: 255, mtu: 1514, hd_time: 60
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 1
NHRP: NAT-check: matched destination address 150.1.2.2
NHRP: Tu0: Found and skipping dynamic multicast mapping NBMA: 150.1.2.2
NHRP: Attempting to send packet via DEST 10.0.0.2
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.2.2

NHRP: Send Registration Reply via Tunnel0 vrf 0, packet size: 101
src: 10.0.0.1, dst: 10.0.0.2
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "unique", reqid: 38
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.1
(C-1) code: no error(0)
prefix: 255, mtu: 1514, hd_time: 60
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 101 bytes out Tunnel0

We see how NRHP Phase 1 works now. The spokes register their associations with the hub via NHRP and the hub learns their NBMA addresses dynamically. At the same time, spokes use point-to-point tunnels to speak to the hub and reach each other. Note that EIGRP is not the only protocol suitable for use with NHRP Phase 1. OSPF is also a viable solution, thank to point-to-multipoint network type and database filter-all out command. See the example below for OSPF configuration with NHRP Phase 1:

mGRE + NHRP Phase 1 + OSPF

R1:
!
! Hub router
!
router ospf 123
router-id 10.0.0.1
network 10.0.0.0 0.255.255.255 area 0
!
interface Loopback0
ip address 150.1.1.1 255.255.255.0
!
interface Loopback 1
ip address 10.0.1.1 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.1 255.255.255.0
no ip redirects
ip nhrp authentication cisco
ip nhrp map multicast dynamic
ip nhrp network-id 123
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 123
ip ospf network point-to-multipoint
ip ospf database-filter all out

R2:
!
! Spoke Router
!
router ospf 123
network 10.0.0.0 0.255.255.255 area 0
router-id 10.0.0.2
!
interface Loopback0
ip address 150.1.2.2 255.255.255.0
!
interface Loopback 1
ip address 10.0.2.2 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.2 255.255.255.0
ip nhrp authentication cisco
ip nhrp map multicast 150.1.1.1
ip nhrp map 10.0.0.1 150.1.1.1
ip nhrp nhs 10.0.0.1
ip nhrp network-id 123
ip nhrp registration timeout 30
ip nhrp holdtime 60
tunnel source Loopback0
tunnel destination 150.1.1.1
tunnel key 123
ip ospf network point-to-multipoint
!
ip route 0.0.0.0 0.0.0.0 Tunnel0

R3:
!
! Spoke Router
!
router ospf 123
network 10.0.0.0 0.255.255.255 area 0
router-id 10.0.0.3
!
interface Loopback0
ip address 150.1.3.3 255.255.255.0
!
interface Loopback 1
ip address 10.0.3.3 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.3 255.255.255.0
ip nhrp authentication cisco
ip nhrp map multicast 150.1.1.1
ip nhrp map 10.0.0.1 150.1.1.1
ip nhrp nhs 10.0.0.1
ip nhrp network-id 123
ip nhrp registration timeout 30
ip nhrp holdtime 60
tunnel source Loopback0
tunnel destination 150.1.1.1
tunnel key 123
ip ospf network point-to-multipoint
!
ip route 0.0.0.0 0.0.0.0 Tunnel0

As we said, the main benefit of using NHRP Phase 1 is simplified configuration on the hub router. Additionally, spoke routers receive minimal routing information (it’s either summarized or filtered on the hub) and are configured in uniform manner. In most simple case, spoke routers could be configured without any NHRP, by simply using point-to-point GRE tunnels. This scenario requires the hub to create a static NHRP mapping for every spoke. For example:

mGRE + NHRP Phase 1 + OSPF + Static NHRP mappings

R1:
!
! Hub router
!
router ospf 123
router-id 10.0.0.1
network 10.0.0.0 0.255.255.255 area 0
!
interface Loopback0
ip address 150.1.1.1 255.255.255.0
!
interface Loopback 1
ip address 10.0.1.1 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.1 255.255.255.0
no ip redirects
ip nhrp authentication cisco
ip nhrp map 10.0.0.2 150.1.2.2
ip nhrp map 10.0.0.3 150.1.3.3
ip nhrp map multicast 150.1.2.2
ip nhrp map multicast 150.1.3.3
ip nhrp network-id 123
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 123
ip ospf network point-to-multipoint
ip ospf database-filter all out

R2:
!
! Spoke Router
!
router ospf 123
network 10.0.0.0 0.255.255.255 area 0
router-id 10.0.0.2
!
interface Loopback0
ip address 150.1.2.2 255.255.255.0
!
interface Loopback 1
ip address 10.0.2.2 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.2 255.255.255.0
tunnel source Loopback0
tunnel destination 150.1.1.1
tunnel key 123
ip ospf network point-to-multipoint
!
ip route 0.0.0.0 0.0.0.0 Tunnel0

R3:
!
! Spoke Router
!
router ospf 123
network 10.0.0.0 0.255.255.255 area 0
router-id 10.0.0.3
!
interface Loopback0
ip address 150.1.3.3 255.255.255.0
!
interface Loopback 1
ip address 10.0.3.3 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.3 255.255.255.0
tunnel source Loopback0
tunnel destination 150.1.1.1
tunnel key 123
ip ospf network point-to-multipoint
!
ip route 0.0.0.0 0.0.0.0 Tunnel0

The disadvantage of NHRP Phase 1 is the inability to establish spoke-to-spoke shortcut tunnels. NHRP Phase 2 resolves this issue and allows for spoke-to-spoke tunnels. To better understand the second phase, we first need to find out how NHRP interacts with CEF – the now default IP switching method on most Cisco routers. Consider the topology and example configuration that follows. See the detailed breakdown after the configuration.

mGRE + NHRP Phase 2 + EIGRP

DMPVN Phase 2

R1:
!
! Hub router
!
router eigrp 123
no auto-summary
network 10.0.0.0 0.255.255.255
!
interface Loopback0
ip address 150.1.1.1 255.255.255.0
!
interface Loopback 1
ip address 10.0.1.1 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.1 255.255.255.0
no ip redirects
ip nhrp authentication cisco
ip nhrp map multicast dynamic
ip nhrp network-id 123
no ip split-horizon eigrp 123
no ip next-hop-self eigrp 123
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 123

R2:
!
! Spoke Router
!
router eigrp 123
no auto-summary
network 10.0.0.0 0.255.255.255
eigrp stub connected
!
interface Loopback0
ip address 150.1.2.2 255.255.255.0
!
interface Loopback 1
ip address 10.0.2.2 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.2 255.255.255.0
ip nhrp authentication cisco
ip nhrp map multicast 150.1.1.1
ip nhrp map 10.0.0.1 150.1.1.1
ip nhrp nhs 10.0.0.1
ip nhrp network-id 123
ip nhrp registration timeout 30
ip nhrp holdtime 60
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 123

R3:
!
! Spoke Router
!
router eigrp 123
no auto-summary
network 10.0.0.0 0.255.255.255
eigrp stub connected
!
interface Loopback0
ip address 150.1.3.3 255.255.255.0
!
interface Loopback 1
ip address 10.0.3.3 255.255.255.0
!
interface Tunnel0
ip address 10.0.0.3 255.255.255.0
ip nhrp authentication cisco
ip nhrp map multicast 150.1.1.1
ip nhrp map 10.0.0.1 150.1.1.1
ip nhrp nhs 10.0.0.1
ip nhrp network-id 123
ip nhrp registration timeout 30
ip nhrp holdtime 60
tunnel source Loopback0
tunnel mode gre multipoint
tunnel key 123

Note that both spokes use mGRE tunnel encapsulation mode, and the hub sets the originating router next-hop IP address in “reflected” EIGRP updates (by default EIGRP sets the next-hop field to “0.0.0.0” - that is, to self). By the virtue of the EIGRP configuration, the subnet “10.0.2.0/24” (attached to R2) reaches to R3 with the next-hop IP address of “10.0.0.2” (R2). It is important that R3 learns “10.0.2.0/24” with the next hop of R2 logical IP address. As we see later, this is the key to trigger CEF next-hop resolution. The mGRE encapsulation used on spokes will trigger NHRP resolutions since now this is NBMA medium. Now, assuming that traffic to 10.0.2.0/24 does not flow yet, check the routing table entry for 10.0.2.2 and the CEF entries for the route and its next-hop:

Rack1R3#show ip route 10.0.2.2
Routing entry for 10.0.2.0/24
Known via "eigrp 123", distance 90, metric 310172416, type internal
Redistributing via eigrp 123
Last update from 10.0.0.2 on Tunnel0, 00:09:55 ago
Routing Descriptor Blocks:
* 10.0.0.2, from 10.0.0.1, 00:09:55 ago, via Tunnel0
Route metric is 310172416, traffic share count is 1
Total delay is 1005000 microseconds, minimum bandwidth is 9 Kbit
Reliability 255/255, minimum MTU 1472 bytes
Loading 1/255, Hops 2

Rack1R3#show ip cef 10.0.2.2
10.0.2.0/24, version 48, epoch 0
0 packets, 0 bytes
via 10.0.0.2, Tunnel0, 0 dependencies
next hop 10.0.0.2, Tunnel0
invalid adjacency

Rack1R3#show ip cef 10.0.0.2
10.0.0.0/24, version 50, epoch 0, attached, connected
0 packets, 0 bytes
via Tunnel0, 0 dependencies
valid glean adjacency

Note that CEF prefix for “10.0.2.0/24” is invalid (but not “glean”), since “10.0.0.2” has not yet been resolved. The CEF prefix for “10.0.0.2” has “glean” adjacency, which means the router needs to send an NHRP resolution request to map the logical IP to NBMA address. Therefore, with CEF switching, NHRP resolution requests are only sent for “next-hop” IP addresses, and never for the networks (e.g. 10.0.2.0/24) themselves (the process-switching does resolve any prefix as we’ll see later). Go ahead and ping from R3 to “10.0.3.3” and observe the process:

Rack1R3#ping 10.0.2.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/80/180 ms

Check the mappings on the hub router. The only two entries registered are the VPN IP addresses of R2 and R3, together with the respective NBMA IP addresses. Note the “expire” field, which, as mentioned above, counts the time for the entry to expire based on the “holdtime” settings of the registering router’s interface. Later we will see how CEF uses this countdown timer to refresh or delete CEF entries for the next-hop IP address

Rack1R1#show ip nhrp
10.0.0.2/32 via 10.0.0.2, Tunnel0 created 00:16:33, expire 00:00:43
Type: dynamic, Flags: authoritative unique registered
NBMA address: 150.1.2.2
Requester: 10.0.0.3 Request ID: 798
10.0.0.3/32 via 10.0.0.3, Tunnel0 created 00:16:34, expire 00:00:51
Type: dynamic, Flags: authoritative unique registered
NBMA address: 150.1.3.3
Requester: 10.0.0.2 Request ID: 813

Check the mappings on R2 (note that R2 now has mapping for R3’s
next-hop associated with its NBMA IP address)

Rack1R2#show ip nhrp
10.0.0.1/32 via 10.0.0.1, Tunnel0 created 00:14:52, never expire
Type: static, Flags: authoritative used
NBMA address: 150.1.1.1
10.0.0.2/32 via 10.0.0.2, Tunnel0 created 00:05:49, expire 00:00:10
Type: dynamic, Flags: router authoritative unique local
NBMA address: 150.1.2.2
(no-socket)
10.0.0.3/32 via 10.0.0.3, Tunnel0 created 00:00:30, expire 00:00:29
Type: dynamic, Flags: router used
NBMA address: 150.1.3.3

The same command output on R3 is symmetric to the output on R2:

Rack1R3#show ip nhrp
10.0.0.1/32 via 10.0.0.1, Tunnel0 created 00:14:00, never expire
Type: static, Flags: authoritative used
NBMA address: 150.1.1.1
10.0.0.2/32 via 10.0.0.2, Tunnel0 created 00:00:05, expire 00:00:54
Type: dynamic, Flags: router
NBMA address: 150.1.2.2
10.0.0.3/32 via 10.0.0.3, Tunnel0 created 00:01:46, expire 00:00:13
Type: dynamic, Flags: router authoritative unique local
NBMA address: 150.1.3.3
(no-socket)

Now check the CEF entry for R2’s next-hop IP address on R3:

Rack1R3#sh ip cef 10.0.0.2
10.0.0.2/32, version 65, epoch 0, connected
0 packets, 0 bytes
via 10.0.0.2, Tunnel0, 0 dependencies
next hop 10.0.0.2, Tunnel0
valid adjacency

The CEF entry for “10.0.0.2” is now valid, since NHRP mapping entry is present. If the next-hop for the prefix “10.0.2.0/24” was pointing toward the hub (R1) (e.g. if the hub was using the default ip next-hop-self eigrp 123) then the NHRP lookup will not be triggered, and cut-through NHRP entry will not be installed. Let’s see the debugging command output on R1, R2 and R3 to observe how the routers collectively resolve the next-hop IP addresses when R3 pings R1:

Rack1R1#debug nhrp
NHRP protocol debugging is on
Rack1R1#debug nhrp packet
NHRP activity debugging is on

Rack1R2#debug nhrp
NHRP protocol debugging is on
Rack1R2#debug nhrp packet
NHRP activity debugging is on

Rack1R3#debug nhrp
NHRP protocol debugging is on
Rack1R3#debug nhrp packet
NHRP activity debugging is on

It all starts when R3 tries to route a packet to “10.0.2.2” and finds out it has “glean” adjacency for its next-hop of “10.0.0.2”. Then, R3 attempt to send NHRP resolution request directly to R2, but fails since R2 NMBA address is unknown. At the same time, the original data packet (ICMP echo) follows to R2 across the hub (R1).

Rack1R3#
NHRP: MACADDR: if_in null netid-in 0 if_out Tunnel0 netid-out 123
NHRP: Checking for delayed event 0.0.0.0/10.0.0.2 on list (Tunnel0).
NHRP: No node found.
NHRP: Sending packet to NHS 10.0.0.1 on Tunnel0
NHRP: Checking for delayed event 0.0.0.0/10.0.0.2 on list (Tunnel0).
NHRP: No node found.
NHRP: Attempting to send packet via DEST 10.0.0.2
NHRP: Send Resolution Request via Tunnel0 vrf 0, packet size: 81
src: 10.0.0.3, dst: 10.0.0.2
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 994
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: Encapsulation failed for destination 10.0.0.2 out Tunnel0

Next, R3 tries to send resolution request to the NHS, which is R1. The resolution request contains information about source NBMA address of R3, and source protocol (logical IP) addresses of R3 and R2.

Rack1R3#
NHRP: Attempting to send packet via NHS 10.0.0.1
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.1.1
NHRP: Send Resolution Request via Tunnel0 vrf 0, packet size: 81
src: 10.0.0.3, dst: 10.0.0.1
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 994
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 81 bytes out Tunnel0

The Resolution Request from R3 arrives to NHS. In essence, R3 tries to resolve the “glean” CEF adjacency using NHRP the same way it uses ARP on Ethernet. Note that request only mentions logical IP addresses of R3 (“10.0.0.3”) and R2 (“10.0.0.2”) and NBMA address of R3.

Rack1R1#
NHRP: Receive Resolution Request via Tunnel0 vrf 0, packet size: 81
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 994
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 0
NHRP: NAT-check: matched destination address 150.1.3.3
NHRP: nhrp_rtlookup yielded Tunnel0
NHRP: Tu0: Found and skipping dynamic multicast mapping NBMA: 150.1.3.3
NHRP: netid_out 123, netid_in 123
NHRP: nhrp_cache_lookup_comp returned 0x855C7B90
NHRP: Attempting to send packet via DEST 10.0.0.3
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.3.3

The NHS has the NHRP mapping for “10.0.0.2” in its NHRP cache - R2 registered this associating with R1. The NHS may immediately reply to the client. Note the “(C-1)” - CIE header in the NHRP reply packet. While the “(M)” (mandatory) header contains the same information received in request packet from R3, the CIE header contains the actual NHRP reply, with the mapping information for R2. This is because the NHS considers R2 to be the “client” of it, and therefore it sends the actual information in CIE header. Note the “prefix” length of 32 - this means the reply is just for one host logical IP address.

Rack1R1#
NHRP: Send Resolution Reply via Tunnel0 vrf 0, packet size: 109
src: 10.0.0.1, dst: 10.0.0.3
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 994
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 32, mtu: 1514, hd_time: 342
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.2.2
client protocol: 10.0.0.2
NHRP: 109 bytes out Tunnel0

At this point, R2 receives the original data packet from R3 (ICMP echo) and tries to send a response back. The problem is that the destination IP address for the echo reply is “10.0.3.3” and the next-hop is “10.0.0.3”, which has “glean” CEF adjacency. Again, R2 replies back across the hub and send a Resolution Request packet: first, directly R3 – this attempt fails - then it sends the resolution request to the NHS.

Rack1R2#
NHRP: MACADDR: if_in null netid-in 0 if_out Tunnel0 netid-out 123
NHRP: Checking for delayed event 0.0.0.0/10.0.0.3 on list (Tunnel0).
NHRP: No node found.
NHRP: Sending packet to NHS 10.0.0.1 on Tunnel0
NHRP: Checking for delayed event 0.0.0.0/10.0.0.3 on list (Tunnel0).
NHRP: No node found.
NHRP: Attempting to send packet via DEST 10.0.0.3
NHRP: Send Resolution Request via Tunnel0 vrf 0, packet size: 81
src: 10.0.0.2, dst: 10.0.0.3
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 1012
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: Encapsulation failed for destination 10.0.0.3 out Tunnel0
NHRP: Attempting to send packet via NHS 10.0.0.1
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.1.1

Rack1R2#
NHRP: Send Resolution Request via Tunnel0 vrf 0, packet size: 81
src: 10.0.0.2, dst: 10.0.0.1
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 1012
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 81 bytes out Tunnel0
NHRP: MACADDR: if_in null netid-in 0 if_out Tunnel0 netid-out 123
NHRP: Checking for delayed event 0.0.0.0/10.0.0.3 on list (Tunnel0).
NHRP: No node found.
NHRP: Sending packet to NHS 10.0.0.1 on Tunnel0

R3 finally receive the Resolution Reply from the NHS, and now it may complete the CEF adjacency for “10.0.0.2”. Since that moment, it switches all packets to “10.0.2.2” directly via R2, not across R1.

Rack1R3#
NHRP: Receive Resolution Reply via Tunnel0 vrf 0, packet size: 109
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 994
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 32, mtu: 1514, hd_time: 342
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.2.2
client protocol: 10.0.0.2
NHRP: netid_in = 0, to_us = 1
NHRP: Checking for delayed event 150.1.2.2/10.0.0.2 on list (Tunnel0).
NHRP: No node found.
NHRP: No need to delay processing of resolution event nbma src:150.1.3.3 nbma dst:150.1.2.2

The resolution request that R2 sent before in attempted to resolve the NBMA address for “10.0.0.3” arrives to R1. Since the NHS has all the information in its local cache (R3 registered its IP to NBMA address mapping) it immediately replies to R2. Note the CIE header in the NHRP reply packet, which contains the actual mapping information.

Rack1R1#
NHRP: Receive Resolution Request via Tunnel0 vrf 0, packet size: 81
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 1012
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 0
NHRP: NAT-check: matched destination address 150.1.2.2
NHRP: nhrp_rtlookup yielded Tunnel0
NHRP: Tu0: Found and skipping dynamic multicast mapping NBMA: 150.1.2.2
NHRP: netid_out 123, netid_in 123
NHRP: nhrp_cache_lookup_comp returned 0x848EF9E8
NHRP: Attempting to send packet via DEST 10.0.0.2
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.2.2

Rack1R1#
NHRP: Send Resolution Reply via Tunnel0 vrf 0, packet size: 109
src: 10.0.0.1, dst: 10.0.0.2
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 1012
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.3
(C-1) code: no error(0)
prefix: 32, mtu: 1514, hd_time: 242
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.3.3
bclient protocol: 10.0.0.3
NHRP: 109 bytes out Tunnel0

At last, R2 receive the reply to its original request, and now it has all the information to complete the CEF entry for “10.0.0.3” and switch packets across the optimal path to R3. At this moment both spokes have symmetric information to reach each other

Rack1R2#
NHRP: Receive Resolution Reply via Tunnel0 vrf 0, packet size: 109
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 1012
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.0.3
(C-1) code: no error(0)
prefix: 32, mtu: 1514, hd_time: 242
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.3.3
client protocol: 10.0.0.3
NHRP: netid_in = 0, to_us = 1
NHRP: Checking for delayed event 150.1.3.3/10.0.0.3 on list (Tunnel0).
NHRP: No node found.
NHRP: No need to delay processing of resolution event nbma src:150.1.2.2 nbma dst:150.1.3.3

Timing out NHRP entries

Now that we know that CEF resolves the next-hop information via NHRP, how does it time-out the unused cut-through tunnel? As we remember, each NHRP entry has countdown expire timer, initialized from the registration hold-time. Every 60 seconds global NHRP process runs on a router and checks the expire timer on all NHRP entries. If the expire timer for an NHRP entry is greater than 120 seconds, nothing is done to the corresponding CEF entry. If the timer is less than 120 seconds, the NHRP process marks the corresponding CEF entry as “stale” but still usable. As soon as the router switches an IP packet using the “stale” entry, it triggers new NHRP resolution request, and eventually refreshes the corresponding NHRP entry as well as CEF entry itself. If no packet hits the “stale” CEF entry, the NHRP mapping will eventually time-out (since the router does not send any “refreshing” requests) and the corresponding CEF entry will become invalid. This will effectively tear down the spoke-to-spoke tunnel.

NHRP Phase 2 Conclusions

Let us quickly recap what we learned so far about NHRP Phase 2 and CEF. Firstly, this mode requires all the spokes to have complete routing information with the next-hop preserved. This may limit scalability in large networks, since not all spokes may accept full load of routing updates. Secondly, CEF only resolve the next-hop information via NHRP, not the full routing prefixes. Actually, the second feature directly implies the first limitation. As we noted, the no ip next-hop-self eigrp 123 command is required to make spoke-to-spoke tunnels work with CEF. However, they added the command only in IOS version 12.3. Is there a way to make spoke-to-spoke tunnels work when the next-hop is set to “self” (the default) in EIGRP updates? Actually, there are few ways. First and the best one – do not use old IOS images to implement DMVPN :) Actually, it is better to use the latest 12.4T train images with DMVPN Phase 3 for the deployment - but then again those images are from the “T”-train! OK, so the other option is get rid of EIGRP and use OSPF, with the network type “broadcast”. OSPF is a link-state protocol - it does not hide topology information and does not mask the next-hop in any way (well, at least when the network-type is “broadcast”). However, the limitation is that the corresponding OSPF topology may have just two redundant hubs – corresponding to OSPF DR and BDR for a segment. This is because every hub must form OSPF adjacencies with all spokes. Such limitation is not acceptable in large installations, but still works fine in smaller deployments. However, there is one final workaround, which is probably the one you may want to use in the current CCIE lab exam – disable CEF on spokes. This is a very interesting case per se, and we are going to see now NHRP works with process switching.

NHRP Phase 2 + EIGRP next-hop-self + no CEF

In this scenario, EIGRP next-hop self is enabled on R1 (the hub). Now R3 sees 10.0.2.0/24 with the next hop of R1. Disable CEF on R2 and R3, and try pinging 10.0.2.2 off R3 loopback1 interface.

R3 sees the route behind R2 as reachable via R1

Rack1R3#show ip route 10.0.2.2
Routing entry for 10.0.2.0/24
Known via "eigrp 123", distance 90, metric 310172416, type internal
Redistributing via eigrp 123
Last update from 10.0.0.1 on Tunnel0, 00:09:55 ago
Routing Descriptor Blocks:
* 10.0.0.1, from 10.0.0.1, 00:09:55 ago, via Tunnel0
Route metric is 310172416, traffic share count is 1
Total delay is 1005000 microseconds, minimum bandwidth is 9 Kbit
Reliability 255/255, minimum MTU 1472 bytes
Loading 1/255, Hops 2

R3 pings “10.0.2.2”, sourcing packet off “10.0.3.3”. Since CEF is disabled, the system performs NHRP lookup to find the NBMA address for “10.0.2.2”. This is opposed to CEF behavior that would only resolve the next-hop for "10.0.2.2" entry. Naturally, the router forwards NHRP request to R3’s NHS, which is R1. At the same time, R3 forwards the data packet (ICMP echo) via its current next-hop – “10.0.0.1”, that is via the hub.

Rack1R3#
NHRP: MACADDR: if_in null netid-in 0 if_out Tunnel0 netid-out 123
NHRP: Checking for delayed event 0.0.0.0/10.0.2.2 on list (Tunnel0).
NHRP: No node found.
NHRP: Sending packet to NHS 10.0.0.1 on Tunnel0
NHRP: Checking for delayed event 0.0.0.0/10.0.2.2 on list (Tunnel0).
NHRP: No node found.
NHRP: Attempting to send packet via DEST 10.0.2.2
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.1.1
NHRP: Send Resolution Request via Tunnel0 vrf 0, packet size: 81
src: 10.0.0.3, dst: 10.0.2.2
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 900
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.2.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 81 bytes out Tunnel0
NHRP: MACADDR: if_in null netid-in 0 if_out Tunnel0 netid-out 123
NHRP: Checking for delayed event 0.0.0.0/10.0.2.2 on list (Tunnel0).
NHRP: No node found.
NHRP: Sending packet to NHS 10.0.0.1 on Tunnel0

Resolution Request arrives to R1 (the NHS). Since R1 has no mapping for “10.0.2.2” (R2 only registers the IP address 10.0.0.2 – its own next-hop IP address), the NHS looks up into routing table, to find the next-hop towards 10.0.2.2. Since it happens to be R2’s IP “10.0.0.2”, the NHS then tries to forward the resolution request towards the next router on the path to the network requested in resolution message – to R2. Thanks to R2’s NHRP registration with R1, the NHS now knows R2’s NBMA address, and successfully encapsulates the packet. In addition, R1 forwards the data packet from R1 to R2, using its routing table. Obviously, the data packet will arrive to R2 a little bit faster, since NHRP requires more time to process and forward the request.

Rack1R1#
NHRP: Receive Resolution Request via Tunnel0 vrf 0, packet size: 81
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 900
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.2.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 0
NHRP: NAT-check: matched destination address 150.1.3.3
NHRP: nhrp_rtlookup yielded Tunnel0
NHRP: Tu0: Found and skipping dynamic multicast mapping NBMA: 150.1.3.3
NHRP: netid_out 123, netid_in 123
NHRP: nhrp_cache_lookup_comp returned 0x0
NHRP: Attempting to send packet via DEST 10.0.2.2
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.2.2

NHRP: Forwarding Resolution Request via Tunnel0 vrf 0, packet size: 101
src: 10.0.0.1, dst: 10.0.2.2
(F) afn: IPv4(1), type: IP(800), hop: 254, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 900
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.2.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 101 bytes out Tunnel0

Now the data packet (ICMP echo) has arrived to R2. R2 generates the response (ICMP - echo reply from “10.0.2.2” to “10.0.3.3”) and now R2 needs the NMBA address of “10.0.3.3” (CEF is disabled on R2). As usual, R2 generates a resolutions request to its NHS (R1). At the same time, R2 sends the response packet to R3’s request across the hub, since it does not know the NBMA address of R3.

Rack1R2#
NHRP: Send Resolution Request via Tunnel0 vrf 0, packet size: 81
src: 10.0.0.2, dst: 10.0.3.3
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 919
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.3.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 81 bytes out Tunnel0

Soon after the data packet arrived, R2 receives the Resolution Request from R3 forwarded by R1. Since R2 is the egress router on NBMA segment for the network “10.0.2.2”, it may reply to the request.

Rack1R2#
NHRP: Receive Resolution Request via Tunnel0 vrf 0, packet size: 101
(F) afn: IPv4(1), type: IP(800), hop: 254, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 900
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.2.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 0
NHRP: nhrp_rtlookup yielded Loopback1
NHRP: netid_out 0, netid_in 123
NHRP: We are egress router for target 10.0.2.2, recevied via Tunnel0
NHRP: Redist mask now 1
NHRP: Attempting to send packet via DEST 10.0.0.3
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.3.3

Note that R2 replies with the full prefix found in its routing table – “10.0.2.0/24”, not just single host “10.0.2.2/32” (this feature is critical for DMVPN Phase 3). This information is encapsulated inside “(C-1)” part of the NHRP reply packet (Client Information Element 1) which describes a client – network connected to the router (R2). The “prefix” field is “/24” which is exactly the value taken from the routing table.

Also note, that R2 learned R3’s NBMA address from the Resolution Request, and now replies directly to R3, bypassing R1. The “stable” flag means that the querying/replying router directly knows the source or destination IP address in the resolution request/reply.

Rack1R2#
NHRP: Send Resolution Reply via Tunnel0 vrf 0, packet size: 129
src: 10.0.0.2, dst: 10.0.0.3 <-- NBMA addresses of R2/R3
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 900
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.2.2
(C-1) code: no error(0)
prefix: 24, mtu: 1514, hd_time: 360
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.2.2
client protocol: 10.0.2.2
NHRP: 129 bytes out Tunnel0

At this moment, Resolution Request from R2 for network "10.0.3.3" reaches R1 – the NHS. Since the NHS has no information on "10.0.3.3", it forwards the request to R3 – the next-hop found via the routing table on path to "10.0.3.3".

Rack1R1#
NHRP: Receive Resolution Request via Tunnel0 vrf 0, packet size: 81
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 919
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.3.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 0
NHRP: NAT-check: matched destination address 150.1.2.2
NHRP: nhrp_rtlookup yielded Tunnel0
NHRP: Tu0: Found and skipping dynamic multicast mapping NBMA: 150.1.2.2
NHRP: netid_out 123, netid_in 123
NHRP: nhrp_cache_lookup_comp returned 0x0
NHRP: Attempting to send packet via DEST 10.0.3.3
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.3.3

NHRP: Forwarding Resolution Request via Tunnel0 vrf 0, packet size: 101
src: 10.0.0.1, dst: 10.0.3.3
(F) afn: IPv4(1), type: IP(800), hop: 254, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 919
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.3.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: 101 bytes out Tunnel0

Back to R3. At this point, it received the ICMP reply for the original ICMP echo packet. Now R3 receives the NHRP Resolution Reply to its original Resolution Request directly from R2. This allows R3 to learn that “10.0.2.0/24” is reachable via NMBA IP address “150.1.2.2”. Note that CIE field “(C-1)” in the reply packet, which tells R3 about the whole “10.0.2.0/24” network – the “prefix” is set to “24”.

Rack1R3#
NHRP: Receive Resolution Reply via Tunnel0 vrf 0, packet size: 129
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 900
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.2.2
(C-1) code: no error(0)
prefix: 24, mtu: 1514, hd_time: 360
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.2.2
client protocol: 10.0.2.2
NHRP: netid_in = 0, to_us = 1
NHRP: NAT-check: matched destination address 150.1.2.2
NHRP: Checking for delayed event 150.1.2.2/10.0.2.2 on list (Tunnel0).
NHRP: No node found.
NHRP: No need to delay processing of resolution event nbma src:150.1.3.3 nbma dst:150.1.2.2
NHRP: Checking for delayed event 0.0.0.0/10.0.2.2 on list (Tunnel0).
NHRP: No node found.

Finally, the Resolution Request from R2, forwarded by R1 (the NHS) arrives to R3. The local router performs lookup for 10.0.3.3 and finds this to be directly connected network, with the prefix of /24. Therefore, R3 generates a Resolution Reply packet and sends it directly to R2, bypassing R1. This packet tells R2 to map logical IP “10.0.3.0/24” to NBMA address “150.1.3.3”.

Rack1R3#
NHRP: Receive Resolution Request via Tunnel0 vrf 0, packet size: 101
(F) afn: IPv4(1), type: IP(800), hop: 254, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth src-stable", reqid: 919
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.3.3
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 360
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 0, pref: 0
NHRP: netid_in = 123, to_us = 0
NHRP: nhrp_rtlookup yielded Loopback1
NHRP: netid_out 0, netid_in 123
NHRP: We are egress router for target 10.0.3.3, recevied via Tunnel0
NHRP: Redist mask now 1
NHRP: Attempting to send packet via DEST 10.0.0.2
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.2.2

NHRP: Send Resolution Reply via Tunnel0 vrf 0, packet size: 129
src: 10.0.0.3, dst: 10.0.0.2
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 919
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.3.3
(C-1) code: no error(0)
prefix: 24, mtu: 1514, hd_time: 360
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.3.3
client protocol: 10.0.3.3
NHRP: 129 bytes out Tunnel0

At last, R2 receives the response to its Resolution Request, and everything is stable now. R2 and R3 know how to reach “10.0.3.0/24” and “10.0.2.0/24” respectively.

Rack1R2#
NHRP: Receive Resolution Reply via Tunnel0 vrf 0, packet size: 129
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "router auth dst-stable unique src-stable", reqid: 919
src NBMA: 150.1.2.2
src protocol: 10.0.0.2, dst protocol: 10.0.3.3
(C-1) code: no error(0)
prefix: 24, mtu: 1514, hd_time: 360
addr_len: 4(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client NBMA: 150.1.3.3
client protocol: 10.0.3.3
NHRP: netid_in = 0, to_us = 1
NHRP: NAT-check: matched destination address 150.1.3.3
NHRP: Checking for delayed event 150.1.3.3/10.0.3.3 on list (Tunnel0).
NHRP: No node found.
NHRP: No need to delay processing of resolution event nbma src:150.1.2.2 nbma dst:150.1.3.3
NHRP: Checking for delayed event 0.0.0.0/10.0.3.3 on list (Tunnel0).
NHRP: No node found.

Now let’s look at NHRP caches of all three routers:

Rack1R1#show ip nhrp
10.0.0.2/32 via 10.0.0.2, Tunnel0 created 01:00:47, expire 00:04:02
Type: dynamic, Flags: authoritative unique registered
NBMA address: 150.1.2.2
10.0.0.3/32 via 10.0.0.3, Tunnel0 created 01:00:47, expire 00:04:23
Type: dynamic, Flags: authoritative unique registered
NBMA address: 150.1.3.3

Rack1R2#show ip nhrp
10.0.0.1/32 via 10.0.0.1, Tunnel0 created 01:56:30, never expire
Type: static, Flags: authoritative used
NBMA address: 150.1.1.1
10.0.0.3/32 via 10.0.0.3, Tunnel0 created 00:00:24, expire 00:05:35
Type: dynamic, Flags: router implicit
NBMA address: 150.1.3.3
10.0.2.0/24 via 10.0.2.2, Tunnel0 created 00:00:24, expire 00:05:35
Type: dynamic, Flags: router authoritative unique local
NBMA address: 150.1.2.2
(no-socket)
10.0.3.0/24 via 10.0.3.3, Tunnel0 created 00:00:24, expire 00:05:35
Type: dynamic, Flags: router
NBMA address: 150.1.3.3

Rack1R3#show ip nhrp
10.0.0.1/32 via 10.0.0.1, Tunnel0 created 01:56:00, never expire
Type: static, Flags: authoritative used
NBMA address: 150.1.1.1
10.0.0.2/32 via 10.0.0.2, Tunnel0 created 00:00:02, expire 00:05:57
Type: dynamic, Flags: router implicit used
NBMA address: 150.1.2.2
10.0.2.0/24 via 10.0.2.2, Tunnel0 created 00:00:02, expire 00:05:57
Type: dynamic, Flags: router used
NBMA address: 150.1.2.2
10.0.3.0/24 via 10.0.3.3, Tunnel0 created 00:00:02, expire 00:05:57
Type: dynamic, Flags: router authoritative unique local
NBMA address: 150.1.3.3
(no-socket)

The “implicit” flag means that the router learned mapping without explicit request, as a part of other router’s reply or request. The “router” flag means that the mapping is either for the remote router or for a network behind the router. The “(no-socket)” flag means that the local router will not use this entry and trigger IPSec socket creation. The “local” flag means the mapping is for the network directly connected to the local router. The router uses those mappings when it loses connection to the local network, so that the NHC may send a purge request to all other clients, telling that the network has gone and they must remove their mappings.

Here is an example. Ensure R3 has the above-mentioned mappings, and then shut down the Loopback1 interface, observing the debugging command output on R3 and R2. R3 sends purge request directly to R2, since it knows R2 requested that mapping.

Rack1R3#
NHRP: Redist callback: 10.0.3.0/24
NHRP: Invalidating map tables for prefix 10.0.3.0/24 via Tunnel0
NHRP: Checking for delayed event 150.1.3.3/10.0.3.3 on list (Tunnel0).
NHRP: No node found.
NHRP: Attempting to send packet via DEST 10.0.0.2
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.2.2
NHRP: Send Purge Request via Tunnel0 vrf 0, packet size: 73
src: 10.0.0.3, dst: 10.0.0.2
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "reply required", reqid: 36
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 0
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client protocol: 10.0.3.3
NHRP: 73 bytes out Tunnel0

R2 receives Purge Request from R3. Note that the “reply required” flag is set. Hence, R2 must confirm that it deleted the mapping with a Purge Reply packet. R2 will erase the corresponding mapping learned via “10.0.0.3” and generate a response packet

Rack1R2#
NHRP: Receive Purge Request via Tunnel0 vrf 0, packet size: 73
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "reply required", reqid: 36
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 0
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client protocol: 10.0.3.3
NHRP: netid_in = 123, to_us = 1
NHRP: Attempting to send packet via DEST 10.0.0.3
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.3.3

R2 first tries to send the Purge Reply to R3 directly, using the NBMA address of R3. Note that CIE header mentions the network erased from the local mappings list

Rack1R2#
NHRP: Send Purge Reply via Tunnel0 vrf 0, packet size: 73
src: 10.0.0.2, dst: 10.0.0.3
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "reply required", reqid: 36
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 0
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client protocol: 10.0.3.3
NHRP: 73 bytes out Tunnel0
NHRP: Invalidating map tables for prefix 10.0.3.0/24 via Tunnel0
NHRP: Attempting to send packet via DEST 10.0.0.1
NHRP: Encapsulation succeeded. Tunnel IP addr 150.1.1.1

R3 receives the reply to its purge request and now it knows that R2 is consistent.

Rack1R3#
NHRP: Receive Purge Reply via Tunnel0 vrf 0, packet size: 73
(F) afn: IPv4(1), type: IP(800), hop: 255, ver: 1
shtl: 4(NSAP), sstl: 0(NSAP)
(M) flags: "reply required", reqid: 36
src NBMA: 150.1.3.3
src protocol: 10.0.0.3, dst protocol: 10.0.0.2
(C-1) code: no error(0)
prefix: 0, mtu: 1514, hd_time: 0
addr_len: 0(NSAP), subaddr_len: 0(NSAP), proto_len: 4, pref: 0
client protocol: 10.0.3.3
NHRP: netid_in = 0, to_us = 1

Timing out NHRP entries with Process-Switching

The last question is how NHRP times out unused entries in case of process-switching mode. Recall the “used” flag set for NHRP mapping. Every time a packet is process-switched using the respective NHRP entry, it is marked as “used”. The background NHRP process runs every 60 seconds, and check the expire timers for each NHRP entry. If the “used” flag is set and expire timer for the entry is greater than 120 seconds then the process clears the flag (and every new packet will refresh it). If the timer is less than 120 seconds and the flag is set, IOS generates a refreshing NHRP request. However, if the flag is not set, the system allows the entry to expire, unless another packet hits it and makes active.

The above-described behavior of NHRP with process switching allows for one interesting feature. The hub router may now summarize all information sent down to spokes say into one default route. This will not affect the spokes, for they will continue querying next-hop information for every destination prefix sent over the mGRE tunnel interface, and learning the optimal next-hop. It would be great to combine this “summarization” feature with the performance of CEF switching. This is exactly what they implemented with DMVPN Phase 3. However, Phase 3 is subject to a separate discussion.

Integrating IPsec

Haven’t we forgotten something for DMVPN Phase 1/Phase 2? That was IPsec, the components that provides confidentiality and integrity checking to mGRE/NHRP. Now, compared with the complexity of NHRP operations, IPsec integration is straightforward.

First, the hub needs to know how to authentication all the spokes using IKE. The most scalable way is to use X.509 certificates and PKI, but for the simplicity, we will just use the same pre-shared key on all routers. Note that we need to configure the routers with a wild-card pre-shared key, in order to accept IKE negotiation requests from any other dynamic peer.

As for IPsec Phase 2, we need dynamic crypto maps there, since the hub has no idea of the connecting peer IP addresses. Fortunately, Cisco IOS has a cute feature called IPsec profiles, designed for use with tunnel interfaces. The profile attaches to a tunnel interface and automatically considers all traffic going out of the tunnel as triggering the IPsec Phase 2. The IPsec phase proxy identities used by the IPsec profile are the source and destination host IP addresses of the tunnel. It makes sense to use IPSec transport mode with mGRE as the latter already provides tunnel encapsulation. Besides, IOS supports some features, like NAT traversal only with IPSec transport mode.

Let’s review an example below and explain how it works.

mGRE + NHRP Phase 2 + Spoke-to-spoke tunnels + IPsec

R1:
crypto isakmp policy 10
encryption 3des
authentication pre-share
hash md5
group 2
!
crypto isakmp key 0 CISCO address 0.0.0.0 0.0.0.0
!
crypto ipsec transform-set 3DES_MD5 esp-3des esp-md5-hmac
mode transport
!
crypto ipsec profile DMVPN
set transform-set 3DES_MD5
!
interface Tunnel 0
tunnel protection ipsec profile DMVPN

R2 & R3:

crypto isakmp policy 10
encryption 3des
authentication pre-share
hash md5
group 2
!
crypto isakmp key 0 CISCO address 0.0.0.0 0.0.0.0
!
crypto ipsec transform-set 3DES_MD5 esp-3des esp-md5-hmac
mode transport
!
crypto ipsec profile DMVPN
set transform-set 3DES_MD5
!
interface Tunnel 0
tunnel protection ipsec profile DMVPN

Start with any spoke, e.g. R3. Since the router uses EIGRP on Tunnel 0 interface, a multicast packet will eventually be send out of the tunnel interface. Thanks to the static NHRP multicast mapping, mGRE will encapsulate the EIGRP packet towards the hub router. The IPsec profile will see GRE traffic going from “150.1.3.3” to “150.1.1.1”. Automatically, ISAKMP negotiation will start with R1, and authentication will use pre-shared keys. Eventually both R1 and R3 will create IPsec SAs for GRE traffic between “150.1.3.3” and “150.1.1.1”. Now R3 may send NHRP resolution request. As soon as R3 tries to send traffic to a network behind R2, it will resolve next-hop “10.0.0.2” to the IP address of 150.1.2.2. This new NHRP entry will trigger ISAKMP negotiation with NBMA address 150.1.2.2 as soon as router tries to use it for packet forwarding. IKE negotiation between R3 and R2 will start and result in formation of new SAs corresponding to IP address pair “150.1.2.2 and 150.1.3.3” and GRE protocol. As soon as the routers complete IPsec Phase 2, packets may flow between R2 and R3 across the shortcut path.

When an unused NHRP entry times out, it will signal the ISAKMP process to terminate the respective IPsec connection. We described the process for timing out NHRP entries before, and as you remember, it depends on the “hold-time” value set by the routers. Additionally, the systems may expire ISAKMP/IPsec connections due to IPsec timeouts.

This is the crypto system status on the hub from the example with NHRP Phase 2 and process-switching:

IPsec Phase 1 has been established with both spokes

Rack1R1#show crypto isakmp sa
dst src state conn-id slot status
150.1.1.1 150.1.2.2 QM_IDLE 1 0 ACTIVE
150.1.1.1 150.1.3.3 QM_IDLE 3 0 ACTIVE

IPsec Phase 2 SA entries for both protected connections to R2 and R3 follows. Note that SAs are for GRE traffic between the loopback.

Rack1R1#show crypto ipsec sa

interface: Tunnel0
Crypto map tag: Tunnel0-head-0, local addr 150.1.1.1

protected vrf: (none)
local ident (addr/mask/prot/port): (150.1.1.1/255.255.255.255/47/0)
remote ident (addr/mask/prot/port): (150.1.2.2/255.255.255.255/47/0)
current_peer 150.1.2.2 port 500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 230, #pkts encrypt: 230, #pkts digest: 230
#pkts decaps: 227, #pkts decrypt: 227, #pkts verify: 227
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#send errors 12, #recv errors 0

local crypto endpt.: 150.1.1.1, remote crypto endpt.: 150.1.2.2
path mtu 1514, ip mtu 1514, ip mtu idb Loopback0
current outbound spi: 0x88261BA3(2284198819)

inbound esp sas:
spi: 0xE279A1EE(3799622126)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2001, flow_id: SW:1, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4472116/2632)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE
spi: 0xB4F6A9E5(3036064229)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2003, flow_id: SW:3, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4596176/2630)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE
spi: 0x1492E4D0(345171152)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2005, flow_id: SW:5, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4525264/2630)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

inbound ah sas:

inbound pcp sas:

outbound esp sas:
spi: 0x81949874(2173999220)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2002, flow_id: SW:2, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4472116/2626)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE
spi: 0xAA5D21A7(2858230183)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2004, flow_id: SW:4, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4596176/2627)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE
spi: 0x88261BA3(2284198819)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2006, flow_id: SW:6, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4525265/2627)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

outbound ah sas:

outbound pcp sas:

protected vrf: (none)
local ident (addr/mask/prot/port): (150.1.1.1/255.255.255.255/47/0)
remote ident (addr/mask/prot/port): (150.1.3.3/255.255.255.255/47/0)
current_peer 150.1.3.3 port 500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 225, #pkts encrypt: 225, #pkts digest: 225
#pkts decaps: 226, #pkts decrypt: 226, #pkts verify: 226
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#send errors 17, #recv errors 0

local crypto endpt.: 150.1.1.1, remote crypto endpt.: 150.1.3.3
path mtu 1514, ip mtu 1514, ip mtu idb Loopback0
current outbound spi: 0xBEB1D9CE(3199326670)

inbound esp sas:
spi: 0x10B44B31(280251185)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2007, flow_id: SW:7, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4436422/2627)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

inbound ah sas:

inbound pcp sas:

outbound esp sas:
spi: 0xBEB1D9CE(3199326670)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2008, flow_id: SW:8, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4436424/2627)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

outbound ah sas:

outbound pcp sas:

Now let’s see how a spoke router establishes a spoke-to-spoke IPsec tunnel:


No NHRP mapping for spoke’s network first

Rack1R3#sh ip nhrp
10.0.0.1/32 via 10.0.0.1, Tunnel0 created 02:02:42, never expire
Type: static, Flags: authoritative used
NBMA address: 150.1.1.1


ISAKMP negotiated just with R1

Rack1R3#sh crypto isakmp sa
dst src state conn-id slot status
150.1.1.1 150.1.3.3 QM_IDLE 1 0 ACTIVE

Generate traffic to network behind R2. Note that the first ping passes through, since it’s routed across the hub, but the second packet is sent directly to R2 and is missed, since IPsec Phase 2 has not yet been established

Rack1R3#ping 10.0.2.2

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.2.2, timeout is 2 seconds:
!.!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 52/121/324 ms


Notice the new NHRP mappings. Note that the tunnel will expire in about 3
minutes, if no new traffic is going to be generated

Rack1R3#sh ip nhrp
10.0.0.1/32 via 10.0.0.1, Tunnel0 created 02:05:38, never expire
Type: static, Flags: authoritative used
NBMA address: 150.1.1.1
10.0.2.0/24 via 10.0.2.2, Tunnel0 created 00:02:44, expire 00:03:15
Type: dynamic, Flags: router
NBMA address: 150.1.2.2

IOS create IPsec Phase 2 SAs for tunnels between R2-R3 and R1-R3. The tunnel between 2 and R3 is dynamic and is used to send only the data traffic.

Rack1R3#show crypto isakmp sa
dst src state conn-id slot status
150.1.1.1 150.1.3.3 QM_IDLE 1 0 ACTIVE
150.1.3.3 150.1.2.2 QM_IDLE 2 0 ACTIVE

Rack1R3#show crypto ipsec sa

interface: Tunnel0
Crypto map tag: Tunnel0-head-0, local addr 150.1.3.3

protected vrf: (none)
local ident (addr/mask/prot/port): (150.1.3.3/255.255.255.255/47/0)
remote ident (addr/mask/prot/port): (150.1.1.1/255.255.255.255/47/0)
current_peer 150.1.1.1 port 500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 290, #pkts encrypt: 290, #pkts digest: 290
#pkts decaps: 284, #pkts decrypt: 284, #pkts verify: 284
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#send errors 0, #recv errors 0

local crypto endpt.: 150.1.3.3, remote crypto endpt.: 150.1.1.1
path mtu 1514, ip mtu 1514, ip mtu idb Loopback0
current outbound spi: 0x10B44B31(280251185)

inbound esp sas:
spi: 0xBEB1D9CE(3199326670)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2001, flow_id: SW:1, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4526856/2383)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

inbound ah sas:

inbound pcp sas:

outbound esp sas:
spi: 0x10B44B31(280251185)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2002, flow_id: SW:2, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4526853/2381)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

outbound ah sas:

outbound pcp sas:

protected vrf: (none)
local ident (addr/mask/prot/port): (150.1.3.3/255.255.255.255/47/0)
remote ident (addr/mask/prot/port): (150.1.2.2/255.255.255.255/47/0)
current_peer 150.1.2.2 port 500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 3, #pkts encrypt: 3, #pkts digest: 3
#pkts decaps: 4, #pkts decrypt: 4, #pkts verify: 4
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#send errors 0, #recv errors 0

local crypto endpt.: 150.1.3.3, remote crypto endpt.: 150.1.2.2
path mtu 1514, ip mtu 1514, ip mtu idb Loopback0
current outbound spi: 0x847D8EEC(2222821100)

inbound esp sas:
spi: 0xA6851754(2793740116)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2004, flow_id: SW:4, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4602306/3572)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

inbound ah sas:

inbound pcp sas:

outbound esp sas:
spi: 0x847D8EEC(2222821100)
transform: esp-3des esp-md5-hmac ,
in use settings ={Transport, }
conn id: 2003, flow_id: SW:3, crypto map: Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4602306/3572)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

outbound ah sas:

outbound pcp sas:

Now you see how all the component of DMVPN work together. We have not covered some other major topics like NAT traversal with NHRP and DMVPN redundancy with multiple hubs. Those advanced topics probably require a separate post, since this one has grown too big already :)

Feb
19

UPDATE: For more information on Redistribution see the video series Understanding Route Redistribution – Excerpts from CCIE R&S ATC

Simple Redistribution Step-by-Step

We're going to take our basic topology from the previous post Understanding Redistribution Part I , and configure to provide full connectivity between all devices with the most simple configuration. Then we are going to tweak some settings and see how they affect redistribution and optimal routing. This is going to be an introductory example to illustrate the redistribution control techniques mentioned previously.

First, we configure IGP routing per the diagram, and advertise Loopback0 interfaces (150.X.Y.Y/24, where X is the rack number, and Y is the router number) into IGP. Specifically, R2, R3 and R4 Loopbacks are advertised into OSPF Area 0. R5 and R6 Loopbacks are advertised into EIGRP 356 and R1 Loopback interface is simply advertised into EIGRP 123.

Next, we propose OSPF to be the core routing domain, used as transit path to reach any other domains. All other domains, in effect, will be connected as stub (non-transit) to the core domain. We start with EIGRP 123 and OSPF domains, and enable mutual redistribution between them on R2 and R3 routers:

R2:
!
! Metric values are chosen just to be easy to type in
! No big secret rules of thumb behind this one
!
router eigrp 123
redistribute ospf 1 metric 1 1 1 1 1
!
router ospf 1
redistribute eigrp 123 subnets

R3:
router eigrp 123
redistribute ospf 1 metric 1 1 1 1 1
!
router ospf 1
redistribute eigrp 123 subnets

In effect, we would expect both routing domains to become transit for each other. However, EIGRP has a really nice feature of assigning default AD value of 170 to external routes. This, in turn, prevents circular redistribution - all OSPF routes injected into EIGRP 123 domain have AD of 170, and are effectively stopped from being advertised back into OSPF, since native OSPF routes preempt them. Let's see the routing table states:

Rack18R1#show ip route eigrp
174.18.0.0/24 is subnetted, 2 subnets
D EX 174.18.234.0
[170/2560002816] via 174.18.123.3, 00:02:03, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:03, FastEthernet0/0
150.18.0.0/24 is subnetted, 4 subnets
D EX 150.18.4.0
[170/2560002816] via 174.18.123.3, 00:02:03, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:03, FastEthernet0/0
D EX 150.18.2.0
[170/2560002816] via 174.18.123.3, 00:02:03, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:03, FastEthernet0/0
D EX 150.18.3.0
[170/2560002816] via 174.18.123.3, 00:02:03, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:03, FastEthernet0/0

Just as we would expect, R1 has two routes from each OSPF prefix, since both R2 and R3 assign the same seed metric to redistributed routes.

Rack18R2#show ip route | beg Gate
Gateway of last resort is not set

174.18.0.0/24 is subnetted, 2 subnets
C 174.18.234.0 is directly connected, Serial0/0
C 174.18.123.0 is directly connected, FastEthernet0/0
150.18.0.0/24 is subnetted, 4 subnets
O 150.18.4.0 [110/65] via 174.18.234.4, 00:00:19, Serial0/0
D 150.18.1.0 [90/156160] via 174.18.123.1, 00:00:13, FastEthernet0/0
C 150.18.2.0 is directly connected, Loopback0
O 150.18.3.0 [110/65] via 174.18.234.3, 00:00:19, Serial0/0

Rack18R3#show ip route | beg Gate
Gateway of last resort is not set

174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial1/0
C 174.18.0.0 is directly connected, FastEthernet0/1
C 174.18.123.0 is directly connected, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/782] via 174.18.234.4, 00:00:54, Serial1/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 00:17:51, FastEthernet0/1
D 150.18.6.0 [90/156160] via 174.18.0.6, 00:17:49, FastEthernet0/1
D 150.18.1.0 [90/156160] via 174.18.123.1, 00:00:48, FastEthernet0/0
O 150.18.2.0 [110/782] via 174.18.234.2, 00:00:54, Serial1/0
C 150.18.3.0 is directly connected, Loopback0

All seems to be fine here too; thanks to EIGRP AD value of 90, EIGRP "native" prefixes are reachable via EIGRP, and OSPF native subnets are reachable via OSPF (since EIGRP External AD value is 170). Next, we move a step further, and redistribute between OSPF and EIGRP 356 domains, i.e. attach the latter domain to the "core":

R3:
router eigrp 356
redistribute ospf 1 metric 1 1 1 1 1
!
router ospf 1
redistribute eigrp 123 subnets
redistribute eigrp 356 subnets

Since EIGRP 356 domain has just one point of attachement to the core, there should be no big problems. Look at the routing table of R1:

Rack18R1#show ip route eigrp
174.18.0.0/24 is subnetted, 3 subnets
D EX 174.18.234.0
[170/2560002816] via 174.18.123.3, 00:05:52, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:52, FastEthernet0/0
D EX 174.18.0.0
[170/2560002816] via 174.18.123.2, 00:01:36, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
D EX 150.18.4.0
[170/2560002816] via 174.18.123.3, 00:05:52, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:52, FastEthernet0/0
D EX 150.18.5.0
[170/2560002816] via 174.18.123.2, 00:01:36, FastEthernet0/0
D EX 150.18.6.0
[170/2560002816] via 174.18.123.2, 00:01:36, FastEthernet0/0
D EX 150.18.2.0
[170/2560002816] via 174.18.123.3, 00:05:52, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:52, FastEthernet0/0
D EX 150.18.3.0
[170/2560002816] via 174.18.123.3, 00:05:52, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:52, FastEthernet0/0

Ah, great. R1 sees R5 and R6 loopback as they are advertised by R2 only. Naturally, redistribution between local routing processes on R3 is non-transitive - when we inject EIGRP 356 routes into OSPF, they are not further propagated into EIGRP 123, even though OSPF is redistributed into EIGRP 123. So R1 packets would have to transit OSPF domain, in order to reach R5 and R6.

R2 sees EIGRP 356 routes reachable via R3:

Rack18R2#show ip route | beg Gate
Gateway of last resort is not set

174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial0/0
O E2 174.18.0.0 [110/20] via 174.18.234.3, 00:02:20, Serial0/0
C 174.18.123.0 is directly connected, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/65] via 174.18.234.4, 00:06:41, Serial0/0
O E2 150.18.5.0 [110/20] via 174.18.234.3, 00:02:20, Serial0/0
O E2 150.18.6.0 [110/20] via 174.18.234.3, 00:02:20, Serial0/0
D 150.18.1.0 [90/156160] via 174.18.123.1, 00:06:35, FastEthernet0/0
C 150.18.2.0 is directly connected, Loopback0
O 150.18.3.0 [110/65] via 174.18.234.3, 00:06:41, Serial0/0

And R3 in turn sees them as EIGRP 356 native ones:

Rack18R3#show ip route | beg Gate
Gateway of last resort is not set

174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial1/0
C 174.18.0.0 is directly connected, FastEthernet0/1
C 174.18.123.0 is directly connected, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/782] via 174.18.234.4, 00:02:47, Serial1/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 00:02:36, FastEthernet0/1
D 150.18.6.0 [90/156160] via 174.18.0.6, 00:02:36, FastEthernet0/1
D 150.18.1.0 [90/156160] via 174.18.123.1, 00:02:34, FastEthernet0/0
O 150.18.2.0 [110/782] via 174.18.234.2, 00:02:47, Serial1/0
C 150.18.3.0 is directly connected, Loopback0

Fine, now to finish the picture, we redistribute between RIP and OSPF on R4

R4:
router ospf 1
redistribute rip subnets
!
! Assign some large metric value to redistributed routes
!
router rip
redistribute ospf 1 metric 8

Check to see the full routing table of R1:

Rack18R1#show ip route eigrp
D EX 222.22.2.0/24
[170/2560002816] via 174.18.123.3, 00:02:16, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:16, FastEthernet0/0
D EX 220.20.3.0/24
[170/2560002816] via 174.18.123.3, 00:02:16, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:16, FastEthernet0/0
174.18.0.0/24 is subnetted, 3 subnets
D EX 174.18.234.0
[170/2560002816] via 174.18.123.3, 00:05:17, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:17, FastEthernet0/0
D EX 174.18.0.0
[170/2560002816] via 174.18.123.2, 00:08:33, FastEthernet0/0
D EX 192.10.18.0/24
[170/2560002816] via 174.18.123.3, 00:02:16, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:16, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
D EX 150.18.4.0
[170/2560002816] via 174.18.123.3, 00:05:17, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:17, FastEthernet0/0
D EX 150.18.5.0
[170/2560002816] via 174.18.123.2, 00:05:14, FastEthernet0/0
D EX 150.18.6.0
[170/2560002816] via 174.18.123.2, 00:05:15, FastEthernet0/0
D EX 150.18.2.0
[170/2560002816] via 174.18.123.3, 00:05:20, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:20, FastEthernet0/0
D EX 150.18.3.0
[170/2560002816] via 174.18.123.3, 00:05:20, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:05:20, FastEthernet0/0
D EX 205.90.31.0/24
[170/2560002816] via 174.18.123.3, 00:02:19, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:02:19, FastEthernet0/0

Seems like we have connectivity to all prefixes, even the ones injected by BB2 (205.90.31.0/24 etc). All of them, with except to EIGRP 356 routes are symmetrically reachable via R2 and R3. Now, a long snapshot of all other routing tables:

Rack18R2#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [110/20] via 174.18.234.4, 00:03:29, Serial0/0
O E2 220.20.3.0/24 [110/20] via 174.18.234.4, 00:03:29, Serial0/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial0/0
O E2 174.18.0.0 [110/20] via 174.18.234.3, 00:03:29, Serial0/0
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [110/20] via 174.18.234.4, 00:03:29, Serial0/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/65] via 174.18.234.4, 00:03:29, Serial0/0
O E2 150.18.5.0 [110/20] via 174.18.234.3, 00:03:29, Serial0/0
O E2 150.18.6.0 [110/20] via 174.18.234.3, 00:03:29, Serial0/0
D 150.18.1.0 [90/156160] via 174.18.123.1, 00:19:36, FastEthernet0/0
C 150.18.2.0 is directly connected, Loopback0
O 150.18.3.0 [110/65] via 174.18.234.3, 00:03:29, Serial0/0
O E2 205.90.31.0/24 [110/20] via 174.18.234.4, 00:03:29, Serial0/0

Rack18R3#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [110/20] via 174.18.234.4, 00:03:56, Serial1/0
O E2 220.20.3.0/24 [110/20] via 174.18.234.4, 00:03:56, Serial1/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial1/0
C 174.18.0.0 is directly connected, FastEthernet0/1
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [110/20] via 174.18.234.4, 00:03:56, Serial1/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/782] via 174.18.234.4, 00:03:56, Serial1/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 00:06:58, FastEthernet0/1
D 150.18.6.0 [90/156160] via 174.18.0.6, 00:06:58, FastEthernet0/1
D 150.18.1.0 [90/156160] via 174.18.123.1, 00:06:57, FastEthernet0/0
O 150.18.2.0 [110/782] via 174.18.234.2, 00:03:56, Serial1/0
C 150.18.3.0 is directly connected, Loopback0
O E2 205.90.31.0/24 [110/20] via 174.18.234.4, 00:03:56, Serial1/0

Rack18R4#show ip route | beg Gate
Gateway of last resort is not set

R 222.22.2.0/24 [120/7] via 192.10.18.254, 00:00:19, FastEthernet0/0
R 220.20.3.0/24 [120/7] via 192.10.18.254, 00:00:19, FastEthernet0/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial0/0
O E2 174.18.0.0 [110/20] via 174.18.234.3, 00:05:11, Serial0/0
O E2 174.18.123.0 [110/20] via 174.18.234.3, 00:05:11, Serial0/0
[110/20] via 174.18.234.2, 00:05:11, Serial0/0
C 192.10.18.0/24 is directly connected, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
C 150.18.4.0 is directly connected, Loopback0
O E2 150.18.5.0 [110/20] via 174.18.234.3, 00:05:11, Serial0/0
O E2 150.18.6.0 [110/20] via 174.18.234.3, 00:05:11, Serial0/0
O E2 150.18.1.0 [110/20] via 174.18.234.3, 00:05:11, Serial0/0
[110/20] via 174.18.234.2, 00:05:11, Serial0/0
O 150.18.2.0 [110/65] via 174.18.234.2, 00:05:11, Serial0/0
O 150.18.3.0 [110/65] via 174.18.234.3, 00:05:11, Serial0/0
R 205.90.31.0/24 [120/7] via 192.10.18.254, 00:00:20, FastEthernet0/0

A quick stop at R5. Since RIPv2 AD is 120, and EIGRP External AD is 170, R5 sees all non EIGRP 356 routes via RIP. Not a big problem right now, but it creates "suboptimal" paths to EIGRP 123 routes, since it's more, well, effective to reach them over Ethernet links.

Rack18R5#sh ip route | beg Gate
Gateway of last resort is not set

R 222.22.2.0/24 [120/7] via 192.10.18.254, 00:00:25, FastEthernet0/1
R 220.20.3.0/24 [120/7] via 192.10.18.254, 00:00:25, FastEthernet0/1
174.18.0.0/24 is subnetted, 3 subnets
R 174.18.234.0 [120/8] via 192.10.18.4, 00:00:04, FastEthernet0/1
C 174.18.0.0 is directly connected, FastEthernet0/0
R 174.18.123.0 [120/8] via 192.10.18.4, 00:00:04, FastEthernet0/1
C 192.10.18.0/24 is directly connected, FastEthernet0/1
150.18.0.0/24 is subnetted, 6 subnets
R 150.18.4.0 [120/8] via 192.10.18.4, 00:00:04, FastEthernet0/1
C 150.18.5.0 is directly connected, Loopback0
D 150.18.6.0 [90/156160] via 174.18.0.6, 00:43:41, FastEthernet0/0
R 150.18.1.0 [120/8] via 192.10.18.4, 00:00:04, FastEthernet0/1
R 150.18.2.0 [120/8] via 192.10.18.4, 00:00:04, FastEthernet0/1
R 150.18.3.0 [120/8] via 192.10.18.4, 00:00:04, FastEthernet0/1
R 205.90.31.0/24 [120/7] via 192.10.18.254, 00:00:25, FastEthernet0/1

We may opt to craeate an access-list, select all RIP routes, and freeze their AD down to 120. Next we can rais global RIP distance to something bigger than 170, to fix this issue. Not a big deal, though! OK, so the only left are R6 and BB2:

Rack18R6#show ip route eigrp
D EX 222.22.2.0/24 [170/2560002816] via 174.18.0.3, 00:07:29, FastEthernet0/0
D EX 220.20.3.0/24 [170/2560002816] via 174.18.0.3, 00:07:29, FastEthernet0/0
174.18.0.0/24 is subnetted, 2 subnets
D EX 174.18.234.0
[170/2560002816] via 174.18.0.3, 00:10:31, FastEthernet0/0
D EX 192.10.18.0/24 [170/2560002816] via 174.18.0.3, 00:07:29, FastEthernet0/0
150.18.0.0/24 is subnetted, 5 subnets
D EX 150.18.4.0 [170/2560002816] via 174.18.0.3, 00:10:31, FastEthernet0/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 00:41:30, FastEthernet0/0
D EX 150.18.2.0 [170/2560002816] via 174.18.0.3, 00:10:31, FastEthernet0/0
D EX 150.18.3.0 [170/2560002816] via 174.18.0.3, 00:10:31, FastEthernet0/0
D EX 205.90.31.0/24 [170/2560002816] via 174.18.0.3, 00:07:29, FastEthernet0/0

BB2>sh ip ro rip
174.18.0.0/24 is subnetted, 3 subnets
R 174.18.234.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 174.18.0.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 174.18.123.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
150.18.0.0/24 is subnetted, 6 subnets
R 150.18.4.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 150.18.5.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 150.18.6.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 150.18.1.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 150.18.2.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0
R 150.18.3.0 [120/8] via 192.10.18.4, 00:00:24, Ethernet0

This seems to be an easy one. We're done, full connectivity attained, and no loops were introduced. Thanks to EIGRP External AD there is no need to manually filter routes on the border routes between EIGRP 123 and OSPF. But what if we are asked to redistribute some "native" EIGRP 123 prefixes (e.g. R1 Loopback0) instead of advertising them?

R1:
router eigrp 123
redistribute connected
network 174.18.123.1 0.0.0.0

See what happens:

Rack18R2#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [110/20] via 174.18.234.4, 00:15:13, Serial0/0
O E2 220.20.3.0/24 [110/20] via 174.18.234.4, 00:15:13, Serial0/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial0/0
O E2 174.18.0.0 [110/20] via 174.18.234.3, 00:15:13, Serial0/0
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [110/20] via 174.18.234.4, 00:15:13, Serial0/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/65] via 174.18.234.4, 00:15:13, Serial0/0
O E2 150.18.5.0 [110/20] via 174.18.234.3, 00:15:13, Serial0/0
O E2 150.18.6.0 [110/20] via 174.18.234.3, 00:15:13, Serial0/0
D EX 150.18.1.0 [170/156160] via 174.18.123.1, 00:01:49, FastEthernet0/0
C 150.18.2.0 is directly connected, Loopback0
O 150.18.3.0 [110/65] via 174.18.234.3, 00:15:13, Serial0/0
O E2 205.90.31.0/24 [110/20] via 174.18.234.4, 00:15:13, Serial0/0

R2 sees R1 Loopback0 as EIGRP external route reachable via R1. However, the situation is different on R3:

Rack18R3#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [110/20] via 174.18.234.4, 00:16:10, Serial1/0
O E2 220.20.3.0/24 [110/20] via 174.18.234.4, 00:16:10, Serial1/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial1/0
C 174.18.0.0 is directly connected, FastEthernet0/1
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [110/20] via 174.18.234.4, 00:16:10, Serial1/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [110/782] via 174.18.234.4, 00:16:10, Serial1/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 00:19:12, FastEthernet0/1
D 150.18.6.0 [90/156160] via 174.18.0.6, 00:19:12, FastEthernet0/1
O E2 150.18.1.0 [110/20] via 174.18.234.2, 00:02:46, Serial1/0
O 150.18.2.0 [110/782] via 174.18.234.2, 00:16:10, Serial1/0
C 150.18.3.0 is directly connected, Loopback0
O E2 205.90.31.0/24 [110/20] via 174.18.234.4, 00:16:10, Serial1/0

Thanks to R1 Loopback0 being redistributed into OSPF, R3 prefers it via R2, not R1, due to AD values. Not a very big problem in this scenario, but in more complex cases this may lead to routing loops, since suboptimal path may be chosen. In order to fix this, we may adjust AD for the EIGRP prefix. However, there is a problem here - we can't change EIGRP external AD selectively, only for all prefixes at once.

This why we may choose to play with AD under OSPF. We may simply adjust AD for R1 Loopback0 prefix to a value of 171. However, we may go further, and simulate the feature of EIGRP with OSPF - speficically, make sure OSPF internal routes have AD different from the external prefixes. For EIGRP the values are 90 and 170. What values should we pick up for OSPF? Well, since OSPF is the core routing domain, it has more information available on what's going around. Okay, and it's the only transit domain, so we should make sure it has the highest preference among others. So it makes sense to set internal/external AD for OSPF to 80 and 160 - lower than the EIGRP values. This ensure that internal routes are always preferred over external, but core routing protocol is always the most trusted one.

R2 & R3:
access-list 1 permit 150.18.1.0
!
router ospf 1
distance ospf intra-area 80
distance ospf inter-area 80
distance ospf external 160
distance 171 0.0.0.0 255.255.255.255 1

See how this works on R2 and R3:

Rack18R2#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [160/20] via 174.18.234.4, 00:00:07, Serial0/0
O E2 220.20.3.0/24 [160/20] via 174.18.234.4, 00:00:07, Serial0/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial0/0
O E2 174.18.0.0 [160/20] via 174.18.234.3, 00:00:35, Serial0/0
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [160/20] via 174.18.234.4, 00:00:35, Serial0/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [80/65] via 174.18.234.4, 00:00:35, Serial0/0
O E2 150.18.5.0 [160/20] via 174.18.234.3, 00:00:35, Serial0/0
O E2 150.18.6.0 [160/20] via 174.18.234.3, 00:00:35, Serial0/0
D EX 150.18.1.0 [170/156160] via 174.18.123.1, 00:03:10, FastEthernet0/0
C 150.18.2.0 is directly connected, Loopback0
O 150.18.3.0 [80/65] via 174.18.234.3, 00:00:35, Serial0/0
O E2 205.90.31.0/24 [160/20] via 174.18.234.4, 00:00:07, Serial0/0

OK, just as it was before, next comes R3:

Rack18R3#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [160/20] via 174.18.234.4, 00:00:55, Serial1/0
O E2 220.20.3.0/24 [160/20] via 174.18.234.4, 00:00:55, Serial1/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial1/0
C 174.18.0.0 is directly connected, FastEthernet0/1
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [160/20] via 174.18.234.4, 00:01:14, Serial1/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [80/782] via 174.18.234.4, 00:01:14, Serial1/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 00:37:14, FastEthernet0/1
D 150.18.6.0 [90/156160] via 174.18.0.6, 00:37:14, FastEthernet0/1
D EX 150.18.1.0 [170/156160] via 174.18.123.1, 00:03:58, FastEthernet0/0
O 150.18.2.0 [80/782] via 174.18.234.2, 00:01:14, Serial1/0
C 150.18.3.0 is directly connected, Loopback0
O E2 205.90.31.0/24 [160/20] via 174.18.234.4, 00:00:56, Serial1/0

All seems to look great, with except to the fact that EIGRP 123 and EIGRP 356 have to transit OSPF domain in order to reach each other. Though this is only true for R1, still we need to find out a way to resolve this issue. What if we start exhanging routes on R3 between EIGRP 123 and EIGRP 356? This will not cause any new domain to become transit - because EIGRP external AD will block the external routes from being re-injected into the core domain. This is why we may safely turn on mutual redistribution between the two domains.

R3:
router eigrp 123
redistribute eigrp 356
!
router eigrp 356
redistribute eigrp 123

Observe the effect on R1 now. Note the metrics assigned to R5 and R6 Loopback prefixes - they are taken directly from EIGRP 356 native prefixe metric values.

Rack18R1#show ip route eigrp
D EX 222.22.2.0/24
[170/2560002816] via 174.18.123.3, 00:20:05, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:05, FastEthernet0/0
D EX 220.20.3.0/24
[170/2560002816] via 174.18.123.3, 00:20:05, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:05, FastEthernet0/0
174.18.0.0/24 is subnetted, 3 subnets
D EX 174.18.234.0
[170/2560002816] via 174.18.123.3, 00:35:22, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:35:22, FastEthernet0/0
D EX 174.18.0.0 [170/30720] via 174.18.123.3, 00:00:37, FastEthernet0/0
D EX 192.10.18.0/24
[170/2560002816] via 174.18.123.3, 00:20:33, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:33, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
D EX 150.18.4.0
[170/2560002816] via 174.18.123.3, 00:20:33, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:33, FastEthernet0/0
D EX 150.18.5.0 [170/158720] via 174.18.123.3, 00:00:38, FastEthernet0/0
D EX 150.18.6.0 [170/158720] via 174.18.123.3, 00:00:38, FastEthernet0/0
D EX 150.18.2.0
[170/2560002816] via 174.18.123.3, 00:20:25, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:25, FastEthernet0/0
D EX 150.18.3.0
[170/2560002816] via 174.18.123.3, 00:20:37, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:37, FastEthernet0/0
D EX 205.90.31.0/24
[170/2560002816] via 174.18.123.3, 00:20:09, FastEthernet0/0
[170/2560002816] via 174.18.123.2, 00:20:09, FastEthernet0/0

R6 also learns EIGRP 123 routes via R3:

Rack18R6#show ip route eigrp
D EX 222.22.2.0/24 [170/2560002816] via 174.18.0.3, 00:23:59, FastEthernet0/0
D EX 220.20.3.0/24 [170/2560002816] via 174.18.0.3, 00:23:59, FastEthernet0/0
174.18.0.0/24 is subnetted, 3 subnets
D EX 174.18.234.0
[170/2560002816] via 174.18.0.3, 01:00:18, FastEthernet0/0
D EX 174.18.123.0 [170/30720] via 174.18.0.3, 00:04:15, FastEthernet0/0
D EX 192.10.18.0/24 [170/2560002816] via 174.18.0.3, 00:27:22, FastEthernet0/0
150.18.0.0/24 is subnetted, 6 subnets
D EX 150.18.4.0 [170/2560002816] via 174.18.0.3, 00:27:22, FastEthernet0/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 01:31:17, FastEthernet0/0
D EX 150.18.1.0 [170/158720] via 174.18.0.3, 00:04:15, FastEthernet0/0
D EX 150.18.2.0 [170/2560002816] via 174.18.0.3, 00:24:18, FastEthernet0/0
D EX 150.18.3.0 [170/2560002816] via 174.18.0.3, 01:00:18, FastEthernet0/0
D EX 205.90.31.0/24 [170/2560002816] via 174.18.0.3, 00:23:59, FastEthernet0/0

R3 will not transit OSPF backbone, due to our choice of administrative distances:

Rack18R3#show ip route | beg Gate
Gateway of last resort is not set

O E2 222.22.2.0/24 [160/20] via 174.18.234.4, 02:22:45, Serial1/0
O E2 220.20.3.0/24 [160/20] via 174.18.234.4, 02:22:45, Serial1/0
174.18.0.0/24 is subnetted, 3 subnets
C 174.18.234.0 is directly connected, Serial1/0
C 174.18.0.0 is directly connected, FastEthernet0/1
C 174.18.123.0 is directly connected, FastEthernet0/0
O E2 192.10.18.0/24 [160/20] via 174.18.234.4, 02:23:04, Serial1/0
150.18.0.0/24 is subnetted, 6 subnets
O 150.18.4.0 [80/782] via 174.18.234.4, 02:23:04, Serial1/0
D 150.18.5.0 [90/156160] via 174.18.0.5, 02:59:03, FastEthernet0/1
D 150.18.6.0 [90/156160] via 174.18.0.6, 02:59:04, FastEthernet0/1
D EX 150.18.1.0 [170/156160] via 174.18.123.1, 02:25:48, FastEthernet0/0
O 150.18.2.0 [80/782] via 174.18.234.2, 02:23:04, Serial1/0
C 150.18.3.0 is directly connected, Loopback0
O E2 205.90.31.0/24 [160/20] via 174.18.234.4, 02:22:45, Serial1/0

So we come up with (mostly) optimal redistribution configuration for our topology. What we learned so far, is that Split-horizon rule may also be implemented using the Administrative Disatnces. EIGRP implements this extremely useful feature automatically, while OSPF should be manually tuned for that. However, as we will learn in further examples, some additional work is needed for RIP. Also, now we remember that redistribution is non-transitive on local router. Finally, we learned a way to introduce a kind of hierarchy of administrative distances for a star-like routing domain topology (core transit domain, and stub edge domains). More complicated examples are to follow this post.

Subscribe to INE Blog Updates