OSPF and MTU Mismatch
What is the difference between using the “system mtu routing 1500” and the “ip ospf mtu-ignore” commands when running OSPF between a router and a switch?
Within the scope of the CCIE Lab Exam, it may be acceptable to issue either of these commands to solve a specific lab task. However, it is key to note that there is a difference between ignoring the MTU for the purpose of OSPF adjacency and matching the MTU within a real production network.
By design, OSPF will automatically detect a MTU mismatch between two devices when they exchange the Database Description (DBD) packets during the formation of adjacency. This is per the standard OSPF specification defined in RFC 2328, “OSPF Version 2”. Specifically the RFC states the following:
10.6. Receiving Database Description Packets This section explains the detailed processing of a received Database Description Packet. [snip] If the Interface MTU field in the Database Description packet indicates an IP datagram size that is larger than the router can accept on the receiving interface without fragmentation, the Database Description packet is rejected. [/snip]
Basically this means that if a router tries to negotiate an adjacency on an interface in which the remote neighbor has a larger MTU, the adjacency will be denied. The idea behind this check is two-fold. The first is to alleviate a problem in the data plane, in which a sending host transmits packets to a receiver that are too large to accept. Typically, Path MTU Discovery (PMTUD) should be implemented on the sender to prevent this case, however this process relies on ICMP messages that could possibly be filtered out in the transit path due to a security policy. The second, and most important issue, is to alleviate a problem in the control plane in which OSPF packets are exchanged.
Specifically this problem stems from the issue that the OSPF Hello, Database Description (DBD), Link-State Request (LSR), and Link-State Acknowledgement (LSAck) packets are generally small, but the Link-State Update (LSU) packets are generally not.
When establishing a new OSPF adjacency, the DBD packet is used to tell new neighbors what LSAs are in the database, but not to give the details about them. Specifically the DBD contains the LSA Header information, but not the actual LSA payload. The idea behind this is to optimize flooding in the case that the receiving router already received the LSA from another neighbor, in which case flooding does not need to occur during adjacency establishment.
For example, suppose that you and I, routers A and B, both have neighbors C and D, and the database is synchronized. If you and I form a new adjacency, my DBD exchange to you will say that I have LSAs A, B, C, and D in my database. Since you are already adjacent with C and D, and I am adjacent with them, you already have all of my LSAs, possibly with the exception of the new link that connects us. This means that even though I describe LSAs A and B to you with my DBD packet, you don’t send an LSR to me for them, which means I don’t send you an LSU about them. This is the normal optimization of how the database is exchanged so that excessive flooding doesn’t occur.
Suppose next that you, router A, know about LSAs A1 through An in your database, and I, router B, know about LSAs B1 through Bn. When we establish an adjacency your DBD to me will describe LSAs A1-An, while mine will describe LSAs B1-Bn. Since I don’t have LSAs A1-An, I will send you an LSR about them, and likewise since you don’t have B1-Bn, you will send an LSR about those to me. When you reply back to me with the LSUs about A1-An, it is likely that the LSU packet itself will contain more than one LSA in the payload, or that if the LSA is large, that it will span multiple IP fragments. The idea behind this is that since you need to send me more than one LSA, it’s more efficient to send them in as few LSUs as possible, instead of sending one LSA per LSU. The problem that can occur in this procedure however is when the router that is flooding has a larger MTU than the router that is receiving.
For example, suppose that the flooding router has a Gigabit Ethernet interface that supports Jumbo frames, which exceed the normal Ethernet MTU of 1500 bytes; however, the receiving router has not enabled Jumbo frame support, which implies that frames over 1500 bytes (excluding layer 2 overhead) will be dropped. If the flooding router sends multiple LSAs in an LSU forcing the packet size to exceed 1500 bytes, or if a single LSA sent by the flooding router is large enough to exceed 1500 bytes, such as a Router LSA (LSA Type 1) with many links, the results can be non-deterministic.
To demonstrate this, take the following topology.
R1 and R2 connect with GigabitEthernet, while R2 and R3 connect with FastEthernet. R1 has a default MTU of 1500 bytes configured on its link to R2, while R2 has Jumbo frame support configured up to 2000 bytes. R2 and R3’s link uses the default MTU of 1500 bytes. Per the RFC’s defined behavior, R1 should reject a OSPF adjacency with R2. This default behavior can be seen as follows:
R1: interface GigabitEthernet1/0 ip address 220.127.116.11 255.255.255.0 ! router ospf 1 network 0.0.0.0 255.255.255.255 area 0 R2: interface GigabitEthernet1/0 mtu 2000 ip address 18.104.22.168 255.255.255.0 ! router ospf 1 network 0.0.0.0 255.255.255.255 area 0 R1#debug ip packet detail IP packet debugging is on (detailed) R1#debug ip ospf adj OSPF adjacency events debugging is on 01:07:18: OSPF: Rcv DBD from 22.214.171.124 on GigabitEthernet1/0 seq 0x172A opt 0x52 flag 0x7 len 32 mtu 2000 state EXSTART 01:07:18: OSPF: Nbr 126.96.36.199 has larger interface MTU 01:07:18: OSPF: Retransmitting DBD to 188.8.131.52 on GigabitEthernet1/0 01:07:18: OSPF: Up DBD Retransmit cnt to 5 for 184.108.40.206 on GigabitEthernet1/0 01:07:18: OSPF: Send DBD to 220.127.116.11 on GigabitEthernet1/0 seq 0x1813 opt 0x52 flag 0x7 len 32
In this case we can see that R1 rejects R2′s DBD packet, since the MTU is larger. Although the obvious solution to this problem is to simply match the MTU of the links to avoid this problem in the first place, IOS also offers the “ip ospf mtu-ignore” command at the interface level to skip over this check in the OSPF adjacency state machine. Once applied, as seen below, R1 and R2 form an adjacency.
R1#conf t Enter configuration commands, one per line. End with CNTL/Z. R1(config)#interface Gig1/0 R1(config-if)#ip ospf mtu-ignore R1(config-if)#end R1# %OSPF-5-ADJCHG: Process 1, Nbr 18.104.22.168 on GigabitEthernet1/0 from LOADING to FULL, Loading Done R1#show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 22.214.171.124 1 FULL/DR 00:00:36 126.96.36.199 GigabitEthernet1/0
At this point, both R1 and R2 learn the routes to each other’s Loopback0 interfaces, as seen below.
R1#show ip route ospf 188.8.131.52/32 is subnetted, 1 subnets O 184.108.40.206 [110/2] via 220.127.116.11, 00:00:05, GigabitEthernet1/0 R2#show ip route ospf 18.104.22.168/32 is subnetted, 1 subnets O 22.214.171.124 [110/2] via 126.96.36.199, 00:00:46, GigabitEthernet1/0
As expected however, since there is an MTU mismatch, R1 is unable to receive packets from R2 that exceed an MTU of 1500 bytes.
R2#ping 188.8.131.52 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 184.108.40.206, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms R2#ping Protocol [ip]: Target IP address: 220.127.116.11 Repeat count : Datagram size : 2000 Timeout in seconds : Extended commands [n]: y Source address or interface: Type of service : Set DF bit in IP header? [no]: yes Validate reply data? [no]: Data pattern [0xABCD]: Loose, Strict, Record, Timestamp, Verbose[none]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 5, 2000-byte ICMP Echos to 18.104.22.168, timeout is 2 seconds: ..... Success rate is 0 percent (0/5)
Theoretically this MTU mismatch should not matter, since end hosts that send traffic should ideally implement Path MTU Discovery. However, let’s now see a case where R2 is unable to flood LSAs to R1 for which the IP packet size exceeds 1500 bytes.
R3, who connects to R2, has been configured with a large number of Loopback interfaces in order to generate a large Router LSA (LSA Type 1). R3′s configuration is as follows, where Loopbacks 22.214.171.124 – 126.96.36.199 have been omitted:
R3: interface FastEthernet0/0 ip address 188.8.131.52 255.255.255.0 shutdown ! interface Loopback3330 ip address 184.108.40.206 255.255.255.255 ! [snip] ! interface Loopback333254 ip address 220.127.116.11 255.255.255.255 ! router ospf 1 network 0.0.0.0 255.255.255.255 area 0
The number of resulting local links can be seen in R3′s database as follows:
R3#show ip ospf database OSPF Router with ID (18.104.22.168) (Process ID 1) Router Link States (Area 0) Link ID ADV Router Age Seq# Checksum Link count 22.214.171.124 126.96.36.199 299 0x80000007 0x0050D2 254
Now let’s activate the link between R2 and R3, which will cause R3 to flood a large Router LSA to R2, which in turn causes R2 to flood this to R1.
R3#config t Enter configuration commands, one per line. End with CNTL/Z. R3(config)#int Fa0/0 R3(config-if)#no shutdown R3(config-if)#end R3# R2#debug ip packet detail IP packet debugging is on (detailed) R2#debug ip ospf packet OSPF packet debugging is on R2#config t Enter configuration commands, one per line. End with CNTL/Z. R2(config)#interface Fa2/0 R2(config-if)#no shutdown R2(config-if)#end R2# %SYS-5-CONFIG_I: Configured from console by console IP: s=188.8.131.52 (FastEthernet2/0), d=184.108.40.206, len 76, rcvd 0, proto=89 OSPF: rcv. v:2 t:1 l:44 rid:220.127.116.11 aid:0.0.0.0 chk:D59B aut:0 auk: from FastEthernet2/0 IP: s=18.104.22.168 (local), d=22.214.171.124 (FastEthernet2/0), len 80, sending, proto=89 [snip]
R2 and R3 form adjacency, and R3′s LSA is flooded to R2. Since the LSA takes more than one 1500 byte packet, it is fragmented into multiple packets, with the largest being the shared MTU of 1500 between them.
IP: s=126.96.36.199 (FastEthernet2/0), d=188.8.131.52, len 1500, rcvd 0 IP Fragment, Ident = 497, fragment offset = 0, proto=89 IP: recv fragment from 184.108.40.206 offset 0 bytes IP: s=220.127.116.11 (FastEthernet2/0), d=18.104.22.168, len 1500, rcvd 0 IP Fragment, Ident = 497, fragment offset = 1480 IP: recv fragment from 22.214.171.124 offset 1480 bytes IP: s=126.96.36.199 (FastEthernet2/0), d=188.8.131.52, len 172, rcvd 0 IP Fragment, Ident = 497, fragment offset = 2960 IP: recv fragment from 184.108.40.206 offset 2960 bytes OSPF: rcv. v:2 t:4 l:3112 rid:220.127.116.11 aid:0.0.0.0 chk:297C aut:0 auk: from FastEthernet2/0 %OSPF-5-ADJCHG: Process 1, Nbr 18.104.22.168 on FastEthernet2/0 from LOADING to FULL, Loading Done
Once the adjacency is full, R2 installs R3′s routes, and begins to flood to R1:
R2#show ip route ospf 22.214.171.124/32 is subnetted, 1 subnets O 126.96.36.199 [110/2] via 188.8.131.52, 00:00:10, GigabitEthernet1/0 184.108.40.206/32 is subnetted, 254 subnets O 220.127.116.11 [110/2] via 18.104.22.168, 00:00:10, FastEthernet2/0 [snip] O 22.214.171.124 [110/2] via 126.96.36.199, 00:00:10, FastEthernet2/0 R2# IP: s=188.8.131.52 (local), d=184.108.40.206 (GigabitEthernet1/0), len 3132, sending broad/multicast, proto=89 IP: s=220.127.116.11 (local), d=18.104.22.168 (GigabitEthernet1/0), len 1996, sending fragment IP Fragment, Ident = 854, fragment offset = 0, proto=89 IP: s=22.214.171.124 (local), d=126.96.36.199 (GigabitEthernet1/0), len 1156, sending last fragment IP Fragment, Ident = 854, fragment offset = 1976
Note that since the LSA exceeds the MTU of 2000 bytes, it is fragmented into multiple packets. Since R1 cannot accept packets that exceed its MTU of 1500 bytes, the LSUs are never received. This means that R1 cannot synchronize the database with R2, as seen as follows.
R1#show ip ospf database OSPF Router with ID (188.8.131.52) (Process ID 1) Router Link States (Area 0) Link ID ADV Router Age Seq# Checksum Link count 184.108.40.206 220.127.116.11 62 0x80000005 0x6592 2 18.104.22.168 22.214.171.124 35 0x8000000D 0x613E 3 Net Link States (Area 0) Link ID ADV Router Age Seq# Checksum 126.96.36.199 188.8.131.52 62 0x80000001 0x61BB 184.108.40.206 220.127.116.11 36 0x80000001 0x974C R2#show ip ospf database OSPF Router with ID (18.104.22.168) (Process ID 1) Router Link States (Area 0) Link ID ADV Router Age Seq# Checksum Link count 22.214.171.124 126.96.36.199 67 0x80000005 0x6592 2 188.8.131.52 184.108.40.206 38 0x8000000D 0x613E 3 220.127.116.11 18.104.22.168 39 0x80000005 0x2AAD 255 Net Link States (Area 0) Link ID ADV Router Age Seq# Checksum 22.214.171.124 126.96.36.199 67 0x80000001 0x61BB 188.8.131.52 184.108.40.206 39 0x80000001 0x974C R3#show ip ospf database OSPF Router with ID (220.127.116.11) (Process ID 1) Router Link States (Area 0) Link ID ADV Router Age Seq# Checksum Link count 18.104.22.168 22.214.171.124 69 0x80000005 0x006592 2 126.96.36.199 188.8.131.52 40 0x8000000D 0x00613E 3 184.108.40.206 220.127.116.11 39 0x80000005 0x002AAD 255 Net Link States (Area 0) Link ID ADV Router Age Seq# Checksum 18.104.22.168 22.214.171.124 69 0x80000001 0x0061BB 126.96.36.199 188.8.131.52 39 0x80000001 0x00974C
This also implies that R1 cannot install routes towards R3:
R1#show ip route ospf 184.108.40.206/32 is subnetted, 1 subnets O 220.127.116.11 [110/2] via 18.104.22.168, 00:00:02, GigabitEthernet1/0 22.214.171.124/24 is subnetted, 1 subnets O 126.96.36.199 [110/2] via 188.8.131.52, 00:00:02, GigabitEthernet1/0
Eventually the adjacency state between R1 and R2 is lost, due to the lack of LSAcks sent in response to R2′s LSUs. This can be seen in R1′s “debug ip ospf packet” as follows, and the “show ip ospf neighbor” on both devices:
R1# OSPF: rcv. v:2 t:1 l:44 rid:184.108.40.206 aid:0.0.0.0 chk:DC98 aut:0 auk: from GigabitEthernet1/0 OSPF: Cannot see ourself in hello from 220.127.116.11 on GigabitEthernet1/0, state INIT R1#show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 18.104.22.168 1 LOADING/DR 00:00:34 22.214.171.124 GigabitEthernet1/0 R2#show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 126.96.36.199 1 FULL/DR 00:00:35 188.8.131.52 FastEthernet2/0 184.108.40.206 1 FULL/BDR 00:00:39 220.127.116.11 GigabitEthernet1/0
The key with this example is that although the “ip ospf mtu-ignore” command allows the initial adjacency to form between R1 and R2, we can see that synchronization fails between them when an LSA replication event causes packet sizes generated by R2 to exceed R1′s MTU.
Based on this we can see that the “ip ospf mtu-ignore” command is not a fix to the underlying problem. Instead it is simply an exception to the OSPF adjacency state machine. The real fix to this problem is to ensure that the MTU values match between neighbors, which prevents both routing exchange in the control plane, and packet drops due to unsupported sizes in the data plane.
About Brian McGahan, CCIE #8593, CCDE #2013::13:
Brian McGahan was one of the youngest engineers in the world to obtain the CCIE, having achieved his first CCIE in Routing & Switching at the age of 20 in 2002. Brian has been teaching and developing CCIE training courses for over 10 years, and has assisted thousands of engineers in obtaining their CCIE certification. When not teaching or developing new products Brian consults with large ISPs and enterprise customers in the midwest region of the United States.
25 Responses to “OSPF and MTU Mismatch”
Leave a Reply