OSPF and MTU Mismatch
Dear Brian,
What is the difference between using the “system mtu routing 1500” and the “ip ospf mtu-ignore” commands when running OSPF between a router and a switch?
Thanks,
Paul
Hi Paul,
Within the scope of the CCIE Lab Exam, it may be acceptable to issue either of these commands to solve a specific lab task. However, it is key to note that there is a difference between ignoring the MTU for the purpose of OSPF adjacency and matching the MTU within a real production network.
By design, OSPF will automatically detect a MTU mismatch between two devices when they exchange the Database Description (DBD) packets during the formation of adjacency. This is per the standard OSPF specification defined in RFC 2328, “OSPF Version 2”. Specifically the RFC states the following:
10.6. Receiving Database Description Packets
This section explains the detailed processing of a received
Database Description Packet.
[snip]
If the Interface MTU field in the Database Description packet
indicates an IP datagram size that is larger than the router can
accept on the receiving interface without fragmentation, the
Database Description packet is rejected.
[/snip]
Basically this means that if a router tries to negotiate an adjacency on an interface in which the remote neighbor has a larger MTU, the adjacency will be denied. The idea behind this check is two-fold. The first is to alleviate a problem in the data plane, in which a sending host transmits packets to a receiver that are too large to accept. Typically, Path MTU Discovery (PMTUD) should be implemented on the sender to prevent this case, however this process relies on ICMP messages that could possibly be filtered out in the transit path due to a security policy. The second, and most important issue, is to alleviate a problem in the control plane in which OSPF packets are exchanged.
Specifically this problem stems from the issue that the OSPF Hello, Database Description (DBD), Link-State Request (LSR), and Link-State Acknowledgement (LSAck) packets are generally small, but the Link-State Update (LSU) packets are generally not.
When establishing a new OSPF adjacency, the DBD packet is used to tell new neighbors what LSAs are in the database, but not to give the details about them. Specifically the DBD contains the LSA Header information, but not the actual LSA payload. The idea behind this is to optimize flooding in the case that the receiving router already received the LSA from another neighbor, in which case flooding does not need to occur during adjacency establishment.
For example, suppose that you and I, routers A and B, both have neighbors C and D, and the database is synchronized. If you and I form a new adjacency, my DBD exchange to you will say that I have LSAs A, B, C, and D in my database. Since you are already adjacent with C and D, and I am adjacent with them, you already have all of my LSAs, possibly with the exception of the new link that connects us. This means that even though I describe LSAs A and B to you with my DBD packet, you don’t send an LSR to me for them, which means I don’t send you an LSU about them. This is the normal optimization of how the database is exchanged so that excessive flooding doesn’t occur.
Suppose next that you, router A, know about LSAs A1 through An in your database, and I, router B, know about LSAs B1 through Bn. When we establish an adjacency your DBD to me will describe LSAs A1-An, while mine will describe LSAs B1-Bn. Since I don’t have LSAs A1-An, I will send you an LSR about them, and likewise since you don’t have B1-Bn, you will send an LSR about those to me. When you reply back to me with the LSUs about A1-An, it is likely that the LSU packet itself will contain more than one LSA in the payload, or that if the LSA is large, that it will span multiple IP fragments. The idea behind this is that since you need to send me more than one LSA, it’s more efficient to send them in as few LSUs as possible, instead of sending one LSA per LSU. The problem that can occur in this procedure however is when the router that is flooding has a larger MTU than the router that is receiving.
For example, suppose that the flooding router has a Gigabit Ethernet interface that supports Jumbo frames, which exceed the normal Ethernet MTU of 1500 bytes; however, the receiving router has not enabled Jumbo frame support, which implies that frames over 1500 bytes (excluding layer 2 overhead) will be dropped. If the flooding router sends multiple LSAs in an LSU forcing the packet size to exceed 1500 bytes, or if a single LSA sent by the flooding router is large enough to exceed 1500 bytes, such as a Router LSA (LSA Type 1) with many links, the results can be non-deterministic.
To demonstrate this, take the following topology.
R1 and R2 connect with GigabitEthernet, while R2 and R3 connect with FastEthernet. R1 has a default MTU of 1500 bytes configured on its link to R2, while R2 has Jumbo frame support configured up to 2000 bytes. R2 and R3’s link uses the default MTU of 1500 bytes. Per the RFC’s defined behavior, R1 should reject a OSPF adjacency with R2. This default behavior can be seen as follows:
R1: interface GigabitEthernet1/0 ip address 12.0.0.1 255.255.255.0 ! router ospf 1 network 0.0.0.0 255.255.255.255 area 0 R2: interface GigabitEthernet1/0 mtu 2000 ip address 12.0.0.2 255.255.255.0 ! router ospf 1 network 0.0.0.0 255.255.255.255 area 0 R1#debug ip packet detail IP packet debugging is on (detailed) R1#debug ip ospf adj OSPF adjacency events debugging is on 01:07:18: OSPF: Rcv DBD from 2.2.2.2 on GigabitEthernet1/0 seq 0x172A opt 0x52 flag 0x7 len 32 mtu 2000 state EXSTART 01:07:18: OSPF: Nbr 2.2.2.2 has larger interface MTU 01:07:18: OSPF: Retransmitting DBD to 2.2.2.2 on GigabitEthernet1/0 01:07:18: OSPF: Up DBD Retransmit cnt to 5 for 2.2.2.2 on GigabitEthernet1/0 01:07:18: OSPF: Send DBD to 2.2.2.2 on GigabitEthernet1/0 seq 0x1813 opt 0x52 flag 0x7 len 32
In this case we can see that R1 rejects R2′s DBD packet, since the MTU is larger. Although the obvious solution to this problem is to simply match the MTU of the links to avoid this problem in the first place, IOS also offers the “ip ospf mtu-ignore” command at the interface level to skip over this check in the OSPF adjacency state machine. Once applied, as seen below, R1 and R2 form an adjacency.
R1#conf t Enter configuration commands, one per line. End with CNTL/Z. R1(config)#interface Gig1/0 R1(config-if)#ip ospf mtu-ignore R1(config-if)#end R1# %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on GigabitEthernet1/0 from LOADING to FULL, Loading Done R1#show ip ospf neighbor Neighbor ID Pri State Dead Time Address Interface 2.2.2.2 1 FULL/DR 00:00:36 12.0.0.2 GigabitEthernet1/0
At this point, both R1 and R2 learn the routes to each other’s Loopback0 interfaces, as seen below.
R1#show ip route ospf
2.0.0.0/32 is subnetted, 1 subnets
O 2.2.2.2 [110/2] via 12.0.0.2, 00:00:05, GigabitEthernet1/0
R2#show ip route ospf
1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/2] via 12.0.0.1, 00:00:46, GigabitEthernet1/0
As expected however, since there is an MTU mismatch, R1 is unable to receive packets from R2 that exceed an MTU of 1500 bytes.
R2#ping 1.1.1.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms R2#ping Protocol [ip]: Target IP address: 1.1.1.1 Repeat count [5]: Datagram size [100]: 2000 Timeout in seconds [2]: Extended commands [n]: y Source address or interface: Type of service [0]: Set DF bit in IP header? [no]: yes Validate reply data? [no]: Data pattern [0xABCD]: Loose, Strict, Record, Timestamp, Verbose[none]: Sweep range of sizes [n]: Type escape sequence to abort. Sending 5, 2000-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds: ..... Success rate is 0 percent (0/5)
Theoretically this MTU mismatch should not matter, since end hosts that send traffic should ideally implement Path MTU Discovery. However, let’s now see a case where R2 is unable to flood LSAs to R1 for which the IP packet size exceeds 1500 bytes.
R3, who connects to R2, has been configured with a large number of Loopback interfaces in order to generate a large Router LSA (LSA Type 1). R3′s configuration is as follows, where Loopbacks 3.3.3.2 – 3.3.3.253 have been omitted:
R3: interface FastEthernet0/0 ip address 23.0.0.3 255.255.255.0 shutdown ! interface Loopback3330 ip address 3.3.3.0 255.255.255.255 ! [snip] ! interface Loopback333254 ip address 3.3.3.254 255.255.255.255 ! router ospf 1 network 0.0.0.0 255.255.255.255 area 0
The number of resulting local links can be seen in R3′s database as follows:
R3#show ip ospf database
OSPF Router with ID (23.0.0.3) (Process ID 1)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
23.0.0.3 23.0.0.3 299 0x80000007 0x0050D2 254
Now let’s activate the link between R2 and R3, which will cause R3 to flood a large Router LSA to R2, which in turn causes R2 to flood this to R1.
R3#config t
Enter configuration commands, one per line. End with CNTL/Z.
R3(config)#int Fa0/0
R3(config-if)#no shutdown
R3(config-if)#end
R3#
R2#debug ip packet detail
IP packet debugging is on (detailed)
R2#debug ip ospf packet
OSPF packet debugging is on
R2#config t
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#interface Fa2/0
R2(config-if)#no shutdown
R2(config-if)#end
R2#
%SYS-5-CONFIG_I: Configured from console by console
IP: s=23.0.0.3 (FastEthernet2/0), d=224.0.0.5, len 76, rcvd 0, proto=89
OSPF: rcv. v:2 t:1 l:44 rid:23.0.0.3
aid:0.0.0.0 chk:D59B aut:0 auk: from FastEthernet2/0
IP: s=23.0.0.2 (local), d=23.0.0.3 (FastEthernet2/0), len 80, sending, proto=89
[snip]
R2 and R3 form adjacency, and R3′s LSA is flooded to R2. Since the LSA takes more than one 1500 byte packet, it is fragmented into multiple packets, with the largest being the shared MTU of 1500 between them.
IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 1500, rcvd 0
IP Fragment, Ident = 497, fragment offset = 0, proto=89
IP: recv fragment from 23.0.0.3 offset 0 bytes
IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 1500, rcvd 0
IP Fragment, Ident = 497, fragment offset = 1480
IP: recv fragment from 23.0.0.3 offset 1480 bytes
IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 172, rcvd 0
IP Fragment, Ident = 497, fragment offset = 2960
IP: recv fragment from 23.0.0.3 offset 2960 bytes
OSPF: rcv. v:2 t:4 l:3112 rid:23.0.0.3
aid:0.0.0.0 chk:297C aut:0 auk: from FastEthernet2/0
%OSPF-5-ADJCHG: Process 1, Nbr 23.0.0.3 on FastEthernet2/0 from LOADING to FULL, Loading Done
Once the adjacency is full, R2 installs R3′s routes, and begins to flood to R1:
R2#show ip route ospf
1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/2] via 12.0.0.1, 00:00:10, GigabitEthernet1/0
3.0.0.0/32 is subnetted, 254 subnets
O 3.3.3.1 [110/2] via 23.0.0.3, 00:00:10, FastEthernet2/0
[snip]
O 3.3.3.254 [110/2] via 23.0.0.3, 00:00:10, FastEthernet2/0
R2#
IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 3132, sending broad/multicast, proto=89
IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 1996, sending fragment
IP Fragment, Ident = 854, fragment offset = 0, proto=89
IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 1156, sending last fragment
IP Fragment, Ident = 854, fragment offset = 1976
Note that since the LSA exceeds the MTU of 2000 bytes, it is fragmented into multiple packets. Since R1 cannot accept packets that exceed its MTU of 1500 bytes, the LSUs are never received. This means that R1 cannot synchronize the database with R2, as seen as follows.
R1#show ip ospf database
OSPF Router with ID (1.1.1.1) (Process ID 1)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
1.1.1.1 1.1.1.1 62 0x80000005 0x6592 2
2.2.2.2 2.2.2.2 35 0x8000000D 0x613E 3
Net Link States (Area 0)
Link ID ADV Router Age Seq# Checksum
12.0.0.1 1.1.1.1 62 0x80000001 0x61BB
23.0.0.3 23.0.0.3 36 0x80000001 0x974C
R2#show ip ospf database
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
1.1.1.1 1.1.1.1 67 0x80000005 0x6592 2
2.2.2.2 2.2.2.2 38 0x8000000D 0x613E 3
23.0.0.3 23.0.0.3 39 0x80000005 0x2AAD 255
Net Link States (Area 0)
Link ID ADV Router Age Seq# Checksum
12.0.0.1 1.1.1.1 67 0x80000001 0x61BB
23.0.0.3 23.0.0.3 39 0x80000001 0x974C
R3#show ip ospf database
OSPF Router with ID (23.0.0.3) (Process ID 1)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
1.1.1.1 1.1.1.1 69 0x80000005 0x006592 2
2.2.2.2 2.2.2.2 40 0x8000000D 0x00613E 3
23.0.0.3 23.0.0.3 39 0x80000005 0x002AAD 255
Net Link States (Area 0)
Link ID ADV Router Age Seq# Checksum
12.0.0.1 1.1.1.1 69 0x80000001 0x0061BB
23.0.0.3 23.0.0.3 39 0x80000001 0x00974C
This also implies that R1 cannot install routes towards R3:
R1#show ip route ospf
2.0.0.0/32 is subnetted, 1 subnets
O 2.2.2.2 [110/2] via 12.0.0.2, 00:00:02, GigabitEthernet1/0
23.0.0.0/24 is subnetted, 1 subnets
O 23.0.0.0 [110/2] via 12.0.0.2, 00:00:02, GigabitEthernet1/0
Eventually the adjacency state between R1 and R2 is lost, due to the lack of LSAcks sent in response to R2′s LSUs. This can be seen in R1′s “debug ip ospf packet” as follows, and the “show ip ospf neighbor” on both devices:
R1#
OSPF: rcv. v:2 t:1 l:44 rid:2.2.2.2
aid:0.0.0.0 chk:DC98 aut:0 auk: from GigabitEthernet1/0
OSPF: Cannot see ourself in hello from 2.2.2.2 on GigabitEthernet1/0, state INIT
R1#show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface
2.2.2.2 1 LOADING/DR 00:00:34 12.0.0.2 GigabitEthernet1/0
R2#show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface
23.0.0.3 1 FULL/DR 00:00:35 23.0.0.3 FastEthernet2/0
1.1.1.1 1 FULL/BDR 00:00:39 12.0.0.1 GigabitEthernet1/0
The key with this example is that although the “ip ospf mtu-ignore” command allows the initial adjacency to form between R1 and R2, we can see that synchronization fails between them when an LSA replication event causes packet sizes generated by R2 to exceed R1′s MTU.
Based on this we can see that the “ip ospf mtu-ignore” command is not a fix to the underlying problem. Instead it is simply an exception to the OSPF adjacency state machine. The real fix to this problem is to ensure that the MTU values match between neighbors, which prevents both routing exchange in the control plane, and packet drops due to unsupported sizes in the data plane.
About Brian McGahan, CCIE #8593, CCDE #2013::13:
Brian McGahan was one of the youngest engineers in the world to obtain the CCIE, having achieved his first CCIE in Routing & Switching at the age of 20 in 2002. Brian has been teaching and developing CCIE training courses for over 8 years, and has assisted thousands of engineers in obtaining their CCIE certification. When not teaching or developing new products Brian consults with large ISPs and enterprise customers in the midwest region of the United States.
Find all posts by Brian McGahan, CCIE #8593, CCDE #2013::13 | Visit Website
You can leave a response, or trackback from your own site.
23 Responses to “OSPF and MTU Mismatch”
Leave a Reply



Very nice writ up.
Very helpful. tks.
awesome..thats i always check with INE each time i need some depth on a piece of technology. thx.
Great article.
I must admit, I’ve probably used OSPf mtu-ignore too many times in the past, but this is a real eye opener
Where is the routing loop though? The other post leading to this talked about a routing loop.
dont’ get me wrong, it’s a great post, but I’m still confused.
@Bob Any time routers in the network do not agree on the topology, both traffic loops and traffic black holes can occur. Typically these temporary conditions are deemed a “transient” loop. Essentially it is a failure in convergence that results in packet loss. One case that this can occur is due to the MTU issue aforementioned. This is not the same as a routing loop, such as due to redistribution.
I agree with Darren…same here!
“Theoretically this MTU mismatch should not matter, since end hosts that send traffic should ideally implement Path MTU Discovery.”
I’m struggling with this assertion.
Suppose a router is trying to deliver a large IP packet to a host (or to another router) with a too-small MTU configured. The router will format the large frame, and put it onto the wire. The receiving host/router will log an error.
How will the sending (large MTU) router know that it’s formatting un-receivable frames, so that it can generate an ICMP “too big” message in order to effect PMTUD?
I’m under the impression that these rules apply to MTU sizing…
1) All L3 systems (hosts/routers) sharing an IP subnet must agree on the MTU.
2) All L2 gear must support an MTU at least as large as the L3 system MTU.
Do I need to adjust my thinking on these rules?
If all links in the transit path use an MTU of 1500 bytes, but one segment or just one router supports giants, let’s say 4000 bytes, PMTUD should automatically negotiate the MTU down to 1500. Anything larger than this would assume that all devices in the transit path support larger than 1500, which includes the end host.
Hi Brian,
I’m down with the “one segment supports” an oddball MTU scenario.
But I remain confused about the scenario where only one /interface/ in a broadcast domain supports a large MTU. It seems to me that this one device has no way to know about his neighbor’s limitation.
If he doesn’t know about his neighbor’s small MTU, then he’ll happily forward an un-receivable frame onto the wire, rather than kicking back a “too big” message to the originator.
Mismatched MTU within a broadcast domain seems utterly unsupportable. What am I missing here?
Right, the issue is that the routers don’t know that there is an MTU mismatch, hence the packet drops. This is what the OSPF problem demonstrates. Theoretically end hosts *could* prevent this by agreeing on the lowest MTU bidirectionally, but they don’t. PMTUD doesn’t normally account for this because mismatching the MTU between devices on the same segment isn’t a design issue, it’s just a misconfiguration.
Okay, we’re on the same page then.
I guess I interpreted your PMTUD comment to mean that PMTUD is able to catch/fix/workaround mismatched router MTU on a transit link.
Thanks for clarifying!
I’m a bit confused here,
Do you mean PMTUD is working when trying to send a traffic ( Data Plane ) only.
why PMTUD didn’t work when R2 trying sending the LSU to R1 ( is sending LSU to OSPF neighbor not considered a Data Plane ).
or did you bulit the scenario with disabling PMTUD on one side between R1 & R2?
Thanks,
@Ahmed PMTUD does not fix the problem outlined in this post. The problem is in the control plane (OSPF) not the data plane (end host traffic). PMTUD works in the data plane not the control plane.
So I am hopeful that I have found a souitlon in your post here, but I don’t get exactly what i am supposed to do. My setup is simply a cable modem in bridge mode wired into my Apple Extreme and wirelessly connected to the Xbox, no static IP. I get the MTU error. Any help would be greatly appreciated. By the way, I too am a Nole fan (Class of ’97, Marching Chief).
Thanks Brian,
Now I understand that the simple “ip ospf mtu-ignore” isn’t enough and why.
Still very helpful almost a year on.
Going to have to rethink some of our mtu-ignores on our routers.
Thank you for this post.
Thanks a lot Brian… its very helpful.
Hi Brian,
We having the same issue here. We have two 4500 routers which is R1 and R2. The R1 and R2 has two interconnect which are primary and secondary link for redundant. For the primary interconnect, we use bandwidth shaper in between R1 and R2. For secondary link its just a direct connection between R1 and R2.
R1 interface configured with 1500 MTU size and R2 with 1520 MTU.
OSPF is running in between these two routers.
R1 interface configured “ip ospf mtu-ignore” but R2 nothing.
The situation is that, the OSPF adjacency no issue on primary link but when shifted to secondary link, the issue started wherby it built adjancy with FULL and after 1 minutes if shows DOWN. After 30 minutes, its back to FULL and DOWN.C
Can i know what was the reason behind this?
It’s because there is a problem is the database exchange. The fix for this is to set the MTU to be the same on the link connecting them.
HI Brian,
Good document i have one question ,
When R2 sent a packet with 2000 bytes , R1 rejects it as he cannot accept it , R1 should send a ICMP message saying fragmentation needed , why does R1 not send a ICMP message ?
Thanks
-BG
How can it send an ICMP message if the interface driver drops the packet?
Great, well put.