Managing OSPF MTU Mismatch: Commands & Impact

OSPF and MTU Mismatch

Dear Brian,

What is the difference between using the “system mtu routing 1500” and the “ip ospf mtu-ignore” commands when running OSPF between a router and a switch?

Thanks,

Paul

Hi Paul,

Within the scope of the CCIE Lab Exam, it may be acceptable to issue either of these commands to solve a specific lab task. However, it is key to note that there is a difference between ignoring the MTU for the purpose of OSPF adjacency and matching the MTU within a real production network.

By design, OSPF will automatically detect a MTU mismatch between two devices when they exchange the Database Description (DBD) packets during the formation of adjacency. This is per the standard OSPF specification defined in RFC 2328, “OSPF Version 2”. Specifically the RFC states the following:

10.6.  Receiving Database Description Packets

        This section explains the detailed processing of a received

        Database Description Packet.

[snip]

        If the Interface MTU field in the Database Description packet

        indicates an IP datagram size that is larger than the router can

        accept on the receiving interface without fragmentation, the

        Database Description packet is rejected.

[/snip]

Basically this means that if a router tries to negotiate an adjacency on an interface in which the remote neighbor has a larger MTU, the adjacency will be denied. The idea behind this check is two-fold. The first is to alleviate a problem in the data plane, in which a sending host transmits packets to a receiver that are too large to accept. Typically, Path MTU Discovery (PMTUD) should be implemented on the sender to prevent this case, however this process relies on ICMP messages that could possibly be filtered out in the transit path due to a security policy. The second, and most important issue, is to alleviate a problem in the control plane in which OSPF packets are exchanged.

Specifically this problem stems from the issue that the OSPF Hello, Database Description (DBD), Link-State Request (LSR), and Link-State Acknowledgement (LSAck) packets are generally small, but the Link-State Update (LSU) packets are generally not.

When establishing a new OSPF adjacency, the DBD packet is used to tell new neighbors what LSAs are in the database, but not to give the details about them. Specifically the DBD contains the LSA Header information, but not the actual LSA payload. The idea behind this is to optimize flooding in the case that the receiving router already received the LSA from another neighbor, in which case flooding does not need to occur during adjacency establishment.

For example, suppose that you and I, routers A and B, both have neighbors C and D, and the database is synchronized. If you and I form a new adjacency, my DBD exchange to you will say that I have LSAs A, B, C, and D in my database. Since you are already adjacent with C and D, and I am adjacent with them, you already have all of my LSAs, possibly with the exception of the new link that connects us. This means that even though I describe LSAs A and B to you with my DBD packet, you don't send an LSR to me for them, which means I don't send you an LSU about them. This is the normal optimization of how the database is exchanged so that excessive flooding doesn't occur.

Suppose next that you, router A, know about LSAs A1 through An in your database, and I, router B, know about LSAs B1 through Bn. When we establish an adjacency your DBD to me will describe LSAs A1-An, while mine will describe LSAs B1-Bn. Since I don't have LSAs A1-An, I will send you an LSR about them, and likewise since you don't have B1-Bn, you will send an LSR about those to me. When you reply back to me with the LSUs about A1-An, it is likely that the LSU packet itself will contain more than one LSA in the payload, or that if the LSA is large, that it will span multiple IP fragments. The idea behind this is that since you need to send me more than one LSA, it's more efficient to send them in as few LSUs as possible, instead of sending one LSA per LSU. The problem that can occur in this procedure however is when the router that is flooding has a larger MTU than the router that is receiving.

For example, suppose that the flooding router has a Gigabit Ethernet interface that supports Jumbo frames, which exceed the normal Ethernet MTU of 1500 bytes; however, the receiving router has not enabled Jumbo frame support, which implies that frames over 1500 bytes (excluding layer 2 overhead) will be dropped. If the flooding router sends multiple LSAs in an LSU forcing the packet size to exceed 1500 bytes, or if a single LSA sent by the flooding router is large enough to exceed 1500 bytes, such as a Router LSA (LSA Type 1) with many links, the results can be non-deterministic.
To demonstrate this, take the following topology.

R1 and R2 connect with GigabitEthernet, while R2 and R3 connect with FastEthernet. R1 has a default MTU of 1500 bytes configured on its link to R2, while R2 has Jumbo frame support configured up to 2000 bytes. R2 and R3’s link uses the default MTU of 1500 bytes. Per the RFC’s defined behavior, R1 should reject a OSPF adjacency with R2. This default behavior can be seen as follows:

R1:
interface GigabitEthernet1/0
ip address 12.0.0.1 255.255.255.0
!
router ospf 1
network 0.0.0.0 255.255.255.255 area 0

R2:
interface GigabitEthernet1/0
mtu 2000
ip address 12.0.0.2 255.255.255.0
!
router ospf 1
network 0.0.0.0 255.255.255.255 area 0

R1#debug ip packet detail
IP packet debugging is on (detailed)
R1#debug ip ospf adj
OSPF adjacency events debugging is on

01:07:18: OSPF: Rcv DBD from 2.2.2.2 on GigabitEthernet1/0 seq 0x172A opt 0x52 flag 0x7 len 32 mtu 2000 state EXSTART
01:07:18: OSPF: Nbr 2.2.2.2 has larger interface MTU
01:07:18: OSPF: Retransmitting DBD to 2.2.2.2 on GigabitEthernet1/0
01:07:18: OSPF: Up DBD Retransmit cnt to 5 for 2.2.2.2 on GigabitEthernet1/0
01:07:18: OSPF: Send DBD to 2.2.2.2 on GigabitEthernet1/0 seq 0x1813 opt 0x52 flag 0x7 len 32

In this case we can see that R1 rejects R2's DBD packet, since the MTU is larger. Although the obvious solution to this problem is to simply match the MTU of the links to avoid this problem in the first place, IOS also offers the "ip ospf mtu-ignore" command at the interface level to skip over this check in the OSPF adjacency state machine. Once applied, as seen below, R1 and R2 form an adjacency.

R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#interface Gig1/0
R1(config-if)#ip ospf mtu-ignore
R1(config-if)#end
R1#
%OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on GigabitEthernet1/0 from LOADING to FULL, Loading Done
R1#show ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface
2.2.2.2 1 FULL/DR 00:00:36 12.0.0.2 GigabitEthernet1/0

At this point, both R1 and R2 learn the routes to each other's Loopback0 interfaces, as seen below.

R1#show ip route ospf
2.0.0.0/32 is subnetted, 1 subnets
O 2.2.2.2 [110/2] via 12.0.0.2, 00:00:05, GigabitEthernet1/0

R2#show ip route ospf
1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/2] via 12.0.0.1, 00:00:46, GigabitEthernet1/0

As expected however, since there is an MTU mismatch, R1 is unable to receive packets from R2 that exceed an MTU of 1500 bytes.

R2#ping 1.1.1.1
Type escape sequence to abort.

Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:

!!!!!

Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms
R2#ping

Protocol [ip]:

Target IP address: 1.1.1.1

Repeat count [5]:

Datagram size [100]: 2000

Timeout in seconds [2]:

Extended commands [n]: y

Source address or interface:

Type of service [0]:

Set DF bit in IP header? [no]: yes

Validate reply data? [no]:

Data pattern [0xABCD]:

Loose, Strict, Record, Timestamp, Verbose[none]:

Sweep range of sizes [n]:

Type escape sequence to abort.

Sending 5, 2000-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:

.....

Success rate is 0 percent (0/5)

Theoretically this MTU mismatch should not matter, since end hosts that send traffic should ideally implement Path MTU Discovery. However, let's now see a case where R2 is unable to flood LSAs to R1 for which the IP packet size exceeds 1500 bytes.

R3, who connects to R2, has been configured with a large number of Loopback interfaces in order to generate a large Router LSA (LSA Type 1). R3's configuration is as follows, where Loopbacks 3.3.3.2 - 3.3.3.253 have been omitted:

R3:

interface FastEthernet0/0

 ip address 23.0.0.3 255.255.255.0

 shutdown

!

interface Loopback3330

 ip address 3.3.3.0 255.255.255.255

!

[snip]

!

interface Loopback333254

 ip address 3.3.3.254 255.255.255.255

!

router ospf 1

 network 0.0.0.0 255.255.255.255 area 0

The number of resulting local links can be seen in R3's database as follows:

R3#show ip ospf database

OSPF Router with ID (23.0.0.3) (Process ID 1)

Router Link States (Area 0)

Link ID ADV Router Age Seq# Checksum Link count
23.0.0.3 23.0.0.3 299 0x80000007 0x0050D2 254

Now let's activate the link between R2 and R3, which will cause R3 to flood a large Router LSA to R2, which in turn causes R2 to flood this to R1.

R3#config t

Enter configuration commands, one per line.  End with CNTL/Z.

R3(config)#int Fa0/0

R3(config-if)#no shutdown

R3(config-if)#end

R3#
R2#debug ip packet detail

IP packet debugging is on (detailed)

R2#debug ip ospf packet

OSPF packet debugging is on
R2#config t

Enter configuration commands, one per line.  End with CNTL/Z.

R2(config)#interface Fa2/0

R2(config-if)#no shutdown

R2(config-if)#end

R2#

%SYS-5-CONFIG_I: Configured from console by console

IP: s=23.0.0.3 (FastEthernet2/0), d=224.0.0.5, len 76, rcvd 0, proto=89

OSPF: rcv. v:2 t:1 l:44 rid:23.0.0.3

      aid:0.0.0.0 chk:D59B aut:0 auk: from FastEthernet2/0

IP: s=23.0.0.2 (local), d=23.0.0.3 (FastEthernet2/0), len 80, sending, proto=89

[snip]

R2 and R3 form adjacency, and R3's LSA is flooded to R2. Since the LSA takes more than one 1500 byte packet, it is fragmented into multiple packets, with the largest being the shared MTU of 1500 between them.

IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 1500, rcvd 0

    IP Fragment, Ident = 497, fragment offset = 0, proto=89

IP: recv fragment from 23.0.0.3 offset 0 bytes

IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 1500, rcvd 0

    IP Fragment, Ident = 497, fragment offset = 1480

IP: recv fragment from 23.0.0.3 offset 1480 bytes

IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 172, rcvd 0

    IP Fragment, Ident = 497, fragment offset = 2960

IP: recv fragment from 23.0.0.3 offset 2960 bytes

OSPF: rcv. v:2 t:4 l:3112 rid:23.0.0.3

      aid:0.0.0.0 chk:297C aut:0 auk: from FastEthernet2/0

%OSPF-5-ADJCHG: Process 1, Nbr 23.0.0.3 on FastEthernet2/0 from LOADING to FULL, Loading Done

Once the adjacency is full, R2 installs R3's routes, and begins to flood to R1:

R2#show ip route ospf

     1.0.0.0/32 is subnetted, 1 subnets

O       1.1.1.1 [110/2] via 12.0.0.1, 00:00:10, GigabitEthernet1/0

     3.0.0.0/32 is subnetted, 254 subnets

O       3.3.3.1 [110/2] via 23.0.0.3, 00:00:10, FastEthernet2/0

[snip]

O       3.3.3.254 [110/2] via 23.0.0.3, 00:00:10, FastEthernet2/0
R2#

IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 3132, sending broad/multicast, proto=89

IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 1996, sending fragment

    IP Fragment, Ident = 854, fragment offset = 0, proto=89

IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 1156, sending last fragment

    IP Fragment, Ident = 854, fragment offset = 1976

Note that since the LSA exceeds the MTU of 2000 bytes, it is fragmented into multiple packets. Since R1 cannot accept packets that exceed its MTU of 1500 bytes, the LSUs are never received. This means that R1 cannot synchronize the database with R2, as seen as follows.

R1#show ip ospf database

OSPF Router with ID (1.1.1.1) (Process ID 1)

Router Link States (Area 0)

Link ID ADV Router Age Seq# Checksum Link count
1.1.1.1 1.1.1.1 62 0x80000005 0x6592 2
2.2.2.2 2.2.2.2 35 0x8000000D 0x613E 3

Net Link States (Area 0)

Link ID ADV Router Age Seq# Checksum
12.0.0.1 1.1.1.1 62 0x80000001 0x61BB
23.0.0.3 23.0.0.3 36 0x80000001 0x974C

R2#show ip ospf database

OSPF Router with ID (2.2.2.2) (Process ID 1)

Router Link States (Area 0)

Link ID ADV Router Age Seq# Checksum Link count
1.1.1.1 1.1.1.1 67 0x80000005 0x6592 2
2.2.2.2 2.2.2.2 38 0x8000000D 0x613E 3
23.0.0.3 23.0.0.3 39 0x80000005 0x2AAD 255

Net Link States (Area 0)

Link ID ADV Router Age Seq# Checksum
12.0.0.1 1.1.1.1 67 0x80000001 0x61BB
23.0.0.3 23.0.0.3 39 0x80000001 0x974C

R3#show ip ospf database

OSPF Router with ID (23.0.0.3) (Process ID 1)

Router Link States (Area 0)

Link ID ADV Router Age Seq# Checksum Link count
1.1.1.1 1.1.1.1 69 0x80000005 0x006592 2
2.2.2.2 2.2.2.2 40 0x8000000D 0x00613E 3
23.0.0.3 23.0.0.3 39 0x80000005 0x002AAD 255

Net Link States (Area 0)

Link ID ADV Router Age Seq# Checksum
12.0.0.1 1.1.1.1 69 0x80000001 0x0061BB
23.0.0.3 23.0.0.3 39 0x80000001 0x00974C

This also implies that R1 cannot install routes towards R3:

R1#show ip route ospf

     2.0.0.0/32 is subnetted, 1 subnets

O       2.2.2.2 [110/2] via 12.0.0.2, 00:00:02, GigabitEthernet1/0

     23.0.0.0/24 is subnetted, 1 subnets

O       23.0.0.0 [110/2] via 12.0.0.2, 00:00:02, GigabitEthernet1/0

Eventually the adjacency state between R1 and R2 is lost, due to the lack of LSAcks sent in response to R2's LSUs. This can be seen in R1's "debug ip ospf packet" as follows, and the "show ip ospf neighbor" on both devices:

R1#
OSPF: rcv. v:2 t:1 l:44 rid:2.2.2.2
aid:0.0.0.0 chk:DC98 aut:0 auk: from GigabitEthernet1/0
OSPF: Cannot see ourself in hello from 2.2.2.2 on GigabitEthernet1/0, state INIT

R1#show ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface
2.2.2.2 1 LOADING/DR 00:00:34 12.0.0.2 GigabitEthernet1/0

R2#show ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface
23.0.0.3 1 FULL/DR 00:00:35 23.0.0.3 FastEthernet2/0
1.1.1.1 1 FULL/BDR 00:00:39 12.0.0.1 GigabitEthernet1/0

The key with this example is that although the "ip ospf mtu-ignore" command allows the initial adjacency to form between R1 and R2, we can see that synchronization fails between them when an LSA replication event causes packet sizes generated by R2 to exceed R1's MTU.

Based on this we can see that the "ip ospf mtu-ignore" command is not a fix to the underlying problem. Instead it is simply an exception to the OSPF adjacency state machine. The real fix to this problem is to ensure that the MTU values match between neighbors, which prevents both routing exchange in the control plane, and packet drops due to unsupported sizes in the data plane.