blog
    OSPF and MTU Mismatch
    30 March 11

    OSPF and MTU Mismatch

    Posted byBrian McGahan
    facebooktwitterlinkedin
    news-featured

    OSPF and MTU Mismatch

    Dear Brian,

    What is the difference between using the “system mtu routing 1500” and the “ip ospf mtu-ignore” commands when running OSPF between a router and a switch?

    Thanks,

    Paul

    Hi Paul,

    Within the scope of the CCIE Lab Exam, it may be acceptable to issue either of these commands to solve a specific lab task. However, it is key to note that there is a difference between ignoring the MTU for the purpose of OSPF adjacency and matching the MTU within a real production network.

    By design, OSPF will automatically detect a MTU mismatch between two devices when they exchange the Database Description (DBD) packets during the formation of adjacency. This is per the standard OSPF specification defined in RFC 2328, “OSPF Version 2”. Specifically the RFC states the following:

    10.6.  Receiving Database Description Packets
    This section explains the detailed processing of a received
    Database Description Packet.
    [snip]
    If the Interface MTU field in the Database Description packet
    indicates an IP datagram size that is larger than the router can
    accept on the receiving interface without fragmentation, the
    Database Description packet is rejected.
    [/snip]

    Basically this means that if a router tries to negotiate an adjacency on an interface in which the remote neighbor has a larger MTU, the adjacency will be denied. The idea behind this check is two-fold. The first is to alleviate a problem in the data plane, in which a sending host transmits packets to a receiver that are too large to accept. Typically, Path MTU Discovery (PMTUD) should be implemented on the sender to prevent this case, however this process relies on ICMP messages that could possibly be filtered out in the transit path due to a security policy. The second, and most important issue, is to alleviate a problem in the control plane in which OSPF packets are exchanged.

    Specifically this problem stems from the issue that the OSPF Hello, Database Description (DBD), Link-State Request (LSR), and Link-State Acknowledgement (LSAck) packets are generally small, but the Link-State Update (LSU) packets are generally not.

    When establishing a new OSPF adjacency, the DBD packet is used to tell new neighbors what LSAs are in the database, but not to give the details about them. Specifically the DBD contains the LSA Header information, but not the actual LSA payload. The idea behind this is to optimize flooding in the case that the receiving router already received the LSA from another neighbor, in which case flooding does not need to occur during adjacency establishment.

    For example, suppose that you and I, routers A and B, both have neighbors C and D, and the database is synchronized. If you and I form a new adjacency, my DBD exchange to you will say that I have LSAs A, B, C, and D in my database. Since you are already adjacent with C and D, and I am adjacent with them, you already have all of my LSAs, possibly with the exception of the new link that connects us. This means that even though I describe LSAs A and B to you with my DBD packet, you don't send an LSR to me for them, which means I don't send you an LSU about them. This is the normal optimization of how the database is exchanged so that excessive flooding doesn't occur.

    Suppose next that you, router A, know about LSAs A1 through An in your database, and I, router B, know about LSAs B1 through Bn. When we establish an adjacency your DBD to me will describe LSAs A1-An, while mine will describe LSAs B1-Bn. Since I don't have LSAs A1-An, I will send you an LSR about them, and likewise since you don't have B1-Bn, you will send an LSR about those to me. When you reply back to me with the LSUs about A1-An, it is likely that the LSU packet itself will contain more than one LSA in the payload, or that if the LSA is large, that it will span multiple IP fragments. The idea behind this is that since you need to send me more than one LSA, it's more efficient to send them in as few LSUs as possible, instead of sending one LSA per LSU. The problem that can occur in this procedure however is when the router that is flooding has a larger MTU than the router that is receiving.

    For example, suppose that the flooding router has a Gigabit Ethernet interface that supports Jumbo frames, which exceed the normal Ethernet MTU of 1500 bytes; however, the receiving router has not enabled Jumbo frame support, which implies that frames over 1500 bytes (excluding layer 2 overhead) will be dropped. If the flooding router sends multiple LSAs in an LSU forcing the packet size to exceed 1500 bytes, or if a single LSA sent by the flooding router is large enough to exceed 1500 bytes, such as a Router LSA (LSA Type 1) with many links, the results can be non-deterministic.
    To demonstrate this, take the following topology.

     

    R1 and R2 connect with GigabitEthernet, while R2 and R3 connect with FastEthernet. R1 has a default MTU of 1500 bytes configured on its link to R2, while R2 has Jumbo frame support configured up to 2000 bytes. R2 and R3’s link uses the default MTU of 1500 bytes. Per the RFC’s defined behavior, R1 should reject a OSPF adjacency with R2. This default behavior can be seen as follows:

    R1:
    interface GigabitEthernet1/0
    ip address 12.0.0.1 255.255.255.0
    !
    router ospf 1
    network 0.0.0.0 255.255.255.255 area 0

    R2:
    interface GigabitEthernet1/0
    mtu 2000
    ip address 12.0.0.2 255.255.255.0
    !
    router ospf 1
    network 0.0.0.0 255.255.255.255 area 0

    R1#debug ip packet detail
    IP packet debugging is on (detailed)
    R1#debug ip ospf adj
    OSPF adjacency events debugging is on

    01:07:18: OSPF: Rcv DBD from 2.2.2.2 on GigabitEthernet1/0 seq 0x172A opt 0x52 flag 0x7 len 32 mtu 2000 state EXSTART
    01:07:18: OSPF: Nbr 2.2.2.2 has larger interface MTU
    01:07:18: OSPF: Retransmitting DBD to 2.2.2.2 on GigabitEthernet1/0
    01:07:18: OSPF: Up DBD Retransmit cnt to 5 for 2.2.2.2 on GigabitEthernet1/0
    01:07:18: OSPF: Send DBD to 2.2.2.2 on GigabitEthernet1/0 seq 0x1813 opt 0x52 flag 0x7 len 32

    In this case we can see that R1 rejects R2's DBD packet, since the MTU is larger. Although the obvious solution to this problem is to simply match the MTU of the links to avoid this problem in the first place, IOS also offers the "ip ospf mtu-ignore" command at the interface level to skip over this check in the OSPF adjacency state machine. Once applied, as seen below, R1 and R2 form an adjacency.

    R1#conf t
    Enter configuration commands, one per line. End with CNTL/Z.
    R1(config)#interface Gig1/0
    R1(config-if)#ip ospf mtu-ignore
    R1(config-if)#end
    R1#
    %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on GigabitEthernet1/0 from LOADING to FULL, Loading Done
    R1#show ip ospf neighbor

    Neighbor ID Pri State Dead Time Address Interface
    2.2.2.2 1 FULL/DR 00:00:36 12.0.0.2 GigabitEthernet1/0

    At this point, both R1 and R2 learn the routes to each other's Loopback0 interfaces, as seen below.

    R1#show ip route ospf
    2.0.0.0/32 is subnetted, 1 subnets
    O 2.2.2.2 [110/2] via 12.0.0.2, 00:00:05, GigabitEthernet1/0

    R2#show ip route ospf
    1.0.0.0/32 is subnetted, 1 subnets
    O 1.1.1.1 [110/2] via 12.0.0.1, 00:00:46, GigabitEthernet1/0

    As expected however, since there is an MTU mismatch, R1 is unable to receive packets from R2 that exceed an MTU of 1500 bytes.

    R2#ping 1.1.1.1
    

    Type escape sequence to abort.
    Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
    !!!!!
    Success rate is 100 percent (5/5), round-trip min/avg/max = 12/16/20 ms

    R2#ping
    Protocol [ip]:
    Target IP address: 1.1.1.1
    Repeat count [5]:
    Datagram size [100]: 2000
    Timeout in seconds [2]:
    Extended commands [n]: y
    Source address or interface:
    Type of service [0]:
    Set DF bit in IP header? [no]: yes
    Validate reply data? [no]:
    Data pattern [0xABCD]:
    Loose, Strict, Record, Timestamp, Verbose[none]:
    Sweep range of sizes [n]:
    Type escape sequence to abort.
    Sending 5, 2000-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
    .....
    Success rate is 0 percent (0/5)

    Theoretically this MTU mismatch should not matter, since end hosts that send traffic should ideally implement Path MTU Discovery. However, let's now see a case where R2 is unable to flood LSAs to R1 for which the IP packet size exceeds 1500 bytes.

    R3, who connects to R2, has been configured with a large number of Loopback interfaces in order to generate a large Router LSA (LSA Type 1). R3's configuration is as follows, where Loopbacks 3.3.3.2 - 3.3.3.253 have been omitted:

    R3:
    interface FastEthernet0/0
    ip address 23.0.0.3 255.255.255.0
    shutdown
    !
    interface Loopback3330
    ip address 3.3.3.0 255.255.255.255
    !
    [snip]
    !
    interface Loopback333254
    ip address 3.3.3.254 255.255.255.255
    !
    router ospf 1
    network 0.0.0.0 255.255.255.255 area 0

    The number of resulting local links can be seen in R3's database as follows:

    R3#show ip ospf database
    

    OSPF Router with ID (23.0.0.3) (Process ID 1)

    Router Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum Link count
    23.0.0.3 23.0.0.3 299 0x80000007 0x0050D2 254

    Now let's activate the link between R2 and R3, which will cause R3 to flood a large Router LSA to R2, which in turn causes R2 to flood this to R1.

    R3#config t
    Enter configuration commands, one per line. End with CNTL/Z.
    R3(config)#int Fa0/0
    R3(config-if)#no shutdown
    R3(config-if)#end
    R3#

    R2#debug ip packet detail
    IP packet debugging is on (detailed)
    R2#debug ip ospf packet
    OSPF packet debugging is on

    R2#config t
    Enter configuration commands, one per line. End with CNTL/Z.
    R2(config)#interface Fa2/0
    R2(config-if)#no shutdown
    R2(config-if)#end
    R2#
    %SYS-5-CONFIG_I: Configured from console by console
    IP: s=23.0.0.3 (FastEthernet2/0), d=224.0.0.5, len 76, rcvd 0, proto=89
    OSPF: rcv. v:2 t:1 l:44 rid:23.0.0.3
    aid:0.0.0.0 chk:D59B aut:0 auk: from FastEthernet2/0
    IP: s=23.0.0.2 (local), d=23.0.0.3 (FastEthernet2/0), len 80, sending, proto=89
    [snip]

    R2 and R3 form adjacency, and R3's LSA is flooded to R2. Since the LSA takes more than one 1500 byte packet, it is fragmented into multiple packets, with the largest being the shared MTU of 1500 between them.

    IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 1500, rcvd 0
    IP Fragment, Ident = 497, fragment offset = 0, proto=89
    IP: recv fragment from 23.0.0.3 offset 0 bytes
    IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 1500, rcvd 0
    IP Fragment, Ident = 497, fragment offset = 1480
    IP: recv fragment from 23.0.0.3 offset 1480 bytes
    IP: s=23.0.0.3 (FastEthernet2/0), d=23.0.0.2, len 172, rcvd 0
    IP Fragment, Ident = 497, fragment offset = 2960
    IP: recv fragment from 23.0.0.3 offset 2960 bytes
    OSPF: rcv. v:2 t:4 l:3112 rid:23.0.0.3
    aid:0.0.0.0 chk:297C aut:0 auk: from FastEthernet2/0
    %OSPF-5-ADJCHG: Process 1, Nbr 23.0.0.3 on FastEthernet2/0 from LOADING to FULL, Loading Done

    Once the adjacency is full, R2 installs R3's routes, and begins to flood to R1:

    R2#show ip route ospf
    1.0.0.0/32 is subnetted, 1 subnets
    O 1.1.1.1 [110/2] via 12.0.0.1, 00:00:10, GigabitEthernet1/0
    3.0.0.0/32 is subnetted, 254 subnets
    O 3.3.3.1 [110/2] via 23.0.0.3, 00:00:10, FastEthernet2/0
    [snip]
    O 3.3.3.254 [110/2] via 23.0.0.3, 00:00:10, FastEthernet2/0

    R2#
    IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 3132, sending broad/multicast, proto=89
    IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 1996, sending fragment
    IP Fragment, Ident = 854, fragment offset = 0, proto=89
    IP: s=12.0.0.2 (local), d=224.0.0.5 (GigabitEthernet1/0), len 1156, sending last fragment
    IP Fragment, Ident = 854, fragment offset = 1976

    Note that since the LSA exceeds the MTU of 2000 bytes, it is fragmented into multiple packets. Since R1 cannot accept packets that exceed its MTU of 1500 bytes, the LSUs are never received. This means that R1 cannot synchronize the database with R2, as seen as follows.

    R1#show ip ospf database
    

    OSPF Router with ID (1.1.1.1) (Process ID 1)

    Router Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum Link count
    1.1.1.1 1.1.1.1 62 0x80000005 0x6592 2
    2.2.2.2 2.2.2.2 35 0x8000000D 0x613E 3

    Net Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum
    12.0.0.1 1.1.1.1 62 0x80000001 0x61BB
    23.0.0.3 23.0.0.3 36 0x80000001 0x974C

    R2#show ip ospf database

    OSPF Router with ID (2.2.2.2) (Process ID 1)

    Router Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum Link count
    1.1.1.1 1.1.1.1 67 0x80000005 0x6592 2
    2.2.2.2 2.2.2.2 38 0x8000000D 0x613E 3
    23.0.0.3 23.0.0.3 39 0x80000005 0x2AAD 255

    Net Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum
    12.0.0.1 1.1.1.1 67 0x80000001 0x61BB
    23.0.0.3 23.0.0.3 39 0x80000001 0x974C

    R3#show ip ospf database

    OSPF Router with ID (23.0.0.3) (Process ID 1)

    Router Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum Link count
    1.1.1.1 1.1.1.1 69 0x80000005 0x006592 2
    2.2.2.2 2.2.2.2 40 0x8000000D 0x00613E 3
    23.0.0.3 23.0.0.3 39 0x80000005 0x002AAD 255

    Net Link States (Area 0)

    Link ID ADV Router Age Seq# Checksum
    12.0.0.1 1.1.1.1 69 0x80000001 0x0061BB
    23.0.0.3 23.0.0.3 39 0x80000001 0x00974C

    This also implies that R1 cannot install routes towards R3:

    R1#show ip route ospf
    2.0.0.0/32 is subnetted, 1 subnets
    O 2.2.2.2 [110/2] via 12.0.0.2, 00:00:02, GigabitEthernet1/0
    23.0.0.0/24 is subnetted, 1 subnets
    O 23.0.0.0 [110/2] via 12.0.0.2, 00:00:02, GigabitEthernet1/0

    Eventually the adjacency state between R1 and R2 is lost, due to the lack of LSAcks sent in response to R2's LSUs. This can be seen in R1's "debug ip ospf packet" as follows, and the "show ip ospf neighbor" on both devices:

    R1#
    OSPF: rcv. v:2 t:1 l:44 rid:2.2.2.2
    aid:0.0.0.0 chk:DC98 aut:0 auk: from GigabitEthernet1/0
    OSPF: Cannot see ourself in hello from 2.2.2.2 on GigabitEthernet1/0, state INIT

    R1#show ip ospf neighbor

    Neighbor ID Pri State Dead Time Address Interface
    2.2.2.2 1 LOADING/DR 00:00:34 12.0.0.2 GigabitEthernet1/0

    R2#show ip ospf neighbor

    Neighbor ID Pri State Dead Time Address Interface
    23.0.0.3 1 FULL/DR 00:00:35 23.0.0.3 FastEthernet2/0
    1.1.1.1 1 FULL/BDR 00:00:39 12.0.0.1 GigabitEthernet1/0

    The key with this example is that although the "ip ospf mtu-ignore" command allows the initial adjacency to form between R1 and R2, we can see that synchronization fails between them when an LSA replication event causes packet sizes generated by R2 to exceed R1's MTU.

    Based on this we can see that the "ip ospf mtu-ignore" command is not a fix to the underlying problem. Instead it is simply an exception to the OSPF adjacency state machine. The real fix to this problem is to ensure that the MTU values match between neighbors, which prevents both routing exchange in the control plane, and packet drops due to unsupported sizes in the data plane.

    Hey! Don’t miss anything - subscribe to our newsletter!

    © 2022 INE. All Rights Reserved. All logos, trademarks and registered trademarks are the property of their respective owners.
    instagram Logofacebook Logotwitter Logolinkedin Logoyoutube Logo