Nov
05

Fragmented IPv4 traffic may cause you a lot of problems in real life. Not only it increases the load on router CPUs, but also impacts applications performance (e.g. TCP needs to re-send the whole packet on a single fragment loss). In addition to that, traffic fragmentation is used in numerous network attacks, allowing an attacker to bypass firewalls or IDSes in some situations. Due to all these reasons, you may want to avoid fragmentation at all and/or ensure your network is insulated from fragmented packets. Unfortunately, there are cases when using IPv4 fragmentation is unavoidable.

MTU Mismatch Issues

Fragmentation occurs when you have MTU mismatch on the path between two communicating endpoints. Endpoints may only know about their local MTU settings, but not about the minimum MTU along the path (although an MTU discovery procedure exists). Nowadays, most of end-user connections are Ethernet-based. This means you commonly expect to see endpoint MTU values of at least 1500 bytes. Even while modern equipment enjoys jumbo GigabitEthernet frames of more than 9Kbyte in size, by default you commonly see MTU set to 1500 bytes everywhere. The default value is usually OK when transported across Internet, since most (read: good) ISPs support user-side MTU of 1500.

Your Ethernet network may function perfectly, until the day you decide to “virtualize” networking resources. Commonly, people face MTU issues when they run tunneling technologies on top of Ethernet with its default MTU value of 1500. These could be GRE, IPsec tunnel or MPLS – you name it. Any tunneling solution that does not terminate at the endpoint PC, may cause MTU issues and lead to packet fragmentation. In most cases, if you originate tunnel from user’s PC, the software automatically adjusts MTU.

The problem lies in the fact that by default many switches use system MTU value of 1500. Many lower-end fastEthernet switches don’t even support jumbo frames. If a VPN tunnel traverses a whole Ethernet infrastructure you face the problems of upgrading all switches to support jumbo frames. In large installations this could be a serious issue, so some workarounds would be nice to have.

Path MTU Discovery

Defines in RFC1191 and called PMTUD for short, this simple procedure allows endpoints to dynamically discover the minimum MTU across the communication path. The procedure uses special Don’t Fragment (DF) flag in the IP packet header. Packets with this flag are never fragmented, but rather dropped when a router sees that the packet does not fit outgoing link’s MTU. When dropping the packet, the router should signal back to the sending host with a special ICMP unreachable message, telling that the packet has been dropped due to the large size and suggesting the new MTU value.

Based on these, an endpoint may first “probe” a new path with MTU-sized, DF-bit marked packets. By listening to the ICMP responses the host may find the proper path MTU value. PMTUD is commonly started with a first TCP session between the two hosts.

The problem with PMTUD is that in modern Internet the router-based signaling does not work well. Routers either rate-limit the ICMP unreachable message or firewalls filter them. This effectively prevents PMTUD from working properly and often makes it rather another problem rather than a solution.

In order to resolve PMTUD “issues”, people commonly use either of two hacks in Cisco IOS routers. The first one is removing DF bits from all packets (commonly TCP packets) using a route-map like-this:


route-map CLEAR_DF
 match ip address TCP
 set ip df 0
!
interface FastEthernet 0/0
 ip policy route-map CLEAR_DF

However, this solutions results in the ingress router fragmenting packets (the effect you ultimately want to avoid). Therefore, use the above configuration as last resort, when your customers start bashing you :)

Preventing Fragmentation

The second procedure applies to TCP traffic only (it’s the most often use type of communications anyways). Using the special TCP option called MSS (Maximum Segment Size) a TCP endpoint may signal the other end about locally supported MTU (basically, MSS = MTU-IP_header_size-TCP_header_size). Endpoints select the minimum MSS supported by both parties. Cisco IOS support special feature called TCP MTU adjustment, which allows router to rewrite the option with the value provided by system administration. By setting this option to a value matching new MTU, you may trick both endpoints into thinking that the actual MSS is lower then they suppose. The interface level command is: ip tcp adjust-mss [value]. With this command configured, every incoming TCP SYN packet is inspected for TCP MSS option and the value is changed per the configuration. You can see this feature often implemented with VPN solutions, such as PPPoE, DMVPN, GRE, etc.

Of course, the best way to prevent fragmentation and PMTUD issues is setting the underlying MTU to a value large enough to accommodate the original packet with tunnel overhead (GRE, MPLS, IPsec). If you’ve done that before actually implementing VPN solutions, you’re a lucky person ;)

Filtering Fragments

The final part is getting your network rid of fragmented traffic. With Cisco IOS you have two ways to accomplish this. The first solution is matching fragments with a special “fragment” keyword like this: permit ip host 1.1.1.1 host 2.2.2.2 fragments. This ACL entry matches any non-initial packet fragment. Non-initial fragment is IP packet with non-zero fragment offset (FO) field. When router fragments a packet, the packet splits as follows:

a) Initial fragment: the first part of the packet. This fragment has the “M” (more) flag set, meaning more fragment will follow. Has the FO value of zero, signaling the position of the fragment inside the original datagram. Usually, this initial fragment contains the upper-level protocol header (such as TCP/UDP) and thus bears the port numbers information. Note that normal packets have FO=0, M=0.

b) Non-Initial fragments: packets with non-zero FO field, meaning these fragments follow the initial one. These fragments have M-bit set, with exception to the final fragment. The final fragment has M bit set to zero, signaling the end of the sequence.

Based on the above, the “fragments” keyword could only be used with ACL entries that do not reference any port number or TCP flags. This is because non-initial fragments don’t carry the upper level protocol information. (Well, there are some special cases [tiny fragments] when crafted fragments are used to split TCP port information across the fragments, but discussing those in depth is beyond the scope of this document).

Note the special treatment that IOS provides to non-initial fragments when matching against an access-list entry that contains protocol/port numbers, e.g. permit tcp any host 1.1.1.1 eq 80. If a non-initial fragment is matched against this entry, then IOS ignores any upper-level protocol information in the ACL entry (the protocol and the port number) and compares the source/destination IP in the fragment with the ACL source/destination values. Effectively, this procedure permits any non-initial fragments matching layer 3 information in the ACL entry. The same goes to “deny” entries matching against the non-initial fragments.

Therefore, if you want to prevent fragmented IP packets from reaching you application ports, put a “deny” statement with “fragments” keyword before the “permit” statement allowing traffic to the application port, like this:


ip access-list ONLY_NON_FRAGMENTS
 deny ip any host 1.1.1.1 fragments
 permit tcp any host 1.1.1.1 eq www

IP Virtual Reassembly

Virtual Reassembly is special IOS feature that allows the router to obtain full picture of a fragmented packet on the fly. When you activate virtual-reassembly on interface, using the command ip virtual-reassembly, IOS starts tracking all incoming fragmented packets. The code delays fragmented packets until it receives all of them, or until the maximum reassembly timeout expires (there are some other thresholds, discussed below). After this, the router performs “virtual” datagram reassembly. Here “virtual” means the packet is not getting actually assembled into a single entity, but rather IOS views it as a whole for subsequent processing. If the router does not receive all fragments during the reassembly timeout, the incomplete packet is dropped.

Reassembly is needed by some applications such as NAT or IPS in order to perform true packet inspection. For example, NAT ALGs (Application Level Gateways) may need to rewrite the packet contents or get some additional information no present in a single fragment. (This is why you can see virtual-reassembly enabled automatically when you activate NAT on interface). After the virtual reassembly procedure and other internal processing have been performed, router switches all datagrams in their original, fragmented form. The router does not send out the assembled datagram to avoid any MTU issues.

Here is the full syntax for the virtual-reassembly command:

ip virtual-reassemblymax-reassemblies number] [max-fragments number] [timeout seconds] [drop-fragments]

In this syntax, max-reassemblies tell the router the maximum number of simultaneous packets to track on the interface. If a fragment for a new packet arrives on the interface, and there is already maximum number of states present, the router drops the new fragment. This is a security precaution against DoS attacks to drain out router resources. Next, max-fragments is the maximum number of fragments allowed in a packet. The any single reassembly currently in progress the router will reject any further fragments exceeding this count. The timeout value is the maximum amount of time the router will wait for assembly to complete before discarding all accumulated fragments. We mentioned this value above. Finally, if you specify drop-fragments keyword, the router will drop any fragments received on the interface. A quick and simple way to block any fragmented traffic.

Finally, virtual reassembly automatically detects common fragmented packets attacks, such as tiny fragments (hiding TCP/UDP port numbers in non-initial fragments) or overlapping fragments (crafting fragments so that they overlap in the actual packet). This provides some minimal level of protection and makes packet fragmentation less risky from security perspective (though still highly undesirable from applications perspective).

Further Reading:

IP Virtual Reassembly
Access Control Lists and IP Fragments

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website


You can leave a response, or trackback from your own site.

16 Responses to “Dealing with Fragmented Traffic”

 
  1. Muhammad says:

    Hello Petr, ur the coolest geek i have ever seen on the face of the earth….

  2. TheGrave says:

    Cool post, thanks for the useful information! Just please fix the code example, some guys may wonder why a route map attached with the service-policy command doesn’t work:)

  3. Tamer A.saleh says:

    Hello,

    When we have two routers directly connected to each other ( e.g via serial interface ), and each serial interface has different MTU ( e.g 1500, 1000), when we do ping from one to another with size 1500, will that trigger fragmentation, or will cause drops?

  4. shion says:

    hi Petr, could this feature works with traffic load sharing?

  5. Smudger says:

    Superb article

  6. shahzad says:

    Superb article…..cleared many doubts……keep up the good work

  7. Wiggo says:

    Hi Petr, will PMTUD also work with UDP? If yes, is it running in windows os per default? What happens if PMTUD doesnt get the ICMP msg, packet too big back? Thanks :)

  8. Anzolex says:

    Excellent !!!

  9. Nick Davitashvili says:

    Wiggo,

    PMTUD is only supported by TCP.

    It is enabled in Windows 2000/XP/2003/Vista/2008/Win7 by default and can be turned off by altering the value of “HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\EnablePMTUDiscovery” (default is 1) and changing it to “0″.

    Note that this can impact performance as Windows will revert to an MTU of 576 bytes per packet for packets to destinations hosts that are not within the same subnet unless you manually set the MTU.

    If packet-too-big or ICMP unreachables are blocked on the return path to the host, it essentially considers any that max MTU available to it will work for the path. This in turn will significantly increase chances of fragmentation.

  10. Anis Oumlil says:

    Many thanks for the post it’s really helpfull. I have one remark regading MSS usage for TCP connection : Each host will take into account and the sent MSS by the other end and his MTU fix the (sent MSS) attribute. But the two end will not negotiate an agreed value.

  11. Steve McNutt says:

    I owe you a beer, those tips finally solved my split-tunnel/slow internet connection.

 

Leave a Reply

Categories

CCIE Bloggers