Apr
06

Having a blast in Chicago with the RS bootcamp students.    Thanks for all the hard work you are doing this week!

A student from a past Reno class, named Michal, asked if I would create a blog post regarding BGP proportional load balancing based on the bandwidth of the links to EBGP peers. It has been on my list of things to do, and here it is. Thanks for the request Michal.

The secret to this trick is to pay attention to the links between directly connected external BGP neighbors, (in this case between R6-R5 and R2-R3), and send the link bandwidth extended community attribute to iBGP peer R1.  It is enabled by entering the bgp dmzlink-bw command and using extended communities to share the information.  To summarize: routes learned from directly connected external neighbor are advertised to IBGP peers including the bandwidth of the external link where the routes were learned, and then the IBGP router (R1) can proportionally load balance between the two paths.

Here is the diagram we will use.

BGP Diagram

We’ll use loobpacks for our IBGP connections, so let’s verify that we have connectivity between loopbacks in AS 123.

R1#ping 6.6.6.6 source loopback 0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 6.6.6.6, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/43/76 ms
R1#
R1#ping 2.2.2.2 source loopback 0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/40/72 ms

Ok, that looks good, so let’s configure R1 to be an IBGP peer with R6 and R2.  The dmzlink-bw feature is implemented as part of the IPv4 address family configuration.

R1(config)#router bgp 126
R1(config-router)#neighbor 6.6.6.6 remote-as 126
R1(config-router)#neighbor 2.2.2.2 remote-as 126
R1(config-router)#neighbor 6.6.6.6 update-source lo0
R1(config-router)#neighbor 2.2.2.2 update-source lo0

R1(config-router)#address-family ipv4
R1(config-router-af)#bgp dmzlink-bw
R1(config-router-af)#neighbor 6.6.6.6 activate
R1(config-router-af)#neighbor 2.2.2.2 activate
R1(config-router-af)#neighbor 6.6.6.6 send-community both
R1(config-router-af)#neighbor 2.2.2.2 send-community both
R1(config-router-af)#maximum-paths ibgp 2
R1(config-router-af)#end

Next, we will configure R6, and R2 to be IBGP neighbors with R1, and EBGP neighbors with R5 and R3 respectively. We are going to manipulate the external interfaces on R6 and R2 to reflect a bandwidth of 6000k and 5000k respectively using the bandwidth command.  BGP can originate the link bandwidth community only for directly connected links to eBGP neighbors.  In our example, this will be originated from R6 and R2.

R6(config)#router bgp 126
R6(config-router)#neighbor 1.1.1.1 remote-as 126
R6(config-router)#neighbor 1.1.1.1 update-source lo0
R6(config-router)#neighbor 10.56.0.5 remote-as 345
R6(config-router)#address-family ipv4
R6(config-router-af)#bgp dmzlink-bw
R6(config-router-af)#neighbor 1.1.1.1 activate
R6(config-router-af)#neighbor 1.1.1.1 next-hop-self
R6(config-router-af)#neighbor 1.1.1.1 send-community both
R6(config-router-af)#neighbor 10.56.0.5 activate
R6(config-router-af)#neighbor 10.56.0.5 dmzlink-bw
R6(config-router-af)#int fa 0/0
R6(config-if)#bandwidth 6000

Now, on to R2, with virtually the same configuration.

R2(config)#router bgp 126
R2(config-router)#neighbor 1.1.1.1 remote-as 126
R2(config-router)#neighbor 1.1.1.1 update-source lo0
R2(config-router)#neighbor 10.23.0.3 remote-as 345
R2(config-router)#address-family ipv4
R2(config-router-af)#bgp dmzlink-bw
R2(config-router-af)#neighbor 1.1.1.1 activate
R2(config-router-af)#neighbor 1.1.1.1 next-hop-self
R2(config-router-af)#neighbor 1.1.1.1 send-community both
R2(config-router-af)#neighbor 10.23.0.3 activate
R2(config-router-af)#neighbor 10.23.0.3 dmzlink-bw
R2(config-router-af)#int ser 0/1.23
R2(config-subif)#bandwidth 5000

Now we will configure R5 and R3 as the EBGP neighbors of R6 and R2 respectively.  These EBGP peers don’t need any special configuration, other than standard BGP.

R5(config)#router bgp 345
R5(config-router)#neighbor 10.56.0.6 remote-as 126
R5(config-router)#neighbor 4.4.4.4 remote-as 345
R5(config-router)#neighbor 4.4.4.4 update-source lo0
R5(config-router)#neighbor 4.4.4.4 next-hop-self

R3(config)#router bgp 345
R3(config-router)#neighbor 10.23.0.2 remote-as 126
R3(config-router)#neighbor 4.4.4.4 remote-as 345
R3(config-router)#neighbor 4.4.4.4 update-source lo0
R3(config-router)#neighbor 4.4.4.4 next-hop-self

Last, but not least we configure R4 as an IBGP peer to R5 and R3. In addition, we will create a loopback and add it into BGP.  We will use the loopack as a target destination from R1 to verify the load balancing in a later step, so watch for that coming up.

R4(config)#int loop 44
R4(config-if)#ip add 44.44.44.44 255.255.255.0
R4(config-if)#router bgp 345
R4(config-router)#neighbor 5.5.5.5 remote-as 345
R4(config-router)#neighbor 3.3.3.3 remote-as 345
R4(config-router)#network 44.44.44.0 mask 255.255.255.0

Now let’s verify. Because we are on R4, let’s verify the BGP neighborships it has.

R4#show ip bgp summary
BGP router identifier 44.44.44.44, local AS number 345
BGP table version is 2, main routing table version 2
1 network entries using 120 bytes of memory
1 path entries using 52 bytes of memory
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 1) using 32 bytes of memory
BGP using 452 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 3.3.3.3 4 345 4 5 2 0 0 00:00:41 0 5.5.5.5 4 345 4 5 2 0 0 00:00:35 0 
! Note:  we can easily verify what routes are being advertised out from R4.

R4#show ip bgp neighbors 5.5.5.5 advertised-routes
BGP table version is 2, local router ID is 44.44.44.44
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 44.44.44.0/24 0.0.0.0 0 32768 i

Total number of prefixes 1
R4#show ip bgp neighbors 3.3.3.3 advertised-routes
BGP table version is 2, local router ID is 44.44.44.44
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 44.44.44.0/24 0.0.0.0 0 32768 i

Total number of prefixes 1
R4#

Looks like AS 345 is fine. Let’s jump to R1, in AS 126, and verify from there.

R1#show ip bgp summary
BGP router identifier 1.1.1.1, local AS number 126
BGP table version is 3, main routing table version 3
1 network entries using 120 bytes of memory
2 path entries using 104 bytes of memory
1 multipath network entries and 2 multipath paths
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 496 total bytes of memory
BGP activity 1/0 prefixes, 2/0 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 2.2.2.2 4 126 10 9 3 0 0 00:06:39 1 6.6.6.6 4 126 11 10 3 0 0 00:07:14 1
R1#show ip bgp
BGP table version is 3, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
* i44.44.44.0/24 6.6.6.6 0 100 0 345 i *>i 2.2.2.2 0 100 0 345 i

! Note:  Looks like we have the neighbors, and the 44.44.44.0/24 prefix.
! To see more detail on the 44.44.44.0 network, we can use a couple additional commands.

R1#show ip bgp 44.44.44.0
BGP routing table entry for 44.44.44.0/24, version 3
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Multipath: iBGP
Flag: 0x820
  Not advertised to any peer
  345
    6.6.6.6 (metric 1) from 6.6.6.6 (6.6.6.6)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath
      DMZ-Link Bw 750 kbytes
  345
    2.2.2.2 (metric 1) from 2.2.2.2 (2.2.2.2)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath, best
      DMZ-Link Bw 625 kbytes

! Note: Let's see what the routing table has to say about this network.

R1#show ip route 44.44.44.0
Routing entry for 44.44.44.0/24
  Known via "bgp 126", distance 200, metric 0
  Tag 345, type internal
  Last update from 2.2.2.2 00:02:56 ago
  Routing Descriptor Blocks:
  * 6.6.6.6, from 6.6.6.6, 00:02:56 ago
      Route metric is 0, traffic share count is 6
      AS Hops 1
      Route tag 345
    2.2.2.2, from 2.2.2.2, 00:02:56 ago
      Route metric is 0, traffic share count is 5
      AS Hops 1
      Route tag 345

! Note: We can also get the information from the CEF table.

R1#show ip cef 44.44.44.0
44.44.44.0/24, version 47, epoch 0, per-destination sharing
0 packets, 0 bytes
  via 6.6.6.6, 0 dependencies, recursive
    traffic share 6
    next hop 10.16.0.6, FastEthernet0/1 via 6.6.6.0/24
    valid adjacency
  via 2.2.2.2, 0 dependencies, recursive
    traffic share 5
    next hop 10.12.0.2, FastEthernet0/0 via 2.2.2.0/24
    valid adjacency
  0 packets, 0 bytes switched through the prefix
  tmstats: external 0 packets, 0 bytes
           internal 0 packets, 0 bytes

So now that the route is there, how do we test the load balancing? One option is to do an extended ping, and record the path. We are expecting a 6 to 5 ratio for outbound traffic favoring the R6 path more than the R2 path. Let’s send 30 ping requests, and show the full response for the benefit of verification.

R1#ping
Protocol [ip]:
Target IP address: 44.44.44.44
Repeat count [5]: 30
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface: loopback0
Type of service [0]:
Set DF bit in IP header? [no]:
Validate reply data? [no]:
Data pattern [0xABCD]:
Loose, Strict, Record, Timestamp, Verbose[none]: r
Number of hops [ 9 ]: 4
Loose, Strict, Record, Timestamp, Verbose[RV]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 30, 100-byte ICMP Echos to 44.44.44.44, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
Packet has IP options:  Total option bytes= 19, padded length=20
 Record route: <*>
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)
   (0.0.0.0)

Reply to request 0 (204 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route: (10.12.0.1) (10.23.0.2) (10.34.0.3) (44.44.44.44)
   <*>
 End of list

Reply to request 1 (156 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route: (10.12.0.1) (10.23.0.2) (10.34.0.3) (44.44.44.44)
   <*>
 End of list

! Note: the path changes on the next ping request, and begins to use R6 as the next hop.

Reply to request 2 (160 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route: (10.16.0.1) (10.56.0.6) (10.45.0.5) (44.44.44.44)
   <*>
 End of list

Reply to request 3 (128 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route: (10.16.0.1) (10.56.0.6) (10.45.0.5) (44.44.44.44)
   <*>
 End of list

Reply to request 4 (156 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 5 (172 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 6 (108 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 7 (136 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 8 (180 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route: (10.12.0.1) (10.23.0.2) (10.34.0.3) (44.44.44.44)
   <*>
 End of list

Reply to request 9 (152 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 10 (80 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 11 (308 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 12 (204 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 13 (108 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 14 (160 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 15 (140 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 16 (140 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 17 (104 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 18 (84 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 19 (192 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 20 (232 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 21 (220 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 22 (168 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 23 (140 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.12.0.1)
   (10.23.0.2)
   (10.34.0.3)
   (44.44.44.44)
   <*>
 End of list

Reply to request 24 (88 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 25 (224 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 26 (484 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 27 (128 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 28 (108 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Reply to request 29 (136 ms).  Received packet has options
 Total option bytes= 20, padded length=20
 Record route:
   (10.16.0.1)
   (10.56.0.6)
   (10.45.0.5)
   (44.44.44.44)
   <*>
 End of list

Success rate is 100 percent (30/30), round-trip min/avg/max = 80/166/484 ms
R1#

The first 2 requests, numbered 0-1, used the path of R2-R3-R4. The next 6 requests, numbered 2-7, used the path of R6-R5-r4. The next 5, numbered 8-12, use the R2-R3-R4 path again, and then the next 6 use the R6-R5-R4 path.

Happy studies.


You can leave a response, or trackback from your own site.

15 Responses to “Move over Variance: BGP Proportional Load Balancing is here!”

 
  1. shivlu jain says:

    really a good post.

  2. Deepak Arora says:

    Can you add download as PDF option for each blog post so that we can print it and use it for studies

  3. robert says:

    Great post, thank you.

    Is there anything we could do to optimize load balancing in the opposite direction, from R4-R1 in a case when we don’t control AS345?

    Robert

  4. Michal says:

    Hi,
    wow great ! thanks for coming back on that topic with your article which is helpful for the proper understanding of things involved in this technique.

    Salutations sinceres ,
    Michal

  5. Franco Isaac says:

    Hi,

    Excellent Post.

    We have a large bgp Multihomed enviroment and just wanted to know if the BGP proportional load balancing is widely used?

    Franco

  6. Deepak- I will pass the idea along. It sound like it would make it more available. I like the idea.

    Robert- Without administrative control, we couldn’t enable proportional load balancing on AS345 for the return path. However, due to the slow connection on the external R2 connection, we could pre-pend the AS path as R2 shares routes with R3, so that AS345 would see our networks as an additional “hop” (AS-Path) away through R2, compared to using R6 which has the faster external link.

    Michal- Thanks for the original request, and your current response. Hope you are enjoying your home in Paris. It was great to get to know you!

    Franco- I don’t have statistics on how frequently the proportional load balancing is used, but it seems like it would be a waste of bandwidth (egress) if we didn’t do it.

    Thanks all for your comments!

  7. Ofer says:

    Hi,

    g8 post but isn’t your test somewhat problematic?

    pinging/tracing from R1 will make the probes process swtiched on R1, not CEF swtiched. if they were CEF switched you would get the same path with all probes (’cause same src/dst are used).
    A better test would be to pinging from a device behind R1…

    I’m not sure if process switched traffic will follow the the UECMP behavior (although your test prove so), can you confirm?

    TIA

  8. Only4rizq says:

    Sorry if this is too insignificant, but I missed the part where it shows DMZ-link Bw are 750K and 625K even though the interfaces were configured with bandwidth of 6000 and 5000.

    R1#show ip bgp 44.44.44.0

    345
    6.6.6.6 (metric 1) from 6.6.6.6 (6.6.6.6)
    Origin IGP, metric 0, localpref 100, valid, internal, multipath
    DMZ-Link Bw 750 kbytes
    345
    2.2.2.2 (metric 1) from 2.2.2.2 (2.2.2.2)
    Origin IGP, metric 0, localpref 100, valid, internal, multipath, best
    DMZ-Link Bw 625 kbytes

    Excellent post by the way. Great way to learn monster technologies in bite size pieces. ! !

  9. borekbp says:

    as I understand – you are loadbalance outgoing traffic only? so like you mentioned – you need to route your AS126 with 1.1.1.0/24 and use prepend for example. But even when you are using prepend technik – you are not 100% loadbalansed over unequal links

  10. Amar says:

    Hey, Great post !!!

    One Question though why we need these commands on R6
    & R2, shouldn’t “maximum-paths ibgp 2″ on R1 should be enough. Why we need to tell R2 & R6 that there are multiple paths.

    R6(config-router-af)#maximum-paths ibgp 2
    R6(config-router-af)#maximum-paths 2

    R6(config)#router bgp 126
    R6(config-router)#neighbor 1.1.1.1 remote-as 126
    R6(config-router)#neighbor 1.1.1.1 update-source lo0
    R6(config-router)#neighbor 10.56.0.5 remote-as 345
    R6(config-router)#address-family ipv4
    R6(config-router-af)#bgp dmzlink-bw
    R6(config-router-af)#neighbor 1.1.1.1 activate
    R6(config-router-af)#neighbor 1.1.1.1 next-hop-self
    R6(config-router-af)#neighbor 1.1.1.1 send-community both
    R6(config-router-af)#neighbor 10.56.0.5 activate
    R6(config-router-af)#neighbor 10.56.0.5 dmzlink-bw
    R6(config-router-af)#maximum-paths ibgp 2
    R6(config-router-af)#maximum-paths 2
    R6(config-router-af)#int fa 0/0
    R6(config-if)#bandwidth 6000

  11. Amar- You are right. Only R1 needed the maximum-path statements. I just updated the example, and removed the additional statements.

    Thanks.

  12. Dom says:

    Thanks for a great post – I needed to know how to verify the traffic was being load shared in an unequal fashion and this showed me.

  13. Lavona Phanco says:

    Thank you so a lot for your good suggestion.I am going to give it a have a go with.

  14. Daniel Freitas says:

    You made very clear the explanations about how to do this load sharing! Thanks a lot!

 

Leave a Reply

Categories

CCIE Bloggers