Insights on CBWFQ

Try assessing your understanding of Cisco's CBWFQ by looking at the following example:

class-map match-all HTTP_R6

 match access-group name HTTP_R6

!

policy-map CBWFQ

 class HTTP_R6

  bandwidth remaining percent 5

!

interface Serial 0/1

  bandwidth 128

  clock rate 128000

  service-policy output CBWFQ

and answering a question on the imaginable scenario: Two TCP flows (think of them as HTTP file transfers) are going across Serial 0/1 interface. One of the flows matches the class HTTP_R6, and another flow, marked with IP Precedence of 7 (pretty high), does not match any class. The traffic flow overwhelms the interface, so the system engages CBWFQ. Now the question is: how CBWFQ will share the interface bandwidth among the flows.

This is not an easy one - you can't asnwer correctly if you stick with the bandwidth allocation logic described on DocCD. First, look at the "answer":

Rack1R4#show policy-map interface serial 0/1
Serial0/1

Service-policy output: CBWFQ

Class-map: HTTP_R6 (match-all)
10982 packets, 6368035 bytes
30 second offered rate 95000 bps, drop rate 0 bps
Match: access-group name HTTP_R6
Queueing
Output Queue: Conversation 41
Bandwidth remaining 5 (%)Max Threshold 64 (packets)
(pkts matched/bytes matched) 10973/6363227
(depth/total drops/no-buffer drops) 8/0/0

Class-map: class-default (match-any)
3429 packets, 1765978 bytes
30 second offered rate 29000 bps, drop rate 0 bps
Match: any

The bandwidth is shared approximately in proportions “3,3:1”. The user configured class has more bandwidth than “class-default”, even though we reserved just 5% of available bandwidth to it. This does not look deterministric - what about the idea that the unused bandwidth goes to “class-default”? The below is the explanation of this behavior, inherent to CBWFQ's design:

The following are the condensed facts about CBWFQ:

1) CBWFQ works the same was as WFQ! You just have the option to use flexible criteria for flow classification using MQC syntax. If you feel lost thinking about WFQ, you may find more information here QoS Teaser: Hold-Queue and WFQ.
2) CBWFQ shares interface bandwidth inversely proportional to flow "weights" (these weights are based on the bandwidth settings as we see later). If you have N flows, where flow “i” has weight value of Weight(i). The CBWFQ will guarantee the flow “i” the following share of bandwidth: Share(i)=(Weight(1)+...+Weight(i)+...+Weight(N))/Weight(i). Thus, flows with smaller weights get more bandwidth. Note that you should treat those values as relative to each other, not as absolute shares.
3) CBWFQ assigns weights to dynamic conversation (flows that don’t match any user-defined class) using the formula Weight(i) = 32384/(IP_Precedence(i)+1) - the same logic found in WFQ.
4) CBWFQ assigns weight to a user-defined class using either of the following formulas:

4.1) Weight(i) = Const*Interface_BW/Class_BW if the class is configured with explicit bandwidth value.
4.2) Weight(i)=Const*100/Bandwidth_Percent if the class is configured with either bandwidth percent or bandwidth remaining percent

Here Const is a special constant that depends on the number of flow queues in WFQ. Cisco never gave any explicit formula, but it looks like the constants are chosen according to the following table (a result of some sweaty modeling experiments):

Number of flows	Constant
16	64
32	64
64	57
128	30
256	16
512	8
1024	4
2048	2
4096	1

That’s all the magic behind the meaning of the “bandwidth” statement. As you can see, user-configurable classes are nothing else than separate conversations within the CBWFQ flows pool. The flow is simply a FIFO queue, scheduled according to its sequence number, which is proportional to flow weight (btw the formula for sequence number is the same as with WFQ!). All flows share the buffer pool that system allocates to CBWFQ, using the hold-queue N out interface-level command where N is the number of buffers. In addition to that, you can even specify WFQ Congestive Discard Threshold using the command queue-limit under the “class-deafult” of your policy map.

!

! Implementing pure WFQ using MQC syntax

!

policy-map WFQ

 class-map class-default

   !

   ! number of dynamic flows

   !

   fair-queue 256

   !

   ! WFQ Congestive Discard Threshold

   !

   queue-limit 32

!

interface Serial 0/1

 no fair-queue

 service-policy output WFQ

 !

 ! WFQ total size

 !

 hold-queue 4096 out

Now look at the following table:

Flow/Conversation Numbers	Weight	Description
Below 2^N	Weight(i)=32384/(IP_Precedence(i)+1)	Dynamic flows, unclassified traffic. This is the classic “fair-queue”.
2^N…2^N+7	Weight(i)=1024	Link Queues. Routing updates, Layer 2 Keepalives etc. Basically it’s the traffic marked as PAK_PRIORITY inside the router.
2^N+8	Weight(i)=0	LLQ or the priority queue. CBWFQ always service this queue first, but de-queued packets are policed using the defined token bucket parameters.
Above 2^N+8	Weight(i) = ConstInterface_BW/Class_BW OR Weight(i)=Const100/Bandwidth_Percent	User-defined classes. Those classes are treated by CBWFQ as the RSVP flows, with relatively low weights. Their weights are almost all the time better than the weights of dynamic flows.

A few notes here. The value of “N” is the base parameter that defines the number of dynamic flows for CBWFQ. Remember you can only specify the number of flows as power of 2, and that “N” is this power value. You configure the number of dynamic flows using the command fair-queue under “class-default”. Next, CBWFQ uses special hash function to distribute unclassified packets in dynamic conversations. They have the same weights as they would have with classic WFQ. Now the Link Queues – we remember they were with WFQ as well. System uses those queues to send critical control plane traffic. The link queues has weight values of 1024, which is much better than any dynamic flows weights, and as we see later are almost on par with user-defined classes weights. Since control plane traffic is intermittent (unless you pump huge BGP tables ;) those flows do not affect bandwidth distribution too much. By the way, the well-known max-reserved-bandwidth 75% rule specifically ensures that link queues will not starve, by preventing a user from allocating too much "weight" to the user-defined classes.

Now, take a quick look at the user-defined classes. Think of two extreme cases:

a) We assigned all interface bandwidth to the class (you may need max-reserved-bandwidth 100). Then the weight value is 64 in the worst case of just 16 flow queues, which is better than almost any other possible weight. This class will get what it wants (almost), no matter what :)
b) We assign small amount of interface bandwidth to the class, e.g. 2%. Then, the class weight is 64*100/2=3200 in the worst case of 16 or 32 flow queues. This is getting close to 32384/(7+1)=4048 which is the weight value for the “best” dynamic queue with IP Precedence value of 7.

From those two facts, we may conclude that user-defined classes dominate dynamic flows almost all the time, unless they have small shares of bandwidth configured. Of course, priority queue beats all, but it is rate-limited (conditionally!), so it can’t starve other conversations, unless you set policer rate to the whole interface bandwidth. By the way, you need to account for layer 2 overhead when setting the rate-limit bandwidth for a priority class. This is important when you are working with voice traffic flows, that has small packet sizes and layer 2 size is significant compared to the payload.

Keeping all those facts in mind, let’s look at the following output – the CBWFQ queue contents from the first configuration sample:

Rack1R4#show queueing interface serial 0/1
Interface Serial0/1 queueing strategy: fair
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: Class-based queueing
Output queue: 12/1000/64/0 (size/max total/threshold/drops)
Conversations 2/5/32 (active/max active/max total)
Reserved Conversations 1/1 (allocated/max allocated)
Available Bandwidth 96 kilobits/sec

(depth/weight/total drops/no-buffer drops/interleaves) 5/1280/0/0/0
Conversation 41, linktype: ip, length: 580
source: 155.1.146.6, destination: 155.1.108.10, id: 0x47F9, ttl: 254,
TOS: 0 prot: 6, source port 80, destination port 11003

(depth/weight/total drops/no-buffer drops/interleaves) 7/4048/0/0/0
Conversation 17, linktype: ip, length: 580
source: 155.1.146.1, destination: 155.1.45.5, id: 0xADC9, ttl: 254,
TOS: 224 prot: 6, source port 80, destination port 56546

Note the weight values for each flow. The weight for user-defined HTTP conversation is 64*100/5=1280, while the weight for dynamic flow is 32384/(7+1)=4048. Thus, using the formula for bandwidth shares, we obtain the following:

Share(1)=(4048+1280)/1280=4,1
Share(2)=(4048+1280)/4048=1,3

Normalize that, dividing by the smallest number which is 1,3, and you will get the proportion “3,1:1” which is pretty close to the distribution we've see above. Some unfairness is probably due to the slow line and large serialization delays.

To summarize what we learned so far:

1) CBWFQ is nothing else than WFQ on steroids ;)
2) User-defined classes have much better scheduling weights than any dynamic flow queue. Therefore, the bandwidth allocated to dynamic queue usually is small compared to any user-defined class.
3) Scheduler shares interface bandwidth in relative proportions. For example, if you have two classes configured with bandwidth values of “32” and “64” and interface bandwidth 128 that does not mean system will allocate classes 32Kbps and 64Kbps. That means: in case on congestion CBWFQ will share bandwidth in proportions 32:64=1:2 between the two classes, plus some small amount to class-default. If you want bandwidth to be "realistic", ensure your entire bandwidth values sum to interface bandwidth. The same goes to bandwidth percents. 4) If you want the scheduler to honor class-default traffic, assign it an explicit bandwidth value. This will effectively disable dynamic flow queues (though preserve Link Queues) and assign all unclassified traffic to a single FIFO queue.

Now a few simple rules to understand how various CBWFQ commands syntax applies in case of interface congestion. All those rules assume that bandwidth weights are large enough to make dynamic flows weights negligible.

1) If you have priority bandwidth configured in your policy map, subtract this value from total interface bandwidth to yield the amount of bandwidth available to other classes. The priority queue is only rate-limited under interface congestion, and in such case, it cannot get more bandwidth than configured with priority statement. Note that in the following text we will refer to priority bandwidth as configured in Kbps, but you may replace its value with priority-percent*interface-bandwidth if you configured rate in percent.

2) Suppose that you configured user-defined classes with bandwidth statement. First, IOS CLI will check that that:

bandwidth(1)+…+bandwidth(N) + priority &lt;= max_reserved_bandwidth*interface_bandwdith/100.

In case of congestion, the scheduler allocates the following amount of bandwidth to class “k”.

share(k)=(interface_bandwidth - priority) * bandwidth(k)/(bandwidth(1)+…+bandwidth(N))  Kbps.

Therefore, as mentioned above, if you want the share to be equal to the bandwidth you set for the class, make sure all bandwidth settings sum to the interface bandwidth.

3) Another case: you configured your classes with bandwidth percent. The IOS CLI performs the following assertion:

[bw_percent(1)+…+bw_percent(N)]*interface_bandwidth + priority &lt;= max_reserved_bandwidth*interface_bandwidth/100

In case of congestion, the scheduler allocates the following amount of bandwidth to class “k”.

share(k)= (interface_bandwidth-priority) * bw_percent(k)/(bw_percent(1)+…+bw_percent(N)).

3) Final case: you configured your classes with bandwidth remaining percent. The IOS CLI performs the following assertion:

bw_rem_percent(1)+…+bw_rem_percent(N) &lt;= 100%

In case of congestion, the scheduler allocates the following amount of bandwidth to class “k”.

share(k)= (interface_bandwidth - priority) * bw_rem_percent(k)/(bw_rem_percent(1)+…+bw_rem_percent(N)).

The funniest thing is that this is the same formula as in the case of simple bandwidth percent. However, the verification is rather simple, and it lets you forget about all those bandwidth computations.

Now you know how to answer that tricky CBWFQ questions :) Cisco never published (at least I never seen that) the information on CBWFQ algorithm. We got all information in this post based on simulations and information available on Cisco WFQ. Hope it helps!