Aug
17

Try assessing your understanding of Cisco’s CBWFQ by looking at the following example:

class-map match-all HTTP_R6
 match access-group name HTTP_R6
!
policy-map CBWFQ
 class HTTP_R6
  bandwidth remaining percent 5
!
interface Serial 0/1
  bandwidth 128
  clock rate 128000
  service-policy output CBWFQ

and answering a question on the imaginable scenario: Two TCP flows (think of them as HTTP file transfers) are going across Serial 0/1 interface. One of the flows matches the class HTTP_R6, and another flow, marked with IP Precedence of 7 (pretty high), does not match any class. The traffic flow overwhelms the interface, so the system engages CBWFQ. Now the question is: how CBWFQ will share the interface bandwidth among the flows.

This is not an easy one – you can’t asnwer correctly if you stick with the bandwidth allocation logic described on DocCD. First, look at the “answer”:

Rack1R4#show policy-map interface serial 0/1
 Serial0/1 

  Service-policy output: CBWFQ

    Class-map: HTTP_R6 (match-all)
      10982 packets, 6368035 bytes
      30 second offered rate 95000 bps, drop rate 0 bps
      Match: access-group name HTTP_R6
      Queueing
        Output Queue: Conversation 41
        Bandwidth remaining 5 (%)Max Threshold 64 (packets)
        (pkts matched/bytes matched) 10973/6363227
        (depth/total drops/no-buffer drops) 8/0/0

    Class-map: class-default (match-any)
      3429 packets, 1765978 bytes
      30 second offered rate 29000 bps, drop rate 0 bps
      Match: any

The bandwidth is shared approximately in proportions “3,3:1”. The user configured class has more bandwidth than “class-default”, even though we reserved just 5% of available bandwidth to it. This does not look deterministric – what about the idea that the unused bandwidth goes to “class-default”? The below is the explanation of this behavior, inherent to CBWFQ’s design:

The following are the condensed facts about CBWFQ:

1) CBWFQ works the same was as WFQ! You just have the option to use flexible criteria for flow classification using MQC syntax. If you feel lost thinking about WFQ, you may find more information here QoS Teaser: Hold-Queue and WFQ.
2) CBWFQ shares interface bandwidth inversely proportional to flow “weights” (these weights are based on the bandwidth settings as we see later). If you have N flows, where flow “i” has weight value of Weight(i). The CBWFQ will guarantee the flow “i” the following share of bandwidth: Share(i)=(Weight(1)+…+Weight(i)+…+Weight(N))/Weight(i). Thus, flows with smaller weights get more bandwidth. Note that you should treat those values as relative to each other, not as absolute shares.
3) CBWFQ assigns weights to dynamic conversation (flows that don’t match any user-defined class) using the formula Weight(i) = 32384/(IP_Precedence(i)+1) – the same logic found in WFQ.
4) CBWFQ assigns weight to a user-defined class using either of the following formulas:

4.1) Weight(i) = Const*Interface_BW/Class_BW if the class is configured with explicit bandwidth value.
4.2) Weight(i)=Const*100/Bandwidth_Percent if the class is configured with either bandwidth percent or bandwidth remaining percent

Here Const is a special constant that depends on the number of flow queues in WFQ. Cisco never gave any explicit formula, but it looks like the constants are chosen according to the following table (a result of some sweaty modeling experiments):

Number of flows

Constant

16

64

32

64

64

57

128

30

256

16

512

8

1024

4

2048

2

4096

1

That’s all the magic behind the meaning of the “bandwidth” statement. As you can see, user-configurable classes are nothing else than separate conversations within the CBWFQ flows pool. The flow is simply a FIFO queue, scheduled according to its sequence number, which is proportional to flow weight (btw the formula for sequence number is the same as with WFQ!). All flows share the buffer pool that system allocates to CBWFQ, using the hold-queue N out interface-level command where N is the number of buffers. In addition to that, you can even specify WFQ Congestive Discard Threshold using the command queue-limit under the “class-deafult” of your policy map.

!
! Implementing pure WFQ using MQC syntax
!
policy-map WFQ
 class-map class-default
   !
   ! number of dynamic flows
   !
   fair-queue 256
   !
   ! WFQ Congestive Discard Threshold
   !
   queue-limit 32
!
interface Serial 0/1
 no fair-queue
 service-policy output WFQ
 !
 ! WFQ total size
 !
 hold-queue 4096 out

Now look at the following table:

Flow/Conversation Numbers

Weight

Description

Below 2^N

Weight(i)=32384/(IP_Precedence(i)+1)

Dynamic flows, unclassified traffic. This is the classic “fair-queue”.

2^N…2^N+7

Weight(i)=1024

Link Queues. Routing updates, Layer 2 Keepalives etc. Basically it’s the traffic marked as PAK_PRIORITY inside the router.

2^N+8

Weight(i)=0

LLQ or the priority queue. CBWFQ always service this queue first, but de-queued packets are policed using the defined token bucket parameters.

Above 2^N+8

Weight(i) = Const*Interface_BW/Class_BW
OR
Weight(i)=Const*100/Bandwidth_Percent

User-defined classes. Those classes are treated by CBWFQ as the RSVP flows, with relatively low weights. Their weights are almost all the time better than the weights of dynamic flows.

A few notes here. The value of “N” is the base parameter that defines the number of dynamic flows for CBWFQ. Remember you can only specify the number of flows as power of 2, and that “N” is this power value. You configure the number of dynamic flows using the command fair-queue under “class-default”. Next, CBWFQ uses special hash function to distribute unclassified packets in dynamic conversations. They have the same weights as they would have with classic WFQ. Now the Link Queues – we remember they were with WFQ as well. System uses those queues to send critical control plane traffic. The link queues has weight values of 1024, which is much better than any dynamic flows weights, and as we see later are almost on par with user-defined classes weights. Since control plane traffic is intermittent (unless you pump huge BGP tables ;) those flows do not affect bandwidth distribution too much. By the way, the well-known max-reserved-bandwidth 75% rule specifically ensures that link queues will not starve, by preventing a user from allocating too much “weight” to the user-defined classes.

Now, take a quick look at the user-defined classes. Think of two extreme cases:

a) We assigned all interface bandwidth to the class (you may need max-reserved-bandwidth 100). Then the weight value is 64 in the worst case of just 16 flow queues, which is better than almost any other possible weight. This class will get what it wants (almost), no matter what :)
b) We assign small amount of interface bandwidth to the class, e.g. 2%. Then, the class weight is 64*100/2=3200 in the worst case of 16 or 32 flow queues. This is getting close to 32384/(7+1)=4048 which is the weight value for the “best” dynamic queue with IP Precedence value of 7.

From those two facts, we may conclude that user-defined classes dominate dynamic flows almost all the time, unless they have small shares of bandwidth configured. Of course, priority queue beats all, but it is rate-limited (conditionally!), so it can’t starve other conversations, unless you set policer rate to the whole interface bandwidth. By the way, you need to account for layer 2 overhead when setting the rate-limit bandwidth for a priority class. This is important when you are working with voice traffic flows, that has small packet sizes and layer 2 size is significant compared to the payload.

Keeping all those facts in mind, let’s look at the following output – the CBWFQ queue contents from the first configuration sample:

Rack1R4#show queueing interface serial 0/1
Interface Serial0/1 queueing strategy: fair
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: Class-based queueing
  Output queue: 12/1000/64/0 (size/max total/threshold/drops)
     Conversations  2/5/32 (active/max active/max total)
     Reserved Conversations 1/1 (allocated/max allocated)
     Available Bandwidth 96 kilobits/sec

  (depth/weight/total drops/no-buffer drops/interleaves) 5/1280/0/0/0
  Conversation 41, linktype: ip, length: 580
  source: 155.1.146.6, destination: 155.1.108.10, id: 0x47F9, ttl: 254,
  TOS: 0 prot: 6, source port 80, destination port 11003

  (depth/weight/total drops/no-buffer drops/interleaves) 7/4048/0/0/0
  Conversation 17, linktype: ip, length: 580
  source: 155.1.146.1, destination: 155.1.45.5, id: 0xADC9, ttl: 254,
  TOS: 224 prot: 6, source port 80, destination port 56546

Note the weight values for each flow. The weight for user-defined HTTP conversation is 64*100/5=1280, while the weight for dynamic flow is 32384/(7+1)=4048. Thus, using the formula for bandwidth shares, we obtain the following:

Share(1)=(4048+1280)/1280=4,1
Share(2)=(4048+1280)/4048=1,3

Normalize that, dividing by the smallest number which is 1,3, and you will get the proportion “3,1:1” which is pretty close to the distribution we’ve see above. Some unfairness is probably due to the slow line and large serialization delays.

To summarize what we learned so far:

1) CBWFQ is nothing else than WFQ on steroids ;)
2) User-defined classes have much better scheduling weights than any dynamic flow queue. Therefore, the bandwidth allocated to dynamic queue usually is small compared to any user-defined class.
3) Scheduler shares interface bandwidth in relative proportions. For example, if you have two classes configured with bandwidth values of “32” and “64” and interface bandwidth 128 that does not mean system will allocate classes 32Kbps and 64Kbps. That means: in case on congestion CBWFQ will share bandwidth in proportions 32:64=1:2 between the two classes, plus some small amount to class-default. If you want bandwidth to be “realistic”, ensure your entire bandwidth values sum to interface bandwidth. The same goes to bandwidth percents. 4) If you want the scheduler to honor class-default traffic, assign it an explicit bandwidth value. This will effectively disable dynamic flow queues (though preserve Link Queues) and assign all unclassified traffic to a single FIFO queue.

Now a few simple rules to understand how various CBWFQ commands syntax applies in case of interface congestion. All those rules assume that bandwidth weights are large enough to make dynamic flows weights negligible.

1) If you have priority bandwidth configured in your policy map, subtract this value from total interface bandwidth to yield the amount of bandwidth available to other classes. The priority queue is only rate-limited under interface congestion, and in such case, it cannot get more bandwidth than configured with priority statement. Note that in the following text we will refer to priority bandwidth as configured in Kbps, but you may replace its value with priority-percent*interface-bandwidth if you configured rate in percent.

2) Suppose that you configured user-defined classes with bandwidth statement. First, IOS CLI will check that that:

bandwidth(1)+…+bandwidth(N) + priority <= max_reserved_bandwidth*interface_bandwdith/100.

In case of congestion, the scheduler allocates the following amount of bandwidth to class “k”.

share(k)=(interface_bandwidth - priority) * bandwidth(k)/(bandwidth(1)+…+bandwidth(N))  Kbps.

Therefore, as mentioned above, if you want the share to be equal to the bandwidth you set for the class, make sure all bandwidth settings sum to the interface bandwidth.

3) Another case: you configured your classes with bandwidth percent. The IOS CLI performs the following assertion:

[bw_percent(1)+…+bw_percent(N)]*interface_bandwidth + priority <= max_reserved_bandwidth*interface_bandwidth/100

In case of congestion, the scheduler allocates the following amount of bandwidth to class “k”.

share(k)= (interface_bandwidth-priority) * bw_percent(k)/(bw_percent(1)+…+bw_percent(N)).

3) Final case: you configured your classes with bandwidth remaining percent. The IOS CLI performs the following assertion:

bw_rem_percent(1)+…+bw_rem_percent(N) <= 100%

In case of congestion, the scheduler allocates the following amount of bandwidth to class “k”.

share(k)= (interface_bandwidth - priority) * bw_rem_percent(k)/(bw_rem_percent(1)+…+bw_rem_percent(N)).

The funniest thing is that this is the same formula as in the case of simple bandwidth percent. However, the verification is rather simple, and it lets you forget about all those bandwidth computations.

Now you know how to answer that tricky CBWFQ questions :) Cisco never published (at least I never seen that) the information on CBWFQ algorithm. We got all information in this post based on simulations and information available on Cisco WFQ. Hope it helps!

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website


You can leave a response, or trackback from your own site.

33 Responses to “Insights on CBWFQ”

 
  1. Jun Prieto says:

    perfect! very informative. you just pulled out a thorn that has been bothering me for quite some time. thanks! :) however, does it have the similar behaviour to LLQ?

  2. To: Jun Prieto

    LLQ is just a single conversation (number 2^N+8) inside CBWFQ flow pool. CBWFQ assigns a weight value of 0 to this flow, and treats it as priority queue. However, when dequeueing the packets CBWFQ policies this flow using the configurable settings: speed and burst size. This way, priority queue does not starve other flow queues.

  3. NTllect says:

    Great article, thanks Petr! QoS mechanics is really your favorite one ;)

  4. To: NTIlect
    Yeah, since you can hardly find any good information on the subject! Cisco documentation is a mess ;)

  5. Alexander Horwatt says:

    Excellent article describing CBWFQ internals! What was your source for this information?

  6. To: Alexander Horwatt

    Some Cisco documents on WFQ and Vegesna’s IP Quality of Service; Plus a number of simulation runs to put all pieces togher :)

  7. Pavel Stefanov says:

    Great article, Petr! I just have one question – inside the different classes including the class-default does the packet length come into play and are sequence numbers assigned to the packets? Why I ask this is because one of the Cisco Press books says that when using WFQ every packet is assigned a SN = Previous_SN + (weight*new_packet_length) and the packet with the lowest SN is taken out from the queues for transmission. Moreover, none of the documentation for IOS 12.4 mentions packet length for WFQ. I am not sure this can be labbed, but who knows, you’re God :)

  8. To: Pavel Stefanov

    Absolutely, SN is still based on adding packet lengths scaled by weight. The same algorithm as found in WFQ. However, I omitted this detail for the sake of simplicity, because CBWFQ/WFQ use the pakcket size to ensure “fairness” but still allocate bandwidth proprtional to flow weights. This of using packet size as of using deficit counter with round robin scheduling.

  9. Pavel Stefanov says:

    Thank you very much, Petr!

  10. Pushkar Bhatkoti says:

    Petr,
    you’ve really nailed it.

    BTW, in your above blog, is value 32384 being selected for any reason?
    or its u took random numnber?

    -Pushkar Bhatkoti

    3) CBWFQ assigns weights to dynamic conversation (flows that don’t match any user-defined class) using the formula Weight(i) = 32384/(IP_Precedence(i)+1)

  11. To:Pushkar Bhatkoti

    I’ve first seen this formula in Cisco WFQ documents. You can easily verify it by sending a flow of packets with IP precedence of zero – this will result in a weight of “32384″. It used to be 4096 in old IOS versions. You may also see this formula in Vegesna’s “IP Quality of Sevice Book” incorrectly stating the numerator of “32768″.

    The choice of this number is probably dictated by some sort of numerical optimization, to speed up the internal calculations.

    It is important to understand, that from a logical standpoint this numerator does not matter – you can choose any value. You may also rewrite the fomula for user-defined classes in “normalized” format:

    Weight=32384*InterfaceBW/(512*ClassBW) [approximately]

    And see how this value compares to dynamic flows.

  12. Atif says:

    Excellent! I am anxiously waiting for more articles on QoS!

  13. icebale says:

    As for as i understand: if only one class traffic incoming then it may fill 75% all bandwidth? That’s right?

  14. Sirus Moghadasian says:

    Petr

    All I can say is it’s very impressive.
    It’s like IEEE transaction on CBWFQ ;)

  15. Robert says:

    According to Cisco this is incorrect:

    “If you have priority bandwidth configured in your policy map, subtract this value from total interface bandwidth to yield the amount of bandwidth available to other classes”

    http://www.cisco.com/en/US/tech/tk543/tk757/technologies_tech_note09186a0080103eae.shtml#summaryofdifferences

    Although the bandwidth guarantees provided by the bandwidth and priority commands have been described with words like “reserved” and “bandwidth to be set aside”, neither command implements a true reservation. In other words, if a traffic class is not using its configured bandwidth, any unused bandwidth is shared among the other classes.

    Any comments appreciated.

  16. [...] conversations and user defined conversations (that’s what Internetwork Expert calls them in this article on CBWFQ.  I think these names are somewhat misleading, which I’ll get to later, but for now [...]

  17. Alex says:

    Hi Petr. Great article it appears there are more layers to this onion than i originally thought.

    I am happy that CB flows within WFQ are conversations and their weight is proportional to %|Bw. But if two flows have the same %|Bw allocation and one contains large packets and one contains smaller packets will the small ones still be scheduled first? I.E can a small packet received later be transmitted before an larger one due to a more preferable sequence number?

    Alex

    • Yes, smaller packets give less increment to the serial number based scheduling. Essentially, the SN for a flow is incrementd by packet_size*weght. If the two flows with equal weight differ by their packet sizes, than the flow with the small packets will be given more time to access the link.

  18. Amit says:

    Excellent information Petr. Well done!!

  19. Amit says:

    Although, I have a question-

    According to the table, there is only one queue for LLQ. Does it mean that there can be only one “priority” command in a policy?

    Can you please explain in detail?

    Thanks.

    • @Amit

      You may configure multiple classes with the “priority” statement. Every class will have a “conditional” policer configured in effect. All classes, in turn, will feed a SINGLE priority queue in FIFO manner – i.e. all classes will share a single queue. The result could be hard to predict, thus it is recommended to have no more than one class configured with the priority statement.

  20. Amit says:

    Yeah, that’s a great point Petr. Multiple “priority” classes end up in a single FIFO queue. That has got to be a big NO-NO.

    BTW, I had “fair-queue” configured in class-default class but could not see any conversations that matched the class in the output of “show queueing interface serial 0/1″ command. Is there any other way to check out the “weight” for individual flows?

    Thanks.
    Amit.

  21. sonny says:

    this’s great, thank Petr Lapukhov

  22. Arul says:

    Hi petr,
    u r dng a great job!!!!

    I have a question for u? If we are confiquring two classes…

    For example,

    Total interface bandwidth is 20 Mbps,
    I m creating class named FTP-access with 2Mbps bandwidth and queue limit of 1024 packets.
    Another class ASD access with bandwidth of 2Mbps and queue limit of 1024 packets.
    And another default class is created.
    now in the class FTP- access i m going to allow only File transfers and it should only within 2Mbps BW.If it exceeds, the packets will be dropped.

    My question is at that time of no file transfer, that 2Mbps is used for other classes or it should remain vacant?
    how much BW should we use for classes?

  23. Matt says:

    Petr.
    What IOS level were you testing this with? Also were these routers or switches? Thx

  24. Dave says:

    Excellent article that talks about the internal of CBWFQ!

  25. sa says:

    I have a question that has been bugging me for a long time. What happens with priority class (say voice) when the utilization of the bandwidth is at 100%? I know that it’s prioritized and that it’s guaranteed a certain percentage of the bandwidth, but in the real world when b/w utilization is at 100% all bets could be off. Is this the case, and would it make sense to artificially limit the total b/w utilization to say 85% to prevent congestion from negatively affecting high priority packets?

    • @sa

      As long as the LLQ relative bandwidth is say below 30% and everything else is non-priority traffic, voice will not degrade even with 100% utilization. The reason being, to LLQ packets it looks like the whole interface bandwidth is *dedicated* to them, due to priority service mechanism. As soon as there is a priority packet, it is served first. The only problem could be high serialization delay incurred by non-LLQ packets.

      Therefore, you will only face issues with Voice quality if you allocate too much voice traffic on the interface. The normal rule of thumb is 33%, and it does not really matter if interface is utilized 100% (well in fact it does, as say control plane traffic will be dropped, so getting above 75% average interface utilization is never recommended).

  26. sa says:

    Thanks for clarification Petr. Will keep those guidelines in mind for production environments. There is one more ‘real life’ scenario that actually happened twice, and the only explanation that I could think of is that priority queue can’t handle larger packets. The first ‘incident’ happened when video traffic was put in the voice priority q by mistake. There were lots of dropped packets on the interface and the video quality was terrible. There was plenty of priority q reserved b/w available and the overall b/w utilization was low. The second time there was an issue with connecting to a phone system via http (some legacy app) after that traffic was also handled by priority q. Wireshark captures have shown lots of retransmits for these flows. What could be the reason for such behavior?

    • @sa

      For video traffic – most likely the problem was relatively small burst size for LLQ policer. Voice traffic is CBR, while video is VBR, with small bursts being mixed with larger bursts. Even though assigning voice to PQ is not recommended, some SRNDs still do this – e.g. Telepresence SRND. In this case, make sure you size burst for LLQ policer large enough to accommodate sudden changes in video traffic flow.

      For the second problem, I believe you were sending voice CBR traffic over TCP connections? If yes, then was it LLQ process before or after encapsulated in TCP? In general, the problem with TCP is that even a single drop will create retransmission and window shrinking, while voice source will keep sending and CBR. This may result in larger bursts sent less often, which is, in turn, could not be properly handled by LLQ scheduler’s inbuilt policer. Remember, the LLQ policer is normally sized to accommodate multiple CBR flows, and any TCP wrapping will result in more bursty behavior compared to normal CBR. The resulting effect is normally poor voice quality.

  27. sa says:

    The second problem is related to a pure tcp app flow – basically one would open a web browser and point to the phone system ip, then log in to see if there are any voice mails. Once that http traffic from the phone system was put into the priority q the app stopped functioning and captures have shown those retransmits. One would get to the login screen and it would just hang. It could be that the app was poorly written and couldn’t handle retransmits.
    My conclusion was that the priority q in LLQ scenario should be used for voice only.

    Thanks for the explanations.

  28. mb says:

    I found the ‘weight’ value in the output of ‘show queueing’ command quite information. But now that this command is deprecated, what is the alternative way to find these information? ‘show policy-map’ command surely shows offered rate or drop rate, but not the cbwfq weights.

 

Leave a Reply

Categories

CCIE Bloggers