Aug
12

This post is a partial excerpt from the QoS section of our IEWB-RS VOL1 V5. We’ll skip the discussion of how much of a legacy the Custom Queue is, and get stright to the working details. Custom queue feature is similar to WFQ in that it tries to share the bandwidth between packet flows using the max min approach: each flow class gets the guaranteed share proportional to its weight plus any class may claim the “unused” interface bandwidth. However, unlike WFQ, there are no dynamic conversations, just 16 static queues with configurable classification criteria. Custom Queue assigns byte counter to every queue (1500 bytes is the default value) and serves the queues in round-robin fashion, proportional to counters. Every dequeued packet decrements queue byte count by its size, until it drops down to zero. Custom Queuing feature supports additional system queue, number 0. System queue is the priority queue and is always served first, before of other (regular) queues. By default, system queue is used for layer 2 keepalives, but not for routing update packets (e.g. RIP, OSPF, EIGRP). Therefore, it’s recommended to map the routing update packets to system queue 0 manually, unless the interface is Frame-Relay, which uses special broadcast queue to send broadcasts. Note that all unclassified packets are by default assigned to queue 1 (e.g. routing updates will use this queue, unless mapped to some other queue), if the default queue number is not changed.

The limitation of round robin scheduling is that it can’t naturally dequeue less than one packet (quantum) from each queue. Since queues may have different average packet sizes (e.g. voice packets of 60 bytes and TCP packets of 1500 bytes) this may lead to undesirable bandwidth distribution ratio (e.g. if queue byte counter is 100 bytes and packet of 1500 bytes is in the queue, the packet will still be sent, since the counter is non zero). In order to make the distribution “fair”, every queue’s byte counter should be proportional to the queue’s average packet size. To make that happen, try to classify traffic so that packets in every queue have approximately the same packet size. Next, use the following example to calculate byte counters.

Consider the following scenario:

• Configure CQ as the queuing mechanics on the connection between R4 and R5 which is 128Kbps
• Ensure RIP routing updates use the system queue
• Voice packets should be guaranteed 30% of interface bandwidth.
• Classify voice packets based on their size of 60 bytes (G.729 codec)
• Allocate 60% of interface bandwidth to HTTP transfers from WWW servers on VLAN146
• The remaining 10% should be allocated to ICMP packets of average size 100 bytes
• Make sure the remaining traffic uses the same queue as ICMP packets
• Limit the queue size to 10 packets for all queues
• Set the IP MTU to 156 bytes in order to enforce the maximum serialization delay of 10ms

The voice packets have layer 3 size of 60 bytes (G.729 samples with 20 bytes per packet – 50pps), the TCP packets have layer 3 sizes 156 bytes (due to the requirement of setting MTU size to allow 10ms serialization) and ICMP packets have layer 3 sizes of 100 bytes. (Note that custom queue byte counter takes in consideration Layer 2 overhead. This is true with all legacy queuing techniques). For HDLC protocol the overhead is 4 bytes, thus the counters should be based on frames sizes of 64, 160 and 104 bytes respectively. The desired distribution ratio is 30:60:10 and the following procedure applies:

1) Because the byte counters must be proportional to packet size, the following should keep true: C1=a*64, C2=b*160; C3=c*104 where “a”, “b” and “c” are the multipliers we are looking for and C(i) is the byte-counter for queue “i”.

2) From the proportion C1:C2:C3=30:60:10 we quickly get that a=30/64=0,47; b=60/160=0,38; c=10/104=0,096. Now the problem is that those multiplies are fractional, and we can’t hold a fractional number of packets in a queue.

3) To get the optimal whole number, we must first normalize the fractional numbers, dividing by the smallest one. In our case this is c=0,096. Therefore now a1=a/c=4,89; b1=b/c=3,95; c1=1.

4) It is possible to simple round up the above numbers to the next integer (ceiling function) and use them as the number of packets that should be send every turn. However, what if we got numbers like 3,5 or 4,2? By rounding them up, we can get the integers that are far too larger than the initial values. To avoid this, we could scale (multiply) all numbers by some integer value, e.g. by 2 or 3, obtaining a value closer to integer.

5) In our case, it is even possible to multiply by 100 and get the exact whole numbers (489, 395 and 100 packets). However, this would yield excessively high byte counts, and will result in considerable delays between round-robin scheduler runs if queues are congested. Also, when calculating final byte counts, make sure your queue depths allow for big counters in case you still want to use them.

6) Therefore we just set multipliers to a=5, b=4, c=1. The resulting byte counters are C1=a*64=320, C2=b*160=640, C3=c*104=104. Let’s calculate the weights each queue obtains with this configuration;

W1=C1/(C1+C2+C3)=320/(640+320+104)=320/1064=0,3
W2=C2/(C1+C2+C3)=640/1064=0,6
W3=C3/(C1+C2+C3)=104/1064=0,1.

This is what we want – the proportions of 30:60:10. Also, the maximum byte counter is 640 bytes, so the maximum possible delay would be based on 4×160 byte packets, resulting in 40ms value (not too VoIP friendly, huh).

When you configure custom-queue mappings of traffic classes to queue-ids, all statements are evaluated sequentially, and the first one matched by the packet determines the queue where it goes. Thus is, we you match packets less than 65 bytes (e.g. queue-list 1 protocol ip 1 lt 65) before you match ICMP packets (e.g. using an access-list), then all ICMP packets of size less that 65 bytes will be classified using the first statement. Pay special attention that the packet size criterion with legacy QoS commands is based on full layer 2 length – this is true with all legacy queuing techniques. In this case HDLC header is 4 bytes, and thus voice packets have full layer 2 size of 64 bytes. Also, when you specify a port number to be matched for CQ classification, it matches both source or destination port.

R4:
access-list 100 permit tcp 155.1.146.0 0.0.0.255 eq www any
access-list 101 permit icmp any any

!
! Map RIP updates to system queue 0 (priority queue)
!
queue-list 1 protocol ip 0 udp 520
!
! G.729 packets are 60 bytes plus 4 bytes for L2 overhead
! Thus the voice packets are less than 65 bytes and we classify
! them based on their size
!
queue-list 1 protocol ip 1 lt 65
queue-list 1 protocol ip 2 list 100
queue-list 1 protocol ip 3 list 101

!
! Set default queue
!
queue-list 1 default 3

!
! Limit the queue depths
!
queue-list 1 queue 0 limit 10
queue-list 1 queue 1 limit 10
queue-list 1 queue 2 limit 10
queue-list 1 queue 3 limit 10

!
! Set the byte counts for queues
!
queue-list 1 queue 1 byte-count 320
queue-list 1 queue 2 byte-count 640
queue-list 1 queue 3 byte-count 104
!
interface Serial 0/1
 ip mtu 156

R5:
interface Serial 0/1
 ip mtu 156

To verify your settings, configure R6 to source stream of G.729-sized packets to R5, measuring jitter and RTT. The G.729 stream consumes slightly more than 24Kbps of bandwidth. R5 should respond to those RTP probes. At the same time, configure R1 as HTTP server and allow other routers to access its IOS image in the flash memory.

R1:
ip http server
ip http path flash:

R5:
rtr responder

R6:
!
! Configure SLA probe for jitter with G.729 codec
!
ip sla monitor 1
 type jitter dest-ipaddr 155.1.45.5 dest-port 16384 codec g729a
 timeout 1000
 frequency 1
!

ip sla monitor schedule 1 life forever start-time now

Configure R5 to meter the incoming traffic rate. Create three traffic classes to match the packets generated by the SLA probe, WWW traffic and ICMP packets. Assign the classes to a policy map and attach it as a service policy to the serial interface of R5. Note that MQC matches L3 packet size (e.g. 60 bytes for G.729) without any L2 overhead.

R5:
class-map match-all ICMP
 match access-group 101
!
! Note that MQC matches L3 packet size, without L2 overhead
!
class-map match-all VOICE
 match packet length min 60 max 60
class-map match-all WWW
 match access-group 100
!
!
policy-map METER
 class VOICE
 class WWW
 class ICMP
!
interface Serial 0/1
 service-policy input METER
 load-interval 30

Start transferring a large file (the IOS image) from R1 to SW2 and send the barrage of ICMP packets from SW1 to R5:

Rack1SW2#copy http://admin:cisco@155.1.146.1/c2600-adventerprisek9-mz.124-10.bin null:

Rack1SW1#ping 155.1.45.5 size 100 repeat 100000000 timeout 0

Type escape sequence to abort.
Sending 100000, 100-byte ICMP Echos to 155.1.45.5, timeout is 1 seconds:
................

Verify the custom queue settings and statistics on R4.

Rack1R4#show queueing custom
Current custom queue configuration:

List   Queue  Args
1      3      default
1      1      protocol ip          lt 65
1      2      protocol ip          list 100
1      3      protocol ip          list 101
1      0      protocol ip          udp port 520
1      1      byte-count 320
1      2      byte-count 640
1      3      byte-count 104 limit 10

Check the current queues utilization. Note that almost all queues have non-zero depth, and the queue for ICMP packets has the maximum amount of packet drops. Also note that the voice packets (queue 1) are queue too, but the drop count is very low.

Rack1R4#show interfaces serial 0/1
Serial0/1 is up, line protocol is up
[snip]
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1332299
  Queueing strategy: custom-list 1
  Output queues: (queue #: size/max/drops)
     0: 0/20/0 1: 8/20/55 2: 11/20/24497 3: 10/10/1307747 4: 0/20/0
     5: 0/20/0 6: 0/20/0 7: 0/20/0 8: 0/20/0 9: 0/20/0
     10: 0/20/0 11: 0/20/0 12: 0/20/0 13: 0/20/0 14: 0/20/0
     15: 0/20/0 16: 0/20/0
  5 minute input rate 49000 bits/sec, 95 packets/sec
  5 minute output rate 121000 bits/sec, 143 packets/sec
[snip]

It’s even possible to view the current queue contents. Use the command show queue [interface] [x/y] [qNumber]. Specify the custom queue number in range 0-16, where 0 is for the system queue. For example, look at queue 2 (WWW packets in this case) contents:

Rack1R4#show queue serial 0/1 2
Output queue for Serial0/1 is 5/20

Packet 1, linktype: ip, length: 160, flags: 0x88
  source: 155.1.146.1, destination: 155.1.58.8, id: 0x60A7, ttl: 254, prot: 6
    data: 0x9577 0x93FD 0xB6FB 0xA7DC 0x7E54 0xBD08 0x74B7
          0xB79C 0x3EF4 0x27E1 0x75B2 0xC3B7 0x37F3 0xDC17 

Packet 2, linktype: ip, length: 160, flags: 0x88
  source: 155.1.146.1, destination: 155.1.58.8, id: 0x60A8, ttl: 254,
  TOS: 0 prot: 6, source port 80, destination port 11015
    data: 0x0050 0x2B07 0x21FC 0xF1E0 0xA32B 0x3C56 0x5010
          0x0F62 0x7866 0x0000 0x3DBE 0xBE6D 0xE70D 0x3E27 

Packet 3, linktype: ip, length: 160, flags: 0x88
  source: 155.1.146.1, destination: 155.1.58.8, id: 0x60A8, ttl: 254, prot: 6
    data: 0x39FD 0xBEC1 0xC76A 0xF5F7 0x37FC 0xB731 0x4E09
          0xDEB5 0x761C 0xE6F6 0x7390 0xB3B3 0x791C 0x400A
[snip]

Check the SLA probe statistic on R6. Note the high RTT for the probe. This is because the voice packets do not get into a priority queue but are served as other regular classes.

Rack1R6#show ip sla monitor statistics 1
Round trip time (RTT)	Index 1
	Latest RTT: 85 ms
Latest operation start time: 05:28:54.827 UTC Tue Aug 5 2008
Latest operation return code: OK
RTT Values
	Number Of RTT: 1000
	RTT Min/Avg/Max: 12/85/169 ms
Latency one-way time milliseconds
	Number of one-way Samples: 0
	Source to Destination one way Min/Avg/Max: 0/0/0 ms
	Destination to Source one way Min/Avg/Max: 0/0/0 ms
Jitter time milliseconds
	Number of Jitter Samples: 999
	Source to Destination Jitter Min/Avg/Max: 2/26/148 ms
	Destination to Source Jitter Min/Avg/Max: 1/2/19 ms
Packet Loss Values
	Loss Source to Destination: 0		Loss Destination to Source: 0
	Out Of Sequence: 0	Tail Drop: 0	Packet Late Arrival: 0
Voice Score Values
	Calculated Planning Impairment Factor (ICPIF): 11
MOS score: 4.06
Number of successes: 50
Number of failures: 0
Operation time to live: Forever

Look at R5 to see the resulting bandwidth allocation. Note that the voice class bandwidth is close to 24Kbps and it does not exceed its allocated share (0,3*128K). Therefore, other classes may claim the remaining bandwidth and this is explains why WWW traffic has more bandwidth than its guaranteed share (0,6*128K).

Note the ICMP packet flood gets 13Kbps of bandwidth and does not seriously impair other traffic flows, unlike with WFQ scheduling. This is due to the fact that each class has its separate FIFO queue and traffic flows don’t share the same buffer space (plus there is no congestive discard procedure). The aggressive ICMP traffic behavior does not seriously affect other queues, since the exceeding traffic is simply dropped.

In total, all classes draw approximately 120Kbps of L3 bandwidth, which is naturally less than the configured 128Kbps line rate

Rack1R5#show policy-map interface serial 0/1
 Serial0/1 

  Service-policy input: METER

    Class-map: VOICE (match-all)
      878252 packets, 56208128 bytes
      1 minute offered rate 23000 bps
      Match: packet length min 60 max 60

    Class-map: WWW (match-all)
      211383 packets, 72688125 bytes
      1 minute offered rate 84000 bps
      Match: access-group 100

    Class-map: ICMP (match-all)
      68628 packets, 7137268 bytes
      1 minute offered rate 13000 bps
      Match: access-group 101

    Class-map: class-default (match-any)
      2573 packets, 630376 bytes
      1 minute offered rate 0 bps, drop rate 0 bps
      Match: any

Clear counters on the serial interface, and configure the custom queue to use priority treatment for queue 1. This is possible by the means of lowest-custom option. This option specifies the number of custom queue, which is the first to start round robin scheduling. All queues up to this number will use priority scheduling, with queue 0 being the most important, queue 1 being the next important, etc. Alternatively, it’s possible to map the “voice” packets to queue 0, which is always priority.

Note, that even though this allows for enabling priority queue with CQ, it’s not recommended. The problem is that priority queues in CQ are not limited in any way, and thus may effectively starve other queues. Voice traffic should use either IP RTP Priority or LLQ mechanisms for priority queue

Rack1R4#clear counters Serial 0/1
Clear "show interface" counters on this interface [confirm]
Rack1R4#conf t
Rack1R4(config)#queue-list 1 lowest-custom 2

Rack1R4#show queueing custom
Current custom queue configuration:

List   Queue  Args
1      2      lowest custom queue
1      3      default
1      1      protocol ip          lt 65
1      2      protocol ip          list 100
1      3      protocol ip          list 101
1      0      protocol ip          udp port 520
1      1      byte-count 320
1      2      byte-count 640
1      3      byte-count 104 limit 10

Rack1R4#show interfaces serial 0/1
Serial0/1 is up, line protocol is up
[snip]
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 1053009
  Queueing strategy: custom-list 1
  Output queues: (queue #: size/max/drops)
     0: 0/20/0 1: 0/20/55 2: 7/20/20301 3: 7/10/1032653 4: 0/20/0
     5: 0/20/0 6: 0/20/0 7: 0/20/0 8: 0/20/0 9: 0/20/0
     10: 0/20/0 11: 0/20/0 12: 0/20/0 13: 0/20/0 14: 0/20/0
     15: 0/20/0 16: 0/20/0
  5 minute input rate 49000 bits/sec, 95 packets/sec
  5 minute output rate 121000 bits/sec, 144 packets/sec
[snip]

Now verify the SLA statistics for RTP packets probe on R6. Note that RTT has dropped significantly, since the packets now go through the priority queue

Rack1R6#show ip sla monitor statistics 1
Round trip time (RTT)	Index 1
	Latest RTT: 19 ms
Latest operation start time: 05:44:00.485 UTC Tue Aug 5 2008
Latest operation return code: OK
RTT Values
	Number Of RTT: 1000
	RTT Min/Avg/Max: 12/19/35 ms
Latency one-way time milliseconds
	Number of one-way Samples: 0
	Source to Destination one way Min/Avg/Max: 0/0/0 ms
	Destination to Source one way Min/Avg/Max: 0/0/0 ms
Jitter time milliseconds
	Number of Jitter Samples: 999
	Source to Destination Jitter Min/Avg/Max: 1/4/10 ms
	Destination to Source Jitter Min/Avg/Max: 1/2/18 ms
Packet Loss Values
	Loss Source to Destination: 0		Loss Destination to Source: 0
	Out Of Sequence: 0	Tail Drop: 0	Packet Late Arrival: 0
Voice Score Values
	Calculated Planning Impairment Factor (ICPIF): 11
MOS score: 4.06
Number of successes: 91
Number of failures: 0
Operation time to live: Forever

To summarize, Custom Queue aims to implement a version of max min bandwidth sharing. It allows classifying packets to queues, and assigning byte counters, that are effectively queue weights. We’ve seen that with classic round robin scheduling this may lead to unfair bandwidth allocation, unless average packets sizes are taken in account. Howeve, since IOS 12.1 the custom queue implementation actually uses something close to “Deficit Round Robin”, which tracks the amount of excessive bytes consumed by every queue and takes that in account in next scheduling rounds. Therefore, when configuring CQ in modern IOS versions, it’s not necessary to set byte counts proportional to average packets sizes. Still it’s nice to know how to compute those byte counters ;)

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website


You can leave a response, or trackback from your own site.

9 Responses to “Understanding Custom Queuing”

 
  1. NTllect says:

    thanks for the article, Petr!

  2. Where would anyone use custom queuing these days (apart from the somewhat ridiculous tasks in the CCIE world)? Do you still see it used somewhere in new installations in real life?

  3. To: Ivan Pepelnhak

    The same goes to all legacy QoS technologies :) You want see many of those around nowdays, although I still use FR PIPQ at same places ;)

    However, I thkink it’s really useful to see how technologies evolved in their historical perspective. CQ is a nice illustration of RR scheduling, and seeing how it works gives people more insights on max-min resource sharing in general.

    Besides, nobody says that aloud, but CBWFQ is really just the good old WFQ with some “enhancements”. Likewise, LLQ maps directly to IP RTP Priority. However, you can hardly find any information on internals of CBWFQ: e.g. how it computes weights values, how it handles Link Queues, and how it works when you have flow-based WFQ and manual classes configured at the same time.

    Based on that, I believe that introducing new concepts following the evolutionary steps is much easier to understand than starting a breakdown of a new technlogy from a scratch.

  4. michael says:

    On R4, shouldn’t the configuration for RIP use udp instead of tcp? queue-list 1 protocol ip 0 tcp 520
    shouldn’t this be queue-list 1 protocol ip 0 udp 520?

    studying for this exam is causing me to pay attention to detail and question everything.

  5. To: michael

    Yeah, thanks for noting that! I was re-typing the configuration instead of copy-pasting from the router config ;) Fixed that now.

  6. Dominik says:

    Where can I find this byte count calculation on Cisco website? Could you provide me the link to the source?

  7. Izack says:

    Why use packet size to classify VOIP traffic rather than an ACL? Wouldn’t any small packet match this queue (telnet)? Also, the requirement says to give 10% bandwidth to ICMP which is assigned to queue 3. But then queue 3 is made the default queue which would receive all unclassified traffic wouldn’t it? Why not create a separate default queue? Thanks.

  8. Paul says:

    Should the Custom Queue configuration be assigned to the interface with “custom-queue-list 1” ?

  9. Matt says:

    Paul, yep it should. Wont get very far without applying “custom-queue-list 1″ to the interface :)

 

Leave a Reply

Categories

CCIE Bloggers