Mar
07

In this post we are going to look into STP convergence process. Many people have perfect understanding of STP, but yet face difficulties when they see questions like “How many seconds will it take for STP to recover connectivity if a given link fails?”. The post will follow the outline below:

1) General overview of STP convergence process
2) How STP converges if a directly connected link fails
3) How STP converges when it detects indirect link failure
4) Topology changes and their effect

See more detailed overview at: http://blog.ine.com/wp-content/uploads/2010/04/understanding-stp-rstp-convergence.pdf

STP Convergence in General

As we know, STP protocol follows certain simple procedure to calculate the loop-free subset of the network topology. STP protocol could be compared to RIP in some sense. Both execute a version of Bellman-Ford iterative algorithm, which could be described as “gradient” (meaning it iteratively looks for the optimal solution, selecting the “closest” candidate every time). Every switch accepts and retains only the best current root bridge information. The switch then blocks alternate paths to the root bridge, leaving only the single optimal (in terms of path cost) uplink and continues relaying the optimal information. If a switch learns about a better (“superior”) root bridge than it knows now (e.g. better bridge id, or shorter path to the root), the old information is erased and the new one immediately accepted and relayed. Note that the switch stores the most recent STP BPDUs with every port that receives them. Therefore, for a given switch, there is a BPDUs stored with every root or alternate (blocked port).

There are certain features in STP designed to improve the algorithm stability and ensure the aging out of the old information. Every BPDU contains two fields: Max_Age and Message_Age. The Message_Age field is incremented every time a BPDU traverses a switch (so it might be compared to the hop count). When a switch stores the BPDU with the respective port, it will count the time in seconds, starting from Message_Age and up to the Max_Age. If during this interval, no further BPDUs are received, the current BPDU is wiped out and the port is declared designated. This procedure ensures that the old information is eventually aged out of the topology.

There is one more thing, similar to the “hold-down” feature found in RIP. It is the way in which STP deals with “inferior” BPDUs. The BPDU is considered inferior, if it carries information about the root bridge that is worse than the one currently stored for the port, or the BPDU has longer distance to reach the current root bridge (compare this to RIP’s increase in metric). Inferior BPDUs may appear when a neighboring switch suddenly loses its uplink and claims itself the new root of the topology. By default, every switch should ignore inferior BPDUs, until the currently stored BPDU expires (time=Max_AgeMessage_Age). This feature intends to stabilize STP topology in situations where an uplink on some switch flaps, causing the switch to start sending inferior information.

STP convergence in case of directly connected link failure

Consider a switch on Fig 1., with two uplinks – one forwarding (root port, port A) and another blocking (alternate port, port B). Imagine now that the root port fails.

stp_convergence_1

There are two different situations:

1) The switch detects loss of carrier and immediately declares the port dead. Since this was the port with the best information, the switch immediately invalidates it, and selects the next “best candidate” which is the alternate port (Port B) as the new root port. The switch will transition Port B through Listening and Learning states, which takes 2xForward_Time. Therefore, the connectivity is restored in 2xForward_Time.

2) The switch does not detect the loss of carrier (for example, the uplink is fiber connected to a converter or connects through a hub), and thus the port remains up. The root port, however, loses the continuous stream of BPDUs. Thus, the stored BPDU information is no longer updated. Based on the default procedure, it takes time=Max_Age-Message_Age to expire the stored information. After this, the switch considers the BPDU stored with the alternate port, and unblock Port B. It will take another 2xForward_Delay to bring the port to forwarding state. Therefore, the connectivity is resotored in 2xForward_Time + (Max_Age-Message_Age).

If the switch detects loss of carrier on the designated port (Port C) nothing much will happen. Since there are no BPDUs received on this port, the switch will only generate a topology change event (more on that later), but will not block or unblock any other local ports. This event, might, however, affect the downstream switches.

STP Convergence in case of indirect link failure

Consider the topology on Fig 2.

stp_convergence_2

In this case, SW2 has better Bridge ID than SW3, and thus Port D is designated on the segment between SW2 and SW3. SW3 blocks the redundant uplink to via SW3 (Port B) and elects Port A as the root port. Now imagine that SW2 detects loss of carrier on the link connected to SW1 (Port C). The switch will immediately invalidate the best BPDU stored for Port C, and will assume itself the root of the spanning-tree, as there are no other ports receiving BPDUs. SW2 will start advertising BPDUs to SW3, setting the designated and the root bridge to itself in the configuration BPDUs. Those are, by definition, inferior BPDUs, and SW3 will ignore them, as it still hears better information from SW1. SW3 will also keep the previous BPDU associated with Port B for the duration of Max_Age-Message_Age. When this timer expires, SW3 will start considering the inferior BPDUs. Port B will move to Listening state, and SW3 will start relaying SW1’s BPDUs to SW2, as those are superior to SW2’s BPDUs. Now, SW2 would detect the better information on its formerly designated port (Port D) and will cycle the port through Listening and Learning states. Both switches (SW2 and SW3) will eventually move their ports into forwarding states, recovering the connectivity. Therefore, it will take Max_Age-Message_Age + 2xForward_Time to recover from indirect link failure.

The effect of topology changes

Switches forward Ethernet frames based on their MAC address tables (filtering tables) that bind MAC addresses to egress ports. When a change in topology occurs (e.g. a link failure) the MAC address tables may appear to be invalid, as the paths between switches have changed. The switches may eventually re-learn the new information, but it may take considerable time, especially if the traffic is scarce and MAC address aging time is large (5 minutes by default). Based on that, if switch detects a change in the topology (e.g. link going up or down), it should notify all other switches that something has changed. In response to this notification, all switches will reduce their MAC address aging time to Forward_Time (15 secs by default) effectively fastening the aging process.

As we know, topology changes are signaled via special TCN BPDU, which is being sent upstream from the originating switch (the one that detected the change) to the root switch via the root ports. As the root switch hears the TCN BPDU, it will set TCN ACK flag in all its outgoing configuration BPDUs for the duration of Max_Age+Forward_Time. All switches that see this flag, will set their MAC address tables aging time to Forward_Time. Once the switch that originated the TCN BPDU will hear the TCN ACK, it will stop signaling about the topology change.

Now what is the effect of a topology change event? Two major things are impacted:

1) Connectivity. In some cases, it may time additional Forward_Delay seconds to expire the old MAC address information and recover connectivity. This may only happen if the old information persists in some switches, and the frames are black-holed.

2) Network performance. Shortening the MAC address table aging time results in less stable topology. When a switch loses a MAC address, it starts flooding frames for this destination, effectively acting like a hub. If the flow of packets in your network is not intense enough, the switches may start losing MAC address table information, resulting in excessive traffic flooding.

The second issue might become pretty dangerous with high number of topology changes. Excessive flooding might severely impact your network performance. Note, that this issue also pertains to L2 topologies that runs RSTP, as the topology changes are handled in the similar way. In order to reduce the number of topology changes, configure all edge ports in the topology (connected to hosts, IP Phones, servers) as spanning-tree portfast. Portfast ports do not generate TC events when they go up or down.

For more detailed description of topology change notification read the following great article at Cisco’s site:

Understanding Spanning-Tree Topology Changes

Part II of this post will consider UplinkFast and BackboneFast features, and their effect on STP convergence.

PS
We often use the formula Max_Age-Message_Age in this text, to be precise. However, most STP topologies are small enough to ignore Message_Age and assume the value of Max_Age for most calculations, unless Max_Age is artificially set to a very low value.

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website


You can leave a response, or trackback from your own site.

21 Responses to “Understanding STP Convergence, Part I”

 
  1. Deepak Arora says:

    Hi Petr,

    It was really a nice article. Can you please share something about how topology changes occur in case of failure in RSTP and MST. Because I haven’t seen any Cisco documentation on these topics.

    Deepak Arora

  2. Tassos says:

    > Now what is the effect of a topology change
    > event? Two major things are impacted:
    >
    > 1) Connectivity….
    >
    > 2) Network performance….
    >
    > The second issue might become pretty dangerous
    > with high number of topology changes.
    > Excessive flooding might severely impact your
    > network performance. Note, that this issue
    > also pertains to L2 topologies that runs RSTP,
    > as the topology changes are handled in the
    > similar way.

    Actually, RSTP might cause -temporally- worse network performance (issue 2) than STP, because all (besides the ones on the TC received port) mac-addresses are cleared. But, this “disadvantage” also helps RSTP provide faster connectivity (issue 1) than STP.

  3. Tassos, you observation is absolutely correct. RSTP fast flushing of mac-address tables indeed speeds-up information relearning, but results in temporary excessive flooding, which is pretty dangerous on a busy network.

    Deepak, We are going to discuss RSTP convergence in-depth in a separate post. However, in most cases it is just a matter of seconds (detecting link failure + synchronization + re-learning).

  4. Joshua Walton says:

    Nice post, Petr.

  5. alex tam says:

    Can I take part of your post to my blog?really like this,mate

  6. Sudhir Bhagat says:

    Hii Petr,

    Really a very helpful article for understanding the Key process in STP. One small querry here from my side

    time=Max_Age – Message_Age

    What is the value of Message_Age here. is that a default value or there is any formula to calculate that.

    regards
    Sudhir

  7. Gino says:

    Excellent post

  8. Gino says:

    Hi Petr

    The port D in SW2(Fig.2) was in Forwarding before failure. When SW2 detects loss of carrier on the link connected to SW1 (Port C), The Port D will move to Blocking or stay in forwarding?, then after the Max_Age-Message_Age, Port D will move to listening when it receive BPDU from SW3?
    It’s weird for me, PORTD was in forwarding state before failure and after that, PORTD is in forwarding state.

  9. sayeed says:

    best article to explain stp convergence

    thank u

  10. mikehil says:

    Hello Petr

    i was read (Link you gave ) ” Understanding Spanning-Tree Protocol Topology Changes ” document. I have question in the example giving in the this document
    then link b2 to b3 goes down B2 generates TCN ACK and switch B1 will receive and decrease age time to 15 second , mac table age time mean host mac will be removed when host is silent . what if host not silent i mean try send information, host mac will not removed from mac table and switch still try send frame to the switch B2 ?

    Thank You

  11. Aaron Dhiman says:

    Great article Petr! Short and Sweet, and I learned some things!

  12. Octavian says:

    Hi Petr,

    Nice articole, although I have an observation to make and please correct me if I’m wrong.

    You said that:
    “As we know, topology changes are signaled via special TCN BPDU, which is being sent upstream from the originating switch (the one that detected the change) to the root switch via the root ports. As the root switch hears the TCN BPDU, it will set TCN ACK flag in all its outgoing configuration BPDUs for the duration of Max_Age+Forward_Time. All switches that see this flag, will set their MAC address tables aging time to Forward_Time. Once the switch that originated the TCN BPDU will hear the TCN ACK, it will stop signaling about the topology change.”

    I agree with you till the part you said tha the RootBridge will “reply with a TCN ACK flag to everybody for max age + forward time.

    As I know, the RootBridge will respond to a TCN BPDU with a BPDU with TC flag set to all nonroot swtiches for max age + forward time, and only the switch that detected the topology change will generate BPDU TCN to upstream, designated swtiches so that every desiganted switch will relay that TCN and respond downstream with a BPDU TCA flag set;

    I’m saying because to my knowledge there is no TCN ACK flag;
    STP has two BPDU message types:
    - configuration BPDU that can have TC or TCA flag set;
    - TCN BPDU;

    How do feel about it?

    • Ankit says:

      @Octavian: Yes, I think you’re right.
      The ‘msg_type’ field determines the kind of BPDU. It’s 0×00 for config BPDU and 0×80 for TCN BPDU.

      According to me, the bridge that suffered a change will send the TCN BPDU upstream towards the root and every switch on the path will relay it out its root port. In addition to this, every switch that receives the TCN BPDU will send a BPDU with TCA flag set downstream.

      Would be great if Petr could confirm though.

      Cheers,
      Ankit

    • didziso says:

      Octavian said:
      “STP has two BPDU message types:
      - configuration BPDU that can have TC or TCA flag set;
      - TCN BPDU;”

      i could almost agree with you.
      ConfBPDU’s can have TC or TCA or BOTH bit’s set. After root bridge receives the first TCN BPDU, it sends the first TC-flagged ConfBPDU also with the TCA bit set. Little nuance, but still worth mentioning.

  13. Ashish says:

    In the case 1 , you indicated port D as designated port because SW2 has better bridgeID(lower bridgeID) than SW3. I was under the impression that the bridge which sends a lower cost BPDU to the LAN segment , is selected as designated bridge and its port becomes the designated port. for e.g if the link connecting SW2 and SW1 costed 100 and link connecting SW3 and SW1 costed 50 , the SW3 becomes desgnated bridge and portB becomes designated port.
    Doesnt the bridge add the link cost to the “cost” field received in the BPDU and then forward it through all designated ports ?

  14. Leo says:

    Once the root bridge has been determined, which two items are assigned to every other switch before STP converges a switched network?

  15. Ali Hassan says:

    it is really very nice and helpful

  16. chandra mohan says:

    This article is really short and sweet (so much information explained very simple and short) :) .

    Thanks,
    Chandra Mohan H

  17. zerny says:

    Hi,

    In the article, you’ve stated that the original switch which detected the change will not stop sending its TCN BPDU until it receives the TCN ACK from the root. But in the provided link, it says
    “The TCN is a very simple BPDU that contains absolutely no information that a bridge sends out every hello_time seconds (this is locally configured hello_time, not the hello_time specified in configuration BPDUs). The designated bridge acknowledges the TCN by immediately sending back a normal configuration BPDU with the topology change acknowledgement (TCA) bit set. The bridge that notifies the topology change does not stop sending its TCN until the designated bridge has acknowledged it. Therefore, the designated bridge answers the TCN even though it does not receive configuration BPDU from its root.”

    I’d like to know which one is correct. Looking forward to your reply. Thanks.

 

Leave a Reply

Categories

CCIE Bloggers