Saturday, October 23, 2010

Lesson 20 - Spanning-Tree Protocol Operation

In my previous post I tried to stress the need for redundant connections between the switches. Multiple paths help us avoid a single point of failure in our designs. However, adding new connections inevitably create loops causing multiple problems. The last section of lesson 19 presented the solution: Spanning-Tree Protocol. It's time we learn a bit more about Spanning-Tree Protocol terminology and scrutinize its operation. So hold down to your hats as we begin the ride ;)

In order to understand the nuts and bolts of Spanning-Tree Protocol (STP), we need to get familiar with its terminology first.

Spanning-Tree Protocol Terminology
The ports participating in STP play different roles and those roles use different states of operation.

Spanning-Tree Port Roles
  1. Root Port (RP) - It is a port on a non-root switch, which is the shortest (the best) path towards the root bridge. Root bridge does NOT have any root ports. (no shortest path to itself ;-))
  2. Designated Port (DP) - It is a port that is in the forwarding state. All ports of the root bridge are designated ports (they are never in a blocking state). BPDU frames our sent out this port.
  3. Non-Designated Port (NDP) - It is a port that is in a blocking state in the STP topology.
Spanning-Tree Port States
  1. Disabled - The port in this state does not participate in the STP operation (it is shut down).
  2. Blocking - The port does NOT forward any Ethernet frames, does NOT accept any Ethernet frames (discards arriving frames), does NOT learn any MAC addresses. However, the port DOES process BPDU frames received from a neighboring switch. If the port transitions to this state (blocking), it can stay blocked for 20 seconds by default (max_age)
  3. Listening - The port in this state CAN send and receive the BPDU frames. However, the port in this state does NOT learn any MAC addresses, and does NOT forward or process incoming frames either. All Ethernet frames are being discarded. The computation of loop free topology takes place in this state. If the port transitions to this state (listening), it can stay in this state for 15 seconds by default (forward_delay).
  4. Learning - The port in this state already knows its role (root port or designated port ) in the STP domain. However, the port will not forward any Ethernet frames yet. It will be learning MAC addresses from the frames arriving at the port in order to populate MAC address table. This helps avoid too much flooding when the port transition to the forwarding state. If the port transitions to this state (learning), it can stay in this state for 15 seconds by default (forward_delay).
  5. Forwarding - The port in this state will forward all Ethernet frames as per switch operation. Also, the port will process all incoming Ethernet frames and will actively learn MAC addresses from the arriving traffic.

Bridges and switches are functionally the same devices. I will use both terms interchangeably.

As soon as you familiarize yourself with STP port roles and port states, it is time to explain how Spanning-Tree Protocol works.

Pic. 1 - STP Port Terminology
Icons designed by: Andrzej Szoblik -

STP (IEEE 802.1d) Principles of Operation
STP will use three stages to compute loop free topology (pic. 2):
  1. Single root bridge election.
  2. Each non-root switch to select a single best port towards the root (root port).
  3. Each non-root switch to select a single forwarding port per segment (designated port).
 Pic. 2 - STP Overview
Icons designed by: Andrzej Szoblik -

Bridge Protocol Data Unit (BPDU)
All switches communicate with one another using special frames called BPDU. Those frames contain multiple parameters that switches are going to process in order to create and maintain loop free topology.

Root Bridge
Root bridge is the switch that has all ports working in the designated role. It will be the reference point from which the loop free topology is computed. Root bridge will impose the timers that other switches will use such as:
  • hello time - how often BPDUs are going to be sent/relayed (default timer=2 seconds), 
  • max age - how long the configuration is valid (default timer=20 seconds),
  • forward delay - how long a port should be in listening/learning state (default timer=15 seconds). 
Root bridge will be announcing its presence by sending BPDU frames. Other switches will relay those frames out their designated port given the hello time. Also, the root bridge has all its ports in the designated role (forwarding).

1. Root Bridge Election

Only one switch in the layer 2 network becomes the root bridge. This is how standard was defined and is known as the Common Spanning-Tree approach (CST). Cisco changed that paradigm and introduced Per Vlan Spanning-Tree approach (PVST+). Cisco switches elect a single root switch per VLAN so, in theory each VLAN could have its own root bridge.

Root election is based on a single parameter that is found in the BPDU frame called: Bridge ID. The switch with the lowest Bridge ID becomes the root. Bridge ID has the following format:


Priority is configurable parameter that is used to elect the root bridge a device you want to be the root. The default value is: 32768. The lower the value is the more likely for a switch to become a root.

Base Mac Address is the unique mac address every switch has been given by the manufacturer. It is a tie breaker in case the priority on all switches is identical.

If you've understood everything so far, you're ready to look at the election process in more detail.

Pic. 3 - Root Bridge Election.
Icons designed by: Andrzej Szoblik -

Imagine that we've just wired our topology in the pic. 3. Now, we start up all the switches and as soon as their ports transition to LISTENING state, they begin to send BPDU frames out of all active ports. In those frames both Bridge ID and Root ID parameters point to their own priority.base-mac-address value. In other words, each switch thinks it is the root bridge. It is like each switch is saying: "Hi there! This is my name (Bridge ID) and by the way I'm the root (Root ID the same as the Bridge ID value). Since they are processing the incoming BPDU's from the neighbors, SW2 and SW3 realize that SW1's Bridge ID is lower than theirs. From that point onwards, they begin to relay BPDU frames saying that SW1 as the root bridge.

In our example, SW3 upon receiving the BPDU from SW1, SW2 and SW4  compares their Bridge ID with its own and the conclusion is that SW1's Bridge ID has the lowest value (base-mac-address breaks the tie). From this point onwards, it relays the BPDU frame out of all its active ports with the following parameters:

Bridge ID = 32768.0000.3333.3333
Root ID = 32768.0000.1111.1111

Similarly, all the switches agree that SW1 is the root (their own Bridge ID is higher).

2. Root Port Selection

As soon as the root has been elected, all non-root switches begin to calculate which port is the best (the least cost) towards the root bridge. This port will be called the root port.

Pic. 4 - Root Port Selection
Icons designed by: Andrzej Szoblik -
SW2, SW3 and SW4 receive BPDUs from different directions. For instance, SW2 will receive them on its port F0/1 and F0/2 (look at pic 4). The accumulative cost (the sum of the cost in the path towards the root), is taken into consideration. The lowest cost to reach the root becomes the root port.

How the cost of path is calculated?

Each speed has its arbitrarily assigned cost which is configurable. A few examples are below:

10 Mbps = 100
100 Mbps = 19
1 Gbps = 4
10 Gbps = 2

The root bridge (here SW1) is sending its BPDU frame every 2 seconds. It uses the parameter called: Root Path Cost in BPDU to advertise the cost to the root. It puts the value of '0' in it, as it is the root bridge and has no cost to itself. The frame is sent out its port F0/1 towards SW3 and F0/2 towards SW2. SW2, upon receiving it, adds the cost used to reach the sender of BPDU based on the predefined speed-to-cost value (all ports in our topology are FastEthernet=19).

Root Path Cost = 0 + 19 = 19 via F0/2

SW2 is going to advertise its best (as of now) cost out of F0/1 port towards SW3. SW3 will receive BPDU from SW1 with the Root Path Cost=0 on its F0/1 port. It will also receive BPDU from SW2 on its F0/2 interface with the Root Path Cost=19. As both ports have the cost of 19 towards those BPDU senders, the following math is done to choose the least cost path towards the root bridge:

Root Path Cost = 0 + 19 = 19 via F0/1
Root Path Cost = 19 + 19 = 38 via F0/2

It is clear that the direct connection towards root bridge via F0/1 is going to be selected as the root port.

SW3 has the least cost towards equal 19 (via F0/1 port). This cost is going to be added to Root Path Cost while it sends the BPDUs out F0/2, F0/3 and F0/4. Of course, SW2 also chooses its F0/2 port as the root port since the cost is smaller.

What if the Root Cost Path is identical?

We run into that situation on SW4. It receives BPDUs on its ports F0/1 and F0/2 with the following parameters:

Bridge ID = 32768.0000.3333.3333
Root ID = 32768.0000.1111.1111
Root Path Cost = 19

The cost clearly does not help to choose a single root port as both ports have the same cost:
19 + 19 = 38.

The following algorithm is used to determine the root port or designated port (in order):
  1. Prefer the lowest Root Path Cost.
  2. In case of the same Root Path Cost, prefer the lowest Bridge ID of the designated switch (the neighbor that sends BPDUs).
  3. In case of receiving BPDUs on multiple ports from the same designated switch (BPDU sender), prefer the lowest Port ID (known also as port priority) of the sender. That parameter has a default value 128 and is configurable.
  4. In case of all above are did not resolve the problem, prefer the lowest Port ID of the BPDU sender.
Equipped with that knowledge let us consider SW4 now.
  1. SW4 receives BPDUs on port F0/1 and F0/2. The Root Path cost is the same: 19 + 19 = 38 on both ports.
  2. The designated switch (SW3), is the same switch i.e. the same Bridge ID (32768.0000.3333.3333).
  3. The designated switch (SW3) sends BPDUs out of its F0/3 and F0/4 ports with the same priority = 128 (Port ID).
  4. The tie breaker is the lowest Port ID where BPDU frames arrive on SW4. Port f0/1 becomes the root port since F0/3 is lower than F0/4 on SW3.
The root ports have been selected on all non-root switches (pic. 5). STP will select a single designated port (forwarding) per segment to block the redundant path towards the root bridge. This way the loop does not exist. Should any of root ports fail, it will take around 30-50 seconds to put the blocking port into forwarding state.

3. Designated Port Selection.
This procedure follows exactly the same algorithm used for root port selection.

Pic. 5 - Designate Port Selection
Icons designed by: Andrzej Szoblik -

Since root port is the best port towards the root bridge it is going to be in the forwarding state (look at the beginning of this lesson). What is left to do, is to choose one of the ports between SW2 and SW3 as designated (forwarding) and the other as non-designated (blocked). The same applies between SW3 and SW4. Either SW3 will block its F0/4, or SW4 should block its F0/2 port.

SW3 will block its F0/2 (non-designated) and SW2 will make its F0/1 port designated (forwarding). The process will look as follows:
  1. Root Path Cost advertised by SW2 is 19 and so is the cost advertised by SW3.
  2. SW2 has lower Bridge ID (32768.0000.2222.2222) than SW3 (32768.0000.3333.3333). SW3 must block its F0/2.
And last selection is going to happen between SW3 (port F0/4) and  SW4 (port F0/2).
  1. Root Path Cost Advertised by SW3 is 19, but SW4 advertises its cost as 38 (two hops via F0/1). SW4 blocks its port F0/2 (non-designated), the SW3 promotes its port F0/4 to designated role (forwarding).
Pic. 6 - Spanning-Tree Topology Computed
Icons designed by: Andrzej Szoblik -

This process happens in the LISTENING state of all ports. Since the topology has been computed and does not have loops (blocking appropriate ports), it is safe to move to next states: learning and finally forwarding.

In the next post, we will look at this process one more time using command line interface and real equipment.