Wednesday, December 22, 2010

Lesson 34 - Dynamic Routing Protocols Introduction

If you have read the previous post, you must have noticed that using a static routing method tends to be a bit cumbersome in larger implementations. Using one of the dynamic routing protocols feels like an easier solution in these scenarios.

In this post I will briefly explain the general concepts behind dynamic routing protocols. Then, we can jump to implementation fundamentals.

One way of classifying dynamic routing protocols is based on where they are used. This criterion allows us to distinguish between two major solutions:
  1. Interior Gateway Protocols  (IGP) 
  2. Exterior Gateway Protocols (EGP)
Common Interior Gateway Protocols are: 
  • Routing Information Protocol (RIP), 
  • Open Shortest Path First (OSPF), 
  • Enhanced Interior Gateway Protocol (EIGRP, Cisco proprietary protocol), 
  • Intermediate System to Intermediate System (IS-IS).
Exterior Gateway Protocols (currently there is only one in use)
  • Border Gateway Protocol  (BGP)
IGPs are designed to work in private networks. EGPs are used to provide paths in the public network (Internet). 

We can also classify routing protocols based on the algorithm they use to distribute and maintain information (routing table). There are three major algorithms supported by Cisco routers:
  1. Distance Vector (DV, aka Bellman-Ford) – example of protocol: RIP.
  2. Link-State – example of protocols: OSPF, IS-IS.
  3. Advanced Distance Vector – protocols: EIGRP (also BGP is partly distance vector protocol).
Understanding the algorithms helps us determine the proper solution for a given design. There is no one best routing protocol out there, but there could be the best one in a specific design.

In this post I am going to focus in on the first algorithm listed above.

Distance Vector Algorithm Characteristics
This method is sometimes referred to as ‘routing by rumor’. The main characteristics of this approach are:
  • Routers do not know the topology of the network. They only know which is the outbound interface and the next-hop router’s IP address (vector) as well as the metric value which describes how far the destination is (distance).
  • Routers advertise their full routing table periodically. This method of route distribution creates two problems: routing loops and counting to infinity. Special techniques were created to solve these issues (details later in the post).
  • Routers perform automatic summarization if they are connected to different classful (A, B, C) networks.
  • No VLSM support. All network masks must be identical if the subnets of a major class are used in the network (RIPv1). RIPv2 is classless (VLSM supported using ‘no auto-summary’ command).
  • Routers are slow to converge. It takes a lot of time to invalidate lost routes and pick the new path if one is available as well as to synchronize their routing information.
  • Routers use simple metric. The metric number tells a router how many routers the packet has to traverse in order to reach the destination. In modern networks bandwidth of the path is much more important than how many hops will be used.
The above characteristics do not encourage us to use this kind of solution in our modern networks. But knowing the DV rules help us appreciate protocols such as OSPF or EIGRP which are more likely to be used in our designs.

Let’s see how things work when DV algorithm is used. As an example, I will use RIP protocol hoping to explain the principles of operation and how the two design issues have been solved (routing loop and counting to infinity).

Distance Vector Principles of Operation
Consider this simple topology. Without getting into configuration (syntax) details let’s have a quick discussion on how information is distributed using DV algorithm. Initially, the routers recognize only connected subnets. They are populated in the routing table as soon as IP addresses and network masks are configured and they are activated (no shutdown).

Pic. 1 - Connected Subnets.



Icons designed by: Andrzej Szoblik - http://www.newo.pl

Let’s assume that we have enabled RIPv2 protocol in the topology presented above (pic. 1). This version of RIP allows the routers to announce both the subnet IP addresses and the network masks (we’ll put it into practice in the next post). 

The RIP process must be activated in the ‘config’ mode. Then we need to instruct it which interfaces should be activated in the RIP domain. This is configured in the ‘config-router’ mode (‘network’ statement).  The routers begin to ‘chat’ and advertise their routing tables every 30 seconds.

Pretend that R1’s timer of sending the advertisement has just kicked in (pic. 2). R1 is advertising its routing table out of the RIP-enabled interfaces (in my example all interfaces of all routers are in the RIP domain). This way, R2 learns about 10.1.1.0/24 subnet. So from R2’s perspective, R1 router becomes the gateway towards 10.1.1.0/24. 

Now, a word about the metric being advertised. 

Metric used in DV reflects how many routers the packet has to traverse to reach the destination network/subnet (so called 'hop-count'). R1’s routing table’s entries (subnets: 10.1.1.0/24 and 10.1.12.0/24) show the metric of ‘0’ hops (pic. 1) since they are directly connected to F0/0 and F0/1 interfaces respectively (they are local to R1). While advertising them to the neighbors (pic. 2), R1 adds 1 hop (itself) to the existing metric found in the routing table.


NOTICE!
Bear in mind, that algorithm prompts the router to send the full routing table. Current implementation changes that behavior (split-horizon) but more on this later in the post.


Pic. 2 – R1’s RIP advertisements.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

R2 accepts the advertisement about 10.1.1.0/24. It puts this information in the RIP’s database and then it creates the entry in the routing table (purple color). Pay a close attention to what has just happened (pic. 2). The update arrives on R2’s F0/0 interface (RIP-enabled), sourced by the IP address of 10.1.12.1. This way, R2 considers its F0/0 the egress (outbound) interface towards the subnet advertised by R1. The IP address of the sender (10.1.12.1) becomes the next-hop IP address towards the subnet 10.1.1.0/24

Next, let’s imagine R2’s timer has expired and it is sending its routing table out F0/0 and F0/1. Please take a closer look at the picture 3 which shows this process in the graphical form. Just like previously R1 router has done, R2 is sending its routing table adding itself as an additional hop added to the existing metric (existing metric +1).

Pic. 3 – R2’s RIP advertisement.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

Now, R1 and R3 accept the advertisement from R2 and register the information sent in their RIP databases (the interfaces process the update as they RIP-enabled). Appropriate entries in the routing tables also show the egress interfaces and the metric expressed in the number of ‘hops’ (how many routers the packet will have to traverse to reach the destination subnet). Also, the IP address of the gateway (the sender IP address) is registered. Again, take a look at pic.3 which shows the new entries (in purple).

Now is the time for R3 to send its own advertisement. Using the same logic you should be able to tell what is going to happen. Take a look at pic. 4 to see what is going to be advertised and what is going to be learned.

The advertisement sent out R3’s F0/0 interface is useless in our topology because there is no other router listening to it. In my next post, I will show you how to prevent a router from doing it. Advertisement sent out F0/1 interface contains information about R3’s directly connected subnet 10.1.3.0/24. Since the existing metric in R3’s routing table for this subnet is ‘0’ (directly connected to F0/0), R3 will add itself as the hop and advertise it with the metric of ‘1’ (existing metric + 1). R2 is going to learn it on its F0/1 interface which becomes the outbound interface to reach the subnet 10.1.3.0/24. It is the interface to reach the advertising router’s IP address 10.1.23.3 after all.

Pic. 4 – R3’s RIP Advertisement.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

Picture 4 shows this process.

This whole process of advertising the routing table out of all RIP-enabled interfaces occurs every 30 seconds but in fact, there is a jitter time introduced so this may vary between 25-30 seconds. When R2 advertising timer expires, it will pass the information contained in the its routing table on to R1. By doing this, R1 learns about all subnets R2 can reach, including 10.1.3.0/24 now (pic. 5).

Pic. 5 – R2’s RIP Advertisement.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

The process of spreading the information explained using this method is referred to as the ‘routing by rumor’. The state in which all routers have stable information about all networks/subnets that can be reached is called the ‘convergence’. Do not confuse it with ‘convergent networks’ which allow all sort of packet transmissions (voice, video, and data).

Take a look at picture 6. It shows that all routers can reach all the subnets available in the RIP domain. Convergence has been accomplished since their routing table are synchronized and up-to-date.

Pic. 6 – Convergence Achieved.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

The method of distributing information presented is prone to introduce two problems:
  • Routing Loops
  • Counting to Infinity
Of course, they have been resolved by using different techniques which I am going to explain later in the post.

Let’s take a look at the downside of using distance vector algorithm.

Routing Loops
In the picture 7, R1’s F0/0 interface. As soon as the IOS detects this fact, the entry in the routing table about 10.1.1.0/24 is immediately flushed (removed from the routing table completely).

Pic. 7 – R1’s F0/0 Interface Goes Down.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

As per the DV algorithm R1 would still wait till its advertisement timer expires. So instead of sending this ‘update’ immediately after it has lost the subnet, it will wait till its timer says: ‘now you can advertise your routing table’. This behavior might create a loop between R1 and R2 as far as the 10.1.1.0/24 subnet is concerned. Consider this situation depicted below.

Pic 8 – R2’s Advertising Timer Expires.


Icons designed by: Andrzej Szoblik - http://www.newo.pl


R2 is advertising its full routing table out of all RIP-enabled interfaces. In this announcement, there is 10.1.1.0/24 subnet. The metric being advertised is: ‘2’ (the existing metric on R2 + 1). By now, you already know that the advertising router is going to add itself as the hop to the metric of the subnet/network it advertises.

Here is the issue. R1 is receiving 10.1.1/0/24 with the metric of 2 hops, the egress interface (the one the ad came on) is F0/1, and the next-hop-address is 10.1.12.2. Look at the pic. 8 and tell me (I can’t hear you though), what would you do if you were R1? Obviously, you would reject this information because by looking at the topology diagram, you already know that 10.1.1.0/24 is inaccessible (down) now, and the only way to reach it is through R1, right?

But the problem is, that routers using DV algorithm do NOT know the topology like explained in the characteristics section. In fact, R1 IS going to accept the information and treat R2 as the gateway towards 10.1.1.0/24 !!!

Wow! As ridiculous as it sounds, it is exactly what would happen according to the rules set by the designers of this algorithm. So R1’s routing table is going to look like shown in the picture 8. Take a look at it now again!

We have a loop between R1 and R2 regarding 10.1.1.0/24. If R2 receives the packets destined to 10.1.1.0/24 subnet, according to its knowledge (current routing table), it is going to send it out F0/0 interface towards R1. This one in turn, will use its F0/1 interface for the destination 10.1.1.0/24, sending it back to R2. The packets will be looped until their TTL values are decremented reaching the value of TTL=0. Then, a router must drop the packet.

Counting To Infinity
A routing loop is not going to be the only problem here. R1 is going to accept advertisements from R2 regarding 10.1.1.0/24 with the number of hops equal '2’. When R1 advertises its own routing table, it is going to add itself (as the hop) to the metric that already exists in the routing table. Look what is going to happen (pic. 9)

Pic. 9 – R1’s RIP Advertisement.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

Initially, R2 is going to ignore the information about 10.1.1.0/24 from R1 containing the metric of ‘3’ hops since it has much better entry in the routing table (lower metric). However, it was R1 that initially sent the metric of ‘1’ hop. Now, the same R1 router keeps sending the metric of ‘3’ hops. The previous metric of ‘1’ is no longer refreshed. Since it uses the aging timer of 180 seconds (how long the information is valid), it finally accepts the entry with the metric of ‘3’ hops instead.

Then R2 begins to advertise the metric of 4 regarding 10.1.1.0/24 subnet out F0/0 and F0/1. You can predict what is going to happen. Remember, that entries must be refreshed every 30 seconds. If they are not refreshed, the ‘Invalidation Timer’ (180 seconds), allows to accept the entry with worse metric than previously. Take a look at the sequence of events in the picture 10.


Pic. 10 – Larger and Larger Metric Propagation.


Icons designed by: Andrzej Szoblik - http://www.newo.pl

It would last forever despite of the fact that 10.1.1.0/24 is not reachable at all!

The Distance Vector algorithm uses a few techniques to prevent these two problems from happening. Here they are:

  • Triggered Update (aka flash update)
  • Route Poisoning
  • Maximum Metric (RIP considers 16 hops as inaccessible)
  • Poison Reverse
  • Hold-Down Timer
  • Split-Horizon
These methods deserve a few words of explanation.

Triggered Update
IOS uses this method to send the update immediately rather than wait for the advertisement timer to expire. However, there is no guarantee that some router in the chain is not going to send its own information before it receives this update. This might still lead to a situation where the two problems occur. So this method, as the only solution here, is not enough to make it work. Other methods must be used as well in order to avoid routing loops and counting to infinity.

Route Poisoning
Upon losing subnet/network reachability, a router is sending a triggered update. This update is going to include the maximum metric value (RIP=16 hops) which is considered as ‘subnet/network inaccessible’ (cannot be reached).

Maximum Metric (RIP=16)
If a RIP router receives an update about a network/subnet with the metric of 16 hops it is considered as inaccessible. This way, the advertising router is excluded from the list of gateways for the subnet/network advertised with the maximum metric.

Poison Reverse
Once a router receives the advertisement including the maximum metric, if it does not have an alternate path towards the subnet/network lost, it is going to send the same subnet/network prefix with the maximum metric (RIP=16) informing the other routers about it. This will also be sent back to the sender of this information it does not have an alternate path (this might be seen as violation of split-horizon, but remember the metric is the maximum value). Poisoning the path back to the advertising router is the way of informing it that the receiver of this information has no alternate path available either.

Hold-Down Timer
Upon receiving information from a neighbor that a subnet/network is inaccessible, the receiving router is going to enable a hold-down timer for 180 seconds. During this time, the receiving router keeps sending packet to the destination being inaccessible for some time rather than withdrawing the entry from its routing table. Why?

In the past, the routers did not have that much power and the media were unreliable. Interfaces were prone to flaps more often than in today’s reliable networks. An ‘interface flap’ is the condition when it goes down and up subsequently in a very short space of time (1-2 seconds perhaps). Under such circumstances, a router would advertise network as inaccessible and then as accessible again. Since it takes some CPU power to withdraw the entry and put it back in, the designers preferred to wait a bit longer to be absolutely sure (180 seconds by default) that the entry was supposed to be removed from the routing table. In case of an interface flapping, not only would the packets still be delivered but the CPU would not waste its ‘precious’ cycles on removing and putting the entry back in the routing table. 

Split-Horizon
This method prevents the loops from occurring in the scenario we have talked about. This technique prevents a router from sending information it learned back out the interface it was received on. Consider our first example. R2 sent information about 10.1.1.0/24 before R1 had had a chance to send the maximum metric towards R2 (subnet down). Split-Horizon prevents R2 from sending information about 10.1.1.0/24 it learned on its F0/0 interface back out the same interface. As a result of that, R1 is never going to receive information it sent towards R2 (10.1.1.0/24) and believe R2 could be the gateway to 10.1.1.0/24. Thus, there is no loop

In my next post I’m going to show you how to enable RIP and how all these techniques work in practice.