Friday, September 17, 2010

Lesson 11 - Layer 2 Connectivity Troubleshooting Part 1

The last two lessons I tried to explain the foundations related to layer 2 operation. I discussed very important switching process and CDP protocol which comes in handy at times. If you also have watched the video in the lesson 10, you got a glimpse of few commands explained in the theory earlier on.

In this lesson, we'll focus on practical application of the layer 1 and layer 2 concepts which could be helpful in troubleshooting networking issues.

Even though we're capable of creating lots of great things we're still human beings. And 'to err is human' adage is as conspicuous in the networking field as anywhere else. This means, that occasionally things won't work as expected. In such situations we need to be able to isolate and fix the problems quickly.

Reactive troubleshooting almost always uses the following work flow:
  1. A problem is reported.
  2. Data and facts are collected.
  3. Data analysis must be performed.
  4. Potential causes are eliminated.
  5. Hypothesis is drawn.
  6. Hypothesis is verified. 
In some situations the steps 3 and 4 could be omitted. But it depends on the nature of the problem reported, severity of issue, skills of the technician etc.

As we have yet to learn layer 2 technologies such as Vlans and Spannig-Tree Protocol which add complexity to troubleshooting process, let us, for now, focus on simple cases. This way we're going to build our troubleshooting skills step by step given the knowledge we posses.

Recall the process of moving data from one computer to another. Everything sent from the application layer goes down to the physical layer. That teaches us one important thing: if the layer 1 is not operational, nothing else will work!

So, we could start diagnosing the networking problems by checking the layer 1 connectivity first. Then, we could move on to the layer 2 and work our way up till we isolate the problem. This makes perfect sense. However, a lot of technicians use other method known as 'divide and conquer'.  In networking diagnostics that might be checking the layers in the middle. For instance using the 'ping' utility we can check the layer 1 through layer 3 status as a starting point. Of course, you already know how the 'ping' works, don't you? 'Ping' uses ICMP protocol which is a layer 3 messaging mechanism encapsulated directly in IP headers.

Ping utility is a layer 3 reachability checking mechanism. It sends a number of ICMP echo messages to the destination host. If the destination host receives echo messages, it sends ICMP echo reply messages back to the sender. Of course, assuming that there are no mechanisms implemented that would filter those messages along the path (like local firewall for instance), the sender should receive those replies and thus effectively checking layer 1 through layer 3 reachability. Thus, the path is verified in BOTH directions.

If you aim at the layer 3 reachability using 'ping' utility, you may get one of the two results:
  1. You get the reply from the destination. This leads to a conclusion that all layers 1 through 3 are working properly. And the connectivity problems might be related to upper layers (layer 4 upward).
  2. You do not get the reply from the destination. In that case, this step is not enough to determine the nature of the issue. In this case, you must perform some additional diagnostics.
Now, I am going to show you a few of such steps.

The steps presented in this lesson are not the ALL possible diagnostics you can do. And they do not have to be done in this specific order. I am merely listing some logical steps which might be useful in order to 'nail down' the root cause of the problem.

Trouble Ticket 1
Data transfer from PC1 toward PC2 is very slow (refer to Pic 1).

Pic. 1 - Simple Topology
Icons designed by: Andrzej Szoblik -

As you see, without the topology diagram, it is much more challenging to do diagnostics. If we know the topology and the path between the source and destination devices in question, we can focus on all components that participate in the data transmission and isolate the issue.

First, let us determine what we know.

The transmission succeeds, but is slow. Verify that yourself. Do not assume it's true or properly tested by the person who reported that. Most users cannot properly describe the nature of the problems.

So, you have done the tests and the transfer proves to be slow indeed.

Some questions you might ask:
  • Has this problem occurred recently?
  • Is this a new client computer or destination server?
  • Has any configuration/update/cable replacement/or any other changes been done on any devices in the path before the problem manifested itself?
  • Are other computers experiencing the same problem, or is this individual case?
  • etc.
Further data and fact collection may depend on the answers to those questions. For the argument's sake let's assume, that the server is a brand new computer installed the day before the problem occurred. All clients sending files to the server suffer from slow data transfer. When you connected the same client to the server directly, the transfer is very fast (computer-to-server through cross over cable).

Given those facts, we're going to take a quick look at the status of the interface where our server computer is connected. The command that is very useful to check both the layer 1 and layer 2 status is:

SW2#show int f0/2 

Pic. 2 - show interface command

This output deserves some explanation.
FastEthernet is up,
This is the status of the layer 1 connectivity (your cable seems to be attached right?).

line protocol is up
This is the status of the layer 2. It looks like the keepalive packets sent every 10 seconds are working back and forth (do not trust it entirely though; look at the trouble ticket 2 in the next lesson).

half-duplex, 100Mb/s
Duplex negotiated is HALF duplex. Most network adapters used in the computers use AUTO negotiation. This means, that the NIC (Network Interface Card aka network adapter), sends special signals to the port of the switch trying to negotiate FULL duplex and the highest speed supported. Unfortunately, some NIC manufacturers do not follow the specification regarding this signaling. This causes some "misunderstanding" between the port of the switch and the NIC. Switch typically drops down the duplex from FULL to HALF. That causes the switch port to enable the Carrier Sense, Multiple Access with Collision Detection mechanism (CSMA/CD) which is used on SHARED not dedicated connections (for details look at the lesson 8).
What we end up having is the NIC working in FULL duplex but the port of the switch runs HALF duplex. It 'thinks' it can either send or receive data but not do both at the same time. When server begins to 'push' data across the network, the port of the switch cannot send anything out towards the server. When it finally 'thinks' it can (medium free) and sends data towards the server, the latter begins to send data down towards the port as it is allowed in FULL duplex connections. The switch must stop immediately as the frames are experiencing collision. At least that is how the switch works under the circumstances. Then it waits till the carrier is free again (no data from the server down the port). This problem is known as: DUPLEX MISMATCH. That results in great number of collisions and late collisions recorded on the port of the switch like shown in the above output.

The solution to that problem could be the following actions:
  1. Try to upgrade the NIC driver using your server/computer vendor's web site.
  2. If the problem persists, you may try to hard code speed and duplex. You have to do this on both ends of the connections. This disables the AUTO NEGOTIATION feature (do not listen to people who say you should do this on one end of the connection).
  3. Sometimes, though very unlikely in our situation, the cable can cause that sort of behavior. Replacing it to the one that is proven to be good, might help. Again, typically we would see some other layer 1 errors (CRC errors, carrier loss).
  4. Replace the NIC on the server to the one that you are sure is working well.
Hard Coding Speed and Duplex

SW2#configure terminal
SW2(config)#interface fastethernet0/2
SW2(config-if)#speed 100
SW2(config-if)#duplex full

As for the computer, you have to refer to the manual of your operating system  how to set speed and duplex on the NIC manually. Perhaps google this. Google are the best!

In the next lessons, we will resolve two more connectivity issues given the skills we obtained in previous lessons.