- Title Pages
- Preface
- Chapter 1 - Basic Troubleshooting Tasks and Startup Issues
- Chapter 2 - PEM Faults and Blower Failures
- Chapter 3 - Troubleshooting PRE Modules
- Chapter 4 - Troubleshooting Line Cards
- Chapter 5 - Replacing or Recovering Passwords
- Appendix A - Unsupported Commands
- Index
- Appendix B - Recommended Tools and Test Equipment
- General Information for Troubleshooting Line Card Crashes
- Troubleshooting the Timing, Communication, and Control Plus Card
- Troubleshooting the OC-12 Packet-Over-SONET Line Card
- Troubleshooting the OC-12 Dynamic Packet Transport Spatial Reuse Protocol WAN Card
- Troubleshooting the Cisco uBR10012 OC-48 DPT/POS Line Card
- Troubleshooting the Gigabit Ethernet Line Card
Troubleshooting Line Cards
This chapter discusses troubleshooting faults on the following Cisco uBR10012 line cards:
•General Information for Troubleshooting Line Card Crashes
•Troubleshooting the Timing, Communication, and Control Plus Card
•Troubleshooting the OC-12 Packet-Over-SONET Line Card
•Troubleshooting the OC-12 Dynamic Packet Transport Spatial Reuse Protocol WAN Card
•Troubleshooting the Cisco uBR10012 OC-48 DPT/POS Line Card
•Troubleshooting the Gigabit Ethernet Line Card
General Information for Troubleshooting Line Card Crashes
Line card crashes occur when the hardware or software encounter unexpected situations that are not expected in the current design. As a general rule, they usually indicate a configuration error, a software error, or a hardware problem.
Table 4-1 lists the show commands that are most useful in collecting information to troubleshoot line card crashes.
Use the following procedure if you suspect that a line card has crashed.
Step 1 If you can identify the particular card that has crashed or is experiencing problems, first use the other sections in this chapter to perform basic troubleshooting. In particular, ensure that the line card is fully inserted into the proper slot, and that all cables are properly connected.
Step 2 If any system messages were displayed on the console or in the SYSLOG logs at the time of the crash, consult the Cisco CMTS System Messages guide and the Cisco IOS System Messages Guide for possible suggestions on the source of the problem.
Step 3 Line cards can crash or appear to crash when an excessive number of debug messages are being generated. In particular, this can happen when using the verbose or detail mode of a debug command, or if the debug command is dumping the contents of packets or packet buffers. If the console contains a large volume of debug output, turn off all debugging with the no debug all command.
Step 4 If the system message log contains messages that indicate the line card is not responding (for example, %IPCOIR-3-TIMEOUT), and the card's LEDs are not lit, the line card might have shut down because of overheating. Ensure that all chassis slots either have the proper card or module installed in them. If a slot is blank, ensure that the slot has a blank front panel installed, so that proper airflow and cooling can be maintained in the chassis.
Step 5 Use the show context summary command to identify all of the line cards that have experienced a crash:
Router# show context summary
CRASH INFO SUMMARY
Slot 1/0: 0 crashes
Slot 1/1: 0 crashes
Slot 2/0: 0 crashes
Slot 2/1: 0 crashes
Slot 3/0: 0 crashes
Slot 3/1: 0 crashes
Slot 4/0: 1 crashes
1 - crash at 04:28:56 EDT Tue Apr 20 1999
Slot 4/1: 0 crashes
Slot 5/0: 0 crashes
Slot 5/1: 0 crashes
Slot 6/0: 0 crashes
Slot 6/1: 0 crashes
Slot 7/0: 0 crashes
Slot 7/1: 0 crashes
Slot 8/0: 0 crashes
Slot 8/1: 0 crashes
Router#
Step 6 After identifying the particular card that crashed, use the show context command again to display more information about the most recent crash. For example:
Router# show context slot 2/0
CRASH INFO: Slot 2/0, Index 1, Crash at 19:57:56 PDT Wed Nov 27 2002
VERSION:
7200 Software (UBR10KCLC-LCK8-M), Version 12.2(122BC.021127.), CISCO DEVELOPMENN
Compiled Wed 27-Nov-02 12:57 by
Card Type: UNKNOWN, S/N CAB0544L6F5
System exception: sig=10, code=0x8000000C, context=0x60A1BDE4
STACK TRACE:
traceback 601C28FC 601C29B4 601B9E8C 600F99B0 600F999C
CONTEXT:
$0 : 00000000, AT : 60930000, v0 : FFFFFFFF, v1 : 60940000
a0 : 00000000, a1 : 00000000, a2 : 00000001, a3 : 0000EA60
t0 : FFFFFFFF, t1 : FFFFA91C, t2 : 601284E0, t3 : FFFF00FF
t4 : 601284D8, t5 : 00000062, t6 : 00000000, t7 : D1B71759
s0 : 00000000, s1 : 00000008, s2 : 00000000, s3 : 60CD0998
s4 : 60CD0990, s5 : 00000000, s6 : 00000002, s7 : 60940000
t8 : 60D98C2C, t9 : 0000001B, k0 : 3040D001, k1 : BE840244
gp : 6093BD60, sp : 60CD0968, s8 : 60A70000, ra : 601C2900
EPC : 0x601C28F8, SREG : 0x3400F903, Cause : 0x8000000C
ErrorEPC : 0xCF1998F2
SLOT 2/0: *Jan 1 00:01:30.371: %SYS-2-EXCEPTIONDUMP: System Crashed, Writing Coredump...
Router#
Step 7 Look for the SIG Type in the line that starts with "System exception" to identify the reason for the crash. Table 4-2 lists the most common SIG error types and their causes.
Step 8 The vast majority of line card crashes are either Cache Parity Exception (SIG type=20), Bus Error Exception (SIG type=10), and Software-forced Crashes (SIG type=23). Use the following sections to further troubleshoot these problems:
If the line card crashed for some other reason, capture the output of the show tech-support command. Registered Cisco.com users can decode the output of this command by using the Output Interpreter tool, which is at the following URL:
https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl
Step 9 If you cannot resolve the problem using the information from the Output Interpreter, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show tech-support command.
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
For information on contacting TAC and opening a case, see the "Obtaining Technical Assistance" section on page x.
Cache Parity Errors
A cache parity error (SIG type is 20) means that one or more bits at a memory location were unexpectedly changed after they were originally written. This error could indicate a potential problem with the Dynamic Random Access Memory (DRAM) that is onboard the line card.
Parity errors are not expected during normal operations and could force the line card to crash or reload. These memory errors can be categorized in two different ways:
•Soft parity errors occur when an energy level within the DRAM memory changes a bit from a one to a zero, or a zero to a one. Soft errors are rare and are most often the result of normal background radiation. When the CPU detects a soft parity error, it attempts to recover by restarting the affected subsystem, if possible. If the error is in a portion of memory that is not recoverable, it could cause the system to crash. Although soft parity errors can cause a system crash, you do not need to swap the board or any of the components, because the problem is not defective hardware.
•Hard parity errors occur when a hardware defect in the DRAM or processor board causes data to be repeatedly corrupted at the same address. In general, a hard parity error occurs when more than one parity error in a particular memory region occurs in a relatively short period of time (several weeks to months).
When parity occurs, take the following steps to resolve the problem:
Step 1 Determine whether this is a soft parity error or a hard parity error. Soft parity errors are 10 to 100 times more frequent than hard parity errors. Therefore, wait for a second parity error before taking any action. Monitor the router for several weeks after the first incident, and if the problem reoccurs, assume that the problem is a hard parity error and proceed to the next step.
Step 2 When a hard parity error occurs (two or more parity errors at the same memory location), try removing and reinserting the line card, making sure to fully insert the card and to securely tighten the restraining screws on the front panel.
Step 3 If this does not resolve the problem, remove and reseat the DRAM chips. If the problem continues, replace the DRAM chips.
Step 4 If parity errors occur, the problem is either with the line card or the router chassis. Try removing the line card and reinserting it. If the problem persists, try removing the line card from its current slot and reinserting it in another slot, if one is available. If that does not fix the problem, replace the line card.
Step 5 If the problems continue, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show tech-support command.
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
For information on contacting TAC and opening a case, see the "Obtaining Technical Assistance" section on page x.
Bus Errors
Bus errors (SIG type is 10) occur when the line card tries to access a memory location that either does not exist (which indicates a software error) or that does not respond (which indicates a hardware error). Use the following procedure to determine the cause of a bus error and to resolve the problem.
Perform these steps as soon as possible after the bus error. In particular, perform these steps before manually reloading or power cycling the router, or before performing an Online Insertion/Removal (OIR) of the line card, because doing so eliminates much of the information that is useful in debugging line card crashes.
Step 1 Capture the output of the show stacks, show context, and show tech-support commands. Registered Cisco.com users can decode the output of this command by using the Output Interpreter tool, which is at the following URL:
https://www.cisco.com/cgi-bin/Support/OutputInterpreter/home.pl
Step 2 If the results from the Output Interpreter indicate a hardware-related problem, try removing and reinserting the hardware into the chassis. If this does not correct the problem, replace the DRAM chips on the hardware. If the problem persists, replace the hardware.
Step 3 If the problem appears software-related, verify that you are running a released version of software, and that this release of software supports all of the hardware that is installed in the router. If necessary, upgrade the router to the latest version of software.
Tip The most effective way of using the Output Interpreter tool is to capture the output of the show stacks and show tech-support commands and upload the output into the tool. If the problem appears related to a line card, you can also try decoding the show context command.
Upgrading to the latest version of the Cisco IOS software eliminates all fixed bugs that can cause line card bus errors. If the crash is still present after the upgrade, collect the relevant information from the above troubleshooting, as well as any information about recent network changes, and contact Cisco TAC.
Software-Forced Crashes
Software-forced crashes (SIG type is 23) occur when the Cisco IOS software encounters a problem with the line card and determines that it can no longer continue, so it forces the line card to crash. The original problem could be either hardware-based or software-based.
The most common reason for a software-forced crash on a line card is a "Fabric Ping Timeout," which occurs when the PRE-1 module sends five keepalive messages (fabric pings) to the line card and does not receive a reply. If this occurs, you should see error messages similar to the following in the router's console log:
%GRP-3-FABRIC_UNI: Unicast send timed out (4)
%GRP-3-COREDUMP: Core dump incident on slot 4, error: Fabric ping failure
Fabric ping timeouts are usually caused by one of the following problems:
•High CPU Utilization—Either the PRE-1 module or line card is experiencing high CPU utilization. The PRE-1 module or line card could be so busy that either the ping request or ping reply message was dropped. Use the show processes cpu command to determine whether CPU usage is exceptionally high (at 95 percent or more). If so, see the "High CPU Utilization Problems" section for information on troubleshooting the problem.
•CEF-Related Problems—If the crash is accompanied by system messages that begin with "%FIB," it could indicate a problem with Cisco-Express Forwarding (CEF) on one of the line card's interfaces. For more information, see Troubleshooting CEF-Related Error Messages, at the following URL:
http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a0080110d68.shtml
•IPC Timeout—The InterProcess Communication (IPC) message that carried the original ping request or the ping reply was lost. This could be caused by a software bug that is disabling interrupts for an excessive period of time, high CPU usage on the PRE-1 module, or by excessive traffic on the line card that is filling up all available IPC buffers.
If the router is not running the most current Cisco IOS software, upgrade the router to the latest software release, so that any known IPC bugs are fixed. If the show processes cpu shows that CPU usage is exceptionally high (at 95 percent or more), or if traffic on the line card is excessive, see the "High CPU Utilization Problems" section.
If the crash is accompanied by %IPC-3-NOBUFF messages, see Troubleshooting IPC-3-NOBUFF Messages on the Cisco 12000, 10000 and 7500 Series, at the following URL:
http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186a00800945a1.shtml
•Hardware Problem—The card might not be fully inserted into its slot, or the card hardware itself could have failed. In particular, if the problem began occurring after the card was moved or after a power outage, the card could have been damaged by static electricity or a power surge. Only a small number of fabric ping timeouts are caused by hardware failures, so check for the following before replacing the card:
–Reload the software on the line card, using the hw-module slot reset command.
–Remove and reinsert the line card in its slot.
–Try moving the card to another slot, if one is available.
If software-forced crashes continue, collect the following information and contact Cisco TAC:
•All relevant information about the problem that you have available, including any troubleshooting you have performed.
•Any console output that was generated at the time of the problem.
•Output of the show tech-support command.
•Output of the show log command (or the log that was captured by your SYSLOG server, if available).
For information on contacting TAC and opening a case, see the "Obtaining Technical Assistance" section on page x.
Troubleshooting the Timing, Communication, and Control Plus Card
At least one working Timing, Communication, and Control Plus (TCC+) card must be installed in the Cisco uBR10012 router for normal operations. The TCC+ card acts as a secondary processor that performs the following functions:
•Generates and distributes 10.24 MHz clock references to each cable interface line card.
•Generates and distributes 32-bit time stamp references to each cable interface line card.
•Allows software to independently power off any or all cable interface line cards.
•Provides support for Online Insertion/Removal (OIR) operations of line cards.
•Drives the LCD panel used to display system configuration and status information.
•Monitors the supply power usage of the chassis.
•Provides two redundant RJ-45 ports for external timing clock reference inputs such as a Global Positioning System (GPS) or BITS clock.
If the Cisco uBR10012 router does not have a working TCC+ card installed, the WAN and cable interface line cards will experience excessive packet drops, or all traffic will be dropped, because of an invalid timing signal. Also, if no TCC+ card is installed, the cable power command is disabled, because this function is performed by the TCC+ card.
Note Because the TCC+ card is considered a half-height card, use slot numbers 1/1 or 2/1 to display information for the TCC+ card using the show diag command. The show cable clock and show controllers clock-reference commands also use these slot numbers when displaying clock-related information.
Figure 4-1 TCC+ Front Panel
The front panel on the TCC+ card has seven LEDs. Table 4-3 describes each LED on the TCC+ card.
When performing any troubleshooting on the TCC+ cards, first check the LEDs as follows:
1. Check the POWER LEDs on each TCC+ card. Are the POWER LEDs on each TCC+ card on (green)?
–If no, remove the TCC+ card and reinsert it, making sure that it firmly connects to the backplane and that both captive screws are tightly connected.
–If yes, proceed to the next step.
2. Is the STATUS LED on the primary TCC+ card on (green) to indicate that it is the primary card? Is the STATUS LED on the secondary TCC+ card flashing (green) to indicate that it is the redundant card?
Use Table 4-4 to continue troubleshooting the TCC+ cards.
Troubleshooting the OC-12 Packet-Over-SONET Line Card
Figure 4-2 describes the LEDs on the Cisco uBR10-1OC12/P-SMI OC-12 Packet-Over-SONET (POS) line card faceplate. Use these descriptions to verify the operation of the OC-12 POS line card.
Figure 4-2 OC-12 POS Line Card LEDs
Table 4-5 describes fault conditions on the OC-12 POS line card and recommended corrective actions.
Troubleshooting the OC-12 Dynamic Packet Transport Spatial Reuse Protocol WAN Card
Figure 4-3 shows and Table 4-6 describes the LEDs on the Cisco BR10-SRP-OC12SML Dynamic Packet Transport (DPT) Spatial Reuse Protocol (SRP) WAN card.
Figure 4-3 OC12 SRP/DPT WAN Line Card LEDs
Troubleshooting the Cisco uBR10012 OC-48 DPT/POS Line Card
The Cisco OC-48 DPT⁄POS interface module has a pair of OC-48c, fiber-optic standard connector (SC) duplex ports that provide an SC connection for either the single-mode short-reach or single-mode long-reach version. Figure 4-4 shows the faceplate on the Cisco OC-48 DPT⁄POS interface module, and Table 4-7 describes each LED.
Figure 4-4 Cisco OC-48 DPT Interface Module Faceplate
Troubleshooting the Gigabit Ethernet Line Card
Figure 4-5 describes the LEDs on the Cisco uBR10-1GE Gigabit Ethernet line card faceplate to help you verify correct operation.
Tip Make sure that the gigabit Ethernet Interface Converter (GBIC) type on the Cisco uBR10012 router matches the GBIC type at the other end of the fiber optic cable.
Figure 4-5 Gigabit Ethernet Line Card Faceplate and LED Descriptions
Table 4-8 describes the gigabit Ethernet line card fault indications and suggests responses to each.