The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document explains what causes parity errors on Cisco routers, and how to troubleshoot them.
Cisco recommends that you have knowledge of how to troubleshoot router crashes.
Refer to Troubleshooting Router Crashes for more information.
This document is not restricted to specific software and hardware versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
Refer to Cisco Technical Tips Conventions for more information on document conventions.
Memory parity errors occur in MultiChannel Interface Processor (MIPS)-based processor products such as:
Cisco 4500/4700 Series Routers
Cisco 7500 Series Routers (RSP1, RSP2, RSP4, RSP8, VIP2-10, VIP2-15, VIP2-20, VIP2-40, VIP2-50)
Cisco 7000 Series Routers (RSP 7000)
Cisco 7200 Series Routers (NPE-100, NPE-150, NPE-175, NPE-200, NPE-225, NPE-300)
Cisco 12000 Series Internet Router
Here are some messages, which are all related to the detection of bad parity somewhere in the system (the list is not exhaustive, but contains the most common messages):
In the show version command output:
System restarted by processor memory parity error at PC 0x6014F7C0, address 0x0
or
System restarted by shared memory parity error at PC 0x60130F40
If you have the output of a show version command from your Cisco device, you can use Cisco CLI Analyzer to display potential issues and fixes. In order to use Cisco CLI Analyzer, you must be a registered customer, be logged in, and have JavaScript enabled.
In the console logs, or in the crashinfo files:
- *** Cache Error Exception *** Cache Err Reg = 0xa401a65a data reference, primary cache, data field error , error on SysAD Bus PC = 0xbfc17950, Cause = 0x0, Status Reg = 0x3040d007 - Error: primary data cache, fields: data, virtual addr 0x6058A000, physical addr(21:3) 0x18A000, vAddr(14:12) 0x2000 virtual address corresponds to main:data, cache word 0 Low Data High Data Par Low Data High Data Par L1 Data : 0:0xFEFFFEFE 0x65776179 0x13 1:0x20536572 0x76657220 0x89 2:0x646F6573 0x206E6F74 0x9C 3:0x20737570 0x706F7274 0xF8 Low Data High Data Par Low Data High Data Par Mem Data : 0:0xFEFFFEFE 0x65776179 0x13 1:0x20536572 0x76657220 0x89 2:0x646F6573 0x206E6F74 0x9C 3:0x20737570 0x706F7274 0xF8 - *** Shared Memory Parity Error *** shared memory control register= 0xffe3 error(s) reported for: CPU on byte(s): 0/1 - %PAR-1-FATAL: Shared memory parity error shared memory status register= 0xFFEF error(s) reported for: CPU on byte(s): 0/1 2/3 - %RSP-3-ERROR: MD error 0000008000000200 %RSP-3-ERROR: QA parity error (bytes 0:3) 02 %RSP-3-ERROR: MEMD parity error condition %RSP-2-QAERROR: reused or zero link error, write at addr 0100 (QA) log 22010000, data 00000000 00000000 %RSP-3-RESTART: cbus complex - %RSP-3-ERROR: CyBus error 01 %RSP-3-ERROR: read data parity %RSP-3-ERROR: read parity error (bytes 0:7) 20 %RSP-3-ERROR: physical address (bits 20:15) 000000 - %RSP-3-ERROR: MD error 00800080C000C000 %RSP-3-ERROR: SRAM parity error (bytes 0:7) F0 %RSP-3-RESTART: cbus complex
There are two kinds of parity errors:
Soft parity errors
These errors occur when an energy level within the chip (for example, a one or a zero) changes. When referenced by the CPU, such errors cause the system to either crash (if the error is in an area that is not recoverable) or they recover other systems (for example, a CyBus complex restarts if the error was in the packet memory (MEMD)). In case of a soft parity error, there is no need to swap the board or any of the components. See the Related Information section for additional information about soft parity errors.
Hard parity errors
These errors occur when there is a chip or board failure that corrupts data. In this case, you need to re-seat or replace the affected component, which usually involves a memory chip swap or a board swap. There is a hard parity error when multiple parity errors occur at the same address. There are more complicated cases that are harder to identify. In general, if you see more than one parity error in a particular memory region in a relatively short period, you can consider it to be a hard parity error.
Studies have shown that soft parity errors are 10 to 100 times more frequent than hard parity errors. Therefore, Cisco highly recommends you to wait for a second parity error before you replace anything. This greatly reduces the impact on your network.
A router has memory in different locations. Theoretically, any memory location can be affected by the parity error, but most memory problems occur in dynamic RAM (DRAM) or shared RAM (SRAM). Based on the platform, here is how you can find out what memory location has been affected, and, if it turns out to be a hard parity error, what part you must replace:
On the Cisco 4500 and 4700 platforms, the crashinfo file is not available in versions earlier than Cisco IOSĀ® Software Release 12.2(10) and 12.2(10)T.
One way to find out where the error occurred is to look at the "restart reason" in the console logs, and in the output of the show version command:
Parity Error in DRAM:
If you did not manually reload the router after the crash, the show version output looks like this:
System restarted by processor memory parity error at PC 0x601799C4, address 0x0 System image file is "flash:c4500-inr-mz.111-14.bin", booted via flash
If a crashinfo file is available, or if console logs have been captured, you can also see something like this:
*** Cache Error Exception *** Cache Err Reg = 0xa0255c61 data reference, primary cache, data field error , error on SysAD Bus PC = 0xbfc0edc0, Cause = 0xb800, Status Reg = 0x34408007
Repeated occurrence of parity errors in DRAM indicates that either the DRAM or the chassis is defective. If you recently removed the chassis, or if you performed any hardware configuration changes, re-seat the DRAM chips to solve the problem. Otherwise, replace the DRAM as a first step. This must prevent the parity errors. If the router still crashes, replace the chassis.
Parity Error in SRAM:
If you did not manually reload the router after the crash, the show version command output looks like this:
System restarted by shared memory parity error at PC 0x60130F40 System image file is "flash:c4500-inr-mz.111-14.bin", booted via flash
If a crashinfo file is available, or if console logs have been captured, you can also see something like this:
*** Shared Memory Parity Error *** shared memory control register= 0xffe3 error(s) reported for: CPU on byte(s): 0/1
or
%PAR-1-FATAL: Shared memory parity error shared memory status register= 0xFFEF error(s) reported for: CPU on byte(s): 0/1 2/3
or
*** Shared Memory Parity Error *** shared memory control register= 0xffdf error(s) reported for: NIM1 on byte(s): 0/1 2/3
Note:
If the error is reported for the CPU, replace the SRAM.
If the error is reported for NIM(x), replace the network module in slot (x). The SRAM allocated to slot (x) can also be affected. In this case, replace the SRAM.
Repeated parity errors in SRAM indicate either defective SRAM chips, or a defective network module that has written bad parity in the SRAM. If you removed the chassis recently, or if you made any hardware configuration changes, re-seat the network modules and the SRAM chips to solve the problem. Otherwise, check where the error is reported in the console logs (see the output example above).
As with the Cisco 4000 series, the problem can be due to faulty DRAM or SRAM for these platforms. The problem can also be because of a defective processor card (RP, RSP or NPE). The Cisco 7000 and 7500 can also report parity errors generated by a faulty or badly seated Interface Processor (legacy xIP or VIP).
Check the crashinfo file and the console logs for one of these error messages:
For the RP, RSP and NPE, you usually see something like this:
Error: primary data cache, fields: data, (SysAD) virtual addr 0x6058A000, physical addr(21:3) 0x18A000, vAddr(14:12) 0x2000 virtual address corresponds to main:data, cache word 0
or simply:
Error: primary data cache, fields: data, SysAD phy21:3 0x201880, va14:12 0x1000, addr 63E01880
This indicates a problem on the RSP itself. If the problem only occurs once, it is most probably a transient issue.
For the RSP, the message can look like this:
%RSP-3-ERROR: MD error 0000008000000200 %RSP-3-ERROR: QA parity error (bytes 0:3) 02 %RSP-3-ERROR: MEMD parity error condition %RSP-2-QAERROR: reused or zero link error, write at addr 0100 (QA) log 22010000, data 00000000 00000000 %RSP-3-RESTART: cbus complex
or
%RSP-3-ERROR: CyBus error 01 %RSP-3-ERROR: read data parity %RSP-3-ERROR: read parity error (bytes 0:7) 20 %RSP-3-ERROR: physical address (bits 20:15) 000000
If there is no indication of another interface processor that writes bad parity into the SRAM (for example, VIP2-1-MSG error messages), the most likely reason for the parity error is the SRAM itself. In this case, replace the RSP.
If other error messages indicate that an interface processor writes bad parity, it can be a faulty or badly-seated card.
If you receive %VIP2-1-MSG: slot(x) messages in the logs or in the crashinfo file, refer to Troubleshooting VIP Crashes.
At the first occurrence of a parity error, it is not possible to distinguish between a soft or hard parity error. From experience, most parity occurrences are soft parity errors, and you can usually dismiss them. If you have recently changed some hardware or have moved the box, try to re-seat the affected part (DRAM, SRAM, NPE, RP, RSP, or VIP). Frequent multiple parity occurrences signify faulty hardware. Replace the affected part (DRAM, RSP, VIP, or motherboard) with the help of the instructions mentioned in this document.
If you still need assistance after you follow the troubleshooting steps above and want to open a service request with the Cisco TAC, be sure to include this information: |
---|
Note: Do not manually reload or power-cycle the router before you collect the above information unless required to troubleshoot a processor memory parity error, because this can cause important information to be lost that is needed to determine the root cause of the problem. |