General Troubleshooting

This chapter provides procedures for troubleshooting the most common problems encountered when operating the NCS 1014 chassis. To troubleshoot specific alarms, see the Alarm Troubleshooting chapter. If you cannot find what you are looking for, contact Cisco Technical Support (1 800 553-2447).

Capture Logs

When troubleshooting NCS 1014 issues, your technical support representative needs certain information about the situation and the symptoms that you are experiencing. To speed up the problem isolation and resolution process, collect the necessary data before you contact your representative.

To collect all debugging information, perform these steps:

.

Procedure


Step 1

show logging

Displays the contents of the logging buffers. You can also view details of FPD upgrade failures.

Example:

RP/0/RP0/CPU0:ios# show logging
Fri Nov 26 15:03:48.886 UTC
Syslog logging: enabled (0 messages dropped, 0 flushes, 0 overruns)
    Console logging: Disabled
    Monitor logging: level debugging, 0 messages logged
    Trap logging: level informational, 0 messages logged
    Buffer logging: level debugging, 1025 messages logged

Log Buffer (2097152 bytes):

RP/0/RP0/CPU0:Nov 25 16:40:28.533 UTC: syslogd[155]: %SECURITY-XR_SSL-6-INFO : XR SSL info: Setting fips register
RP/0/RP0/CPU0:Nov 25 16:40:36.323 UTC: cfgmgr-rp[120]: %MGBL-CONFIG-7-INTERNAL : Configuration Manager was unable to find subtree for 'sh_p_service_role_daemon' partition.  : cfgmgr-rp : (PID=2522) :  -Traceback= 7f1be3f92420 7f1be4bdd0c6 7f1be4bdd208 7f1be4bd74a4 7f1be4bd7e45 7f1be4bdb972 7f1be4bd7f0e 55e025a46170 55e025a42429 55e025a3168f
RP/0/RP0/CPU0:Nov 25 16:40:36.457 UTC: aib[291]: Registering with IM
RP/0/RP0/CPU0:Nov 25 16:40:36.661 UTC: cma_partner[350]: Packet received on undiscovered module 160
RP/0/RP0/CPU0:Nov 25 16:40:37.113 UTC: ifmgr[142]: platform_pfi_ifh_get_if_alloc_info: Setting pic
............
............

Step 2

show tech-support ncs1014

Creates a .tgz file that contains the dump of the configuration and show command outputs. This file provides system information for the Cisco Technical Support.

Example:

RP/0/RP0/CPU0:ios# show tech-support ncs1014 
Fri Nov 26 15:05:28.996 UTC
++ Show tech start time: 2021-Nov-26.150529.UTC ++
Fri Nov 26 15:05:30 UTC 2021 Waiting for gathering to complete
..................................................................................................
Fri Nov 26 15:10:38 UTC 2021 Compressing show tech output
Show tech output available at 0/RP0/CPU0 : /harddisk:/showtech/showtech-ncs1014-2021-Nov-26.150529.UTC.tgz
++ Show tech end time: 2021-Nov-26.151040.UTC ++

Step 3

show tech-support install

Collects the Cisco support file for the installation information. By default, the output of this command is saved on the NCS 1014 hard disk in a file with .tgz extension. Similarly, other show-tech-support commands can be used to gather data for a specific area.

Example:

RP/0/RP0/CPU0:N112#show tech-support install
++ Show tech start time: 2023-Dec-07.062636.UTC ++
Thu Dec  7 06:26:37 UTC 2023 Waiting for gathering to complete
...............................................................................................................
Thu Dec  7 06:32:48 UTC 2023 Compressing show tech output
Show tech output available at 0/RP0/CPU0 : /harddisk:/showtech/showtech-N112-install-2023-Dec-07.062636.UTC.tgz
++ Show tech end time: 2023-Dec-07.063258.UTC ++

Using Onboard Failure Logging

Onboard Failure Logging (OBFL) collects and stores boot, environmental, and critical hardware data in the nonvolatile flash memory of the CPU controller card. This information is used for troubleshooting, testing, and diagnosis if a failure or other error occurs. This data provides improved accuracy in hardware troubleshooting and root cause isolation analysis. The data collected includes field-replaceable unit (FRU) serial number, OS version, total run time, boot status, temperature and voltage at boot, temperature and voltage history, and other board specific errors.

Procedure


show logging onboard { fmea | inventory | temperature | uptime | voltage}

Displays OBFL data.

Example:

The following example shows the uptime information.

sysadmin-vm:0_RP0# show logging onboard uptime

OBFL Uptime Information For : 0/RP0
       * indicates incomplete time-sync while record was written 
       ! indicates time reset backwards while system was running 
  -----------------------------------------------------------------------------------
       UPTIME CARD INFORMATION     
  -----------------------------------------------------------------------------------
       Entity Name                :  Value
  -----------------------------------------------------------------------------------
      Previous Chassis SN         : CAT2311B0C5 
      Current Chassis SN          : CAT2311B0CM 
      Previous R/S/I              : 0/0/0 
      Current R/S/I               : 0/0/0 
      Write Interval              : 15 (min) 
      First Power On TS           : 07/30/2019 07:33:56  
      Last Erase TS               : --/--/---- --:--:-- 
      Rack Change Count           : 8
      Slot Change Count           : 8
   
  -----------------------------------------------------------------------------------
       UPTIME INFORMATION     
  -----------------------------------------------------------------------------------
   Start Time (UTC)    | End Time (UTC)      | Card Uptime info    
   mm/dd/yyyy hh:mm:ss | mm/dd/yyyy hh:mm:ss | Weeks.Days.Hrs.Min.Sec 
  -----------------------------------------------------------------------------------
   10/28/2021 12:23:17 | 11/14/2021 21:09:18 | 2.3.8.46.1 
   11/14/2021 21:09:18 | 11/18/2021 16:31:15 | 0.3.19.21.57 
   11/18/2021 16:31:15 | 11/18/2021 21:10:35 | 0.0.4.39.20 
   11/18/2021 21:10:35 | 11/19/2021 12:40:39 | 0.0.15.30.4 
   11/19/2021 12:40:39 | 11/19/2021 14:16:10 | 0.0.1.35.31 
   11/19/2021 14:16:10 | 11/22/2021 11:49:20 | 0.2.21.33.10 
   11/22/2021 11:49:20 | 11/22/2021 22:51:48 | 0.0.11.2.28 
   11/22/2021 22:51:48 | 11/23/2021 17:17:41 | 0.0.18.25.53 
   11/24/2021 21:22:12 | 11/24/2021 23:11:16 | 0.0.1.49.4 
   11/24/2021 23:11:16 | 11/24/2021 23:39:49 | 0.0.0.28.33 
   11/24/2021 23:39:49 | 11/25/2021 15:25:32 | 0.0.15.45.43 
   11/25/2021 15:25:32 | 11/25/2021 16:10:05 | 0.0.0.44.33 
   11/25/2021 16:10:05 | 11/25/2021 16:25:08 | 0.0.0.15.3 
   11/25/2021 16:25:08 | 11/25/2021 16:37:18 | 0.0.0.12.10 
   11/25/2021 16:37:18 | 11/26/2021 15:08:27 | 0.0.22.31.9 


OBFL Uptime Information For : 0/SC0
       * indicates incomplete time-sync while record was written 
       ! indicates time reset backwards while system was running 
  -----------------------------------------------------------------------------------
       UPTIME CARD INFORMATION     
  -----------------------------------------------------------------------------------
       Entity Name                :  Value
  -----------------------------------------------------------------------------------
      Previous Chassis SN         : ------------ 
      Current Chassis SN          : CAT2311B0CM 
      Previous R/S/I              : -/-/- 
      Current R/S/I               : 0/1/0 
      Write Interval              : 15 (min) 
      First Power On TS           : 06/07/2019 08:52:42  
      Last Erase TS               : --/--/---- --:--:-- 
      Rack Change Count           : 0
      Slot Change Count           : 0
   
  -----------------------------------------------------------------------------------
       UPTIME INFORMATION     
  -----------------------------------------------------------------------------------
   Start Time (UTC)    | End Time (UTC)      | Card Uptime info    
   mm/dd/yyyy hh:mm:ss | mm/dd/yyyy hh:mm:ss | Weeks.Days.Hrs.Min.Sec 
  -----------------------------------------------------------------------------------
   10/24/2021 05:48:29 | 10/24/2021 06:27:51 | 0.0.0.39.22 
   10/24/2021 06:27:51 | 10/24/2021 07:05:24 | 0.0.0.37.33 
   10/24/2021 07:05:24 | 10/26/2021 23:43:32 | 0.2.16.38.8 
   10/26/2021 23:43:32 | 10/26/2021 23:55:49 | 0.0.0.12.17 
   10/26/2021 23:55:49 | 10/27/2021 00:09:49 | 0.0.0.14.0 
   10/27/2021 00:09:49 | 10/27/2021 00:16:08 | 0.0.0.6.19 
   10/27/2021 00:16:08 | 10/27/2021 23:37:51 | 0.0.23.21.43 
   10/27/2021 23:37:51 | 10/27/2021 23:50:33 | 0.0.0.12.42
   11/24/2021 21:22:12 | 11/24/2021 23:11:16 | 0.0.1.49.4 
   11/24/2021 23:11:16 | 11/24/2021 23:39:49 | 0.0.0.28.33 
   11/24/2021 23:39:49 | 11/25/2021 15:25:32 | 0.0.15.45.43 
   11/25/2021 15:25:32 | 11/25/2021 16:10:05 | 0.0.0.44.33 
   11/25/2021 16:10:05 | 11/25/2021 16:25:08 | 0.0.0.15.3 
   11/25/2021 16:25:08 | 11/25/2021 16:37:18 | 0.0.0.12.10 
   11/25/2021 16:37:18 | 11/26/2021 15:09:27 | 0.0.22.32.9 

Clear the CARD FAILED State

In Cisco NCS 1014, the "CARD FAILED" state indicates that a line card within the network system is no longer operational. This state typically suggests a hardware failure, software issue, or some other critical fault that prevents the card from doing its intended functions. Critical faults include warm reload executed on a line card in shutdown state.

Use this task to clear the CARD FAILED state of a line card located in rack 0 and slot 0.

Procedure


Step 1

Check the contents of the logging buffers.

The highlighted log suggests that a warm reload was executed on a line card in shutdown state. Only cold reload (reload location 0/*) can recover the LC from the shutdown state.

Example:

RP/0/RP0/CPU0:ios# show logging
Fri Nov 26 15:03:48.886 UTC
Syslog logging: enabled (0 messages dropped, 0 flushes, 0 overruns)
    Console logging: Disabled
    Monitor logging: level debugging, 0 messages logged
    Trap logging: level informational, 0 messages logged
    Buffer logging: level debugging, 1025 messages logged

Log Buffer (2097152 bytes):

RP/0/RP0/CPU0:Nov 25 16:40:28.533 UTC: syslogd[155]: %SECURITY-XR_SSL-6-INFO : XR SSL info: Setting fips register
RP/0/RP0/CPU0:Nov 25 16:40:36.323 UTC: cfgmgr-rp[120]: %MGBL-CONFIG-7-INTERNAL : Configuration Manager was unable to find subtree for 'sh_p_service_role_daemon' partition.  : cfgmgr-rp : (PID=2522) :  -Traceback= 7f1be3f92420 7f1be4bdd0c6 7f1be4bdd208 7f1be4bd74a4 7f1be4bd7e45 7f1be4bdb972 7f1be4bd7f0e 55e025a46170 55e025a42429 55e025a3168f
RP/0/RP0/CPU0:Nov 25 16:40:36.457 UTC: aib[291]: Registering with IM
RP/0/RP0/CPU0:Nov 25 16:40:36.661 UTC: cma_partner[350]: Packet received on undiscovered module 160
RP/0/RP0/CPU0:Nov 25 16:40:37.113 UTC: ifmgr[142]: platform_pfi_ifh_get_if_alloc_info: Setting pic
RP/0/RP0/CPU0:Nov 25 16:30:38.122 UTC: shelfmgr[227]: %PLATFORM-SHELFMGR-3-OP_FAIL : Failed to reload 0/0/NXR0: 'CPA_INTF' detected the 'fatal' condition 'Operation not supported'

............
............

Step 2

Carry out a cold restart of the affected device in rack 0 and slot 0.

Example:

RP/0/RP0/CPU0:ios# reload location 0/0
Fri Nov 26 15:03:48.886 UTC#
Proceed with reload? [confirm] 

Step 3

Type y to continue.

Warning

 

The reload operation impacts the running traffic.

Wait for the device to restart and continue with the next step.

Step 4

Verify the state of the line card.

Now, you can see the line card in OPERATIONAL state.

Example:

RP/0/RP0/CPU0:ios#show platform
Node              Type                     State                    Config state
--------------------------------------------------------------------------------
0/RP0/CPU0        NCS1K14-CNTLR-K9(Active) IOS XR RUN               NSHUT,NMON
0/PM0             NCS1K4-AC-PSU            OPERATIONAL              NSHUT,NMON
0/PM1             NCS1K4-AC-PSU            OPERATIONAL              NSHUT,NMON
0/FT0             NCS1K14-FAN              OPERATIONAL              NSHUT,NMON
0/FT1             NCS1K14-FAN              OPERATIONAL              NSHUT,NMON
0/FT2             NCS1K14-FAN              OPERATIONAL              NSHUT,NMON
0/0/NXR0          NCS1K4-1.2T-K9           OPERATIONAL              NSHUT,NMON
0/1/NXR0          NCS1K14-2.4T-K9          OPERATIONAL              NSHUT,NMON
0/2/NXR0          NCS1K4-1.2T-K9           OPERATIONAL              NSHUT,NMON
0/3/NXR0          NCS1K4-1.2T-K9           OPERATIONAL              NSHUT,NMON
If the CARD FAILED state persists after a cold restart, contact your Cisco account representative or log into the Technical Support Website at http://www.cisco.com/c/en/us/support/index.html for more information or call Cisco TAC (1 800 553-2447).