The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes how to troubleshoot a hardware module in a Nexus 7k when it is non-responsive or intermittent.
Step 1. Perform a snmpwalk on the various SNMP V3 userids and/or SNMP V2 community strings (that is, walk the hostname mib).
Do this in a continuous loop.
Step 2. ssh to the VDC in question that has intermittent non-responding snmpwalks for the hostname from Step 1.
With both Step 1. and Step 2. impacted at the same time on the say 60-second cycle, this appears to be a hardware failure inside of the N7K control plane as the N7K runs a hardware diagnostic health check consistently. When you see 30 seconds of responsiveness and then 30 seconds of non-responsiveness and then the cycle repeats, this is a clear indication of the hardware diagnostic health check scanning all hardware. The 30 seconds of responsiveness is the scanning of the good hardware and then the 30 seconds of non-responsiveness is the failed hardware.
Step 3. If Step 2. clearly depicts a hardware failure, do the next steps:
Note: EOBC is the internal control plane process the N7K uses to communicate between the SUP/Fabric Modules/Line Cards. If this EOBC process is impacted in any way, the associated module depicted in the admin VDC-1 logfile is most likely be the culprit of the intermittent responsiveness witnessed in the previous tests as the SUP has lost 100% consistent communications with the associated module depicted in the admin VDC-1 logfile and is trying to recover/communicate with it causing the intermittent responsiveness with other control plane processes.
Example:
lab-sw01-admin-vdc-1# show logging logfile | inc EOBC
2022 Feb 22 19:46:15 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure on standby sup in device DEV_EOBC_MAC (device error 0xc0a0504f)
2022 Feb 22 19:46:15 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
2022 Feb 22 19:46:16 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure on standby sup in device DEV_EOBC_MAC (device error 0xc0a0504f)
2022 Feb 22 19:46:16 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
2022 Feb 22 19:46:21 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
2022 Feb 22 19:46:21 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure on standby sup in device DEV_EOBC_MAC (device error 0xc0a0504f)
2022 Feb 22 19:46:22 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
2022 Feb 22 19:46:23 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure on standby sup in device DEV_EOBC_MAC (device error 0xc0a0504f)
2022 Feb 22 19:46:23 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
2022 Feb 22 19:46:24 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure on standby sup in device DEV_EOBC_MAC (device error 0xc0a0504f)
2022 Feb 22 19:46:24 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
2022 Feb 22 19:46:26 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure on standby sup in device DEV_EOBC_MAC (device error 0xc0a0504f)
2022 Feb 22 19:46:26 lab-sw01-admin-vdc-1 %MODULE-4-MOD_WARNING: Module 8 (Serial number: JAA00000000) reported warning 8/1-8/0 due to EOBC heartbeat failure in device DEV_EOBC_MAC (device error 0xc0a0514d)
This log output clearly shows Module 8 has EOBC heartbeat failure with the standby SUP and is in an unhealthy state and requires immediate action.
Step 1. Perform a show module and capture the data for reference:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 N77-SUP2E active *
6 0 Supervisor Module-2 N77-SUP2E ha-standby
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
5 8.4(4) 1.3
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Note: All Modules are online (ie ok) and Module 5 is the Active (ie active *) SUP with Module 6 as the High Availability Standby (ie ha-standby) SUP. Even though there are EOBC WARNINGS about Module 8 in the admin VDC Logfile, this output depicts Module 8 as OK.
Step 2. Perform either a reload of the switch or perform a supervisor switchover (that is, both within the admin VDC) :
lab-sw01-admin-vdc-1# reload
- system (ie supervisor) switchover - NOTE: preferred method as this is a non-impacting procedure to the box with regards to active data flows
lab-sw01-admin-vdc-1# system switchover
Note: In either case, prior to performing a reload or system switchover, ensure you are on both supervisor consoles so that you can witness firsthand all of the supervisor output.
Step 3. In the case where Module 8 is the suspected culprit, you are likely to see on the console Module 8 error out upon the system (that is supervisor) switchover:
lab-sw01-admin-vdc-1(standby) login: 2022 Feb 23 02:09:45 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %KERN-2-SYSTEM_MSG: [12392164.927835] Switchover started by redundancy driver - kernel
2022 Feb 23 02:09:45 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).
2022 Feb 23 02:09:45 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 6 is becoming active.
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth0/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.
2022 Feb 23 02:09:47 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %PLATFORM-1-PFM_ALERT: Disabling ejector based shutdown on sup in slot 6
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth1/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth2/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth3/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth4/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth5/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth6/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth7/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth8/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth9/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth10/8 return status No card found in slot
2022 Feb 23 02:09:46 lab-sw01-vdc-2 %$ VDC-2 %$ %ELTM-2-ELTM_INTF_TO_LTL: Failed to get LTL for interface lc-eth11/8 return status No card found in slot
Step 4. Perform multiple show modules and watch to see if/when Module 8 comes back online:
Module 5 dropped out and is powered-up:
Module 8 dropped out and is powered-up:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module powered-up
Mod Power-Status Reason
--- ------------ ---------------------------
8 powered-up Unknown. Issue show system reset mod ...
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
lab-sw01-admin-vdc-1# 2022 Feb 23 02:11:11 lab-sw01-vdc-2 %$ VDC-2 %$ %PLATFORM-2-MOD_DETECT: Module 8 detected (Serial number JAA00000000) Module-Type 10/40 Gbps Ethernet Module Model N77-F324FQ-25
2022 Feb 23 02:11:11 lab-sw01-vdc-2 %$ VDC-2 %$ %PLATFORM-2-MOD_PWRUP: Module 8 powered up (Serial number JAA00000000)
2022 Feb 23 02:11:11 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %PLATFORM-2-MOD_DETECT: Module 8 detected (Serial number JAA00000000) Module-Type 10/40 Gbps Ethernet Module Model N77-F324FQ-25
2022 Feb 23 02:11:11 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %PLATFORM-2-MOD_PWRUP: Module 8 powered up (Serial number JAA00000000)
Module 8 is pwr-cycled:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module pwr-cycld
Mod Power-Status Reason
--- ------------ ---------------------------
8 pwr-cycld Unknown. Issue show system reset mod ...
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 powered-up
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Module 8 is checked by epld auto-upgrade and is good to go:
lab-sw01-admin-vdc-1# 2022 Feb 23 02:13:06 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: <<%EPLD_AUTO-2-AUTO_UPGRADE_CHECK>> Automatic EPLD upgrade check for module 8: EPLD versions are up to date. - epld_auto
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 powered-up
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Module 8 moves to testing by the hardware diagnostics:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 testing
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Module 8 moves to initializing after passing hardware diagnostics:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 initializing
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Module 8 comes online:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 powered-up
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Module 5 SUP going active:
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 N77-SUP2E inserted
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
5 8.4(4) 1.3
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
Module 5 SUP becomes ha-standby:
2022 Feb 23 02:16:38 lab-sw01-admin-vdc-1 %$ VDC-1 %$ %PLATFORM-1-PFM_ALERT: Enabling ejector based shutdown on sup in slot 6
lab-sw01-admin-vdc-1# show module
Mod Ports Module-Type Model Status
--- ----- ----------------------------------- ------------------ ----------
1 12 100 Gbps Ethernet Module N77-F312CK-26 ok
2 12 100 Gbps Ethernet Module N77-F312CK-26 ok
3 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
4 48 1/10 Gbps Ethernet Module N77-F348XP-23 ok
5 0 Supervisor Module-2 N77-SUP2E ha-standby
6 0 Supervisor Module-2 N77-SUP2E active *
7 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
8 24 10/40 Gbps Ethernet Module N77-F324FQ-25 ok
Mod Sw Hw
--- --------------- ------
1 8.4(4) 1.5
2 8.4(4) 1.5
3 8.4(4) 1.9
4 8.4(4) 1.9
5 8.4(4) 1.3
6 8.4(4) 1.3
7 8.4(4) 1.2
8 8.4(4) 1.2
2022 Feb 23 02:15:44 lab-sw01-admin-vdc-1 %MODULE-5-MOD_OK: Module 8 is online (Serial number: JAA00000000)
2022 Feb 23 02:15:43 lab-sw01-admin-vdc-1 %SYSMGR-SLOT8-5-MODULE_ONLINE: System Manager has received notification of local module becoming online.
2022 Feb 23 02:15:44 lab-sw01-admin-vdc-1 %PLATFORM-5-MOD_STATUS: Module 8 current-status is MOD_STATUS_ONLINE/OK
2022 Feb 23 02:16:38 lab-sw01-admin-vdc-1 %MODULE-5-STANDBY_SUP_OK: Supervisor 5 is standby
Note: All Modules are online (that is OK) and Module 6 is the Active (that is active *) SUP with Module 5 as the High Availability Standby (that is ha-standby) SUP.
Step 5. Once all modules are online, repeat Step 1. and validate all connectivity is normalized.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
24-Mar-2022 |
Initial Release |