Troubleshoot Nexus 7000 Raid issues due to missing partition

Available Languages

Download Options

PDF (9.3 KB)
View with Adobe Reader on a variety of devices
ePub (67.8 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (68.8 KB)
View on Kindle device or Kindle app on multiple devices

Updated:September 28, 2018

Document ID:213746

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Nexus 7000 Supervisor 2/2E Compact Flash Failure Recovery

Introduction

This article is an extension to the document “Nexus 7000 Supervisor 2/2E Compact Flash Failure Recovery” that addresses all possible failure scenarios. A possibility where flash recovery tool fails to run, this document may come handy. It is recommended to have console access to the device to perform the changes. Also, it is strongly recommended to not make any changes under the Linux kernel, which is not mentioned in the document, as this may have an impact on the switch operations. Cisco TAC supervision is advisable.

Background

As explained in the other document, each N7K supervisor 2/2E is equipped with 2 eUSB flash devices in RAID1 configuration, one primary and one mirror. Together they provide non-volatile repositories for boot images, startup configuration and persistent application data. In a situation where the Raid fails for a supervisor in the chassis, we run the flash recovery tool, to fix the same. In almost all cases, we resort to reloading/failing over the supervisor, if the flash recovery tool fails to run. There is a possibility to fix this without a reload/failover in certain scenario.

Prerequisites

Requirements

Cisco recommends that you have knowledge ofCisco Nexus OS, storage or flash disk recovery methods and Linux level debugging.

Components Used

Nexus 7000 series switches

Symptom

Raid failure is observed on a supervisor and while trying to recover the flash for the affected supervisors, following error appears when running the flash recovery tool,

Switches would run into Raid failure state with error code - 0xe1

ERROR: Cannot perform recovery. /dev/sdb has incorrect partition info.
ERROR: Disk /dev/sdb needs to be manually inspected for errors.
INFO: No recovery was attempted on module 5. All flashes left intact.
INFO: A detailed copy of the this log was saved as volatile:flash_repair_log_mod5.tgz.

Solution

Load the debug plugin on the switch, to login to the linux shell,

Switch# load bootflash:n7000-s2-debug-sh.6.1.4a.gbin

Please be careful, while running the commands here.

Once we get the linux prompt, look for the affected partition as per the error message. In our case it is /dev/sdb. It could be some other partitions too.

Linux(debug)# ls -l /dev/sd?
brw-r----- 1 root root 8, 0 Aug 28 2015 sda
brw-rw-r-- 1 root disk 8, 32 Dec 18 2013 sdc
brw-rw-r-- 1 root disk 8, 48 Dec 18 2013 sdd
brw-rw-r-- 1 root disk 8, 64 Dec 18 2013 sde
brw-rw-r-- 1 root disk 8, 80 Dec 18 2013 sdf
brw-rw-r-- 1 root disk 8, 96 Dec 18 2013 sdg
brw-rw-r-- 1 root disk 8, 112 Dec 18 2013 sdh
brw-rw-r-- 1 root disk 8, 128 Dec 18 2013 sdi
brw-rw-r-- 1 root disk 8, 144 Dec 18 2013 sdj
brw-rw-r-- 1 root disk 8, 160 Dec 18 2013 sdk
brw-rw-r-- 1 root disk 8, 176 Dec 18 2013 sdl
brw-rw-r-- 1 root disk 8, 192 Dec 18 2013 sdm

The partition is found to be missing, leading to error, while running the recovery tool. Create the missing partition manually, with same permission as other blocks.

Linux(debug)# mknod -m 664 /dev/sdb b 8 16

Now, we can see the sdb partition under /dev,

Linux(debug)# ls -l /dev/sd?
brw-r----- 1 root root 8, 0 Aug 28 2015 sda
brw-rw-r-- 1 root root 8, 16 May 26 07:31 sdb
brw-rw-r-- 1 root disk 8, 32 Dec 18 2013 sdc
brw-rw-r-- 1 root disk 8, 48 Dec 18 2013 sdd
brw-rw-r-- 1 root disk 8, 64 Dec 18 2013 sde
brw-rw-r-- 1 root disk 8, 80 Dec 18 2013 sdf
brw-rw-r-- 1 root disk 8, 96 Dec 18 2013 sdg
brw-rw-r-- 1 root disk 8, 112 Dec 18 2013 sdh
brw-rw-r-- 1 root disk 8, 128 Dec 18 2013 sdi
brw-rw-r-- 1 root disk 8, 144 Dec 18 2013 sdj
brw-rw-r-- 1 root disk 8, 160 Dec 18 2013 sdk
brw-rw-r-- 1 root disk 8, 176 Dec 18 2013 sdl
brw-rw-r-- 1 root disk 8, 192 Dec 18 2013 sdm

Exit from the linux shell and run the flash recovery tool again.

This time without any error messages and the Raid failure on the primary flash was recovered (0xf0). Confirmed the same using the command,

"slot x show system internal raid | i i cmos|block | head line 5"

It should run fine without such errors and should be able to recover the affected Supervisor from the Raid failure state. In case, recovery tool continues to fail to run, it could be due to another reason, or an actual corruption with the partition, and we may have to resort to a reload/failover.

Related Information

Nexus 7000 Supervisor 2/2E Compact Flash Failure Recovery

Contributed by Cisco Engineers

Barun Singhania
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

Nexus 7000 Series Switches

Troubleshoot Nexus 7000 Raid issues due to missing partition

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Background

Prerequisites

Requirements

Components Used

Symptom

Solution

Related Information

Nexus 7000 Supervisor 2/2E Compact Flash Failure Recovery

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products