THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
1.0 |
20-May-21 |
Initial Release |
2.0 |
21-May-21 |
Updated the Workaround/Solution and How to Identify Affected Products Sections |
2.1 |
07-Jun-21 |
Updated the Defect Information and Workaround/Solution Sections |
Affected Product ID | Comments |
---|---|
N540-24Z8Q2C-M= |
Part Alternate |
N540-24Z8Q2C-M |
|
N540-24Z8Q2C-SYS |
|
N540-ACC-SYS |
|
N540-ACC-SYS= |
Part Alternate |
N540-ACC-M |
|
N540-ACC-M= |
Part Alternate |
N540-24Z8Q2C-SYS= |
Part Alternate |
Defect ID | Headline |
---|---|
CSCvx16766 | Support firmware upgrade functionality for onboard SSD |
Due to a flaw in solid-state drive (SSD) firmware, the SSD will no longer respond after approximately 3.2 years of accumulated operation. After the first unresponsive event occurs, every subsequent power-cycle allows the SSD to operate for another 1008 hours (approximately six weeks) before the SSD will no longer respond again.
After 28,224 hours (approximately 3.2 years) of accumulated Power On Hours (POH), a memory buffer overrun condition occurs which triggers the firmware event. This event causes the drive to become unresponsive until the drive is power-cycled. No data loss will occur when the memory buffer overrun firmware event occurs. A power-cycle restores normal operation of the drive. The drive continues to operate normally for approximately six weeks (1008 additional accumulated POH), at which time the drive becomes unresponsive again. Power-cycling the drive again will re-initiate the 1008-hour window.
After 3.2 years of operation, the router behaviour is unpredictable as the SSD locks up.
Below is one of the instances of an SSD in lock state -
RP/0/RP0/CPU0:ios#show logging | inc Read-only
Here is the Sample output:
RP/0/RP0/CPU0:ios#show logging | inc Read-only
Day MMM X HH:MM:SS.048 UTC
start_backing_thread:bind: Read-only file system
start_backing_thread:bind: Read-only file system
ctrace_enable_configuration(): inotify_add_watch() failed fd 11
ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_info/ctrace.ctrl.
No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 12
ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_error/ctrace.ctrl.
No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 13 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_conn/ctrace.ctrl.
No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 14 ctrl_file_name /v
ctrace_enable_configuration(): inotify_add_watch() failed fd 12 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_error/ctrace.ctrl.
No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 13 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_conn/ctrace.ctrl.
No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
ctrace_enable_configuration(): inotify_add_watch() failed fd 14 ctrl_file_name /var/log/ctrace/show_logging/xr_ds_capi_msc/ctrace.ctrl.
No such file or directory (2)
ctrace_enable_configuration2 failed with error 0x5
start_backing_thread:bind: Read-only file system
RP/0/RP0/CPU0:Day MMM X HH:MM:SS.418 UTC: syslog_dev[117]:
syslog_infra_hm[141] PID-23708: ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_msc/ctrace_0.trc (Read-only file system)))ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_conn/ctrace_0.trc (Read-only file system))ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_error/ctrace_0.trc (Read-only file system)
RP/0/RP0/CPU0:Day MMM X HH:MM:SS.418 UTC: syslog_dev[117]: syslog_infra_hm[141] PID-23708: ctrace abort handler: unable to open trace file /var/log/ctrace/_pkg_bin_logger/xr_ds_capi_info/ctrace_0.trc (Read-only
file system))
Workaround
Power-cycle the system in order to recover from the problem temporarily. However, this failure will reappear after 1008 hours of operation.
Solution
In order to prevent the occurrence of this issue and disruption to the network and operations, Cisco recommends that you proactively upgrade the SSD firmware before the accumulated uptime reaches 28,224 hours.
Refer to the "How to Identify Affected Products" section to determine if your system is affected. If the system is affected, SMU installation followed by the SSD firmware upgrade will permanently resolve this defect.
Note: A product return and replacement (RMA) is not recommended because the firmware upgrade process will resolve the issue.
Follow these steps to perform the SSD firmware upgrade.
Step 1: Install the required SMU:
IOS XR Release | SMU Link | SMU Install Type |
IOS XR Release 6.5.3 | ncs540-sysadmin-6.5.3.CSCvx16766.tar | Process Restart* |
IOS XR Release 6.6.25 | ncs540-sysadmin-6.6.25.CSCvx16766.tar | Reload** |
IOS XR Release 6.6.3 | ncs540-sysadmin-6.6.3.CSCvx16766.tar | Process Restart* |
IOS XR Release 7.0.1 | ncs540-sysadmin-7.0.1.CSCvx16766.tar | Process Restart* |
IOS XR Release 7.0.2 | ncs540-sysadmin-7.0.2.CSCvx16766.tar | Process Restart* |
IOS XR Release 7.1.1 | ncs540-sysadmin-7.1.1.CSCvx16766.tar | Process Restart* |
IOS XR Release 7.1.2 | ncs540-sysadmin-7.1.2.CSCvx16766.tar | Reload** |
IOS XR Release 7.2.1 | ncs540-sysadmin-7.2.1.CSCvx16766.tar | Reload** |
IOS XR Release 7.2.2 | ncs540-sysadmin-7.2.2.CSCvx16766.tar | Process Restart* |
IOS XR Release 7.3.1 | ncs540-sysadmin-7.3.1.CSCvx16766.tar | Process Restart* |
* If the SMU reload type is Process Restart, the Process sata_fpd restarts on the router upon activation of the SMU.
** If the SMU reload type is Reload, the router reloads upon activation of the SMU.
Step 2: Upgrade the firmware:
Run this command to check the device, status, and the firmware version:
[admin] show hw-module location 0/RP0 fpd
Sample output:
Location Card type HWver FPD device ATR Status Running Programd ---------------------------------------------------------------------- 0/RP0 N540-ACC-SYS 1.0 MB-MIFPGA CURRENT 0.05 0.05 0/RP0 N540-ACC-SYS 1.0 Bootloader NEED UPGD 1.05 1.05 0/RP0 N540-ACC-SYS 1.0 CPU-IOFPGA NEED UPGD 1.15 1.15 0/RP0 N540-ACC-SYS 1.0 MB-IOFPGA NEED UPGD 0.20 0.20 0/RP0 N540-ACC-SYS 1.0 SATA-M500IT-MC NEED UPGD 2.00 2.00
Note: For NCS540, the FPD device name can be SATA-M500IT-MC or SATA-M500IT-MU-B.
Use this command to upgrade the firmware:
[admin] upgrade hw-module location 0/RP0 fpd <fpd-name>
Sample output:
sysadmin-vm:0_RP0# upgrade hw-module location 0/RP0 fpd SATA-M500IT-MC
Wed May 12 22:41:07.183 UTC
sysadmin-vm:0_RP0# 0/RP0/ADMIN0:May 12 22:41:19.186 UTC: fpdserv[3797]: %INFRA-FPD_Manager-1-UPGRADE_ALERT : Upgrade for the following FPDs has been committed:
0/RP0/ADMIN0:May 12 22:41:19.187 UTC: fpdserv[3797]: %INFRA-FPD_Manager-1-UPGRADE_ALERT : Location FPD name Force
0/RP0/ADMIN0:May 12 22:41:19.187 UTC: fpdserv[3797]: %INFRA-FPD_Manager-1-UPGRADE_ALERT : ==================================================
0/RP0/ADMIN0:May 12 22:41:19.189 UTC: fpdserv[3797]: %INFRA-FPD_Manager-1-UPGRADE_ALERT : 0/RP0 SATA-M500IT-MC FALSE
0/RP0/ADMIN0:May 12 22:41:26.732 UTC: sata_fpd[28230]: %INFRA-FPD_Driver-1-UPGRADE_ALERT : FPD SATA-M500IT-MC@0/RP0 image programming completed with UPGRADE DONE state Info: [SATA FPD upgrade Complete]
Use this command to confirm that the firmware has been upgraded to the latest version:
[admin] show hw-module location 0/RP0 fpd
Sample output:
Location Card type HWver FPD device ATR Status Running Programd ---------------------------------------------------------------------- 0/RP0 N540-ACC-SYS 1.0 MB-MIFPGA CURRENT 0.05 0.05 0/RP0 N540-ACC-SYS 1.0 Bootloader NEED UPGD 1.05 1.05 0/RP0 N540-ACC-SYS 1.0 CPU-IOFPGA NEED UPGD 1.15 1.15 0/RP0 N540-ACC-SYS 1.0 MB-IOFPGA NEED UPGD 0.20 0.20 0/RP0 N540-ACC-SYS 1.0 SATA-M500IT-MC CURRENT 3.00 3.00
Ensure that the firmware is updated to the fixed version, as shown in the table in the "How to Identify Affected Products" section.
For a more detailed FAQ on the SSD issue, refer to Solid State Drive Issue on Certain Products - Software Update Required to Avoid Failure and Customer Impact .
Cisco has identified the list of product Serial Numbers which are shipped with affected FW version. Refer to the Serial Number Validation section to determine if your product may potentially be affected. If the product is listed as “Affected” please check the FW version if it is already upgraded.
Use this command to check the firmware version:
admin show smart-monitor location all | inc "Location|Device Model|Firmware Version"
Expected Output:
RP/0/RP0/CPU0:ios#admin show smart-monitor location all | inc "Location|Device Model|Firmware Version"
Day MMM DD HH:MM:SS.672 UTC
Location : 0/RP0
Device Model: Micron_M500IT_MTFDDAT128MBD
Firmware Version: MU01.00
Compare the firmware version listed in the output to the list of impacted firmware versions in this table:
Device model | Impacted Firmware version | Fixed Firmware version |
Micron_M500IT_MTFDDAT128MBD | MU01.00 | MU04.00 or higher |
MC02.00 | MC03.00 or higher |
If you determine that the firmware is affected, upgrade the firmware version with the recommended SMU before the system reaches 28,224 POH.
Use this command to check the POH:
sysadmin-vm:0_RP0# show smart-monitor location 0/RP0 | inc Power_On_Hours
Sample Output:
sysadmin-vm:0_RP0# show smart-monitor location 0/RP0 | inc Power_On_Hours DDD MMM DD HH:MM:SS.538 UTC+00:00 9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 17849
In this example, the current POH is 17849 hours, so the SSD issue will occur after 10,375 more hours of operation (28224-17894).
This field notice provides the ability to determine if the serial number(s) of a device is impacted by this issue. In order to verify your serial number(s), enter it in the Serial Number Validation tool at https://snvui.cisco.com/snv/FN72107.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
My Notifications—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance