THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Revision | Publish Date | Comments |
---|---|---|
10.1 |
13-Oct-22 |
Updated the Product Hierarchy Metatags to Optimize Search Results |
10.0 |
13-Oct-17 |
Migration to new field notice system |
1.0 |
17-May-16 |
Initial Release |
Affected Product ID | Comments |
---|---|
UCS-VIC-M82-8P= |
|
UCSB-MLOM-40G-01= |
|
UCSB-MLOM-40G-03= |
|
UCSC-PCIE-C10T-02= |
|
UCS-VIC-M82-8P |
|
UCSB-MLOM-40G-01 |
|
UCSB-MLOM-40G-03 |
|
UCSC-PCIE-C10T-02 |
Defect ID | Headline |
---|---|
CSCuh61202 | VIC : FC traffic drops after IOM reset/reseat/cable pull, or upgrade |
CSCus64439 | "PFC feature operational" not updated causes storage latency and aborts. |
When the Input/Output Module (IOM) is upgraded/reset/removed/reinserted or the cable between the IOM and the Fabric Interconnect (FI) is pulled/replaced, the Fibre Channel (FC) storage traffic that goes from the Cisco Virtual Interface Card (VIC) 1225/1227/1240/1280/1340/1380 through the IOM can experience problems (drops) after the connection has been re-established.
There are no visual indicators in Unified Computing System Manager (UCSM) that the defects mentioned in this document have been encountered. Though the virtual network interface cards/virtual host bus adapters show as green in UCSM, the underlying storage paths might still be down when affected.
If the egress FC port is congested, the switch sends Priority-based Flow Control (PFC) frames to the servers in order to reduce the Fibre Channel over Ethernet (FCoE) rate and avoid packet drops with defined Class of Service (CoS) priority. Data Center Bridging (DCB) PFC extends the standard PAUSE frame to include IEEE 802.1p CoS values. With PFC, instead of halting all traffic on the link when a PAUSE frame is sent, traffic is paused only for those CoS values that are enabled in the PFC frame. A PFC frame is sent for the enabled priority for which traffic needs to be paused.
This problem has the potential to manifest under several different conditions:
In Cisco bug ID CSCuh61202 the switch/IOM sends Pause Registers (Priority Groups (PGS) and PFC updates) to the adapter, specifically a PGS update followed by a PFC update. Because of an issue in the VIC firmware, when the PFC update happens the pause configuration does not get programmed correctly. This results in the adapter not honoring the pause for the FCoE CoS. This leads to dropped FCoE frames on the IOM.
Cisco bug ID CSCus64439 is triggered by a sequence of events on the VIC adapter in which PFC might not be correctly reconfigured after it receives a PFC Inoperative frame followed by PFC Operational frame. This results in the adapter not honoring the pause for the FCoE CoS. This leads to dropped FCoE frames on the IOM.
As this behavior is a race condition, it can occur on some servers, but not others. Typically it might be observed in an entire chassis. However when there are multiple chassis in the system, the chassis with the correct Pause configuration can still be affected due to the amount of traffic that is flooded onto the network.
Steps to Identify Cisco bug ID CSCuh61202
"feature operational"
. Here is a non-affected example from the adapter syslog:
140416-07:30:41.096482 mcp.uif_dcbx Port 1-0: FCOE feature operational 140416-07:30:41.096588 mcp.uif_dcbx Port 1-0: MTU feature operational 140416-07:30:41.096774 mcp.uif_dcbx Port 1-0: NIV feature operational 140416-07:30:41.096910 mcp.uif_dcbx Port 1-0: PFC feature operational <-- 140416-07:30:41.097054 mcp.uif_dcbx Port 1-0: PGS feature operational <--
Here is an affected example from the adapter syslog:
140416-04:46:00.769934 mcp.uif_dcbx Port 1-0: NIV feature operational 140416-04:46:06.645865 mcp.uif_dcbx Port 1-0: PGS feature operational <-- 140416-04:46:06.848516 mcp.uif_dcbx Port 1-0: MTU feature operational 140416-04:46:08.782584 mcp.uif_dcbx Port 1-0: FCOE feature operational 140416-04:46:10.788342 mcp.uif_dcbx Port 1-0: PFC feature operational <--
"PFC feature operational"
.
Notes:
Example abort message from adapter log:
140416-08:13:12.809365 ecom.ecom_main ecom(8:1): abort called for exch 4d56, status 3 rx_id b38 s_stat 0x1 xmit_recvd 0x200 burst_offset 0x200 sgl_err 0x0 last_param 0x0 last_seq_cnt 0x0 tot_bytes_exp 0x200 h_seq_cnt 0x0 exch_type 0x1 s_id 0x3d0200 d_id 0x3d03a0 host_tag 0x78
From the IOM show-tech-support-iom-nxos.out, search for "frames rcvd without credit for pausable class"
:
: |3 |000001ad | wo_cr[3] | frames rcvd without credit for pausable classes. Pause is missing. < -- non-zero incrementing value
Steps to Identify Cisco bug ID CSCus64439
150209-16:12:35.817073 mcp.uif_dcbx Port 0-0: PFC feature operational 150209-17:31:11.454181 mcp.uif_dcbx Port 0-0: PFC feature inoperative <-- 150209-17:31:20.512303 mcp.uif_dcbx Port 0-0: PGS feature operational 150209-17:31:24.482730 mcp.uif_dcbx Port 0-0: PFC feature operational
"PFC feature operational"
.
Notes:
Example abort message from adapter log:
140416-08:13:12.809365 ecom.ecom_main ecom(8:1): abort called for exch 4d56, status 3 rx_id b38 s_stat 0x1 xmit_recvd 0x200 burst_offset 0x200 sgl_err 0x0 last_param 0x0 last_seq_cnt 0x0 tot_bytes_exp 0x200 h_seq_cnt 0x0 exch_type 0x1 s_id 0x3d0200 d_id 0x3d03a0 host_tag 0x78
From the IOM show-tech-support-iom-nxos.out, search for "frames rcvd without credit for pausable class"
:
: |3 |000001ad | wo_cr[3] | frames rcvd without credit for pausable classes. Pause is missing. <-- non-zero incrementing value
Adapter Log Instructions
Gather the Log Information from the CLI
If you want to look for the adapter logs that are associated with chassis 2, server 3, adapter 1, then here is the syntax from the CLI of a live system:
ucs-B# connect adapter 2/3/1 adapter 2/3/1 # connect No entry for terminal type "dumb"; using dumb terminal settings. adapter 2/3/1 (top):1# show-log
Note: It is not possible to grep or | include in this shell.
Gather the Log Information from the UCSM
Collect a Chassis tech-support for the suspected chassis:
Untar the Cisco Integrated Management Controller (CIMC) tech-support file and navigate to the MEZZ tech-support
folder of the affected adapter. In this example, the adapter logs for server 3 are shown. Untar the MEZZ31_TechSupport.tar.gz
file:
Expand the MEZZ tech-support
folder and untar the obfl.tar.gz
file:
Navigate to the >obfl
folder and open the syslog/syslog.1
files with a text editor.
Dependent upon the situation, customers can choose a temporary/immediate workaround or the preferred workaround.
Immediate Workaround: Reboot the blade that floods traffic to the IOM. This results in the pause configuration being reprogrammed.
Preferred Workaround: This method is to avoid this issue long-term. If an upgrade is to be performed; plan the upgrade such that the adapters (B/C-bundle) firmware gets upgraded prior to the infrastructure components (A-bundle). It is recommended to upgrade all components in the system from the blade to the FI in the same maintenance window so the system does not sit in disparate versions.
If you plan to upgrade from any 2.2x to 3.1x releases, upgrade B/C-bundle to 2.2(6g) first, then upgrade to the 2.2(6g) A-bundle. Afterwards you can upgrade to 3.1x in the normal order.
If you plan to upgrade from any 2.1x to 2.2x/3.1x releases, upgrade B/C-bundle to 2.1(3k) first, then upgrade to the 2.1(3k) A-bundle. Afterwards you can upgrade to 2.2.x/3.1x in the normal order.
Release 3.1(1e) or 2.2(6g) and its subsequent releases that contain fixes for all defects mentioned in this notice can be found at Cisco Software Download Release 2.2 and later.
Refer to Upgrading Cisco UCS from Release 2.1 to Release 2.2.
How to Avoid Further Problems
In order to avoid this issue, it is required to upgrade the Cisco 1225/1227/1240/1280/1340/1380 Adapter firmware before you update the UCS infrastructure components (UCSM/IOM/FI). This procedure goes against the grain of Cisco's UCS upgrade guide, but in order to avoid this issue it is recommended.
If you require further assistance, or if you have any further questions regarding this field notice, please contact the Cisco Systems Technical Assistance Center (TAC) by one of the following methods:
Cisco Notification Service—Set up a profile to receive email updates about reliability, safety, network security, and end-of-sale issues for the Cisco products you specify.
Unleash the Power of TAC's Virtual Assistance