本產品的文件集力求使用無偏見用語。針對本文件集的目的,無偏見係定義為未根據年齡、身心障礙、性別、種族身分、民族身分、性別傾向、社會經濟地位及交織性表示歧視的用語。由於本產品軟體使用者介面中硬式編碼的語言、根據 RFP 文件使用的語言,或引用第三方產品的語言,因此本文件中可能會出現例外狀況。深入瞭解思科如何使用包容性用語。
思科已使用電腦和人工技術翻譯本文件,讓全世界的使用者能夠以自己的語言理解支援內容。請注意,即使是最佳機器翻譯,也不如專業譯者翻譯的內容準確。Cisco Systems, Inc. 對這些翻譯的準確度概不負責,並建議一律查看原始英文文件(提供連結)。
本檔案 d說明 如何排除Nexus 9000交換機上的第1層鏈路擺動問題。
思科建議您先熟悉Cisco Nexus作業系統(NX-OS)和基本Nexus架構,然後再繼續處理本文檔中介紹的資訊。
本文中的資訊係根據以下軟體和硬體版本:
本文中的資訊是根據特定實驗室環境內的裝置所建立。文中使用到的所有裝置皆從已清除(預設)的組態來啟動。如果您的網路運作中,請確保您瞭解任何指令可能造成的影響。
連結翻動是交換器(例如Nexus 9000)上的實體介面在開啟和關閉之間不斷交替處理的網路問題。這種破壞性行為會降低網路效能、破壞網路穩定並中斷通訊,從而造成極大的不便。鏈路擺動通常由物理層故障或協定同步問題引起。
協定觸發的鏈路擺動發生在協定同步存在問題時。這可能包括連結彙總控制通訊協定(LACP)、虛擬連線埠通道等通訊協定。此問題可能源於協定配置錯誤或資料包丟失,從而導致鏈路不穩定。定期監控和及時的軟體更新有助於防止此類鏈路翻動。
鏈路擺動也可能來自網路物理層第1層。這通常涉及電纜和介面等物理元件。電纜損壞、鬆動或老化以及介面故障都可能導致鏈路翻動。定期物理檢查和維護(包括電纜檢查和介面測試)有助於在導致鏈路擺動之前識別和糾正這些問題。
本文重點介紹第1層物理問題故障排除。
可以從日誌中輕鬆識別鏈路抖動。此範例顯示連線埠E1/5上的連結翻動事件,其中連線埠會關閉,然後稍後重新開啟。
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from Ethernet1/5 to none
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel100: Ethernet1/5 is down
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-SPEED: Interface Ethernet1/5, operational speed changed to 10 Gbps
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_DUPLEX: Interface Ethernet1/5, operational duplex mode changed to Full
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface Ethernet1/5, operational Receive Flow Control state changed to off
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface Ethernet1/5, operational Transmit Flow Control state changed to off
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-SPEED: Interface port-channel100, operational speed changed to 10 Gbps
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_DUPLEX: Interface port-channel100, operational duplex mode changed to Full
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_RX_FLOW_CONTROL: Interface port-channel100, operational Receive Flow Control state changed to off
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-IF_TX_FLOW_CONTROL: Interface port-channel100, operational Transmit Flow Control state changed to off
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_UP: port-channel100: Ethernet1/5 is up
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from none to Ethernet1/5
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 10000000 Kbit
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETHPORT-5-IF_UP: Interface Ethernet1/5 is up in mode access
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETHPORT-5-IF_UP: Interface port-channel100 is up in mode access
乙太網路連線埠管理員(Ethpm)是管理乙太網路介面的程式。Ethpm事件歷史記錄可用於識別鏈路擺動的原因。
E1/5在05:28:35發生鏈路故障,由ETH_PORT_FSM_EV_LINK_DOWN觸發ethpm轉換。這表示第1層翻動。
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
N9K-C93180YC-FX# show system internal ethpm event-history interface e1/5
[143] 2024-01-21T05:26:02.100255000+00:00 [-] FSM:<Ethernet1/5> Transition:
Previous state: [ETH_PORT_FSM_ST_WAIT_BUNDLE_MEMBER_BRINGUP]
Triggered event: [ETH_PORT_FSM_EV_FIRST_BRINGUP_BUNDLE_MEMBER_DONE]
Next state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
[144] 2024-01-21T05:27:35.783495000+00:00 [-] FSM:<Ethernet1/5> Transition:
Previous state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
Triggered event: [ETH_PORT_FSM_EV_LINK_DOWN]
Next state: [FSM_ST_NO_CHANGE]
E1/8在07:40:07進入初始化關閉狀態,由ETH_PORT_FSM_EV_EXTERNAL_REINIT_NO_FLAP_REQ觸發ethpm轉換。這表示連結彙總控制通訊協定(LACP)已觸發連結翻動。
2024 Jan 21 07:37:20 N9K-C93180YC-FX %ETHPORT-5-IF_UP: Interface port-channel200 is up in Layer3
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel200 is down (No operational members)
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel200: first operational port changed from Ethernet1/8 to none
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel200: Ethernet1/8 is down
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel200,bandwidth changed to 100000 Kbit
2024 Jan 21 07:40:07 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_INITIALIZING: Interface Ethernet1/8 is down (Initializing)
N9K-C93180YC-FX# show system internal ethpm event-history interface e1/8
[218] 2024-01-21T07:37:20.551880000+00:00 [-] FSM:<Ethernet1/8> Transition:
Previous state: [ETH_PORT_FSM_ST_WAIT_BUNDLE_MEMBER_BRINGUP]
Triggered event: [ETH_PORT_FSM_EV_FIRST_BRINGUP_BUNDLE_MEMBER_DONE]
Next state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
[219] 2024-01-21T07:40:07.104339000+00:00 [-] FSM:<Ethernet1/8> Transition:
Previous state: [ETH_PORT_FSM_ST_BUNDLE_MEMBER_UP]
Triggered event: [ETH_PORT_FSM_EV_EXTERNAL_REINIT_NO_FLAP_REQ]
Next state: [FSM_ST_NO_CHANGE]
思科提供廣泛的光纖模組陣列,可適應各種速度、介質和距離。將鏈路連線到Nexus 9000之前,請確保SFP和電纜與當前軟體和硬體相容。您可以透過以下方式驗證這點:
從NX-OS 10.2.1開始,所有Cloudscale ToR和EoR平台都支援平台洞察引擎(PIE)。PIE是交換機上的即時根本原因分析應用程式。
三個PIE可以幫助您解決第1層鏈路擺動問題。
連結翻動PIE分析使用者空間驅動程式(USD)發佈的連結翻動事件,並確定連結翻動的根本原因。PIE將根本原因分析見解發佈給經紀商。鏈路擺動事件由USD(PIE客戶端)在鏈路擺動時發佈。USD從ASIC和USD收集進行根本原因分析所需的全部相關資料,並將資料發佈給經紀商。連結翻動PIE會分析資料並到達翻動最可能的根本原因。
向下連結的PIE會查詢連結未啟動的根本原因。當介面配置為up但介面運行狀態不是up時,USD將收集有關介面的資料。此資料將發佈到PIE應用程式。鏈路關閉的PIE訂閱這些事件,從代理接收資料,並分析資料以查詢根本原因。
光纖PIE是一個連續監測引擎,它對定期收集的DOM資料執行時間序列分析。通過在一段時間內跟蹤DOM中的各種引數,PIE到達一個度量來描述每個光埠的光學狀態。該度量是對光收發器趨勢健康狀況的洞察。
有關詳細資訊,請參閱此PIE文檔:
Cisco Nexus 9000系列NX-OS平台見解引擎指南,版本10.2(x)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from Ethernet1/5 to none
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel100: Ethernet1/5 is down
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:27:35 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:27:58 N9K-C93180YC-FX %ETHPORT-5-SPEED: Interface Ethernet1/5, operational speed changed to 10 Gbps
<snip>
2024 Jan 21 05:28:02 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_UP: port-channel100: Ethernet1/5 is up
N9K-C93180YC-FX# show pie interface ethernet 1/5 link-flap-rca
2024-01-21 05:27:35 Event Id: 00000068 Ethernet1/5 Source Id: 436209664 RCA Code: 41 >>>PIE event time
Reason: Link flapped/down due to Local Fault, check peer >>>PIE link flap reason
N9K-C93180YC-FX# show pie interface ethernet 1/5 transceiver-insights
2024-01-21 05:30:12 Event Id: 00000080 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: --------GOOD------- Mod: 01
2024-01-21 05:28:12 Event Id: 00000072 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: --------GOOD------- Mod: 01
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel100: first operational port changed from Ethernet1/5 to none
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETH_PORT_CHANNEL-5-PORT_DOWN: port-channel100: Ethernet1/5 is down
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_BANDWIDTH_CHANGE: Interface port-channel100,bandwidth changed to 100000 Kbit
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_LINK_FAILURE: Interface Ethernet1/5 is down (Link failure)
2024 Jan 21 05:48:38 N9K-C93180YC-FX %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel100 is down (No operational members)
N9K-C93180YC-FX# show pie interface ethernet 1/5 link-down-rca
2024-01-21 05:48:48 Event Id: 00000197 Ethernet1/5 Source Id: 436209664 RCA Code: 16 >>>PIE event time
Reason: No PCS alignment detected. Please check Fec, speed, Autoneg configurations with peer >>>Physical layer failed
N9K-C93180YC-FX# show pie interface ethernet 1/5 transceiver-insights
2024-01-21 05:50:12 Event Id: 00000199 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: ********BAD******** Mod: 01
2024-01-21 05:48:12 Event Id: 00000187 Event Class: xcvr DOM DB Event Interface: Ethernet1/5 Health Metric: --------GOOD------- Mod: 01
根據PIE輸出,建議更換潛在的故障元件並繼續監控。如果鏈路擺動仍然存在,則需要交換測試以縮小故障部分的範圍。交換測試可以通過一次更改一個元件而保持其他所有元件不變。最終,在交換出特定故障元件後,鏈路會穩定下來。
對於10.2(1)之前的NX-OS軟體版本,不支援PIE。檢查第1層鏈路擺動需要幾個手動步驟。
這將列出連線的模組上的所有連結事件。反退回時間是指介面在通知管理引擎鏈路斷開之前等待的持續時間。在此期間,介面會等待檢視鏈路是否恢復。這用於確定鏈路是否已斷開或僅出現輕微翻動。
N9K-C93180YC-FX# attach module 1
module-1# show system internal port-client link-event
*************** Port Client Link Events Log ***************
---- ------ ----- ----- ------
Time PortNo Speed Event Stsinfo
---- ------ ----- ----- ------
Jan 21 05:48:38 2024 00122142 Ethernet1/5 ---- DOWN Link down debounce timer stopped and link is down
Jan 21 05:48:37 2024 00993003 Ethernet1/5 ---- DOWN Link down debounce timer started(0x40e50006)
Jan 21 05:45:14 2024 00432606 Ethernet1/5 10G UP SUCCESS(0x0)
這些事件提供關於每個連結事件的詳細資訊。
N9K-C93180YC-FX# attach module 1
module-1# show hardware internal tah link-events fp-port 5
324) Jan 21 05:48:37 2024 uSec 992843: Fp 5 : tahusd_isr.c #8469
Port Down with an ASIC interrupt
------------- ASIC MAC/PCS/Serdes REGS (Mac Channel 0) -------------
Link flapped due to Local Fault, check peer >>>Local Fault means the local device detected the issue on the receive path.
>>>Remote Fault means a Local Fault is detected across the link.
Intr Regs 00:0x0000, 01:0x0000, 02:0x0000, 03:0x0010, 07:0x0000, 11:0x0000, 15:0x0000
sts2.bercount : 0x0f00 sts2.erroredblocks : 0x0000
bercounthi : 0x0000 erroredblockhi : 0x0000
counters0.syncloss : 0x0001 counters0.blocklockloss: 0x0001
counters1.highber : 0x0000 counters1.vlderr : 0x0000
counters2.unkerr : 0x0012 counters2.invlderr : 0x0000
錯誤代碼 |
說明 |
|
sts2.erroredblocks |
對出錯的塊進行計數(高位數)。 |
|
sts2.bercount |
對錯誤的同步報頭計數(低位位)。 |
|
伯孔蒂 |
對錯誤的同步報頭進行計數(高位數)。 |
|
erroredblockhi |
對出錯的塊進行計數(高位數)。 |
|
counters0.syncloss |
同步丟失 |
|
counters0.blocklockloss |
阻止鎖定丟失 |
|
counters1.highber |
高BER |
|
counters1.vlderr |
有效錯誤 |
|
counters2.unkerr |
未知錯誤 |
|
counters2.inverderr |
無效錯誤 |
此輸出中有幾條小型封裝熱插拔(SFP)資訊。如果有任何值在SFP診斷的可接受範圍之外,則SFP被視為可能損壞的元件,需要更換。在這個例子中,一切都很好。
N9K-C93180YC-FX# show interface e1/5 transceiver details
Ethernet1/5
transceiver is present
type is 10Gbase-SR >>>SFP type
name is CISCO-OPLINK >>>SFP vendor
part number is TPP4XGDS0CCISE2G
revision is 02
serial number is OPMXXXXXXXX >>>SFP SN
nominal bitrate is 10300 MBit/sec >>>SFP bitrate
Link length supported for 50/125um OM2 fiber is 82 m
Link length supported for 62.5/125um fiber is 26 m
Link length supported for 50/125um OM3 fiber is 300 m
cisco id is 3
cisco extended id number is 4
cisco part number is 10-2415-03
cisco product id is SFP-10G-SR >>>SFP PID
cisco version id is V03
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 36.52 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.28 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.61 mA 12.00 mA 0.50 mA 11.50 mA 1.00 mA
Tx Power -2.70 dBm 1.99 dBm -11.30 dBm -1.00 dBm -7.30 dBm
Rx Power -2.40 dBm 1.99 dBm -13.97 dBm -1.00 dBm -9.91 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
peer side information is snipped.
如果在以前的檢查中一切正常,則需要交換測試來縮小故障部分的範圍。交換測試可以通過一次更改一個元件而保持其他所有元件不變。最後,在交換出特定故障元件後,鏈路將穩定下來。
修訂 | 發佈日期 | 意見 |
---|---|---|
1.0 |
31-Jan-2024 |
初始版本 |