Introduction
This document describes troubleshooting the User Plane Function (UPF) upgrade after the Redundancy Configuration Manager (RCM) switchover.
Problem
Step 1: In the Active UPF (RCM-based), instances of missing chunks are observed:
[local]UPF# context n6
[n6]UPF# show ipv6 chunks
Failure: This CLI is only for User-plane
Note: Always check for DIMM/ECC/UEC/ADDDC errors on source and target UCS servers that are hosting UPFs and take RCM tac debug/
Step 2: On the Active UPF, in cases where chunks are missing, monitor SNMP trap events for the transition of UPF state from Standby to Active.
[n6]UPF# show snmp trap history verbose | grep RCM
Tuesday November 14 21:16:45 UTC 2023
Mon Oct 13 08:24:42 2023 Internal trap notification 1426 (RCMChassisState) RCM Chassis State: (0) Chassis State Init
Mon Oct 13 08:24:49 2023 Internal trap notification 1414 (RCMServiceStart) Context Name:rcm Service Name:rcm started
Mon Oct 13 08:25:04 2023 Internal trap notification 1425 (RCMTCPConnect) Context Name: rcm
Mon Oct 13 08:25:04 2023 Internal trap notification 1421 (RCMConfigPushCompleteSent) Context Name: rcm
Mon Oct 13 08:25:04 2023 Internal trap notification 1426 (RCMChassisState) RCM Chassis State: (2) Chassis State Standby
Mon Oct 13 08:33:47 2023 Internal trap notification 1420 (RCMConfigPushCompleteReceived) Context Name:
Mon Oct 13 08:33:47 2023 Internal trap notification 1421 (RCMConfigPushCompleteSent) Context Name: rcm
Mon Oct 13 08:48:10 2023 Internal trap notification 1421 (RCMConfigPushCompleteSent) Context Name: rcm
Mon Oct 13 08:48:10 2023 Internal trap notification 1420 (RCMConfigPushCompleteReceived) Context Name: up
Mon Oct 13 08:48:12 2023 Internal trap notification 1426 (RCMChassisState) RCM Chassis State: (1) Chassis State Active
Step 3: On the Active UPF, when encountering missing chunks, examine the syslog for log events indicating the cessation of the rest(5) sx-demux services in the corresponding Redundancy Group (RG-1), while the Standby UPF transitions to the active state.
Oct 13 08:48:11 UPF evlogd: [local-60sec11.091] [sessctrl 8066 info] [1/0/9050 <sessctrl:0> ctrl_mgrs_cfg.c:2201] [context: up, contextID: 2] [software internal system critical-info syslog] Session Controller: stopping SX-DEMUX service up1 2023-10-13T08:48:11.000+0000
Oct 13 08:48:11 UPF evlogd: [local-60sec11.483] [sessctrl 8066 info] [1/0/9050 <sessctrl:0> ctrl_mgrs_cfg.c:2201] [context: up, contextID: 2] [software internal system critical-info syslog] Session Controller: stopping SX-DEMUX service up2 2023-10-13T08:48:11.000+0000
Oct 13 08:48:11 UPF evlogd: [local-60sec11.582] [sessctrl 8066 info] [1/0/9050 <sessctrl:0> ctrl_mgrs_cfg.c:2201] [context: up, contextID: 2] [software internal system critical-info syslog] Session Controller: stopping SX-DEMUX service up3 2023-10-13T08:48:11.000+0000
Oct 13 08:48:11 UPF evlogd: [local-60sec11.726] [sessctrl 8066 info] [1/0/9050 <sessctrl:0> ctrl_mgrs_cfg.c:2201] [context: up, contextID: 2] [software internal system critical-info syslog] Session Controller: stopping SX-DEMUX service up5 2023-10-13T08:48:11.000+0000
Oct 13 08:48:18 UPF evlogd: [local-60sec18.749] [sessctrl 8066 info] [1/0/9050 <sessctrl:0> ctrl_mgrs_cfg.c:2201] [context: up, contextID: 2] [software internal system critical-info syslog] Session Controller: stopping SX-DEMUX service up6 2023-10-13T08:48:18.000+0000
Step 4: On the Active UPF with missing chunks, enable debug mode (cli test-commands password <password>) and execute the command to monitor Sx DeReg transactions that align with the UPF active period.
[n6]UPF# show ip pool vpn-sx-transactions
Context: n6
Sx transactions:
sent: 0, received: 0
Failed transactions: 0
**************************************************************************************
Sx Deregistration transactions:
**************************************************************************************
Peer Address Deregistration Time
================================ ====================================================
192.168.1.55 Mon Oct 13 08:48:18 2023
192.168.1.49 Mon Oct 13 08:48:18 2023
192.168.1.49 Mon Oct 13 08:48:18 2023
192.168.2.55 Mon Oct 13 08:48:18 2023
192.168.2.55 Mon Oct 13 08:48:18 2023
192.168.2.49 Mon Oct 13 08:48:18 2023
192.168.2.49 Mon Oct 13 08:48:18 2023
[n6]UPF#
Step 5: On the Active UPF with missing chunks, search the syslog for the logs occurring in the vicinity of the UPF transitioning to the Active state.
Oct 13 08:48:12 UPF evlogd: [local-60sec12.060] [vpn 5013 error] [1/0/9399 <vpnmgr:3> _cups_ip_pool.c:16149] [context: n6, contextID: 3] [software internal system syslog] #01Prefix fd12:3456:7890:abcd::/64 is not allocated to this UP: Closest chunk found with id -2146435055 prefix0: 638057330 start_prefix1: -1391067126 end_prefix1: -1391050752 2023-10-13T08:48:12.000+0000
Observe log events continuously coming from vpnmgr instance 3 task (vpnmgr:3).
localsystem:$ less UPF-Destination-UPF-Syslog.log | grep "Pool_name is not present" | head -1
Oct 13 08:48:18 UPF evlogd: [local-60sec18.811] [vpn 5013 error] [1/0/9399 <vpnmgr:3> vpn_ip_pool.c:27493] [context: n6, contextID: 3] [software internal system syslog] #01Pool_name is not present in release request for prefixfd1:3456:7892:abcd::/64 2023-10-13T08:48:18.000+0000
localsystem:$
localsystem:$ less UPF-Destination-UPF-Syslog.log | grep "Pool_name is not present" | tail -1
Oct 13 09:29:59 UPF evlogd: [local-60sec59.671] [vpn 5013 error] [1/0/9399 <vpnmgr:3> vpn_ip_pool.c:27493] [context: n6, contextID: 3] [software internal system syslog] #01Pool_name is not present in release request forprefixfd1:3456:7894:abcd::/64 2023-10-13T09:29:59.000+0000
localsystem:$
Solution
To address this issue, please refer to the bug report for more details: Cisco bug ID CSCwh97931
The fix for this issue involves enhancing SxDemux to prevent IP chunk clean-up during SRP transition while also improving log debug capabilities.
If the CDETS fix mentioned is not yet available in the UPF build you are using, you can employ this workaround:
Execute the Standard N4 association Dis-associate/Associate MOP within the MW timeframe.