Troubleshoot UPF State Mismatch in RCM

Available Languages

Download Options

PDF (30.0 KB)
View with Adobe Reader on a variety of devices
ePub (81.0 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (67.3 KB)
View on Kindle device or Kindle app on multiple devices

Updated:December 4, 2023

Document ID:221211

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Step 1. Capture Some Command Outputs

Step 2. Collect Controller and Configmgr Logs

Troubleshooting

Scenario for UPs Getting Stuck Into the Pending State

Workaround

Introduction

This document describes the issues related to the UPF states mismatch in RCM.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

The information in this document is based on these software and hardware versions:

Redundancy Configuration Manager (RCM)
User Plane Function (UPF/UP)

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Logs Collection

RCM

Step 1. Capture Some Command Outputs

Firstly, you must identify which is the problematic UP and what is the pattern of the issue. In order to determine which UPs experienced a switchover and identify where the current issue is located, it is essential to document the reasons for the switchovers.

rcm show-statistics switchover
rcm show-statistics switchover-verbose

rcm show-statistics configmgr  --------------- to check how many UPs are registered for config push

rcm show-statistics controller --------------- to check no of UPs and its states registered with controller

Step 2. Collect Controller and Configmgr Logs

Once you identify among which UPs the problem lies, you can collect controller logs and configmgr logs in order to identify what was the cause of the switchover and what went wrong for the UPs to get stuck in Pending State.

Refer to the RCM Log Collection link for the log collection procedure.

UP

SSD, Syslogs, and SNMP traps for the problematic timestamp, cover the timeframe at least two hours before the issue starts.

Troubleshooting

Scenario for UPs Getting Stuck Into the Pending State

Generally, every UP registers itself to the RCM via the controller
The controller is responsible for maintaining the UP states it receives from UP and the one assigned by RCM and compiling them

rcm show-statistics controller

message :
{
  "keepalive_version": "f1ab207c5d3120f8a4286b999b9f4cd207034e7c61e204d74e41f48578c476de",
  "keepalive_timeout": "20s",
  "num_groups": 2,
  "groups": [
{
      "groupid": 1,
      "endpoints_configured": 7,
      "standby_configured": 1,
      "pause_switchover": false,
      "active": 2,
      "standby": 0,
      "endpoints": [
{
          "endpoint": "X.X.X.X",    -------- UP IP
          "bfd_status": "STATE_UP",
          "upf_registered": true,
          "upf_connected": true,
          "upf_state_received": "UpfMsgState_Active",
          "bfd_state": "BFDState_UP",
          "upf_state": "UPFState_PendActive",
          "route_modifier": 32,
          "pool_received": false,
          "echo_received": 253,
          "management_ip": "X.X.X.X",
          "host_id": "SEUD2413",
          "ssh_ip": "Y.Y.Y.Y",
          "force_nso_registration": false
  },

In the controller statistics, if observed, there are different states which controller is maintaining and each UP state has its own meaning.

BFD state - Indicates the BFD state between RCM and UP (do not refer to it as UF state, it is purely BFD state only)

UPF state - The current state of the UPF in the RCM

UPF state received - UP state sent by UP towards RCM

As per the flow generally, whenever there is a switchover from Active UP to Standby UP, RCM must undergo certain procedures for smooth handovers mentioned here:

1. Checkpointmgr flush from old UP and checkpoint sync with new Active UP

2. Config flush

3. Config push

4. Managing UP states

Consider the example of UP pair as UP-A (Active UP) and UP-B (Standby-UP) and when there is a switchover before getting into Active and Standby states it first gets into the Pending state.

UP-A (Active UP) --------------------- PendStandby ---------------------- Standby

UP-B (Standby UP) ------------------- PendActive ---------------------- Active

As can be seen before becoming Active/Standby, the mentioned procedural transactions are happening between RCM and UP in order to have a smooth switchover.

Whenever there is a switchover from Active to Standby and vice versa, RCM must perform a config push where it pushes the Active UP configuration in the UP which becomes Active, and pushes the Standby UP configuration in the UP which becomes Standby.

Note :: In Standby UP normally RCM push all the UP config which are currently active so that whenever this UP becomes active it removes all the unwanted config

As soon as the switchover is initiated, RCM has a timer value of 15 minutes (it varies based on the configured value) and within this timer value, it must complete the switchover which gets concluded once the config push is completed.
Now in case, due to some reason if config push is not completed within the time the timer expires and RCM initiates the reload to the UP. This continues until the config push is completed.
So, when RCM is pushing configuration to UP it is expecting configuration complete signal from UP based on which RCM understands that the config push is completed and considers it a successful switchover.

This is the log that can be seen from the syslogs and the SNMP traps when the config push is complete.

Syslogs 

Nov 13 12:01:09 INVIGJ02GNR1D1UP12CO evlogd: [local-60sec9.041] [cli 30000 debug] [1/0/10935 <cli:1010935> cliparse.c:571] [context: local, contextID: 1]  [software internal system syslog] CLI command [user rcmadmin, mode [local]INVIGJ02GNR1D1UP12CO]: rcm-config-push-complete
Nov 13 12:01:09 INVIGJ02GNR1D1UP12CO evlogd: [local-60sec9.041] [cli 30000 debug] [1/0/10935 <cli:1010935> cliparse.c:571] [context: local, contextID: 1]  [software internal system syslog] CLI command [user rcmadmin, mode [local]INVIGJ02GNR1D1UP12CO]: rcm-config-push-complete end-of-config

SNMP

Fri Mar 24 09:59:01 2023 Internal trap notification 1425 (RCMTCPConnect) Context Name: rcm 
Fri Mar 24 09:59:01 2023 Internal trap notification 1421 (RCMConfigPushCompleteSent) Context Name: rcm 
Fri Mar 24 09:59:01 2023 Internal trap notification 1426 (RCMChassisState) RCM Chassis State: (2) Chassis State Standby 
Fri Mar 24 09:59:04 2023 Internal trap notification 1276 (BFDSessionUp) vpn n6 OurAddr fc00:10:5:132::10 NeighborAddr fc00:10:5:132::254 Session(6/1090552866), Diagnostic code 0 PhyPortId 0

But in case there is any issue due to which the config push completion is taking time which causes the timer value to expire, then such issues of UP stuck into the Pending state occur.
As RCM did not get the config push completion status, it considers the switchover is not complete and keeps UP in the Pending state.
Different reasons for config push issues are explained in UP Reload Causes.

Workaround

1. Temporarily you can enforce the config push complete signal from UP towards RCM with this mentioned command in order to bring back the UP in the Active/Standby state:

rcm-config-push-complete end-of-config

2. This mentioned workaround is just temporary in order to identify the issue taking time for config push which is described in UP Reload Causes.

Revision History

Revision	Publish Date	Comments
1.0	04-Dec-2023	Initial Release

Contributed by Cisco Engineers

Saumyakanta Sahoo
Cisco TAC Engineer
Bharati Choudhary
Cisco TAC Engineer
Krishna Kishore D V
Cisco TAC Engineer

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

Troubleshoot UPF State Mismatch in RCM

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Prerequisites

Requirements

Components Used

Logs Collection

RCM

Step 1. Capture Some Command Outputs

Step 2. Collect Controller and Configmgr Logs

UP

Troubleshooting

Scenario for UPs Getting Stuck Into the Pending State

Workaround

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products