The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes the procedure to perform maintenance activities in the IP Multimedia Subsystem (IMS) and Data User Plane Function (UPF) nodes.
Cisco recommends that you have knowledge of these topics:
The information in this document is based on these software and hardware versions:
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
The User Plane Interface (UPF) is one of the network functions (NFs) of the 5G core network (5GC). It is responsible for packet routing and forwarding, packet inspection, handling QoS, and external PDU sessions to interconnect Data Networks (DN), in the 5G architecture.
VPC-SI consolidates the operations of physical Cisco ASR 5500 chassis which runs StarOS into a single Virtual Machine (VM) able to run on commercial off-the-shelf (COTS) servers. Each VPC-SI VM operates as an independent StarOS instance, and incorporates the management and session process capabilities of a physical chassis.
Kernel-based Virtual Machine (KVM) is an open-source virtualization technology built into Linux. Specifically, KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).
Interchassis Session Recovery (ICSR) is a licensed Cisco feature that requires a separate license, this feature provides the highest possible availability for continuous call process without interruption to subscriber services. ICSR allows the operator to configure gateways for redundancy purposes. In the event of a gateway failure, ICSR allows sessions to be transparently routed around the failure, thus maintaining the user experience. ICSR also preserves session information and state.
Hardware maintenance such as hardware failure or software/firmware upgrade, and more, need downtime in the servers. This procedure needs to be followed for maintenance to be performed in the UPF baremetal servers and how to gracefully switch over the services to avoid unwanted downtime in the UPF application.
UPF Nodes are StarOS VMs which are hosted in KVM hypervisor. One KVM hypervisor hosts 2 VM instances. IMS UPF has 1:1 redundancy, Every active instance has a standby instance. it uses ICSR along with Session Redundancy Protocol (SRP) to handle redundancy. SRP is used to exchange hello messages between ICSR chassis. It also exchanges session state information between active/standby chassis (checkpoint data). Complete Subscriber Session information is sent from the ACTIVE Chassis to the STANDBY Chassis in the form of a Call Recovery Record (CRR), over the SRP link.
Log in to the KVM node and list the VM instances with KVM virsh command.
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 imsupf01 running
4 imsupf10 running
cloud-user@podname-upf-ims-kvmnode-1:~$
Log in to the UPF instance and check the chassis status.
[local]imsupf10# show srp info
Friday July 22 15:50:24 UTC 2022
Service Redundancy Protocol:
-------------------------------------------------------------------------------
Context: srp
Local Address: 10.x.x.74
Chassis State: Standby
Chassis Mode: Backup
Chassis Priority: 2
Local Tiebreaker: 02-7E-35-53-F9-F1
Route-Modifier: 9
Peer Remote Address: 10.x.x.73
Peer State: Active
Peer Mode: Primary
Peer Priority: 1
Peer Tiebreaker: 02-11-59-73-87-35
Peer Route-Modifier: 8
Last Hello Message received: Fri Jul 22 15:50:21 2022 (3 seconds ago)
Peer Configuration Validation: Complete
Last Peer Configuration Error: None
Last Peer Configuration Event: Fri Jul 22 15:50:22 2022 (2 seconds ago)
Last Validate Switchover Status: None
Connection State: Connected
[local]imsupf01# show srp info
Friday July 22 15:31:20 UTC 2022
Service Redundancy Protocol:
-------------------------------------------------------------------------------
Context: srp
Local Address: 10.x.x.66
Chassis State: Active
Chassis Mode: Backup
Chassis Priority: 2
Local Tiebreaker: 02-7C-1A-62-FA-3C
Route-Modifier: 5
Peer Remote Address: 10.x.x.65
Peer State: Standby
Peer Mode: Primary
Peer Priority: 1
Peer Tiebreaker: 02-87-33-98-6D-08
Peer Route-Modifier: 6
Last Hello Message received: Fri Jul 22 15:31:20 2022 (1 seconds ago)
Peer Configuration Validation: Complete
Last Peer Configuration Error: None
Last Peer Configuration Event: Fri Jul 22 15:20:13 2022 (668 seconds ago)
Last Validate Switchover Status: None
Connection State: Connected
Check if the number of lines are the same on the active-Standby ICSR Pair for IMS UPF.
Active node
# show configuration | grep -n -E "^end$"
Thursday July 21 07:30:17 UTC 2022
14960:end
Standby Node
# show configuration | grep -n -E "^end$"
Thursday July 21 07:31:02 UTC 2022
14959:end
Check if the SRP sessmgr are in the active-connected state before SRP switchover on active UPF and ensure there is no pending-active state.
[local]imsupf01# show srp checkpoint statistics active
Thursday July 21 07:38:04 UTC 2022
Number of Sessmgrs: 20
Sessmgrs in Active-Connected state: 20
Sessmgrs in Standby-Connected state: 0
Sessmgrs in Pending-Active state: 0
Check if SRP sessmgr are in the active-connected state before SRP switchover on standby UPF and ensure there is no pending-active state
[local]imsupf02# show srp checkpoint statistics active
Thursday July 21 07:40:03 UTC 2022
Number of Sessmgrs: 20
Sessmgrs in Active-Connected state: 0
Sessmgrs in Standby-Connected state: 20
Sessmgrs in Pending-Active state: 0
If any of these two is in an Active state, you need to first perform these tasks before switchover:
[upf-ims]# save config /flash/xxx_production.cfg. --> Replace xxx with the desired name of the config
[upf-ims]# srp validate-configuration
[upf-ims]# srp validate-switchover
Before VM shutdown, you need to ensure that the Active instances are switched over to standby so that subscribers are gracefully switched over. If the instance is already on standby, no action is needed. If the instance is active, check the highlighted values and ensure that the standby is ready to take over.
Check the current subscribers in the active UPF instance.
[local]imsupf01# show subscribers data-rate summary
Friday July 22 16:01:37 UTC 2022
Total Subscribers : 175024
Active : 175024 Dormant : 0
Switchover the active instance to standby.
[context-name]<hostname># srp initiate-switchover
Check the status of the standby, that would have become active by now, and the subscriber sessions are also moved to the new active instance. Now since both the VM instances are in the standby state, they are good to go down for server maintenance. Use given virsh commands to shutdown the VM instances and verify the status.
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh shutdown imsupf01
Domain imsupf01 is being shutdown
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh shutdown imsupf10
Domain imsupf10 is being shutdown
cloud-user@podname-upf-ims-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 imsupf01 shut off
4 imsupf10 shut off
cloud-user@podname-upf-ims-kvmnode-1:~$
Once the server is brought back after the maintenance, VMs are started automatically. UPF instances remain on standby. verify with the given command.
[local]imsupf10# show srp info
Friday July 22 15:50:24 UTC 2022
Service Redundancy Protocol:
-------------------------------------------------------------------------------
Context: srp
Local Address: 10.x.x.74
Chassis State: Standby
Chassis Mode: Backup
Chassis Priority: 2
Local Tiebreaker: 02-7E-35-53-F9-F1
Route-Modifier: 9
Peer Remote Address: 10.x.x.73
Peer State: Active
Peer Mode: Primary
Peer Priority: 1
Peer Tiebreaker: 02-11-59-73-87-35
Peer Route-Modifier: 8
Last Hello Message received: Fri Jul 22 15:50:21 2022 (3 seconds ago)
Peer Configuration Validation: Complete
Last Peer Configuration Error: None
Last Peer Configuration Event: Fri Jul 22 15:50:22 2022 (2 seconds ago)
Last Validate Switchover Status: None
Connection State: Connected
Data UPF uses RCM which has N:M redundancy wherein N is a number of Active UPFs and is less than 10, and M is a number of Standby UPs in the redundancy group. RCM is a Cisco proprietary node or network function (NF) that provides redundancy for StarOS-based User Plane Functions (UPF). It stores or mirrors all the required session information from all the Active UPF. On a switchover trigger, one Standby UPF is selected to receive the appropriate session data from the common location. RCM runs on a K3 cluster on a VM. The Ops Centre configures the RCM node.
Data UPF nodes are also the same as IMS UPF nodes. The only difference is the RCM redundancy management.
Check the VM status in the KVM node.
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 dataupf20 running
2 dataupf11 running
cloud-user@podname-upf-data-kvmnode-1:~$
After login to the UPF instance, check the RCM redundancy status. If the instance is already on standby, no action is needed. If it is active, it needs to be gracefully switched over to standby.
[local]dataupf11# show rcm info
Friday July 22 17:23:17 UTC 2022
Redundancy Configuration Module:
-------------------------------------------------------------------------------
Context: rcm
Bind Address: 10.x.x.75
Chassis State: Active
Session State: SockActive
Route-Modifier: 26
RCM Controller Address: 10.x.x.163
RCM Controller Port: 9200
RCM Controller Connection State: Connected
Ready To Connect: Yes
Management IP Address: 10.x.x.149
Host ID: DATAUPF15
SSH IP Address: 10.x.x.158 (Activated)
SSH IP Installation: Enabled
[local]dataupf11#
Check if all sessmgr are in an Active-connected state.
local]dataupf11# show rcm checkpoint statistics active
Thursday July 21 07:47:03 UTC 2022
Number of Sessmgrs: 22
Sessmgrs in Active-Connected state: 22
Sessmgrs in Standby-Connected state: 0
Sessmgrs in Pending-Active state: 0
Identify the corresponding RCM node from the Customer Information Questionnaire (CIQ) and check the RCM status. Note that RCM switchover can be done only from the master node. Ensure you log in to the master RCM.
[podname-aio-1/dcrm01] rcm# rcm show-status
message :
{"status":"MASTER"}
[podname-aio-1/dcrm01] rcm#
Find the active and standby UPF nodes with the given command (output truncated):
[podname-aio-1/dcrm01] rcm# rcm show-statistics controller
message :
{
"keepalive_version": "e7386cb81b1fefc3396dfd1d528e0d2a27de80d5de6a78364caf938a0d2149b6",
"keepalive_timeout": "20s",
"num_groups": 2,
"groups": [
{
"groupid": 1,
"endpoints_configured": 7,
"standby_configured": 1,
"pause_switchover": false,
"active": 6,
"standby": 1,
"endpoints": [
{
"endpoint": "10.x.x.75",
"bfd_status": "STATE_UP",
"upf_registered": true,
"upf_connected": true,
"upf_state_received": "UpfMsgState_Active",
"bfd_state": "BFDState_UP",
"upf_state": "UPFState_Active",
"route_modifier": 26,
"pool_received": true,
"echo_received": 142354,
"management_ip": "10.x.x.149",
"host_id": "DATAUPF15",
"ssh_ip": "10.x.x.158",
"force_nso_registration": false
....
....
{
"endpoint": "10.x.x.77",
"bfd_status": "STATE_UP",
"upf_registered": true,
"upf_connected": true,
"upf_state_received": "UpfMsgState_Standby",
"bfd_state": "BFDState_UP",
"upf_state": "UPFState_Standby",
"route_modifier": 50,
"pool_received": false,
"echo_received": 3673,
"management_ip": "10.x.x.153",
"host_id": "",
"ssh_ip": "10.x.x.186",
"force_nso_registration": false
},
Login to the standby UPF instance with the management IP and verify the status
[local]dataupf13# show rcm info
Friday July 22 17:36:04 UTC 2022
Redundancy Configuration Module:
-------------------------------------------------------------------------------
Context: rcm
Bind Address: 10.x.x.77
Chassis State: Standby
Session State: SockStandby
Route-Modifier: 50
RCM Controller Address: 10.x.x.163
RCM Controller Port: 9200
RCM Controller Connection State: Connected
Ready To Connect: Yes
Management IP Address: 10.x.x.153
Host ID:
SSH IP Address: 10.x.x.186 (Activated)
SSH IP Installation: Enabled
[local]dataupf13#
After the verification, gracefully switchover the active to the standby. Ensure to use the management IP.
[podname-aio-1/dcrm01] rcm# rcm switchover-mgmt-ip source 10.x.x.149 destination 10.x.x.153
Note: In case after switchover if new Active UP sessmgr are stuck in SERVER state. Engage Cisco technical support. In case of problematic instances, sessmgr has to be killed, so it reconnects to RCM with proper CLIENT socket state and recovers. All sessmgr needs to be in CLIENT state. Verify it with the given command (in hidden mode).
# show session subsystem facility sessmgr all debug-info | grep -E "SessMgr|Mode:"
Thursday July 21 07:56:26 UTC 2022
SessMgr: Instance 5000
Mode: UNKNOWN State: SRP_SESS_STATE_SOCK_ACTIVE
SessMgr Activity Detected: FALSE
SessMgr: Instance 22
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
SessMgr Activity Detected: TRUE
SessMgr: Instance 21
Mode: CLIENT State: SRP_SESS_STATE_SOCK_ACTIVE
SessMgr Activity Detected: TRUE
Check all sessmgr are in active and ready state.
# show rcm checkpoint statistics verbose
Thursday July 21 07:52:29 UTC 2022
smgr state peer recovery pre-alloc chk-point rcvd chk-point sent
inst conn records calls full micro full micro
---- ------- ----- ------- -------- ----- ----- ----- ----
1 Actv Ready 0 0 1731 68120 3107912 409200665
2 Actv Ready 0 0 1794 70019 3060062 408647685
3 Actv Ready 0 0 1753 68793 3078531 406227415
4 Actv Ready 0 0 1744 67585 3080952 410218643
5 Actv Ready 0 0 1749 69155 3096067 404944553
6 Actv Ready 0 0 1741 68805 3067392 407133464
7 Actv Ready 0 0 1744 67963 3084023 406772101
8 Actv Ready 0 0 1748 68702 3009558 408073589
9 Actv Ready 0 0 1736 68169 3030624 405679108
10 Actv Ready 0 0 1707 67386 3071592 406000628
11 Actv Ready 0 0 1738 68086 3052899 407991476
12 Actv Ready 0 0 1720 68500 3102045 408803079
13 Actv Ready 0 0 1772 69683 3082235 406426650
14 Actv Ready 0 0 1727 66900 2873736 392352402
15 Actv Ready 0 0 1739 68465 3032395 409603844
16 Actv Ready 0 0 1756 69221 3063447 411445527
17 Actv Ready 0 0 1755 68708 3051573 406333047
18 Actv Ready 0 0 1698 66328 3066983 407320405
19 Actv Ready 0 0 1736 68030 3037073 408215965
20 Actv Ready 0 0 1733 67873 3069116 405634816
21 Actv Ready 0 0 1763 69259 3074942 409802455
22 Actv Ready 0 0 1748 68228 3051222 406470380
Verify the subscribers are moved to the new standby:
[local]dataupf11# show subscribers data-rate summary
Friday July 22 17:40:18 UTC 2022
Total Subscribers : 62259
Active : 62259 Dormant : 0
When both instances are standby, VMs can be shutdown from KVM with virsh commands.
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh shutdown dataupf20
Domain dataupf20 is being shutdown
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh shutdown dataupf11
Domain dataupf11 is being shutdown
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 dataupf20 shut off
4 dataupf11 shut off
cloud-user@podname-upf-data-kvmnode-1:~$
When VMs are shut down, the KVM node (physical server) can be taken down for maintenance. Once completed, start the server. VMs are brought up automatically. UPF instances become standby on their own. Verify the same with given commands.
cloud-user@podname-upf-data-kvmnode-1:~$ sudo virsh list --all
Id Name State
----------------------------------------------------
1 dataupf20 running
2 dataupf11 running
cloud-user@podname-upf-data-kvmnode-1:~$
[local]dataupf11# show rcm info
Friday July 22 17:36:04 UTC 2022
Redundancy Configuration Module:
-------------------------------------------------------------------------------
Context: rcm
Bind Address: 10.x.x.77
Chassis State: Standby
Session State: SockStandby
Route-Modifier: 50
RCM Controller Address: 10.x.x.163
RCM Controller Port: 9200
RCM Controller Connection State: Connected
Ready To Connect: Yes
Management IP Address: 10.x.x.153
Host ID:
SSH IP Address: 10.x.x.186 (Activated)
SSH IP Installation: Enabled
[local]dataupf13#
In the RCM node, rcm controller still can show the standby UPF as "pending standby". This can take up to 15 to 20 minutes to transform to standby. Verify the same with the given commands (output truncated):
[podname-aio-1/dcrm01] rcm# rcm show-statistics controller
message :
{
"keepalive_version": "e7386cb81b1fefc3396dfd1d528e0d2a27de80d5de6a78364caf938a0d2149b6",
"keepalive_timeout": "20s",
"num_groups": 2,
"groups": [
{
"groupid": 1,
"endpoints_configured": 7,
"standby_configured": 1,
"pause_switchover": false,
"active": 6,
"standby": 1,
"endpoints": [
....
....
{
"endpoint": "10.x.x.77",
"bfd_status": "STATE_UP",
"upf_registered": true,
"upf_connected": true,
"upf_state_received": "UpfMsgState_Standby",
"bfd_state": "BFDState_UP",
"upf_state": "UPFState_Standby",
"route_modifier": 50,
"pool_received": false,
"echo_received": 3673,
"management_ip": "10.x.x.153",
"host_id": "",
"ssh_ip": "10.x.x.186",
"force_nso_registration": false
},
Revision | Publish Date | Comments |
---|---|---|
1.0 |
26-Jul-2022 |
Initial Release |