Troubleshoot of S11 KPI Degradation

Available Languages

Download Options

PDF (257.6 KB)
View with Adobe Reader on a variety of devices
ePub (321.9 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (223.1 KB)
View on Kindle device or Kindle app on multiple devices

Updated:June 16, 2023

Document ID:220520

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

Overview

Messages in the S11 Interface

Troubleshooting Sequence

Analysis and Identification of Symptoms

Introduction

This document describes how to troubleshoot S11 Key Performance Indicators (KPI) degradation issues.

Overview

S11 is the interface that connects the Mobility Management Entity (MME) and the Serving Gateway (SGW) in a Long Term Evolution (LTE) network. The interface utilizes the Gn or GPRS Tunnelling Protocol-Control (GTP-C).

Messages in the S11 Interface

Create Session Request/Response
Modify Session Request/Response
Delete Session Request/Response

EPS Session Establishment

EPS Session Establishment:

S11 KPI degradation is observed when you see more Create Session Requests (CSR) rejections as compared to its CSR attempts, which has to be the root cause.

You can know the formula used to measure the KPI and make a note of all the counters which are included in the formula and determine the exact counter responsible for degradation.

S11 ASR (SPGW)  = ((tun-sent-cresessrespaccept+ggsn_tun-sent-cresessrespdeniedUserAuthFailed+tun-sent-cresessrespdeniedPrefPdnTypeUnsupported+tun-sent-cresessrespdeniedCtxtNotFound)/EGTPC-ggsn_tun-recv-cresess)*100

PDN Connectivity Success Rate (MME) : ((%esmevent-pdncon-success%) + (%esm-msgtx-pdncon-rej%))*) / (%esmevent-pdncon-attempt%) *100)

Note: Formula can vary based on the way it is measured.

Logs Required at Initial Level:

KPI trend depicting the degradation.
KPI formula utilized.
Raw bulkstat counters and cause code trends from the beginning of the issue.
Capture two instances of Show Support Details (SSDs) from the node at a 30-minute interval during problematic periods.
Syslogs ranged from two hours before the degradation occurred up until the current time. mon sub/pro traces and logging monitor msid <imsi> .

Troubleshooting Sequence

Evaluate the KPI trend of each counter involved in the S11 KPI formula by analyzing the bulkstats.
Compare the KPI trend during problematic timelines with non-problematic timelines.
Examine how the identified problematic bulkstat counter is defined based on flow and establish any patterns.
Collect disconnect reasons from the node through multiple iterations at intervals of 3 to 5 minutes.

You can analyze the delta of disconnect reasons between two SSDs collected at different timestamps. The disconnect reason that shows a significant increase in the delta value can be considered the cause of KPI degradation. For detailed descriptions of all disconnect reasons, refer to the Cisco Statistics and Counters Reference here: https://www.cisco.com/c/en/us/td/docs/wireless/asr_5000/21-23/Stat-Count-Reference/21-23-show-comman...

show session disconnect-reasons verbose

5. Check egtp statistics based on the type of node it is taken on:

--- SGW end -----

show egtpc statistics interface sgw-ingress path-failure-reasons
show egtpc statistics interface sgw-ingress summary
show egtpc statistics interface sgw-ingress verbose
show egtpc statistics interface sgw-ingress sessmgr-only

show egtpc statistics interface sgw-egress path-failure-reasons
show egtpc statistics interface sgw-egress summary
show egtpc statistics interface sgw-egress verbose
show egtpc statistics interface sgw-egress sessmgr-only

---- PGW end -----

show egtpc statistics interface pgw-ingress path-failure-reasons
show egtpc statistics interface sgw-ingress summary
show egtpc statistics interface sgw-ingress verbose
show egtpc statistics interface sgw-ingress sessmgr-only

--- MME end -----

show egtpc statistics interface mme path-failure-reasons
show egtpc statistics interface mme summary
show egtpc statistics interface mme verbose
show egtpc statistics interface mme sessmgr-only

6. Once you have identified the specific counter causing the problem, you must capture mon-sub/mon-pro call traces to further analyze and identify the specific call flow that is causing the KPI degradation. Additionally, you can use external tools to obtain Wireshark traces for more detailed analysis.

Commands to capture Mon sub-traces are as follows:

monitor subscriber with options 19, 26,33, 34, 35, 49,A,S, X, Y, verbosity +5 during the issue.

mon-pro with options 19, 26,33, 34, 35, 49,A,S, X, Y, verbosity +5 during the issue if no mon-sub is present.

More options can be enabled depending on the protocol or call flow we need to capture specifically

In cases where capturing traces like mon-sub is not feasible due to a minimal percentage of KPI degradation, you must capture system-level debug logs instead. This involves capturing debug logs for sessmgr and egptc, and if necessary, capturing gateway-specific flows.

logging filter active facility sessmgr level debug 
logging filter active facility egtpc level debug
logging filter active facility sgw level debug
logging filter active facility pgw level debug

logging active ----------------- to enable 
no logging active ------------- to disable

Note :: Debugging logs can increase CPU utilization so need to keep a watch while executing debugging logs

7. After analyzing the debug logs, if you determine the cause of the issue, you can proceed to capture the core file for that specific event where you observe the error logs.

logging enable-debug facility sessmgr instance <instance-ID> eventid 11176 line-number 3219 collect-cores 1

For example  :: consider we are getting below error log in debug logs which we suspect can be a cause of issue 
and we don;t have any call trace

[egtpc 141027 info] [15/0/6045 <sessmgr:93> _handler_func.c:10068] [context: INLAND_PTL_MME01, contextID: 6] [software internal user syslog] [mme-egress] Sending reject response for the message EGTP_MSG_UPDATE_BEARER_REQUEST with cause EGTP_CAUSE_NO_RESOURCES_AVAILABLE to <Host:x.x.x.x, Port:31456, seq_num:82011>

So in this error event 

facility :: sessmgr
event ID = 141027 
line number = 10068

Warning: Whenever requesting the collection of logs such as debug logs, logging monitor, mon-sub, or mon-pro, it is important to ensure that these logs are collected during a maintenance window. Additionally, it is crucial to monitor the CPU load during this time.

Analysis and Identification of Symptoms

First, check if any frequent crashes are observed in the system from SSD.

show crash list

Please verify if any license issues have been encountered. In some cases, when the license at the Serving Packet Data Gateway (SPGW) is expired, it can no longer accept new calls, resulting in failed calls and leading to S11 degradation or dip.

show resource info

Please verify if there are multiple sessmgr instances in a warn/over the state due to high memory or CPU usage. If such instances are found, check if new calls are being rejected because of these conditions.
From the debug logs, you can check on which interface, you get the call rejection errors.

If a significant number of call rejection errors occur for a specific subscriber in the "sgw-egress" context, followed by the same subscriber being rejected in the "sgw-ingress" context, it can be inferred that the rejections from the Packet Data Gateway (PGW) are sent to SGW-> MME in the S11 context. To confirm and troubleshoot further from the PGW end, you can now take a mon-sub for this IMSI.

2022-Nov-26+00:20:51.763 [egtpc 141018 unusual] [7/0/16871 <sessmgr:579> _handler_func.c:3227] [context: gwctx, contextID: 2]  [software internal user syslog] [sgw-egress] For IMSI: 427021600263284, create session request is rejected by the peer with cause EGTP_CAUSE_NO_RESOURCES_AVAILABLE

2022-Nov-26+00:20:51.763 [egtpc 141018 unusual] [7/0/16871 <sessmgr:579> _handler_func.c:2505] [context: gwctx, contextID: 2]  [software internal user syslog] [sgw-ingress] For IMSI: 427021600263284, create session request is rejected by the SAP user with cause EGTPC_REASON_UNKNOWN

Sometimes there can be multiple rejection reasons for the KPI dip, so you need to check for each reason separately and proceed accordingly.

For example, there can be no_resource_available/user_auth_failure error increase for certain International Mobile Subscriber Identity (IMSI) series, which is for in-roamer subscribers, so these need to be checked from PGW. There could be a reason like remote peer not responding and create a session request getting timed out at SGW and this can cause degradation in S11 KPI. This create session could be rejected as No_resource_available from SGW towards MME. These reject cause codes can be observed from the monitor protocol logs and you can check the Create Session Request and Create Session Responses to identify the specific IP addresses from where these reject cause codes are sent.

Revision History

Revision	Publish Date	Comments
1.0	18-Jun-2023	Initial Release

Contributed by Cisco Engineers

Gaurav Sachan
Cisco TAC Engineer
Bharati Choudhary
Cisco TAC Engineer
Krishna Kishore D V
Cisco Technical Leader

Was this Document Helpful?

Feedback

Contact Cisco

Open a Support Case
(Requires a Cisco Service Contract)

This Document Applies to These Products

ASR 5000 Series

Troubleshoot of S11 KPI Degradation

Available Languages

Download Options

Bias-Free Language

Contents

Introduction

Overview

Messages in the S11 Interface

Troubleshooting Sequence

Analysis and Identification of Symptoms

Revision History

Contributed by Cisco Engineers

Was this Document Helpful?

Contact Cisco

This Document Applies to These Products