Introduction
This document describes how to troubleshoot Session Management Function (SMF) log alert related to All Peers are Dead, Setting status code to 0
.
Problem
The session impact was reported on the SMF.
Analysis
Log All Peers are Dead
The log indicates that all peers inside SelectedProfileName:CHF-OFF are dead.
The log covers all endpoints configured on SMF, and anytime you see all peers are dead inside the profile this always leads to session impact.
master-1 b26897bce81a[2516]:
master-1 c77834f772f7[2516]: ************* TRANSACTION: 2475167152 *************
master-1 c77834f772f7[2516]: ************* TRANSACTION: 2475167152 *************
master-1 c77834f772f7[2516]: TRANSACTION SUCCESS:
master-1 c77834f772f7[2516]: TRANSACTION SUCCESS:
master-1 c77834f772f7[2516]: GR Instance ID : 1
master-1 c77834f772f7[2516]: GR Instance ID : 1
master-1 c77834f772f7[2516]: Txn Type : N40ChargingDataReq(3585)
master-1 c77834f772f7[2516]: Txn Type : N40ChargingDataReq(3585)
master-1 c77834f772f7[2516]: Priority : 1
master-1 c77834f772f7[2516]: Priority : 1
master-1 c77834f772f7[2516]: Session Namespace : smf(1)
master-1 c77834f772f7[2516]: Session Namespace : smf(1)
master-1 c77834f772f7[2516]: CDL Slice Name : smf
master-1 c77834f772f7[2516]: CDL Slice Name : smf
master-1 c77834f772f7[2516]: LOG MESSAGES:
master-1 c77834f772f7[2516]: LOG MESSAGES:
master-1 c77834f772f7[2516]: 2023/09/10 15:00:00.007 [ERROR] [nrfClient.Discovery.nrf] All Peers are Dead, Setting status code to 0 (timeout)
master-1 c77834f772f7[2516]: 2023/09/10 15:00:00.007 [ERROR] [nrfClient.Discovery.nrf] All Peers are Dead, Setting status code to 0 (timeout)
master-1 c77834f772f7[2516]: 2023/09/10 15:00:00.007 [ERROR] [nrfClient.Discovery.nrf] Message send failed, response [Type:CHF ServiceName:nchf-convergedcharging SelectedProfileName:"CHF-OFF" FailureProfile:"Fail-H-CHF-OFF" GroupID:"CHF-*" ]
master-1 c77834f772f7[2516]: 2023/09/10 15:00:00.007 [ERROR] [nrfClient.Discovery.nrf] Message send failed, response [Type:CHF ServiceName:nchf-convergedcharging SelectedProfileName:"CHF-OFF" FailureProfile:"Fail-H-CHF-OFF" GroupID:"CHF-*" ]
master-1 c77834f772f7[2516]: ***********************************************
master-1 c77834f772f7[2516]: ***********************************************
Based on the configuration, the SMF tries to reach to the primary server with higher priority in case there is an HTTP code 504 (timeout) system, and then the SMF tries to reach out to the secondary server. If that fails, as well in that case system also sets session in continue mode.
In the example, the secondary Charging function (CHF) for Offline is 10.10.10.2. The SMF received the 504 error and the action is FailureContinueAction.
master-2 42013075464a[2621]: 2023/09/10 15:00:00.063 rest-ep [ERROR] [RestClient.go:175] [infra.rest_client.core] Error in rest call err Post "http://10.10.10.2:1090/OFFLINE/nchf-convergedcharging/v2/chargingdata": context deadline exceeded
master-2 42013075464a[2621]: 2023/09/10 15:00:00.063 rest-ep [ERROR] [Config.go:1721] [nrfClient.Discovery.nrf] Send to NF rpcName[CHF], method:[DataRequest] EndPoint[http://10.10.10.2:1090/OFFLINE/nchf-convergedcharging/v2] failed
master-2 42013075464a[2621]: ************* TRANSACTION: 2252879781 *************
master-2 42013075464a[2621]: TRANSACTION SUCCESS:
master-2 42013075464a[2621]: GR Instance ID : 1
master-2 42013075464a[2621]: Txn Type : N40ChargingDataReq(3521)
master-2 42013075464a[2621]: Priority : 1
master-2 42013075464a[2621]: Session Namespace : smf(1)
master-2 42013075464a[2621]: CDL Slice Name : smf
master-2 42013075464a[2621]: LOG MESSAGES:
master-2 42013075464a[2621]: 2023/09/10 15:00:00.063 [ERROR] [rest_ep.app.ChargingIntf] {imsi-1234567891011121:21} Received Charging Data Response error with timediff 10001557123 - Request message {{"invocationSequenceNumber":1,"invocationTimeStamp":"2025-11-10T14:29:29Z","nfConsumerIdentification":{"nFIPv4Address":"10.10.10.12","nFName":"dce0c1d7-aa37-4f2c-870b-6f7c1be10af1","nFPLMNID":{"mcc":"123","mnc":"456"},"nodeFunctionality":"SMF"},"notifyUri":"http://10.10.10.12:8195/callbacks/v2/notifyUri/1909959397/chargingNotification","pDUSessionChargingInformation":{"chargingId":1909959397,"pduSessionInformation":{"authorizedQoSInformation":{"5qi":1,"arp":{"preemptCap":"NOT_PREEMPT","preemptVuln":"PREEMPTABLE","priorityLevel":1}},"authorizedSessionAMBR":{"downlink":"2048000 bps","uplink":"2048000 bps"},"chargingCharacteristicsSelectionMode":"VISITING_DEFAULT","dnnId":"data","hPlmnId":{"mcc":"123","mnc":"456"},"networkSlicingInfo":{"sNSSAI":{"sst":1}},"pduAddress":{"iPv6dynamicPrefixFlag":true,"pduIPv6AddresswithPrefix":"x:x:x:x::"},"pduSessionID":21,"pduType":"IPV6","ratType":"WLAN","servingCNPlmnId":{"mcc":"123","mnc":"456"},"sscMode":"SSC_MODE_1","startTime":"2025-11-10T14:29:29Z"},"userInformation":{"roamerInOut":"IN_BOUND","servedGPSI":"msisdn-12345678901"},"userLocationinfo":{"n3gaLocation":{"portNumber":4505,"ueIpv4Addr":"x.x.x.x"}}},"subscriberIdentifier":{"subscriberIdentityType":"SUPI","supi":"imsi-1234567891011121"}}}
master-2 42013075464a[2621]: 2023/09/10 15:00:00.063 [ERROR] [nrfClient.SendMesg.NRF] FHI status 504 timediff 1000332537, Uri: http://10.10.10.2:1090/OFFLINE/nchf-convergedcharging/v2, retryCount = 0 loopMaxRetry = 0, maxRetry = 0
master-2 42013075464a[2621]: 2023/09/10 15:00:00.063 [ERROR] [nrfClient.Discovery.nrf] Message send failed, response [Type:CHF Http2_Status:504 FailAction:FailureContinueAction MsgType:3587 ServiceName:nchf-convergedcharging SelectedProfileName:"CHF-OFF" FailureProfile:"Fail-H-CHF-OFF" GroupID:"CHF-*" ]
master-2 42013075464a[2621]: ***********************************************
SMF Checks
On SMF, check peers and their connected time regarding the endpoint that reported the issue.
smf# show peers
GR POD CONNECTED ADDITIONAL
INSTANCE ENDPOINT LOCAL ADDRESS PEER ADDRESS DIRECTION INSTANCE TYPE TIME RPC DETAILS INTERFACE NAME VRF
--------------------------------------------------------------------------------------------------------------------------------------------------------------
1 <none> 192.168.1.1 10.10.10.2:1090 Outbound rest-ep-0 Rest 4 hours CHF <none> n40 NA
1 <none> 192.168.1.2 10.10.10.2:1090 Outbound rest-ep-1 Rest 4 hours CHF <none> n40 NA
1 <none> 192.168.1.3 10.10.10.1:1090 Outbound rest-ep-2 Rest 4 hours CHF <none> n40 NA
1 <none> 192.168.1.3 10.10.10.2:1090 Outbound rest-ep-2 Rest 4 hours CHF <none> n40 NA
1 <none> 192.168.1.4 10.10.10.1:1090 Outbound rest-ep-3 Rest 4 hours CHF <none> n40 NA
1 <none> 192.168.1.2 10.10.10.1:1090 Outbound rest-ep-1 Rest 4 hours CHF <none> n40 NA
1 <none> 192.168.1.4 10.10.10.2:1090 Outbound rest-ep-3 Rest 2 hours CHF <none> n40 NA
1 <none> 192.168.1.1 10.10.10.1:1090 Outbound rest-ep-0 Rest 4 hours CHF <none> n40 NA
// CHF related profiles
profile network-element chf CHF-OFFLINE
nf-client-profile CHF-OFF
failure-handling-profile Fail-H-CHF-OFF
discovery local
exit
// Here is configuration for CHF profile where all peers are dead
profile nf-client nf-type chf
chf-profile CHF-OFF
locality LOC1
priority 1
service name type nchf-convergedcharging
responsetimeout 1000
endpoint-profile epprof
capacity 10
api-root OFFLINE
uri-scheme http
version
uri-version v2
exit
exit
endpoint-name ep1
priority 1
capacity 10
primary ip-address ipv4 10.10.10.1
primary ip-address port 1090
exit
endpoint-name ep2
priority 2
capacity 10
primary ip-address ipv4 10.10.10.2
primary ip-address port 1090
exit
exit
exit
exit
exit
// Failure handling that in case of timeout (HTTP code 504) then try secondary server one time and then proceed with continuing the session
profile nf-client-failure nf-type chf
profile failure-handling Fail-H-CHF-OFF
service name type nchf-convergedcharging
responsetimeout 1000
message type ChfConvergedchargingCreate
status-code httpv2 504
retry 1
action continue
exit
exit
message type ChfConvergedchargingUpdate
status-code httpv2 504
retry 1
action continue
exit
exit
message type ChfConvergedchargingDelete
status-code httpv2 504
retry 1
action continue
exit
exit
exit
Grafana Checks
The direct correlation between HTTP 504 Timeout and the time of the issue was observed.
query: sum(increase(smf_restep_http_msg_total{nf_type="chf", namespace=~"$namespace"}[15m])) by (api_name, response_cause, response_status)
Nexus Checks
Check for any flaps that happened.
Nexus# show logging last 500 | include BFD
Solution
The solution to this problem varies in this case because SMF is the client and CHF is the server.
Loss of connection was not caused by SMF.