The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document describes how to troubleshoot the "OS-SHMWIN-2-ERROR_ENCOUNTERED" error on a Cisco IOS® XR router.
Examples of the error message are:
"%OS-SHMWIN-2-ERROR_ENCOUNTERED"
LC/0/0/CPU0:Dec 16 09:45:58 : fib_mgr[260]: %OS-SHMWIN-2-ERROR_ENCOUNTERED : SHMWIN: Error encountered: System memory state is severe, please check the availability of the system memory
LC/0/0/CPU0:Dec 16 09:45:39 : l2fib[328]: %OS-SHMWIN-2-ERROR_ENCOUNTERED : SHMWIN: Error encountered: System memory state is severe, please check the availability of the system memory
RP/0/RSP0/CPU0:Aug 11 21:15:47.174 IST: show_ip_interface[65961]: %OS-SHMWIN-2-ERROR_ENCOUNTERED : SHMWIN: Error encountered: 'shmwin' detected the 'fatal' condition 'mutex operation failed'
The error indicates that the system's memory state is severe. Specifically, the shared memory, which stores the dynamic data between multiple processes, has an issue.
Start by identifying the linecard (or RP/RSP) and the top memory consumers.
The error message can have a process or even a command embedded. However, if the memory condition is low, anything can fail if there is not enough memory available. You need to identify what causes the available memory to go low.
The linecard is indicated in the error message itself. Try to find the top consumers of the memory.
show memory location 0/x/CPUx
show memory summary location 0/x/CPUx
show watchdog memory-state location 0/x/CPUx
show processes memory location 0/x/CPUx
Note: There could be other error messages possibly indicating what the culprit processes are.
For example:
RP/0/RSP0/CPU0:Apr 24 11:34:33.599 EST: wdsysmon[450]: %HA-HA_WD-4-MEMORY_ALARM : Memory threshold crossed: Normal with 892.125MB free
RP/0/RSP0/CPU0:Apr 24 13:23:12.947 EST: wdsysmon[450]: %HA-HA_WD-4-MEMORY_ALARM : Memory threshold crossed: Minor with 819.199MB free
RP/0/RSP0/CPU0:Apr 24 14:32:10.086 EST: wdsysmon[450]: %HA-HA_WD-4-MEMORY_STATE_CHANGE : New memory state: Severe
RP/0/RSP0/CPU0:Apr 24 14:32:10.086 EST: wdsysmon[450]: %HA-HA_WD-4-TOP_MEMORY_USERS_WARNING : Top 5 consumers of system memory (671084 Kbytes free):
RP/0/RSP0/CPU0:Apr 24 14:32:10.086 EST: wdsysmon[450]: %HA-HA_WD-4-TOP_MEMORY_USER_WARNING : 0: Process Name: eth_server[61], pid: 57385, Heap usage 54632 Kbytes, Virtual Shared memory usage: 73116 Kbytes.
RP/0/RSP0/CPU0:Apr 24 14:32:10.086 EST: wdsysmon[450]: %HA-HA_WD-4-TOP_MEMORY_USER_WARNING : 1: Process Name: bgp[1051], pid: 553285, Heap usage 28556 Kbytes, Virtual Shared memory usage: 90512 Kbytes.
RP/0/RSP0/CPU0:Apr 24 14:32:10.087 EST: wdsysmon[450]: %HA-HA_WD-4-TOP_MEMORY_USER_WARNING : 2: Process Name: instdir[252], pid: 184387, Heap usage 24808 Kbytes, Virtual Shared memory usage: 24800 Kbytes.
RP/0/RSP0/CPU0:Apr 24 14:32:10.087 EST: wdsysmon[450]: %HA-HA_WD-4-TOP_MEMORY_USER_WARNING : 3: Process Name: parser_server[352], pid: 204908, Heap usage 21896 Kbytes, Virtual Shared memory usage: 4184784 Kbytes.
RP/0/RSP0/CPU0:Apr 24 14:32:10.087 EST: wdsysmon[450]: %HA-HA_WD-4-TOP_MEMORY_USER_WARNING : 4: Process Name: ipv6_rib[1144], pid: 549174, Heap usage 21600 Kbytes, Virtual Shared memory usage: 24688 Kbytes.
If the process is BGP or any other routing protocol, verify you did not make any changes in the network that contributed to this.
Use these commands to get an overview of the memory used and to identify the top processes taking the memory.
0/x/CPUx is the specific linecard in the error.
show memory summary location 0/x/CPUx
show memory summary location 0/x/CPUx
show shared-memory location 0/x/CPUx
show memory-top-consumers location 0/x/CPUx
show shmwin summary location 0/x/CPUx
Examples:
RP/0/RSP1/CPU0:R1#show memory summary location 0/RSP0/CPU0
node: node0_RSP0_CPU0
Physical Memory: 6144M total--------------------------------------
Application Memory : 5738M (2795M available)
Image: 117M (bootram: 117M)
Reserved: 224M, IOMem: 0, flashfsys: 0
Total shared window: 76M
RP/0/RSP1/CPU0:R1#show memory summary location 0/RSP0/CPU0
node: node0_RSP0_CPU0
Physical Memory: 6144M total--------------------------------------
Application Memory : 5738M (2797M available)
Image: 117M (bootram: 117M)
Reserved: 224M, IOMem: 0, flashfsys: 0
Total shared window: 76M
RP/0/RSP1/CPU0:R1#show shared-memory location 0/0/cpu0
Total Shared memory: 1527M
ShmWin: 236M
Image: 703M
LTrace: 353M
AIPC: 33M
SLD: 3M
SubDB: 1M
CERRNO: 144K
GSP-CBP: 64M
EEM: 0
XOS: 4M
CHKPT: 2M
CDM: 4M
XIPC: 594K
DLL: 64K
SysLog: 0
Miscellaneous: 119M
LTrace usage details:
Used: 353M, Max: 2075M
Current: default(dynamic)
Configured: dynamic with scale-factor: 8 (changes take effect after reload)
RP/0/RP0/CPU0:R1#show memory-top-consumers location 0/RP0/CPU0
Execute 'show memory-snapshots process <> location <>' to check memory usage trend.
###################################################################
Top memory consumers on 0/RP0/CPU0 (at 2023/Nov/8/15:41:42)
###################################################################
PID Process Total(MB) Heap(MB) Shared(MB)
7366 mibd_interface 233.2 192.64 37.7
2552 spp 228.2 9.71 222.1
49132 bgp 225.9 83.62 165.9
4844 l2rib 211.8 21.12 190.1
2787 gsp 137.9 24.64 113.1
3869 mpls_lsd 122.8 12.85 107.8
3804 fib_mgr 121.0 13.43 108.7
2975 parser_server 116.7 66.39 44.6
6685 l2vpn_mgr 116.5 43.77 82.3
3310 dpa_port_mapper 114.8 2.96 110.2
RP/0/RSP1/CPU0:R1#show shmwin summary location 0/0/cpu0
----------------------------------------
Shared memory window summary information
----------------------------------------
Data for Window "subdb_sco_tbl":
-----------------------------
Virtual Memory size : 1536 MBytes
Virtual Memory Range : 0x7c000000 - 0xdc000000
Virtual Memory Group 2 size : 352 MBytes
Virtual Memory Group 2 Range : 0x66000000 - 0x7c000000
Window Name ID GRP #Usrs #Wrtrs Ownr Usage(KB) Peak(KB) Peak Timestamp
---------------- --- --- ----- ------ ---- --------- -------- -------------------
subdb_sco_tbl 70 1 1 1 158 3 0 --/--/---- --:--:--
Data for Window "ptp":
-----------------------------
ptp 131 P 1 1 0 35 35 10/18/2023 11:56:31
Data for Window "cfmd-sla":
-----------------------------
cfmd-sla 53 1 1 1 0 99 99 10/18/2023 11:56:20
Data for Window "cfmd":
-----------------------------
cfmd 36 1 1 1 0 99 99 10/18/2023 11:56:30
Data for Window "vkg_pbr_ea":
-----------------------------
vkg_pbr_ea 83 1 1 1 0 147 147 10/18/2023 11:56:27
Data for Window "span_ea_pd":
-----------------------------
span_ea_pd 40 1 1 1 362 34 34 10/18/2023 11:56:13
Data for Window "vkg_l2fib_vqi":
-----------------------------
vkg_l2fib_vqi 97 1 2 2 0 3 0 --/--/---- --:--:--
Data for Window "statsd_db":
-----------------------------
statsd_db 60 1 1 1 0 3 0 --/--/---- --:--:--
Data for Window "statsd_db_l":
-----------------------------
statsd_db_l 130 P 1 1 0 1131 1131 10/18/2023 11:56:17
Data for Window "arp":
-----------------------------
arp 20 1 1 1 0 227 227 10/18/2023 11:56:37
Data for Window "bm_lacp_tx":
-----------------------------
bm_lacp_tx 54 1 1 1 132 1 0 --/--/---- --:--:--
Data for Window "ether_ea_shm":
-----------------------------
ether_ea_shm 26 1 4 4 406 227 227 10/18/2023 11:56:27
Data for Window "vkg_l2fib_evpn":
-----------------------------
vkg_l2fib_evpn 100 1 3 3 0 3 0 --/--/---- --:--:--
Data for Window "l2fib":
-----------------------------
l2fib 14 1 10 10 262 45265 45265 11/08/2023 15:03:18
Data for Window "ether_ea_tcam":
-----------------------------
ether_ea_tcam 58 1 5 5 313 595 595 10/18/2023 11:55:55
Data for Window "vkg_vpls_mac":
-----------------------------
vkg_vpls_mac 35 1 3 3 0 6291 6291 10/25/2023 13:15:04
Data for Window "prm_stats_svr":
-----------------------------
prm_stats_svr 24 1 21 21 0 12419 12419 10/18/2023 11:56:24
Data for Window "prm_srh_main":
-----------------------------
prm_srh_main 66 1 31 31 0 60163 60163 10/18/2023 11:56:31
Data for Window "prm_tcam_mm_svr":
-----------------------------
prm_tcam_mm_svr 23 1 1 1 0 22067 22163 10/18/2023 12:04:59
Data for Window "prm_ss_lm_svr":
-----------------------------
prm_ss_lm_svr 65 1 1 1 0 3233 3233 10/18/2023 11:56:33
Data for Window "prm_ss_mm_svr":
-----------------------------
prm_ss_mm_svr 22 1 5 5 0 3867 3867 10/18/2023 11:55:52
Data for Window "vkg_gre_tcam":
-----------------------------
vkg_gre_tcam 63 1 2 2 388 35 35 10/18/2023 11:55:54
Data for Window "tunl_gre":
-----------------------------
tunl_gre 62 1 2 2 388 39 39 10/18/2023 11:55:38
Data for Window "pd_fib_cdll":
-----------------------------
pd_fib_cdll 28 1 1 1 0 35 35 10/18/2023 11:55:36
Data for Window "SMW_TEST_2":
-----------------------------
SMW_TEST_2 86 1 1 1 0 1067 1067 10/18/2023 11:55:35
Data for Window "ifc-mpls":
-----------------------------
ifc-mpls 13 1 18 18 188 7161 9057 11/02/2023 18:32:41
Data for Window "ifc-ipv6":
-----------------------------
ifc-ipv6 17 1 18 18 188 25249 25665 11/02/2023 18:33:13
Data for Window "ifc-ipv4":
-----------------------------
ifc-ipv4 16 1 18 18 188 24205 24893 10/31/2023 18:12:27
Data for Window "ifc-protomax":
-----------------------------
ifc-protomax 18 1 18 18 188 6057 6297 10/18/2023 11:56:06
Data for Window "bfd_offload_shm":
-----------------------------
bfd_offload_shm 94 1 1 1 0 2 0 --/--/---- --:--:--
Data for Window "netio_fwd":
-----------------------------
netio_fwd 34 1 1 1 0 0 0 --/--/---- --:--:--
Data for Window "mfwd_info":
-----------------------------
mfwd_info 1 1 2 2 254 1373 1373 10/18/2023 11:56:24
Data for Window "mfwdv6":
-----------------------------
mfwdv6 15 1 1 1 258 737 737 10/18/2023 11:55:57
Data for Window "vkg_bmp_adj":
-----------------------------
vkg_bmp_adj 30 1 2 2 129 235 235 10/18/2023 11:55:55
Data for Window "rewrite-db":
-----------------------------
rewrite-db 101 1 3 3 0 4115 4115 10/18/2023 11:55:32
Data for Window "inline_svc":
-----------------------------
inline_svc 88 1 1 1 0 755 755 10/18/2023 11:55:33
Data for Window "im_rd":
-----------------------------
im_rd 33 1 75 75 217 1131 1131 10/18/2023 11:55:32
Data for Window "ipv6_pmtu":
-----------------------------
ipv6_pmtu 98 1 1 1 256 3 0 --/--/---- --:--:--
Data for Window "im_db_private":
-----------------------------
im_db_private 129 P 1 1 0 1131 1131 10/18/2023 11:55:34
Data for Window "infra_ital":
-----------------------------
infra_ital 19 1 3 3 340 387 387 10/18/2023 11:55:41
Data for Window "infra_statsd":
-----------------------------
infra_statsd 8 1 5 5 370 3 0 --/--/---- --:--:--
Data for Window "ipv6_nd_pkt":
-----------------------------
ipv6_nd_pkt 128 P 1 1 0 107 107 10/18/2023 11:55:30
Data for Window "aib":
-----------------------------
aib 2 1 10 10 114 2675 2675 10/18/2023 11:56:42
Data for Window "vkg_pm":
-----------------------------
vkg_pm 5 1 34 1 313 307 307 11/03/2023 11:25:06
Data for Window "subdb_fai_tbl":
-----------------------------
subdb_fai_tbl 75 2 11 1 0 51 51 10/18/2023 11:55:26
Data for Window "subdb_ifh_tbl":
-----------------------------
subdb_ifh_tbl 74 2 2 1 0 35 35 10/18/2023 11:55:26
Data for Window "subdb_ao_tbl":
-----------------------------
subdb_ao_tbl 72 2 1 1 0 43 43 10/18/2023 11:55:26
Data for Window "subdb_do_tbl":
-----------------------------
subdb_do_tbl 73 2 11 1 0 35 35 10/18/2023 11:55:26
Data for Window "subdb_co_tbl":
-----------------------------
subdb_co_tbl 71 2 11 1 0 4107 4107 10/18/2023 11:55:26
Data for Window "rspp_ma":
-----------------------------
rspp_ma 3 1 14 14 0 3 0 --/--/---- --:--:--
Data for Window "cluster_dlm":
-----------------------------
cluster_dlm 61 1 26 26 0 3 0 --/--/---- --:--:--
Data for Window "pfm_node":
-----------------------------
pfm_node 29 1 1 1 0 195 195 10/18/2023 11:56:11
Data for Window "im_rules":
-----------------------------
im_rules 31 1 85 85 217 453 453 10/18/2023 11:55:32
Data for Window "im_db":
-----------------------------
im_db 32 1 85 1 0 2065 2065 10/18/2023 11:56:26
Data for Window "spp":
-----------------------------
spp 27 1 51 51 88 1403 1403 10/18/2023 11:56:29
Data for Window "qad":
-----------------------------
qad 6 1 1 1 0 134 134 01/01/1970 02:00:08
Data for Window "pcie-server":
-----------------------------
pcie-server 39 1 1 1 0 39 39 01/01/1970 02:00:07
---------------------------------------------
Total SHMWIN memory usage : 235 MBytes
Identify there is no memory leak for any process:
You can take a 'memory compare'. This process shows you the increase or decrease over a period of time - that you specify - of memory per process. This is an example; note the 'difference' column.
RP/0/RSP0/CPU0:R1#show memory compare start
Successfully stored memory snapshot /harddisk:/malloc_dump/memcmp_start.out
RP/0/RSP0/CPU0:R1#show memory compare end
Successfully stored memory snapshot /harddisk:/malloc_dump/memcmp_end.out
RP/0/RSP0/CPU0:R1#show memory compare report
JID name mem before mem after difference mallocs restart/exit/new
--- ---- ---------- --------- ---------- ------- ----------------
376 parser_server 32069512 32070976 1464 1
463 sysdb_svr_local 10064204 10065084 880 20
459 sysdb_shared_nc 4103104 4103560 456 12
66013 exec 209964 210052 88 3
1241 xtc_agent 4796436 4796432 -4 0
1087 bgp 51646552 51646120 -432 -3
457 sysdb_mc 5094852 5094188 -664 -8
358 netio 19185724 19183804 -1920 -45
334 lpts_pa 76234948 76228484 -6464 -97
1031 ospf 9107084 9098232 -8852 -1
476 tcp 5725148 5708444 -16704 -8
254 gsp 9473460 9424452 -49008 14
1153 mdtd 25206084 24750076 -456008 -25
You are now free to remove snapshot memcmp_start.out and memcmp_end.out under /harddisk:/malloc_dump
If ltrace is the process that takes a lot of memory and is one of the top memory consumers, consider lowering the amount of memory it uses.
This is how you can configure ltrace to take less memory: Configure ltrace Scale Factors on ASR9K Route Processors and Line Cards for Efficient Memory Management
If you did not find the solution to the problem in this document, provide this output:
0/x/CPUx is the specific linecard in the error. The Job ID (JID) of the process can be found with the command show processes
.
show tech-support
show hw-module fpd
show memory location 0/x/CPUx
show memory summary location all
show watchdog memory-state location all
show watchdog trace location all
show processes memory location all
show shmwin all header location 0/x/CPUx
show shmwin all bands location 0/x/CPUx
show shmwin all banks location 0/x/CPUx
show shmwin all list all location 0/x/CPUx
show shmwin all malloc-stats location 0/x/CPUx
show shmwin all mutexlocation 0/x/CPUx
show shmwin all participants all-stats location 0/x/CPUx
show shmwin all pool all-pools location
show shmwin trace all location all
show memory <job id process> location 0/x/CPUx
Revision | Publish Date | Comments |
---|---|---|
1.0 |
01-Dec-2023 |
Initial Release |