The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
This document is intended to present a structured troubleshooting methodology and useful tools to help identify and isolate Group Encrypted Transport VPN (GETVPN) problems and to provide possible solutions.
Cisco recommends that you have knowledge of these topics:
This document is not restricted to specific software and hardware versions.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.
As with most troubleshooting of complex technology problems, the key is to be able to isolate the problem to a specific feature, subsystem, or component. The GETVPN solution is comprised of a number of feature components, specifically:
It also provides an extensive set of troubleshooting tools in order to ease the troubleshoot process. It is important to understand which of these tools are available, and when they are appropriate for each troubleshooting task. When troubleshooting, it is always a good idea to start with the least intrusive methods so that the production environment is not negatively impacted. The key to this structured troubleshooting is to be able to break the problem down to either a control or data plane issue. You can do this if you follow the protocol or data flow and use the various tools presented here in order to checkpoint them.
This GETVPN topology and addressing scheme is used throughout the rest of this troubleshooting document.
crypto gdoi group G1
identity number 3333
server local
rekey authenmypubkeyrsa get
rekey transport unicast
sa ipsec 1
profile gdoi-p
match address ipv4ENCPOL
address ipv4 10.1.11.2
redundancy
local priority 10
peer address ipv4 10.1.12.2
crypto gdoi group G1
identity number 3333
server address ipv4 10.1.11.2
server address ipv4 10.1.12.2
!
crypto map gm_map 10 gdoi
set group G1
!
interface Serial1/0
crypto map gm_map
Note: The KS2 and GM2 configurations are not included here for brevity.
Before you begin to troubleshoot, ensure that you have prepared the logging facility as described here. Some best practices are also listed here:
service timestamps debug datetime msec
service timestamps log datetime msec
Router#terminal exec prompt timestamp
Control plane means all the protocol events that led up to the policy and Security Association (SA) creation on the GM so that they are ready to encrypt and decrypt data plane traffic. Some of the key checkpoints in the GETVPN control plane are:
These troubleshooting best practices are not GETVPN specific; they apply to almost any control plane debugging. It is critical to follow these best practices in order to ensure the most effective troubleshooting:
service timestamp debug datetime msec
service timestamp log datetime msec
terminal exec prompt timestamp
As a general rule, these are the command outputs you should collect for almost all GETVPN problems.
KS
show crypto gdoi
show crypto gdoi ks coop
show crypto gdoi ks members
show crypto gdoi ks rekey
show crypto gdoi ks policy
GM
show crypto eli
show crypto gdoi rekey sa
show crypto gdoi
show crypto gdoi gm
show crypto gdoi gm rekey
GETVPN provides an extensive set of syslog messages for significant protocol events and error conditions. The syslog should always be the first place to look when you perform GETVPN troubleshooting.
Syslog Messages | Explanation |
---|---|
COOP_CONFIG_MISMATCH |
The configuration between the primary key server and secondary key server is mismatched. |
COOP_KS_ELECTION |
The local key server has entered the election process in a group. |
COOP_KS_REACH |
The reachability between the configured cooperative key servers is restored. |
COOP_KS_TRANS_TO_PRI | The local key server transitioned to a primary role from being a secondary server in a group. |
COOP_KS_UNAUTH |
An authorized remote server tried to contact the local key server in a group, which could be considered a hostile event. |
COOP_KS_UNREACH |
The reachability between the configured cooperative key servers is lost, which could be considered a hostile event. |
KS_GM_REVOKED |
During rekey protocol, an unauthorized member tried to join a group, which could be considered a hostile event. |
KS_SEND_MCAST_REKEY |
Sending multicast rekey. |
KS_SEND_UNICAST_REKEY |
Sending unicast rekey. |
KS_UNAUTHORIZED |
During GDOI registration protocol, an unauthorized member tried to join a group, which could be considered a hostile event. |
UNAUTHORIZED_IPADDR |
The registration request was dropped because the requesting device was not authorized to join the group. |
Syslog Messages | Explanation |
---|---|
GM_CLEAR_REGISTER |
The clear crypto gdoi command has been executed by the local group member. |
GM_CM_ATTACH | A crypto map has been attached for the local group member. |
GM_CM_DETACH | A crypto map has been detached for the local group member.& |
GM_RE_REGISTER | IPsec SA created for one group might have been expired or cleared. Need to reregister to the key server. |
GM_RECV_REKEY | Rekey received. |
GM_REGS_COMPL | Registration complete. |
GM_REKEY_TRANS_2_MULTI | Group member has transitioned from using a unicast rekey mechanism to using a multicast mechanism. |
GM_REKEY_TRANS_2_UNI | Group member has transitioned from using a multicast rekey mechanism to using a unicast mechanism. |
PSEUDO_TIME_LARGE | A group member has received a pseudotime with a value that is largely different from its own pseudotime. |
REPLAY_FAILED |
A group member or key server has failed an anti-replay check. |
Note: The messages highlighted in red are the most common or significant messages seen in a GETVPN environment.
GETVPN debugs are divided:
F340.06.15-2900-18#debug cry gdoi ?
all-features All features in GDOI
condition GDOI Conditional Debugging
gm Group Member
ks Key Server
GM1#debug cry gdoi gm ?
all-features All Group Member features
infrastructure GM Infrastructure
registration GM messages related to registration
rekey GM messagese related to Re-Key
replay Anti Replay
GM1#debug cry gdoi gm all-features ?
all-levels All levels
detail Detail level
error Error level
event Event level
packet Packet level
terse Terse level
Debug Level | What You Will Get |
---|---|
Error | Error conditions |
Terse | Important messages to the user and protocol issues |
Event | State transitions and events such as send and receive rekeys |
Detail | Most detailed debug message information |
Packet | Includes dump of detailed packet information |
All | All of the above |
In Cisco IOS® Version 15.1(3)T and later, GDOI conditional debugging was added in order to help troubleshoot GETVPN in a large-scale environment. So all Internet Security Association and Key Management Protocol (ISAKMP) and GDOI debugs can now be triggered with a conditional filter based on the group or peer IP address. For most GETVPN problems, it is good to enable both ISAKMP and GDOI debugs with the appropriate conditional filter, since GDOI debugs only show GDOI-specific operations. In order to use ISAKMP and GDOI conditional debugs, complete these two simple steps:
For example:
KS1# debug crypto gdoi condition peer add ipv4 10.1.20.2
% GDOI Debug Condition added.
KS1#
KS1# show crypto gdoi debug-condition
GDOI Conditional Filters:
Peer Address 10.1.20.2
Unmatched NOT set
KS1#debug crypto gdoi ks registration all-levels
GDOI Key Server Registration Debug level: (Packet, Detail, Event, Terse, Error)
Note: With both ISAKMP and GDOI conditional debugs, in order to catch debug messages that might not have the conditional filter information, for example the IP address in the debug path, the unmatched flag can be enabled. However, this must be used with caution because it can produce a large amount of debug information.
This was added in Version 15.1(3)T. Event tracing offers light-weight, always-on tracing for significant GDOI events and errors. There is also exit-path tracing with traceback enabled for exception conditions. Event traces can provide more GETVPN event history information than traditional syslogs.
GDOI event traces are enabled by default and can be retrieved from the trace buffer with the show monitor even-trace command.
GM1#show monitor event-trace gdoi ?
all Show all the traces in current buffer
back Show trace from this far back in the past
clock Show trace from a specific clock time/date
coop GDOI COOP Event Traces
exit GDOI Exit Traces
from-boot Show trace from this many seconds after booting
infra GDOI INFRA Event Traces
latest Show latest trace events since last display
merged Show entries in all event traces sorted by time
registration GDOI Registration event Traces
rekey GDOI Rekey event Traces
GM1#show monitor event-trace gdoi rekey all
*Nov 6 15:55:16.117: GDOI_REKEY_EVENT: ACK_SENT: From 10.1.12.2 to 10.1.13.2
with seq no 1 for the group G1
*Nov 6 15:55:16.117: GDOI_REKEY_EVENT: REKEY_RCVD: From 10.1.12.2 to 10.1.13.2
with seq no 1 for the group G1
*Nov 6 16:11:01.125: GDOI_REKEY_EVENT: ACK_SENT: From 10.1.12.2 to 10.1.13.2
with seq no 1 for the group G1
*Nov 6 16:11:01.125: GDOI_REKEY_EVENT: REKEY_RCVD: From 10.1.12.2 to 10.1.13.2
with seq no 1 for the group G1
The exit path trace provides detailed information about exit path, that is exception and error conditions, with the traceback option enabled by default. The tracebacks can then be used in order to decode the exact code sequence that has led to the exit path condition. Use the detail option in order to retrieve the tracebacks from the trace buffer:
GM1#show monitor event-trace gdoi exit all detail
*Nov 6 15:15:25.611: NULL_VALUE_FOUND:Invalid GROUP Name
-Traceback= 0xCA51318z 0xCA1F4DBz 0xC9B2707z 0xCA1ED4Ez 0x97EB018z
0x97EA960z 0x97E8D62z 0x97F3706z 0x97F3361z 0xA02684Ez
*Nov 6 15:15:25.611: MAP_NOT_APPLIED_IN_ANY_INTERFACE:
-Traceback= 0xCA51318z 0xCA46718z 0xCA1EF79z 0x97EB018z 0x97EA960z
0x97E8D62z 0x97F3706z 0x97F3361z 0xA02684Ez 0xA01FD52z
*Nov 6 15:15:25.650: NULL_VALUE_FOUND:NULL Parameters passed idb or ipaddress
when idb ipaddress is changed
-Traceback= 0xCA51318z 0xCA22430z 0xA09A8DCz 0xA09D8F6z 0xA0F280Fz
0xBA1D1F4z 0xBA1CACCz 0xBA1C881z 0xBA1C5BBz 0xA0F494Az
The default trace buffer size is 512 entries, and this might not be enough if the problem is intermittent. In order to increase this default trace entry size, the event trace configuration parameters can be changed like shown here:
GM1#show monitor event-trace gdoi rekey parameters
Trace has 512 entries
Stacktrace is disabled by default
GM1#
GM1#config t
Enter configuration commands, one per line. End with CNTL/Z.
GM1(config)#monitor event-trace gdoi rekey size ?
<1-1000000> Number of entries in trace
Here are some of the common control plane issues for GETVPN. To re-iterate, the Control Plane is defined as all of the GETVPN feature components required in order to enable dataplane encryption and decryption on the GMs. At a high level, this requires successful GM registration, security policy and SA download/install, and subsequent KEK/TEK rekey.
In order to check and verify that the KS has successfully created the security policy and the associated KEK/TEK, enter:
KS1#show crypto gdoi ks policy
Key Server Policy:
For group G1 (handle: 2147483650) server 10.1.11.2 (handle: 2147483650):
For group G1 (handle: 2147483650) server 10.1.12.2 (handle: 2147483651):
# of teks : 1 Seq num : 10
KEK POLICY (transport type : Unicast)
spi : 0x18864836BA888BCD1126671EEAFEB4C7
management alg : disabled encrypt alg : 3DES
crypto iv length : 8 key size : 24
orig life(sec): 1200 remaining life(sec): 528
sig hash algorithm : enabled sig key length : 162
sig size : 128
sig key name : key1
TEK POLICY (encaps : ENCAPS_TUNNEL)
spi : 0x91E3985A
access-list : ENCPOL
transform : esp-null esp-sha-hmac
alg key size : 0 sig key size : 20
orig life(sec) : 900 remaining life(sec) : 796
tek life(sec) : 2203 elapsed time(sec) : 1407
override life (sec): 0 antireplay window size: 4
Replay Value 442843.29 secs
One common problem with the KS policy setup is when there are different policies configured between the primary and secondary KSs. This can result in unpredictable KS behavior and this error will be reported:
%GDOI-3-COOP_CONFIG_MISMATCH: WARNNING: replay method configuration between
Primary KS and Secondary KS are mismatched
Currently there is no automatic configuration sync between primary and secondary KSs, so these must be manually rectified.
Because COOP is a critical (and almost always mandatory) configuration for GETVPN, it is key to make sure COOP works correctly and the COOP KS roles are correct:
KS1#show crypto gdoi ks coop
Crypto Gdoi Group Name :G1
Group handle: 2147483650, Local Key Server handle: 2147483650
Local Address: 10.1.11.2
Local Priority: 200
Local KS Role: Primary , Local KS Status: Alive
Local KS version: 1.0.4
Primary Timers:
Primary Refresh Policy Time: 20
Remaining Time: 10
Antireplay Sequence Number: 40
Peer Sessions:
Session 1:
Server handle: 2147483651
Peer Address: 10.1.12.2
Peer Version: 1.0.4
Peer Priority: 100
Peer KS Role: Secondary , Peer KS Status: Alive
Antireplay Sequence Number: 0
IKE status: Established
Counters:
Ann msgs sent: 31
Ann msgs sent with reply request: 2
Ann msgs recv: 64
Ann msgs recv with reply request: 1
Packet sent drops: 7
Packet Recv drops: 0
Total bytes sent: 20887
Total bytes recv: 40244
In a functional COOP setup, this protocol flow should be observed:
IKE Exchange > ANN with COOP priorities exchanged > COOP Election > ANN from primary to secondary KS (policy, GM database, and keys)
When COOP does not work correctly, or if there is a COOP split, such as multiple KSs become the primary KS, these debugs must be collected for troubleshooting:
debug crypto isakmp
debug crypto gdoi ks coop all-levels
show crypto isakmp sa
show crypto gdoi ks coop
Successful IKE exchange is required for GETVPN in order to secure the control channel for the subsequent policy and SA download. At the end of the successful IKE exchange, a GDOI_REKEY sa is created.
In versions earlier than Cisco IOS 15.4(1)T, the GDOI_REKEY can be shown with the show crypto isakmp sa command:
GM1#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst src state conn-id status
10.1.13.2 10.1.11.2 GDOI_REKEY 1075 ACTIVE
10.1.11.2 10.1.13.2 GDOI_IDLE 1074 ACTIVE
IPv6 Crypto ISAKMP SA
GM1#
In Cisco IOS 15.4(1)T and later, this GDOI_REKEY sa is shown with the show crypto gdoi rekey sa command:
GM1#show crypto gdoi rekey sa GETVPN REKEY SA dst src conn-id status 10.1.13.2 10.1.11.2 1114 ACTIVE
Note: Once the initial IKE exchange completes, subsequent policies and keys will be pushed from the KS to the GM with the use of the GDOI_REKEY SA. So there is no rekey for the GDOI_IDLE SA when they expire; they disappear when their lifetimes expire. However, there should always be GDOI_REKEY SA on the GM in order for it to receive rekeys.
The IKE exchange for GETVPN is no different from the IKE used in traditional point-to-point IPsec tunnels, so the troubleshooting method remains the same. These debugs must be collected in order to troubleshoot IKE authentication issues:
debug crypto isakmp
debug crypto isakmp error
debug crypto isakmp detail (hidden command, if detailed isakmp exchange information
is needed)
debug crypto isakmp packet (hidden command, if packet level isakmp information is needed)
Once IKE authentication succeeds, GM registers with the KS. These syslog messages are expected to be seen when this occurs correctly:
%GDOI-5-GM_REKEY_TRANS_2_UNI: Group G1 transitioned to Unicast Rekey.
%GDOI-5-SA_KEK_UPDATED: SA KEK was updated
%GDOI-5-SA_TEK_UPDATED: SA TEK was updated
%GDOI-5-GM_REGS_COMPL: Registration to KS 10.1.12.2 complete for group G1 using
address 10.1.13.2
%GDOI-5-GM_INSTALL_POLICIES_SUCCESS: SUCCESS: Installation of Reg/Rekey policies
from KS 10.1.12.2 for group G1 & gm identity 10.1.13.2
The policy and keys can be verified with this command:
GM1#show crypto gdoi
GROUP INFORMATION
Group Name : G1
Group Identity : 3333
Crypto Path : ipv4
Key Management Path : ipv4
Rekeys received : 1
IPSec SA Direction : Both
Group Server list : 10.1.11.2
10.1.12.2
Group member : 10.1.13.2 vrf: None
Version : 1.0.4
Registration status : Registered
Registered with : 10.1.12.2
Re-registers in : 139 sec
Succeeded registration: 1
Attempted registration: 1
Last rekey from : 10.1.11.2
Last rekey seq num : 0
Unicast rekey received: 1
Rekey ACKs sent : 1
Rekey Rcvd(hh:mm:ss) : 00:05:20
allowable rekey cipher: any
allowable rekey hash : any
allowable transformtag: any ESP
Rekeys cumulative
Total received : 1
After latest register : 1
Rekey Acks sents : 1
ACL Downloaded From KS 10.1.11.2:
access-list deny icmp any any
access-list deny eigrp any any
access-list deny ip any 224.0.0.0 0.255.255.255
access-list deny ip 224.0.0.0 0.255.255.255 any
access-list deny udp any port = 848 any port = 848
access-list permit ip any any
KEK POLICY:
Rekey Transport Type : Unicast
Lifetime (secs) : 878
Encrypt Algorithm : 3DES
Key Size : 192
Sig Hash Algorithm : HMAC_AUTH_SHA
Sig Key Length (bits) : 1024
TEK POLICY for the current KS-Policy ACEs Downloaded:
Serial1/0:
IPsec SA:
spi: 0x8BF147EF(2347845615)
transform: esp-3des esp-sha-hmac
sa timing:remaining key lifetime (sec): (200)
Anti-Replay(Time Based) : 4 sec interval
GM1#
GM1#
GM1#show crypto ipsec sa
interface: Serial1/0
Crypto map tag: gm1map, local addr 10.1.13.2
protected vrf: (none)
local ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
remote ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
current_peer 0.0.0.0 port 848
PERMIT, flags={}
#pkts encaps: 0, #pkts encrypt: 0, #pkts digest: 0
#pkts decaps: 0, #pkts decrypt: 0, #pkts verify: 0
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#send errors 0, #recv errors 0
local crypto endpt.: 10.1.13.2, remote crypto endpt.: 0.0.0.0
path mtu 1500, ip mtu 1500, ip mtu idb Serial1/0
current outbound spi: 0x0(0)
PFS (Y/N): N, DH group: none
local crypto endpt.: 10.1.13.2, remote crypto endpt.: 0.0.0.0
path mtu 1500, ip mtu 1500, ip mtu idb Serial1/0
current outbound spi: 0x8BF147EF(2347845615)
PFS (Y/N): N, DH group: none
inbound esp sas:
spi: 0x8BF147EF(2347845615)
transform: esp-3des esp-sha-hmac ,
in use settings ={Tunnel, }
conn id: 1, flow_id: SW:1, sibling_flags 80000040, crypto map: gm1map
sa timing: remaining key lifetime (sec): (192)
Kilobyte Volume Rekey has been disabled
IV size: 8 bytes
replay detection support: Y replay window size: 4
Status: ACTIVE(ACTIVE)
inbound ah sas:
inbound pcp sas:
outbound esp sas:
spi: 0x8BF147EF(2347845615)
transform: esp-3des esp-sha-hmac ,
in use settings ={Tunnel, }
conn id: 2, flow_id: SW:2, sibling_flags 80000040, crypto map: gm1map
sa timing: remaining key lifetime (sec): (192)
Kilobyte Volume Rekey has been disabled
IV size: 8 bytes
replay detection support: Y replay window size: 4
Status: ACTIVE(ACTIVE)
outbound ah sas:
outbound pcp sas:
GM1#
Note: With GETVPN, inbound and outbound SAs use the same SPI.
With GETVPN registration and policy install type of problems, these debugs are needed in order to troubleshoot:
debug crypto isakmp (KS and GM)
debug crypto gdoi ks registration all-levels (KS)
debug crypto gdoi gm registration all-level (GM)
debug crypto engine (GM only)
show crypto eli detail (multiple iterations on GM)
Note: Additional debugs may be required depending on the outcome of these outputs.
Since GETVPN registration typically occurs immediately after the GM reload, this EEM script might be helpful in order to collect these debugs:
event manager applet debug
event syslog pattern "RESTART"
action 1.0 cli command "enable"
action 2.0 cli command "debug crypto gdoi all all"
Once the GMs are registered to the KS and the GETVPN network is properly set up, the primary KS is responsible for sending rekey messages to all the GMs registered to it. The rekey messages are used in order to synchronize all the policies, keys, and pseudotimes on the GMs. The rekey messages can be sent through a unicast or a multicast method.
This syslog message is seen on the KS when the rekey message is sent:
%GDOI-5-KS_SEND_UNICAST_REKEY: Sending Unicast Rekey for group G1 from address
10.1.11.2 with seq # 11
On the GMs, this is the syslog that is seen when it receives the rekey:
%GDOI-5-GM_RECV_REKEY: Received Rekey for group G1 from 10.1.11.2 to 10.1.20.2
with seq # 11
Rekey functionality requires the presence of RSA keys on the KS. The KS provides the public key of the RSA key pair to the GM through this secure channel during registration. The KS then signs the GDOI messages sent to the GM with the private RSA key in the GDOI SIG payload. The GM receives the GDOI messages and uses the public RSA key in order to verify the message. The messages between the KS and the GM are encrypted with the KEK, which is also distributed to the GM during registration. Once the registration is complete, subsequent rekeys are encrypted with the KEK and signed with the private RSA key.
If the RSA key is no present on the KS during GM registration, this message appears on the syslog:
%GDOI-1-KS_NO_RSA_KEYS: RSA Key - get : Not found, Required for group G1
When the keys are not present on the KS, the GM registers for the first time, but the next rekey fails from the KS. Eventually the existing keys on the GM expire, and it reregisters again.
%GDOI-4-GM_RE_REGISTER: The IPSec SA created for group G1 may have expired/been
cleared, or didn't go through. Re-register to KS.
Since the RSA key pair is used in order to sign the rekey messages, they MUST be the same between the primary and all secondary KSs. This ensures that during a primary KS failure, the rekeys sent by a secondary KS (the new primary KS) can still be properly validated by the GMs. When it generates the RSA key pair on the primary KS, the key pair must be created with the exportable option so that they can be exported to all the secondary KSs in order to meet this requirement.
KEK/TEK rekey failure is one of the most common GETVPN problems encountered in customer deployments. Troubleshooting rekey issues should follow the rekey steps as outlined here:
KS1#show crypto gdoi ks rekey
Group G1 (Unicast)
Number of Rekeys sent : 341
Number of Rekeys retransmitted : 0
KEK rekey lifetime (sec) : 1200
Remaining lifetime (sec) : 894
Retransmit period : 10
Number of retransmissions : 5
IPSec SA 1 lifetime (sec) : 900
Remaining lifetime (sec) : 405
KS1#show crypto gdoi ks members
Group Member Information :
Number of rekeys sent for group G1 : 346
Group Member ID : 10.1.14.2 GM Version: 1.0.4
Group ID : 3333
Group Name : G1
Key Server ID : 10.1.11.2
Rekeys sent : 346
Rekeys retries : 0
Rekey Acks Rcvd : 346
Rekey Acks missed : 0
Sent seq num : 2 1 2 1
Rcvd seq num : 2 1 2 1
Group Member ID : 10.1.13.2 GM Version: 1.0.4
Group ID : 3333
Group Name : G1
Key Server ID : 10.1.12.2
Rekeys sent : 340
Rekeys retries : 0
Rekey Acks Rcvd : 340
Rekey Acks missed : 0
Sent seq num : 2 1 2 1
Rcvd seq num : 2 1 2 1
GM1#show crypto gdoi gm rekey
Group G1 (Unicast)
Number of Rekeys received (cumulative) : 340
Number of Rekeys received after registration : 340
Number of Rekey Acks sent : 340
Multicast rekey is different from unicast rekey in these aspects:
The most commonly seen multicast rekey problem is when the rekey is not received on the GM. There could be a number of possible causes for this, such as:
The first step to troubleshoot an issue with multicast rekey is to see if rekey works when switched from the multicast to the unicast method.
Once you identify that the issue is specific to multicast rekey, verify that KS sends the rekey to the multicast address specified.
%GDOI-5-KS_SEND_MCAST_REKEY: Sending Multicast Rekey for group G1 from address
10.1.11.2 to 226.1.1.1 with seq # 6
Test multicast connectivity between the KS and GM with an Internet Control Message Protocol (ICMP) request to the multicast address. All the GMs that are part of the multicast group should reply to the ping. Ensure that ICMP is excluded from the KS encryption policy for this test.
KS1#ping 226.1.1.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 226.1.1.1, timeout is 2 seconds:
Reply to request 0 from 10.1.21.2, 44 ms
If the multicast ping test fails, then multicast troubleshooting must be performed, which is outside of the scope of this document.
When customers upgrade their GM to a new Cisco IOS version, they might experience KEK rekey failures with this message observed in the syslog:
%GDOI-3-GDOI_REKEY_SEQ_FAILURE: Failed to process rekey seq # 1 in seq payload for
group G1, last seq # 11
%GDOI-3-GDOI_REKEY_FAILURE: Processing of REKEY payloads failed on GM 10.1.13.2 in the group G1,
with peer at 10.1.11.2
%CRYPTO-6-IKMP_MODE_FAILURE: Processing of GDOI mode failed with peer at 10.1.11.2
This behavior is caused by an interoperability issue introduced with the anti-replay check that is added for control plane messages. Specifically, a KS that runs the older code will reset the KEK rekey sequence number to 1, and this will be dropped by the GM that runs the new code when it interprets that as a replayed rekey packet. For more details, see Cisco bug ID CSCta05809 (GETVPN: GETVPN control-plane sensible to replay), and GETVPN Configuration Restrictions.
With GETVPN, the Control Plane messages can carry time-sensitive information in order to provide the time-based anti-replay check service. Therefore, these messages require anti-replay protection themselves in order to ensure time accruracy. These messages are:
As part of this anti-replay protection implementation, sequence number checks were added in order to protect replayed messages, as well as a pseudotime check when TBAR is enabled.
Solution
In order to resolve this issue, both the GM and KS must be upgraded to Cisco IOS versions after the Control Plane replay check feature. With the new Cisco IOS code, KS does not reset the sequence number back to 1 for a KEK rekey, but instead it continues to use the current sequence number and only resets the sequence number for TEK rekeys.
These Cisco IOS versions have the Replay Check features:
For other Control Plane Replay failures, collect this information and make sure the times are synched between the KS and GM.
With GETVPN, Control Plane Packet fragmentation is a common issue, and it can manifest itself in one of these two scenarios when the Control Plane packets are large enough that they will require IP fragmentation:
COOP Announcement Packets
The COOP Announcement packets carry the GM database information, and thus can grow big in a large GETVPN deployment. From past experience, a GETVPN network that consists of 1500+ GMs will produce Announcement packets larger than 18024 bytes, which is the Cisco IOS default Huge buffer size. When this happens, the KS fails to allocate a buffer large enough to transmit the ANN packets with this error:
%SYS-2-GETBUF: Bad getbuffer, bytes= 18872 -Process= "Crypto IKMP", ipl= 0, pid= 183
In order to rectify this condition, this buffer tuning is recommended:
buffers huge permanent 10
buffers huge size 65535
GETVPN rekey packets can also exceed the typical 1500 IP Maximum Transition Unit (MTU) size when the encryption policy is large, such as a policy that consists of 8+ lines of Access Control Entries (ACEs) in the encryption ACL.
In both of the previous scenarios, GETVPN must be able to properly transmit and receive the fragmented UDP packets in order for COOP or GDOI rekey to work properly. IP fragmentation can be a problem in some network environments. For example, a network that consists of Equal Cost Multi Path (ECMP) forwarding plane, and some devices in the forwarding plane require virtual reassembly of the fragmented IP packets, such as Virtual Fragmentation Reassembly (VFR).
In order to identify the problem, check the reassembly errors on the device where it is suspected that the fragmented UDP 848 packets are not properly received:
KS1#show ip traffic | section Frags
Frags: 10 reassembled, 3 timeouts, 0 couldn't reassemble
0 fragmented, 0 fragments, 0 couldn't fragment
If the reassembly timeouts continue to increment, use the debug ip error command in order to confirm if the drop is part of the rekey/COOP packet flow. Once confirmed, normal IP forwarding troubleshooting should be performed in order to isolate the exact device in the forwarding plane that might have dropped the packets. Some commonly used tools include:
Various interoperability issues have been found with GETVPN over the years, and it is critical to notice the Cisco IOS release versions between KS and GM and amongst the KSs for interoperability issues.
Other well known GETVPN interoperability issues are:
This Cisco IOS upgrade procedure should be followed when a Cisco IOS code upgrade needs to be performed in a GETVPN environment:
Compared to Control Plane problems, GETVPN data plane issues are problems where the GM has the policy and keys to perform dataplane encryption and decryption, but for some reason the end-to-end traffic flow does not work. Most of the dataplane issues for GETVPN relate to generic IPsec forwarding, and are not GETVPN specific. So most of the troubleshooting approach described here applies to generic IPsec dataplane issues as well.
With encryption problems (both Group-based or pair-wise tunnels), it is important to troubleshoot the problem and isolate the problem to a particular part of the datapath. Specifically, the troubleshooting approach described here is intended to help you answer these questions:
IPsec dataplane troubleshooting is very different from that for the Control Plane. With the dataplane, there are usually no debugs that you can run, or at least run safely in a production environment. So the troubleshooting relies heavily on different counters and traffic statistics that can help trace the packet along a forwarding path. The idea is to be able to develop a set of checkpoints in order to help isolate where packets might be dropped as shown here:
Here are some data plane debugging tools:
The checkpoints in the datapath in the previous image can be validated with these tools:
Encrypting GM
Decrypting GM
The return path follows the same traffic flow. The next sections have some examples of these dataplane tools in use.
The encryption/decryption counters on a router are based on an IPsec flow. Unfortunately this does not work well with GETVPN since GETVPN typically deploys a "permit ip any any" encryption policy that encrypts everything. So if the problem only happens for some of the flows and not all, these counters can be somewhat difficult to use in order to correctly assess if the packets are encrypted or decrypted when there is enough significant background traffic that works.
GM1#show crypto ipsec sa | in encrypt|decrypt
#pkts encaps: 100, #pkts encrypt: 100, #pkts digest: 100
#pkts decaps: 100, #pkts decrypt: 100, #pkts verify: 100
Netflow can be used in order to monitor both the ingress and egress traffic on both GMs. Note with the GETVPN permit ip any any policy, the enrypted traffic will be aggregate and does not provide the per-flow information. Per-flow information will then need to be collected with the DSCP/precedence marking described later.
In this example, the netflow for a 100 count ping from a host behind GM1 to a host behind GM2 is shown at the various checkpoints.
Encrypting GM
Netflow configuration:
interface Ethernet0/0
description LAN
ip address 192.168.13.1 255.255.255.0
ip flow ingress
ip pim sparse-dense-mode
!
interface Serial1/0
description WAN interface
ip address 10.1.13.2 255.255.255.252
ip flow egress
ip pim sparse-dense-mode
crypto map gm1map
Netflow output:
GM1#show ip cache flow | be SrcIf
SrcIf SrcIPaddress DstIf DstIPaddress Pr SrcP DstP Pkts
Et0/0 192.168.13.2 Se1/0* 192.168.14.2 32 8DE1 6523 100
Et0/0 192.168.13.2 Se1/0 192.168.14.2 01 0000 0800 100
GM1#
Note: In the previous output, * denotes egress traffic. The first line shows egress encrypted traffic (with protocol 0x32 = ESP) out of the WAN inteface, and the second line ingress ICMP traffic hitting the LAN interface.
Decrypting GM
Configuration:
interface Ethernet0/0
description LAN interface
ip address 192.168.14.1 255.255.255.0
ip flow egress
ip pim sparse-dense-mode
!
interface Serial1/0
description WAN interface
ip address 10.1.14.2 255.255.255.252
ip flow ingress
ip pim sparse-dense-mode
crypto map gm1map
Netflow output:
GM2#show ip cache flow | be SrcIf
SrcIf SrcIPaddress DstIf DstIPaddress Pr SrcP DstP Pkts
Se1/0 192.168.13.2 Et0/0 192.168.14.2 32 8DE1 6523 100
Se1/0 192.168.13.2 Et0/0* 192.168.14.2 01 0000 0800 100
GM2#
The challenge with troubleshooting an encryption problem is that once the packet is encrypted you lose visibility into the payload, which is what encryption is supposed to do, and that makes it difficult to trace the packet for a particular IP flow. There are two ways to address this limitation when it comes to troubleshooting an IPsec problem:
ESP-NULL require changes on both tunnel end points and often is not allowed based on the customer security policy. Therefore, Cisco typically recommends the use of DSCP/precedence marking instead.
ToS (hex) | ToS(Decimal) | IP Precedence | DSCP | Binary |
---|---|---|---|---|
0xE0 | 224 | 7 Network Control | 56 CS7 | 11100000 |
0xC0 | 192 | 6 Internetwork Control | 48 CS6 | 11000000 |
0xB8 | 184 | 5 Critical | 46 EF | 10111000 |
0xA0 | 160 | 40 CS5 | 10100000 | |
0x88 | 136 | 4 Flash Override | 34 AF41 | 10001000 |
0x80 | 128 | 32 CS4 | 10000000 | |
0x68 | 104 | 3 Flash | 26 AF31 | 01101000 |
0x60 | 96 | 24 CS3 | 01100000 | |
0x48 | 72 | 2 Immediate | 18 AF21 | 01001000 |
0x40 | 64 | 16 CS2 | 01000000 | |
0x20 | 32 | 1 Priority | 8 CS1 | 00100000 |
0x00 | 0 | 0 Routine | 0 Dflt | 00000000 |
These methods are typically used in order to mark packets with the specific DSCP/Precedence markings.
PBR
interface Ethernet1/0
ip policy route-map mark
!
access-list 150 permit ip host 172.16.1.2 host 172.16.254.2
!
route-map mark permit 10
match ip address 150
set ip precedence flash-override
MQC
class-map match-all my_flow
match access-group 150
!
policy-map marking
class my_flow
set ip precedence 4
!
interface Ethernet1/0
service-policy input marking
Router Ping
GM1-host#ping ip
Target IP address: 192.168.14.2
Repeat count [5]:
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]: 136
...
<snip>
Note: It is always a good idea to monitor the normal traffic flow and DSCP/precedence profile before you apply marking so that the marked traffic flow is unique.
IP Precedence Accounting
interface Ethernet0/0
ip address 192.168.1.2 255.255.255.0
ip accounting precedence input
middle_router#show interface precedence
Ethernet0/0
Input
Precedence 4: 100 packets, 17400 bytes
Interface ACL
middle_router#show access-list 144
Extended IP access list 144
10 permit ip any any precedence routine
20 permit ip any any precedence priority
30 permit ip any any precedence immediate
40 permit ip any any precedence flash
50 permit ip any any precedence flash-override (100 matches)
60 permit ip any any precedence critical
70 permit ip any any precedence internet (1 match)
80 permit ip any any precedence network
Embedded Packet Capture (EPC) is a useful tool to capture packets at the interface level in order to identify if a packet has reached a specific device. Remember that EPC works well for clear text traffic, but it can be a challenge when the captured packets are encrypted. Therefore techniques like DSCP/precedence marking discussed previously or other IP characters, such as the length of the IP packet, have to be used together with EPC in order to make the troubleshooting more effective.
This is a useful feature to trace the feature forwarding path on all platforms that run Cisco IOS-XE, such as CSR1000v, ASR1000, and ISR4451-X.
Troubleshooting the IPsec dataplane for GETVPN is mostly no different from troubleshooting traditional point-to-point IPsec dataplane issues, with two exceptions due to these unique dataplane properties of GETVPN.
In a GETVPN network, TBAR failures can often be difficult to troubleshoot since there are no longer pair-wise tunnels. In order to troubleshoot GETVPN TBAR failures, complete these steps:
%CRYPTO-4-PKT_REPLAY_ERR: decrypt: replay check failed
connection id=13, sequence number=1
%GDOI-4-TIMEBASED_REPLAY_FAILED: An anti replay check has failed in group G1:
my_pseudotime = 620051.84 secs, peer_pseudotime = 619767.09 secs, replay_window =
4 (sec), src_ip = 192.168.13.2, dst_ip = 192.168.14.2
GM2#show crypto gdoi gm replay
Anti-replay Information For Group G1:
Timebased Replay:
Replay Value : 621388.66 secs
Input Packets : 0 Output Packets : 0
Input Error Packets : 2 Output Error Packets : 0
Time Sync Error : 0 Max time delta : 0.00 secs
TBAR Error History (sampled at 10pak/min):
19:29:32.081 EST Wed Nov 13 2013: src=192.168.13.2; my_pst=620051.84 secs;
peer_pst=619767.09 secs; win=4
Note: The enhancements mentioned previously have since been implemented in Cisco IOS-XE by Cisco bug ID CSCun49335 and in Cisco IOS by Cisco bug ID CSCub91811.
GDOI:GM REPLAY:DET:(0):my_pseudotime is 621602.30 (secs), peer_pseudotime is 621561.14
(secs), replay_window is 4 (secs), src_addr = 192.168.14.2, dest_addr = 192.168.13.2
GM1
GM1#show crypto gdoi gm replay
Load for five secs: 0%/0%; one minute: 0%; five minutes: 0%
Time source is hardware calendar, *21:06:26.469 EST Wed Nov 13 2013
Anti-replay Information For Group G1:
Timebased Replay:
Replay Value : 625866.26 secs
Input Packets : 0 Output Packets : 0
Input Error Packets : 0 Output Error Packets : 0
Time Sync Error : 0 Max time delta : 0.00 secs
GM2
GM2#show crypto gdoi gm replay
Load for five secs: 0%/0%; one minute: 0%; five minutes: 0%
Time source is hardware calendar, *21:06:26.743 EST Wed Nov 13 2013
Anti-replay Information For Group G1:
Timebased Replay:
Replay Value : 625866.51 secs
Input Packets : 4 Output Packets : 4
Input Error Packets : 2 Output Error Packets : 0
Time Sync Error : 0 Max time delta : 0.00 secs
In the previous example, if the pseudotime (as indicated by Replay Value) is significantly different between the GMs when the outputs are captured with the same reference time, then the problem can be attributed to clock skew.
Note: On the Cisco Aggregated Services Router 1000 Series platform, due to the platform architecture, the datapath on the Quantum Flow Processor (QFP) actually refers to the wall clock for counting pseudotime ticks. This has created problems with TBAR when the wall clock time changes due to NTP sync. This problem is documented with Cisco bug ID CSCum37911.
With GETVPN, Path MTU Discovery (PMTUD) does not work between the encrypting and decrypting GMs, and large packets with the Don't Fragment (DF) bit set can get blackholed. The reason that this does not work is due to GETVPN Header Preservation where the data source/destination addresses are preserved in the ESP encapsulating header. This is depicted in this image:
As the image shows, PMTUD breaks down with GETVPN with this flow:
In summary, PMTUD does not work with GETVPN today. In order to work around this issue, Cisco recommends these steps:
Most of the IPsec dataplane troubleshooting is like troubleshooting traditional point-to-point IPsec tunnels. One of the common issues is %CRYPTO-4-RECVD_PKT_MAC_ERR. See Syslog "%CRYPTO-4-RECVD_PKT_MAC_ERR:" Error Message with Ping Loss Over IPsec Tunnel Troubleshooting for more troubleshooting details.
This message can be generated when an IPsec packet is received that does not match an SPI in the SADB. See Cisco bug ID CSCtd47420 - GETVPN - CRYPTO-4-RECVD_PKT_NOT_IPSEC reported for pkt not matching flow. An example is:
%CRYPTO-4-RECVD_PKT_NOT_IPSEC: Rec'd packet not an IPSEC packet. (ip)
vrf/dest_addr= /192.168.14.2, src_addr= 192.168.13.2, prot= 50
This message should be %CRYPTO-4-RECVD_PKT_INV_SPI, which is what gets reported for traditional IPsec as well as on some hardware platforms such as ASR. This cosmetic issue was fixed by Cisco bug ID CSCup80547: Error in reporting CRYPTO-4-RECVD_PKT_NOT_IPSEC for ESP pak.
Note: These messages can sometimes appear due to another GETVPN bug CSCup34371: GETVPN GM stops decrytping traffic after TEK rekey.
In this case, the GM cannot decrypt GETVPN traffic, although it has a valid IPsec SA in the SADB (the SA being rekeyed). The problem disappears as soon as the SA expires and is removed from the SADB. This issue causes significant outage, because TEK rekey is performed in advance. For example, the outage can be 22 minutes in the case of a TEK lifetime of 7200 seconds. See the bug description for the exact condition that should be met in order to encounter this bug.
Platforms that run Cisco IOS-XE have platform-specific implementations, and often require platform-specific debugging for GETVPN issues. Here are a list of commands typically used in order to troubleshoot GETVPN on these platforms:
show crypto eli all
show platform software ipsec policy statistics
show platform software ipsec fp active inventory
show platform hardware qfp active feature ipsec spd all
show platform hardware qfp active statistics drop clear
show platform hardware qfp active feature ipsec data drop clear
show crypto ipsec sa
show crypto gdoi
show crypto ipsec internal
debug crypto ipsec
debug crypto ipsec error
debug crypto ipsec states
debug crypto ipsec message
debug crypto ipsec hw-req
debug crypto gdoi gm infra detail
debug crypto gdoi gm rekey detail
An ASR1000 GM might continue to register to the Key Server if the crypto engine does not support the IPsec policy or algorithm received. For example, on Nitrox based ASR platforms (such as ASR1002), Suite-B or SHA2 policies are not supported and this can cause the continuous re-registration symptoms.
On the ASR1000 platform, the Cisco bug ID CSCum37911 fix introduced a limitation on this platform where TBAR time of less than 20 seconds is not supported. See Restrictions for GETVPN on IOS-XE.
This enhancement bug has been opened to lift this restriction, Cisco bug ID CSCuq25476 - ASR1k needs to support a GETVPN TBAR window size of less than 20 seconds.
Update: This restriction has since been lifted with the fix for Cisco bug ID CSCur57558 , and it is no longer a limitation in XE3.10.5, XE3.13.2 and later code.
Also note, for a GM that runs on Cisco IOS-XE platforms (ASR1k or ISR4k), it is highly recommended that the device runs a version with the fix for this issue if TBAR is enabled; Cisco bug ID CSCut91647 - GETVPN on IOS-XE: GM incorrectly drops packets due to TBAR failure.
A regression was found on the ISR4x00 platform where the deny policies are ignored. For details, see Cisco bug ID CSCut14355 - GETVPN - ISR4300 GM ignores deny policy.