Introduction
This document describes how to resolve the Viptela SD-WAN router Virtual Router Redundancy Protocol (VRRP) stuck in the Active-Active state.
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Basic knowledge of Meraki Solutions
- Basic knowledge of VRRP
Components Used
The information in this document is based on these software and hardware versions:
- vEdge 2000, Version 19.2.3
- MS250-48FP, Version MS 12.28
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Topology
Symptom 1. VRRP in Active-Active State
Both upstream gateway vEdge devices connected downwards to Meraki stack switches act as VRRP primary.
VE1# show vrrp
MASTER PREFIX
GROUP VRRP OMP ADVERTISEMENT DOWN LIST
VPN IF NAME ID VIRTUAL IP VIRTUAL MAC PRIORITY STATE STATE TIMER TIMER LAST STATE CHANGE TIME TRACK PREFIX LIST STATE
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11 10ge0/0.670 1 10.17.69.1 00:00:5e:00:01:01 110 master up 1 3 2021-10-12T02:16:49+00:00 Default_Route_Prefix_List resolved
VE2# show vrrp
MASTER PREFIX
GROUP VRRP OMP ADVERTISEMENT DOWN LIST
VPN IF NAME ID VIRTUAL IP VIRTUAL MAC PRIORITY STATE STATE TIMER TIMER LAST STATE CHANGE TIME TRACK PREFIX LIST STATE
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11 10ge0/0.670 1 10.17.69.1 00:00:5e:00:01:01 100 master up 1 3 2021-10-12T02:16:40+00:00 Default_Route_Prefix_List resolved
Symptom 2. Switch Alerted for BAD DNS
Switch 2 which connected to VE2 was alerted for “DNS is misconfigured” in the Meraki dashboard.
Symptom 3. APs Go in Repeater Mode
APs connected to Switch 2 went to repeater mode since the Switch doesn’t have gateway reachability.
Troubleshoot
- Verify the VRRP behavior from vEdges.
Collect the “tcpdump” from both vEdges and verify the VRRP packet status. In this case, noticed that VRRP packets receive and send by VE1. But no VRRP packets are received from VE1 to VE2. However, the same has been sent from VE1. Hence, you can confirm that there are no issues with gateway vEdges functionality.
From VE1:
10.17.69.3 > 224.0.0.18: vrrp 10.17.69.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 20, addrs: 10.17.69.1
08:57:12.744406 80:b7:09:32:e5:02 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: (tos 0xc0, ttl 255, id 6968, offset 0, flags [DF], proto VRRP (112), length 40)
10.17.69.2 > 224.0.0.18: vrrp 10.17.69.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 110, authtype none, intvl 1s, length 20, addrs: 10.17.69.1
08:57:13.708034 00:00:5e:00:01:01 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 56: (tos 0xc0, ttl 255, id 29924, offset 0, flags [DF], proto VRRP (112), length 40)
From VE2:
10.17.69.3 > 224.0.0.18: vrrp 10.17.69.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 20, addrs: 10.17.69.1
08:57:50.644532 80:b7:09:31:82:a2 > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 54: (tos 0xc0, ttl 255, id 31817, offset 0, flags [DF], proto VRRP (112), length 40)
No VRRP packet from VE1 (10.17.69.2), hence VE2 assumes VE1 is down and acts as VRRP primary.
- Verify the Meraki Stack Behavior.
Meraki dashboard indicates that AP4 and AP3 are in repeater mode which is connected to uplink switch2 which gets the alert for bad DNS.
To confirm the Stack status, open Meraki TAC as the stack communication massages is visible only to Meraki TAC. On verification, it is identified that intra-stack communication issues between the primary and secondary switches in the stack.
Meraki also confirmed this issue was caused by the VRRP packet from VE1 not reached to VE2 via Stack member switch1(primary) through stack member 2. This is a known issue in the 12.28 code.
Solution
- Reload all the member switches in the stack (temporary fix).
- Upgrade the Meraki Switch firmware to the latest stable build.