Introduction
This document describes ACI Fault F3696 coop-ep-dampening and remediation steps.
Background Information
This specific fault is triggered when EPs go into a "freeze" state due to the COOP Endpoint Dampening Feature. EPs are put into a "freeze" state when they are found to have consistent movement behavior, causing multiple updates to COOP in a short time interval.
COOP EP dampening is a COOP Process protection mechanism which also aids in the identification of why EP(s).
COOP EP dampening is introduced, and enabled by default, in Cisco Application Policy Infrastructure Controller (APIC) release 4.2(3).
code : F3696
descr : 1 EPs are in freeze state.
cause : coop-ep-dampening
Note: The nature of this fault and the associated burndown timers could cause the fault to trigger, then get cleared on its own.
Intersight Connected ACI Fabrics
This fault is actively monitored as part of Proactive ACI Engagements.
If you have an Intersight connected ACI fabric, a Service Request was generated on your behalf to indicate that instances of this fault were found within your Intersight Connected ACI fabric.
COOP Endpoint Dampening
The Council of Oracle Protocol (COOP) is used to communicate Endpoint (EP) Mapping Information (location and identity) to the spine proxy. Leaf switches forward endpoint address information to the spine switches via COOP, which then ensures all spine nodes maintain a consistent copy of endpoint address and location information.
Consistent EP movement, such as across interfaces or devices, causes constant endpoint updates towards spines to ensure the COOP database is accurate. An aggressive volume of updates due to ongoing Endpoint movement can result in COOP resource over-utilization, preventing the processing of valid endpoint updates.
Rogue Endpoint Detection, a feature of the leaf switch, prevents aggressive EP updates from reaching the spine switch as long as the moves are scoped to a single leaf. There are other EP movement scenarios, such as cross-leaf EP Movement, that requires a different protection mechanism to protect COOP. This is where COOP Endpoint Dampening comes into play.
To relieve pressure on COOP in EP Movement situations, the spine switches ask all leaf switches to ignore updates from the flagged endpoints for a specified period. When this occurs, the dampening state of any such endpoint is "freeze," and a fault F3696 is generated.
More details on the penalty values and thresholds are mentioned in the config guide link, example 4.2 config guide link.
Refer to the version-specific configuration guide link for latest on this feature.
Note: The other aggressive EP Update Protection features, such as Rogue EP Control and EP Loop Protection, must be explicitly enabled. More details on these features is covered in the ACI Fabric Endpoint Learning White Paper.
Possible Causes for EP Freezes
The 2 typical scenarios seen to cause this behavior in the field are:
- A server with 2 separate leaf connections using Active-Active, instead of a single logical link (vPC) Configuration
- A loop on downstream network devices
Quick Start to Address Fault
- Identify which endpoint(s) went into the "frozen" state.
- (Optional) If Dataplane impact is noticed, clear the frozen EP for temporary impact resolution.
- Identify and understand why the EP(s) moved and whether or not this is expected, and required, in your network design.
- If not required, take action to address the underlying condition which caused the EP movement.
- If the movement in question is required and necessary for the network design, consider disabling COOP EP Dampening.
Note: COOP EP Dampening is a protection mechanism for the COOP Process. In general, it is preferred to take action which mitigates unnecessary EP movement where possible.
Detailed Steps to Address Fault
Identify Frozen Endpoints
Use this switch CLI procedure to view all dampened endpoints in a spine or leaf node.
- Log in to the spine or leaf switch CLI and enter command: switch# show coop internal info repo ep dampening
(Optional) Clear Frozen Endpoints
Using the GUI
When performed using the GUI, this clears all frozen EPs on the selected node. This operation must be executed on all spine switches as well as on the source leaf switch of the frozen endpoint.
- On the menu bar, click Fabric > Inventory.
- In the Navigation pane, expand the pod and the spine or leaf node.
- Right-click the node and choose Clear Dampened Endpoints.
- Click Yes to confirm the action.
Note: If the EP(s) in question are still in the endpoint table on the leaf switch, the endpoint is published to the spine switch COOP database. If not, the dampened endpoint is deleted from the spine switch COOP database after two minutes.
Via the Switch CLI
When performed via a switch CLI, this procedure only clears a single endpoint at a time. This operation must be executed on all spine switches and on the source leaf switch of the endpoint.
- Log in to the spine or leaf switch CLI and enter command: switch# clear coop internal info repo ep dampening key <bd_vnid> <mac>
Note: If the EP(s) in question are still in the endpoint table on the leaf switch, the endpoint is published to the spine switch COOP database. If not, the dampened endpoint is deleted from the spine switch COOP database after two minutes.
Disable COOP EP Dampening
In general, this is not recommended. However, If you have found that your network design requires the EP movement in question, COOP EP dampening can be disabled.
An HTTP POST to /api/policymgr/mo/.xml with disableEpDampening="true" disables COOP EP dampening.
COOP EP dampening can be re-enabled with the same request, but by setting disableEpDampening="false".
POST api/policymgr/mo/.xml
PAYLOAD:
<polUni>
<infraInfra>
<infraSetPol disableEpDampening="true"></infraSetPol>
</infraInfra>
</polUni>
Using APIC CLI
On the APIC CLI, the icurl command can facilitate the required HTTP POST.
Disable COOP EP dampening:
apic# icurl -X POST -d '<polUni><infraInfra><infraSetPol disableEpDampening="true"></infraSetPol></infraInfra></polUni>' http://localhost:7777/api/policymgr/mo/.xml
Validate that COOP EP dampening has been disabled:
apic# moquery -c infraSetPol
Total Objects shown: 1
# infra.SetPol
disableEpDampening : yes
dn : uni/infra/settings
Additional Details
COOP EP Dampening - DampFactor Customization
In 5.2.4d and later releases, the dampFactor can be modified to increase specific values associated with the COOP EP dampening feature.
Consider modifying the DampFactor for scenerios where a certain level of EP movement is expected outside of the default thresholds, and you do not want to disable COOP EP dampening.
There are 3 threshold values related to the damp penalty which work in tandem. All 3 of these values are modified when changing the DampFactor:
Threshold Name |
Description |
Default Value |
dampReuseThresh |
Reuse threshold value when EP moves back to normal state from a "freeze" state |
2500 |
dampSatThresh |
Damp saturation threshold. When an EP crosses this penalty value, it is put into a "freeze" state |
10000 |
dampThresh |
Critical state threshold. If the EP stays above the threshold for 10 min, it is put into a "freeze" state |
4000 |
The default DampFactor is set to 1. The dampFactor can be modified to values between 1 and 5.
Modify COOP EP DampFactor
To change the damp factor to 4 times the value, you can use this post on the APIC:
apic# icurl -X POST -d '<polUni><infraInfra><infraSetPol dampFactor=4></infraSetPol></infraInfra></polUni>' http://localhost:7777/api/policymgr/mo/.xml
The modified thresholds can be validated, per-spine per repo, by checking the coopRepP class:
apic# moquery -c coopRepP
# coop.RepP
...
dampReuseThresh : 10000
dampSatThresh : 40000
dampThresh : 16000