BGP update generation uses the concept of update groups to optimize performance. An update group is a collection of peers
with the identical outbound policy. When generating updates, the group policy is used to format messages that are then transmitted
to the members of the group.
In order to maintain fairness in resource utilization, each update group is allocated a quota of formatted messages that it
keeps in its cache. Messages are added to the cache when they are formatted by the group, and they are removed when they are
transmitted to all the members of the group.
A slow peer is a peer that cannot keep up with the rate at which the Cisco IOS software is generating update messages, and
is not keeping up over a prolonged period (in the order of a few minutes). There are several causes of a peer being slow:
-
There is packet loss or high traffic on the link to the peer, and the throughput of the BGP TCP connection is very low.
-
The peer has a heavy CPU load and cannot service the TCP connection at the required frequency.
When a slow peer is present in an update group, the number of formatted updates pending transmission builds up. When the cache
limit is reached, the group does not have any more quotas to format new messages. In order for a new message to be formatted,
some of the existing messages must be transmitted by the slow peer and then removed from the cache. The rest of the members
of the group that are faster than the slow peer and have completed transmission of the formatted messages will not have anything
new to send, even though there may be newly modified BGP networks waiting to be advertised or withdrawn. This effect of blocking
formatting of all the peers in a group when one of the peers is slow in consuming updates is the "slow peer" problem.
Temporary Slowness Does Not Constitute a Slow Peer
Events that cause large churn in the BGP table (such as connection resets) can cause a brief spike in the rate of update generation.
A peer that temporarily falls behind during such events, but quickly recovers after the event, is not considered a slow peer.
In order for a peer for be marked as slow, it must be incapable of keeping up with the average rate of generated updates over
a longer period (in the order of a few minutes).