This document describes how the Internal Border Gateway Protocol (iBGP) between Provider Edge (PE) and Customer Edge (CE) feature is implemented in Cisco IOS®.
Until the new iBGP PE-CE feature, iBGP between PE and CE (hence on a Virtual Routing and Forwarding (VRF) interface on the PE router) was not officially supported. One exception is iBGP on VRF interfaces in a Multi-VRF CE (VRF-Lite) setup. The motivation to deploy this feature is:
With this feature, the sites of the VRF can have the same ASN as the SP core. However, in case the ASN of the VRF sites are different than the ASN of the SP core, it can be made to appear the same with the use of the feature local-Autonomous System (AS).
Here are the two major parts in order to make this feature work:
The new ATTR_SET attribute allows the SP to carry all of the BGP attributes of the customer in a transparent manner and does not interfere with the SP attributes and BGP policies. Such attributes are the cluster list, local preference, communities, and so on.
ATTR_SET is the new BGP attribute used in order to carry the VPN BGP attributes of the SP customer. It is an optional transitive attribute. In this attribute, all of the customer BGP attributes from the BGP Update message, except for the MP_REACH and MP_UNREACH attributes, can be carried.
The ATTR_SET attribute has this format:
+------------------------------+
| Attr Flags (O|T) Code = 128 |
+------------------------------+
| Attr. Length (1 or 2 octets) |
+------------------------------+
| Origin AS (4 octets) |
+------------------------------+
| Path Attributes (variable) |
+------------------------------+
The attribute flags are the regular BGP attribute flags (refer to RFC 4271). The attribute length indicates whether the attribute length is one or two octets. The purpose of the Origin AS field is to prevent the leak of one route originated in one AS to be leaked to another AS without the proper manipulation of the AS_PATH. The variable length Path Attributes field carries the VPN BGP attributes that must be carried across the SP core.
On the egress PE router, the VPN BGP attributes are pushed into this attribute. On the ingress PE router, these attributes are popped from the attribute, before the BGP prefix is sent to the CE router. This attribute provides the isolation of BGP attributes between the SP network and the customer VPN and vice versa. For example, the SP route reflection cluster list attribute is not seen and considered inside the VPN network. But also, the VPN route reflection cluster list attribute is not seen and considered inside the SP network.
Look at Figure 1 in order to see the propagation of a customer BGP prefix across the SP network.
Figure 1
CE1 and CE2 are in the same AS as the SP network: 65000. PE1 has iBGP configured towards CE1. PE1 reflects the path for the prefix 10.100.1.1/32 towards the RR in the SP network. The RR reflects the iBGP path towards the PE routers as usual. PE2 reflects the path towards CE2.
In order for this to work properly, you must:
Refer to Figure 1.
Here is the needed configuration for PE1 and PE2:
PE1
vrf definition customer1
rd 65000:1
route-target export 1:1
route-target import 1:1
!
address-family ipv4
exit-address-family
router bgp 65000
bgp log-neighbor-changes
neighbor 192.168.100.3 remote-as 65000
neighbor 192.168.100.3 update-source Loopback0
!
address-family vpnv4
neighbor 192.168.100.3 activate
neighbor 192.168.100.3 send-community extended
exit-address-family
!
address-family ipv4 vrf customer1
neighbor 10.1.1.4 remote-as 65000
neighbor 10.1.1.4 activate
neighbor 10.1.1.4 internal-vpn-client
neighbor 10.1.1.4 route-reflector-client
neighbor 10.1.1.4 next-hop-self
exit-address-family
PE2
vrf definition customer1
rd 65000:2
route-target export 1:1
route-target import 1:1
!
address-family ipv4
exit-address-family
router bgp 65000
bgp log-neighbor-changes
neighbor 192.168.100.3 remote-as 65000
neighbor 192.168.100.3 update-source Loopback0
!
address-family vpnv4
neighbor 192.168.100.3 activate
neighbor 192.168.100.3 send-community extended
exit-address-family
!
address-family ipv4 vrf customer1
neighbor 10.1.2.5 remote-as 65000
neighbor 10.1.2.5 activate
neighbor 10.1.2.5 internal-vpn-client
neighbor 10.1.2.5 route-reflector-client
neighbor 10.1.2.5 next-hop-self
exit-address-family
There is a new command, neighbor <internal-CE> internal-vpn-client, to make this feaure work. It must be configured on the PE router only for the iBGP session towards the CE routers.
Refer to Figure 1.
This is the prefix advertised by CE1:
CE1#show bgp ipv4 unicast 10.100.1.1/32
BGP routing table entry for 10.100.1.1/32, version 2
Paths: (1 available, best #1, table default)
Advertised to update-groups:
4
Refresh Epoch 1
Local
0.0.0.0 from 0.0.0.0 (10.100.1.1)
Origin IGP, metric 0, localpref 100, weight 32768, valid, sourced, local, best
rx pathid: 0, tx pathid: 0x0
When PE1 receives the BGP prefix 10.100.1.1/32 from CE1, it stores it twice:
PE1#show bgp vpnv4 unicast all 10.100.1.1/32
BGP routing table entry for 65000:1:10.100.1.1/32, version 21
Paths: (2 available, best #1, table customer1)
Advertised to update-groups:
5
Refresh Epoch 1
Local, (Received from ibgp-pece RR-client)
10.1.1.4 (via vrf customer1) from 10.1.1.4 (10.100.1.1)
Origin IGP, metric 0, localpref 200, valid, internal, best
mpls labels in/out 18/nolabel
rx pathid: 0, tx pathid: 0x0
Refresh Epoch 1
Local, (Received from ibgp-pece RR-client), (ibgp sourced)
10.1.1.4 (via vrf customer1) from 10.1.1.4 (10.100.1.1)
Origin IGP, localpref 100, valid, internal
Extended Community: RT:1:1
mpls labels in/out 18/nolabel
rx pathid: 0, tx pathid: 0
The first path is the actual path on PE1, because it is received from CE1.
The second path is the path that is advertised towards the RRs/PE routers. It is marked with ibgp sourced. It contains the ATTR_SET attribute. Notice that this path has one or more Route Targets (RTs) attached to it.
PE1 advertises the prefix as shown here:
PE1#show bgp vpnv4 unicast all neighbors 192.168.100.3 advertised-routes
BGP table version is 7, local router ID is 192.168.100.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 65000:1 (default for vrf customer1)
*>i 10.100.1.1/32 10.1.1.4 0 200 0 i
Total number of prefixes 1
This is how the RR sees the path:
RR#show bgp vpnv4 un all 10.100.1.1/32
BGP routing table entry for 65000:1:10.100.1.1/32, version 10
Paths: (1 available, best #1, no table)
Advertised to update-groups:
3
Refresh Epoch 1
Local, (Received from a RR-client)
192.168.100.1 (metric 11) (via default) from 192.168.100.1 (192.168.100.1)
Origin IGP, localpref 100, valid, internal, best
Extended Community: RT:1:1
Originator: 10.100.1.1, Cluster list: 192.168.100.1
ATTR_SET Attribute:
Originator AS 65000
Origin IGP
Aspath
Med 0
LocalPref 200
Cluster list
192.168.100.1,
Originator 10.100.1.1
mpls labels in/out nolabel/18
rx pathid: 0, tx pathid: 0x0
Notice that the local preference of this VPNv4 unicast prefix in the core is 100. In the ATTR_SET, the original local preference of 200 is stored. However, this is transparent to the RR in the SP core.
On PE2, you see the prefix as shown here:
PE2#show bgp vpnv4 unicast all 10.100.1.1/32
BGP routing table entry for 65000:1:10.100.1.1/32, version 5
Paths: (1 available, best #1, no table)
Not advertised to any peer
Refresh Epoch 2
Local
192.168.100.1 (metric 21) (via default) from 192.168.100.3 (192.168.100.3)
Origin IGP, localpref 100, valid, internal, best
Extended Community: RT:1:1
Originator: 10.100.1.1, Cluster list: 192.168.100.3, 192.168.100.1
ATTR_SET Attribute:
Originator AS 65000
Origin IGP
Aspath
Med 0
LocalPref 200
Cluster list
192.168.100.1,
Originator 10.100.1.1
mpls labels in/out nolabel/18
rx pathid: 0, tx pathid: 0x0
BGP routing table entry for 65000:2:10.100.1.1/32, version 6
Paths: (1 available, best #1, table customer1)
Advertised to update-groups:
1
Refresh Epoch 2
Local, imported path from 65000:1:10.100.1.1/32 (global)
192.168.100.1 (metric 21) (via default) from 192.168.100.3 (192.168.100.3)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator AS(ibgp-pece): 65000
Originator: 10.100.1.1, Cluster list: 192.168.100.1
mpls labels in/out nolabel/18
rx pathid:0, tx pathid: 0x0
The first path is the one received from the RR, with the ATTR_SET. Note that the RD is 65000:1, the origin RD. The second path is the imported path from the VRF table with RD 65000:1. The ATTR_SET has been removed.
This is the path as seen on CE2:
CE2#show bgp ipv4 unicast 10.100.1.1/32
BGP routing table entry for 10.100.1.1/32, version 10
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
Local
10.1.2.2 from 10.1.2.2 (192.168.100.2)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator: 10.100.1.1, Cluster list: 192.168.100.2, 192.168.100.1
rx pathid: 0, tx pathid: 0x0
Notice that the next-hop is 10.1.2.2, which is PE2. The cluster list contains routers PE1 and PE2. These are the RRs that matter inside the VPN. The SP RR (10.100.1.3) is not in the cluster list.
The local preference of 200 has been preserved inside the VPN across the SP network.
The debug bgp vpnv4 unicast updates command shows the update propagated in the SP network:
PE1#
BGP(4): Revise route installing 1 of 1 routes for 10.100.1.1/32 -> 10.1.1.4
(customer1) to customer1 IP table
BGP(4): 192.168.100.3 NEXT_HOP changed SELF for ibgp rr-client pe-ce net
65000:1:10.100.1.1/32,
BGP(4): 192.168.100.3 Net 65000:1:10.100.1.1/32 from ibgp-pece 10.1.1.4 format
ATTR_SET
BGP(4): (base) 192.168.100.3 send UPDATE (format) 65000:1:10.100.1.1/32, next
192.168.100.1, label 16, metric 0, path Local, extended community RT:1:1
BGP: 192.168.100.3 Next hop is our own address 192.168.100.1
BGP: 192.168.100.3 Route Reflector cluster loop; Received cluster-id 192.168.100.1
BGP: 192.168.100.3 RR in same cluster. Reflected update dropped
RR#
BGP(4): 192.168.100.1 rcvd UPDATE w/ attr: nexthop 192.168.100.1, origin i, localpref
100, originator 10.100.1.1, clusterlist 192.168.100.1, extended community RT:1:1,
[ATTR_SET attribute: originator AS 65000, origin IGP, aspath , med 0, localpref 200,
cluster list 192.168.100.1 , originator 10.100.1.1]
BGP(4): 192.168.100.1 rcvd 65000:1:10.100.1.1/32, label 16
RT address family is not configured. Can't create RTC route
BGP(4): (base) 192.168.100.1 send UPDATE (format) 65000:1:10.100.1.1/32, next
192.168.100.1, label 16, metric 0, path Local, extended community RT:1:1
PE2#
BGP(4): 192.168.100.3 rcvd UPDATE w/ attr: nexthop 192.168.100.1, origin i, localpref
100, originator 10.100.1.1, clusterlist 192.168.100.3 192.168.100.1, extended community
RT:1:1, [ATTR_SET attribute: originator AS 65000, origin IGP, aspath , med 0, localpref
200, cluster list 192.168.100.1 , originator 10.100.1.1]
BGP(4): 192.168.100.3 rcvd 65000:1:10.100.1.1/32, label 16
RT address family is not configured. Can't create RTC route
BGP(4): Revise route installing 1 of 1 routes for 10.100.1.1/32 -> 192.168.100.1
(customer1) to customer1 IP table
BGP(4): 10.1.2.5 NEXT_HOP is set to self for net 65000:2:10.100.1.1/32,
PE2# debug bgp vpnv4 unicast updates detail
BGP updates debugging is on with detail for address family: VPNv4 Unicast
PE2#
BGP(4): 192.168.100.3 rcvd UPDATE w/ attr: nexthop 192.168.100.1, origin i,
localpref 100, originator 10.100.1.1, clusterlist 192.168.100.3 192.168.100.1,
extended community RT:1:1, [ATTR_SET attribute: originator AS 65000, origin IGP,
aspath , med 0, localpref 200, cluster list 192.168.100.1 , originator 10.100.1.1]
BGP(4): 192.168.100.3 rcvd 65000:1:10.100.1.1/32, label 17
RT address family is not configured. Can't create RTC route
BGP: 192.168.100.3 rcv update length 125
BGP: 192.168.100.3 rcv update dump: FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF
0090 0200 00
PE2#00 7980 0E21 0001 800C 0000 0000 0000 0000 C0A8 6401 0078 0001 1100 00FD E800
0000 010A 6401 0140 0101 0040 0200 4005 0400 0000 64C0 1008 0002 0001 0000 0001 800A
08C0 A864 03C0 A864 0180 0904 0A64 0101 C080 2700 00FD E840 0101 0040 0200 8004 0400
0000 0040 0504 0000 00C8 800A 04C0 A864 0180 0904 0A64 0101
BGP(4): Revise route installing 1 of 1 routes for 10.100.1.1/32 -> 192.168.100.1
(customer1) to customer1 IP table
BGP(4): 10.1.2.5 NEXT_HOP is set to self for net 65000:2:10.100.1.1/32,
Next-hop-self must be configured on the PE routers for this feature. The reason for this is that normally the next-hop is transported unchanged with iBGP. However, here there are two separate networks: the VPN network and the SP network, which run separate Interior Gateway Protocols (IGPs). Hence, the IGP metric cannot be easily compared and used for best path calculation between the two networks. The approach chosen by RFC 6368 is to have next-hop-self mandatory for the iBGP session towards the CE, which avoids the previously described issue all together. An advantage is that the VRF sites can run different IGPs with this approach.
RFC 6368 mentions that it is recommended that different VRF sites of the same VPN use different (unique) RDs. In Cisco IOS, this is mandatory for this feature.
Refer to Figure 2. The VPN customer1 has ASN 65001.
Figure 2
CE1 is in AS 65001. In order to make this internal BGP from the standpoint of PE1, it needs the iBGP local-as feature.
CE1
router bgp 65001
bgp log-neighbor-changes
network 10.100.1.1 mask 255.255.255.255
neighbor 10.1.1.1 remote-as 65001
PE1
router bgp 65000
bgp log-neighbor-changes
neighbor 192.168.100.3 remote-as 65000
neighbor 192.168.100.3 update-source Loopback0
!
address-family vpnv4
neighbor 192.168.100.3 activate
neighbor 192.168.100.3 send-community extended
exit-address-family
!
address-family ipv4 vrf customer1
neighbor 10.1.1.4 remote-as 65001
neighbor 10.1.1.4 local-as 65001
neighbor 10.1.1.4 activate
neighbor 10.1.1.4 internal-vpn-client
neighbor 10.1.1.4 route-reflector-client
neighbor 10.1.1.4 next-hop-self
exit-address-family
PE2 and CE2 are configured similarly.
PE1 sees the BGP prefix as shown here:
PE1#show bgp vpnv4 unicast all 10.100.1.1/32
BGP routing table entry for 65000:1:10.100.1.1/32, version 41
Paths: (2 available, best #1, table customer1)
Advertised to update-groups:
5
Refresh Epoch 1
Local, (Received from ibgp-pece RR-client)
10.1.1.4 (via vrf customer1) from 10.1.1.4 (10.100.1.1)
Origin IGP, metric 0, localpref 200, valid, internal, best
mpls labels in/out 18/nolabel
rx pathid: 0, tx pathid: 0x0
Refresh Epoch 1
Local, (Received from ibgp-pece RR-client), (ibgp sourced)
10.1.1.4 (via vrf customer1) from 10.1.1.4 (10.100.1.1)
Origin IGP, localpref 100, valid, internal
Extended Community: RT:1:1
mpls labels in/out 18/nolabel
rx pathid: 0, tx pathid: 0
The prefix is an internal BGP.
PE2 sees this:
PE2#show bgp vpnv4 unicast all 10.100.1.1/32
BGP routing table entry for 65000:1:10.100.1.1/32, version 33
Paths: (1 available, best #1, no table)
Not advertised to any peer
Refresh Epoch 5
Local
192.168.100.1 (metric 21) (via default) from 192.168.100.3 (192.168.100.3)
Origin IGP, localpref 100, valid, internal, best
Extended Community: RT:1:1
Originator: 10.100.1.1, Cluster list: 192.168.100.3, 192.168.100.1
ATTR_SET Attribute:
Originator AS 65001
Origin IGP
Aspath
Med 0
LocalPref 200
Cluster list
192.168.100.1,
Originator 10.100.1.1
mpls labels in/out nolabel/18
rx pathid: 0, tx pathid: 0x0
BGP routing table entry for 65000:2:10.100.1.1/32, version 34
Paths: (1 available, best #1, table customer1)
Advertised to update-groups:
5
Refresh Epoch 2
Local, imported path from 65000:1:10.100.1.1/32 (global)
192.168.100.1 (metric 21) (via default) from 192.168.100.3 (192.168.100.3)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator AS(ibgp-pece): 65001
Originator: 10.100.1.1, Cluster list: 192.168.100.1
mpls labels in/out nolabel/18
rx pathid: 0, tx pathid: 0x0
The Originator AS is 65001, which is the AS used when the prefix is sent from PE2 to CE2. So, the AS is preserved, and so is the local preference in this example.
CE2#show bgp ipv4 unicast 10.100.1.1/32
BGP routing table entry for 10.100.1.1/32, version 3
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
Local
10.1.2.2 from 10.1.2.2 (192.168.100.2)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator: 10.100.1.1, Cluster list: 192.168.100.2, 192.168.100.1
rx pathid: 0, tx pathid: 0x0
You see Local instead of an AS Path. This means it is an internal BGP route originated in AS 65001, which is also the configured ASN of router CE2. All of the BGP attributes have been taken from the ATTR_SET attribute. This adheres to the rules for Case 1 in the next section.
The ATTR_SET contains the Originator AS of the originating VRF. This Originating AS is checked by the remote PE, when it removes the ATTR_SET before it sends the prefix to the CE router.
Case 1: If the Originating AS matches the configured AS for the CE router, then the BGP attributes are taken from the ATTR_SET attribute when the PE imports the path into the destination VRF.
Case 2: If the Originating AS does not match the configured AS for the CE router, then the set of attributes for the constructed path are taken as shown here:
PE2 sees the route as follows:
PE2#show bgp vpnv4 unicast all 10.100.1.1/32
BGP routing table entry for 65000:1:10.100.1.1/32, version 43
Paths: (1 available, best #1, no table)
Not advertised to any peer
Refresh Epoch 6
Local
192.168.100.1 (metric 21) (via default) from 192.168.100.3 (192.168.100.3)
Origin IGP, localpref 100, valid, internal, best
Extended Community: RT:1:1
Originator: 10.100.1.1, Cluster list: 192.168.100.3, 192.168.100.1
ATTR_SET Attribute:
Originator AS 65000
Origin IGP
Aspath
Med 0
LocalPref 200
Cluster list
192.168.100.1,
Originator 10.100.1.1
mpls labels in/out nolabel/17
rx pathid: 0, tx pathid: 0x0
BGP routing table entry for 65000:2:10.100.1.1/32, version 44
Paths: (1 available, best #1, table customer1)
Advertised to update-groups:
6
Refresh Epoch 6
Local, imported path from 65000:1:10.100.1.1/32 (global)
192.168.100.1 (metric 21) (via default) from 192.168.100.3 (192.168.100.3)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator AS(ibgp-pece): 65000
Originator: 10.100.1.1, Cluster list: 192.168.100.1
mpls labels in/out nolabel/17
rx pathid: 0, tx pathid: 0x0
This is the prefix as seen on CE2:
CE2#show bgp ipv4 unicast 10.100.1.1/32
BGP routing table entry for 10.100.1.1/32, version 5
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
65000
10.1.2.2 from 10.1.2.2 (192.168.100.2)
Origin IGP, localpref 100, valid, external, best
rx pathid: 0, tx pathid: 0x0
This is Case 2. The Origin AS number contained in the ATTR_SET attribute is prepended to the AS_PATH by PE2 and follows the rules that apply to an eBGP peering between the source and destination AS. The iBGP-specific attributes are ignored by PE2 when it creates the route to be advertised to CE2. So, the local preference is 100 and not 200 (as seen in the ATTR_SET attribute).
Refer to Figure 4.
Figure 4
Figure 4 shows an additional CE router, CE3, connected to PE1. CE1 and CE3 are both connected to PE1 on the same VRF instance: customer1. This means that CE1 and CE3 are Multi-VRF CE routers (also known as VRF-Lite) of PE1. PE1 puts itself as next-hop when it advertises the prefixes from CE1 to CE3. In the case that this behavior is not wanted, you could configure neighbor 10.1.3.6 next-hop-unchanged on PE1. In order to configure this, you must remove neighbor 10.1.3.6 next-hop-self on PE1. Then CE3 sees the routes from CE1 with CE1 to be the next-hop for those BGP prefixes. In order to make this work, you need the routes for those BGP next-hops in the routing table of CE3. You need a dynamic routing protocol (IGP) or static routes on CE1, PE1, and CE3 in order to make sure the routers have a route for each others next-hop IP addresses. However, there is a problem with this configuration.
The configuration on PE1 is:
router bgp 65000
!
address-family ipv4 vrf customer1
neighbor 10.1.1.4 remote-as 65000
neighbor 10.1.1.4 activate
neighbor 10.1.1.4 internal-vpn-client
neighbor 10.1.1.4 route-reflector-client
neighbor 10.1.1.4 next-hop-self
neighbor 10.1.3.6 remote-as 65000
neighbor 10.1.3.6 activate
neighbor 10.1.3.6 internal-vpn-client
neighbor 10.1.3.6 route-reflector-client
neighbor 10.1.3.6 next-hop-unchanged
exit-address-family
The prefix from CE1 is seen fine on CE3:
CE3#show bgp ipv4 unicast 10.100.1.1
BGP routing table entry for 10.100.1.1/32, version 9
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
Local
10.1.1.4 from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator: 10.100.1.1, Cluster list: 192.168.100.1
rx pathid: 0, tx pathid: 0x0
However, the prefix from CE2 is seen on CE3 as shown here:
CE3#show bgp ipv4 unicast 10.100.1.2
BGP routing table entry for 10.100.1.2/32, version 0
Paths: (1 available, no best path)
Not advertised to any peer
Refresh Epoch 1
Local
192.168.100.2 (inaccessible) from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 100, valid, internal
Originator: 10.100.1.2, Cluster list: 192.168.100.1, 192.168.100.2
rx pathid: 0, tx pathid: 0
The BGP next-hop is 192.168.100.2, the loopback IP address of PE2. PE1 did not rewrite the BGP next-hop to itself when it advertised the prefix 10.100.1.2/32 to CE3. This makes this prefix unusable on CE3.
So, in the case of a mix of the iBGP PE-CE feature across MPLS-VPN and iBGP VRF-Lite, you must make sure that you always have next-hop-self on the PE routers.
You cannot preserve the next-hop when a PE router is an RR that reflects iBGP routes from one CE to another CE across VRF interfaces locally on the PE. When you run iBGP PE-CE across an MPLS VPN network, you must use internal-vpn-client for the iBGP sessions towards the CE routers. When you have more than one local CE in a VRF on a PE router, then you must keep next-hop-self for those BGP peers.
You could look at route-maps in order to set the next-hop to self for prefixes received from other PE routers, but not for reflected prefixes from other locally-connected CE routers. However, it is not currently supported to set the next-hop to self in an outbound route-map. That configuration is shown here:
router bgp 65000
address-family ipv4 vrf customer1
neighbor 10.1.1.4 remote-as 65000
neighbor 10.1.1.4 activate
neighbor 10.1.1.4 internal-vpn-client
neighbor 10.1.1.4 route-reflector-client
neighbor 10.1.1.4 next-hop-self
neighbor 10.1.3.6 remote-as 65000
neighbor 10.1.3.6 activate
neighbor 10.1.3.6 internal-vpn-client
neighbor 10.1.3.6 route-reflector-client
neighbor 10.1.3.6 route-map NH-setting out
exit-address-family
ip prefix-list PE-loopbacks seq 10 permit 192.168.100.0/24 ge 32
!
route-map NH-setting permit 10
description set next-hop to self for prefixes from other PE routers
match ip route-source prefix-list PE-loopbacks
set ip next-hop self
!
route-map NH-setting permit 20
description advertise prefixes with next-hop other than the prefix-list in
route-map entry 10 above
!
However, this is not supported:
PE1(config)#route-map NH-setting permit 10
PE1(config-route-map)# set ip next-hop self
% "NH-setting" used as BGP outbound route-map, set use own IP/IPv6 address for the nexthop not supported
If PE1 runs older Cisco IOS software that lacks the feature iBGP PE-CE, then PE1 never sets itself as the next-hop for the reflected iBGP prefixes. This means that the reflected BGP prefix (10.100.1.1/32) from CE1 (10.100.1.1) to CE2 -via PE1- would have CE1 (10.1.1.4) as the next-hop.
CE3#show bgp ipv4 unicast 10.100.1.1
BGP routing table entry for 10.100.1.1/32, version 32
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
Local
10.1.1.4 from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator: 10.100.1.1, Cluster list: 192.168.100.1
rx pathid: 0, tx pathid: 0x0
The prefix from CE2 (10.100.1.2/32) is seen with PE2 as the next-hop, because PE1 does not do next-hop-self for this prefix either:
CE3#show bgp ipv4 unicast 10.100.1.2
BGP routing table entry for 10.100.1.2/32, version 0
Paths: (1 available, no best path)
Not advertised to any peer
Refresh Epoch 1
Local
192.168.100.2 (inaccessible) from 10.1.3.1 (192.168.100.1)
Origin IGP, localpref 100, valid, internal
Originator: 10.100.1.2, Cluster list: 192.168.100.1, 192.168.100.3, 192.168.100.2
ATTR_SET Attribute:
Originator AS 65000
Origin IGP
Aspath
Med 0
LocalPref 100
Cluster list
192.168.100.2,
Originator 10.100.1.2
rx pathid: 0, tx pathid: 0
In order for the iBGP PE-CE feature to work properly, all PE routers for the VPN where the feature is enabled must have the code to support the feature and have the feature enabled.
Refer to Figure 5.
Figure 5
Figure 5 shows a VRF-Lite setup. The session from PE1 towards CE4 is eBGP. The session from PE1 towards CE3 is still iBGP.
For eBGP prefixes, the next-hop is always set to self when it advertises the prefixes towards an iBGP neighbor on VRF. This is regardless of the fact if the session towards the iBGP neighbor across VRF has next-hop-self set or not.
In Figure 5, CE3 sees the prefixes from CE4 with PE1 as the next-hop.
CE3#show bgp ipv4 unicast 10.100.1.4
BGP routing table entry for 10.100.1.4/32, version 103
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
65004
10.1.3.1 from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
This occurs with next-hop-self on PE1 towards CE3 or without.
If the interfaces on PE1 towards CE3 and CE4 are not in a VRF, but in the global context, the next-hop-self towards CE3 does makes a difference.
Without next-hop-self on PE1 towards CE3, you see:
PE1#show bgp vrf customer1 vpnv4 unicast neighbors 10.1.3.6
BGP neighbor is 10.1.3.6, vrf customer1, remote AS 65000, internal link
...
For address family: VPNv4 Unicast
Translates address family IPv4 Unicast for VRF customer1
Session: 10.1.3.6
BGP table version 1, neighbor version 1/0
Output queue size : 0
Index 12, Advertise bit 0
Route-Reflector Client
12 update-group member
Slow-peer detection is disabled
Slow-peer split-update-group dynamic is disabled
Interface associated: (none)
Although the next-hop-self is implicitly enabled, the output does not indicate this.
With next-hop-self on PE1 towards CE3, you see:
PE1#show bgp vrf customer1 vpnv4 unicast neighbors 10.1.3.6
BGP neighbor is 10.1.3.6, vrf customer1, remote AS 65000, internal link
..
For address family: VPNv4 Unicast
...
NEXT_HOP is always this router for eBGP paths
Whereas, if the interfaces towards CE3 and CE4 are in a global context, the next-hop for prefixes from CE4 is CE4 itself when next-hop-self is not configured:
CE3#show bgp ipv4 unicast 10.100.1.4
BGP routing table entry for 10.100.1.4/32, version 124
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
65004
10.1.4.7 from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
For next-hop-self on PE1 towards CE3:
CE3#show bgp ipv4 unicast 10.100.1.4
BGP routing table entry for 10.100.1.4/32, version 125
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 1
65004
10.1.3.1 from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
This was done based on RFC 4364.
If you want to not set next-hop-self for eBGP prefixes towards an iBGP session across a VRF interface, you must configure next-hop-unchanged. The support for this only occured with Cisco bug ID CSCuj11720.
router bgp 65000
...
address-family ipv4 vrf customer1
neighbor 10.1.1.4 remote-as 65000
neighbor 10.1.1.4 activate
neighbor 10.1.1.4 route-reflector-client
neighbor 10.1.3.6 remote-as 65000
neighbor 10.1.3.6 activate
neighbor 10.1.3.6 route-reflector-client
neighbor 10.1.3.6 next-hop-unchanged
neighbor 10.1.4.7 remote-as 65004
neighbor 10.1.4.7 activate
exit-address-family
Now, CE3 sees CE4 as the next-hop for the prefixes advertised by CE4:
CE3#show bgp ipv4 unicast 10.100.1.4
BGP routing table entry for 10.100.1.4/32, version 130
Paths: (1 available, best #1, table default)
Not advertised to any peer
Refresh Epoch 3
65004
10.1.4.7 from 10.1.3.1 (192.168.100.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
rx pathid: 0, tx pathid: 0x0
If you try to configure the next-hop-unchanged keyword for the iBGP session towards CE3 on Cisco IOS code prior to Cisco bug ID CSCuj11720, you encounter this error:
PE1(config-router-af)# neighbor 10.1.3.6 next-hop-unchanged
%BGP: Can propagate the nexthop only to multi-hop EBGP neighbor
After Cisco bug ID CSCuj11720, the next-hop-unchanged keyword is valid for multi-hop eBGP neighbors and iBGP VRF-Lite neighbors.