Jan
17

Abstract:

Inter-AS Multicast VPN solution introduces some challenges in cases where peering systems implement BGP-free core. This post illustrates a known solution to this problem, implemented in Cisco IOS software. The solution involves the use of special MP-BGP and PIM extensions. The reader is assumed to have understanding of basic Cisco’s mVPN implementation, PIM-protocol and Multi-Protocol BGP extensions.

Abbreviations used

mVPN – Multicast VPN
MSDP – Multicast Source Discovery Protocol
PE – Provider Edge
CE – Customer Edge
RPF – Reverse Path Forwarding
MP-BGP – Multi-Protocol BGP
PIM – Protocol Independent Multicast
PIM SM – PIM Sparse Mode
PIM SSM – PIM Source Specific Multicast
LDP – Label Distribution Protocol
MDT – Multicast Distribution Tree
P-PIM – Provider Facing PIM Instance
C-PIM – Customer Facing PIM Instance
NLRI – Network Layer Rechability Information

Inter-AS mVPN Overview

A typical “classic” Inter-AS mVPN solution leverages the following key components:

  • PIM-SM maintaining separate RP for every AS
  • MSDP used to exchange information on active multicast sources
  • (Optionally) MP-BGP multicast extension to propagate information about multicast prefixes for RPF validations

With this solution, different PEs participating in the same MDT discover each other by joining the shared tree towards the local RP and listening to the multicast packets sent by other PEs. Those PEs belong to the same or different Autonomous Systems. In the latter case, the sources are discovered by the virtue of MSDP peering. This scenario assumes that every router in the local AS has complete routing information about multicast sources (PE’s loopback addresses) residing in other system. Such information is necessary for the purpose of RPF check. In turn, this leads to the requirement of running BGP on all P routers OR redistributing MP-BGP (BGP) multicast prefixes information into IGP. The redistribution approach clearly has limited scalability, while the other method requires enabling BGP on the P routers, which nullifies with the idea of BGP-free core.

A solution alternative to running PIM in Sparse Mode would be using PIM SSM, which relies on out-of-band information for multicast sources discovery. For such case, Cisco released a draft proposal listing new MP-BGP MDT SAFI that is used to propagate MDT information and associated PE addresses. Let’s make a short detour to MP-BGP to get a better understanding of SAFIs.

MP-BGP Overview

Recall the classic BGP UPDATE message format. It consists of the following sections: [Withdrawn prefixes (Optional)] + [Path Attributes] + [NLRIs]. The Withdrawn Prefixes and NLRIs are IPv4 prefixes, and their structure does not support any other network protocols. The Path Attributes (e.g. AS_PATH, ORIGIN, LOCAL_PREF, NEXT_HOP) are associated with all NLRIs; prefixes sharing different set of path attributes should be carried in a separate UPDATE message. Also, notice that NEXT_HOP is an IPv4 address as well.

In order to introduce support for non-IPv4 network protocols into BGP, two new optional transitive Path Attributes have been added to BGP. The first attribute is known as MP_REACH_NLRI, has the following structure: [AFI/SAFI] + [NEXT_HOP] + [NLRI]. Both NEXT_HOP and NLRI are formatted according to the protocol encoded via AFI/SAFI that stands for Address Family Identifier and Subsequent Address Family Identifier respectively. For example, this could be an IPv6 or CLNS prefix. Thus, all information about non-IPv4 prefixes is encoded in a new BGP Path Attribute. A typical BGP Update message that contains MP_REACH_NLRI attributes would have no “classic” NEXT_HOP attribute and no “Withdrawn Prefixes” or “NLRIs” found in normal UPDATE messages. For the next-hop calculations, a receiving BGP speaker should use the information found in MP_REACH_NLRI attribute. However, the multi-protocol UPDATE message may contain other BGP path attributes such as AS_PATH, ORIGIN, MED, LOCAL_PREF and so on. However, this time those attributes are associated with the non-IPv4 prefixes found in all attached MP_REACH_NLRI attributes.

The second attribute, MP_UNREACH_NLRI has format similar to MP_REACH_NLRI but lists the “multi-protocol” addresses to be removed. No other path attributes need to be associated with this attribute, an UPDATE message may simply contain the list of MP_UNREACH_NLRIs.

The list of supported AFIs may be found in RFC1700 (though it’s obsolete now, it is still very informative) – for example, AFI 1 stands for IPv4, AFI 2 stands for IPv6 etc. The subsequent AFI is needed to clarify the purpose of the information found in MP_REACH_NLRI. For example, SAFI value of 1 means the prefixes should be used for unicast forwarding, SAFI 2 means the prefixes are to be used for multicast RPF checks and SAFI 3 means the prefixes could be used for both purposes. Last, but not least – SAFI of 128 means MPLS labeled VPN address.

Just as a reminder, BGP process would perform separate best-path election process for “classic” IPv4 prefixes and every AFI/SAFI pair prefixes separately, based on the path attributes. This allows for independent route propagation for the addresses found in different address families. Since a given BGP speaker may not support particular network protocols, the list of supported AFI/SAFI pairs is advertised using BGP capabilities feature (another BGP extension), and the particular network protocol information is only propagated if both speakers support it.

MDT SAFI Overview

Cisco drafted a new SAFI to be used with regular AFIs such as IPv4 or IPv6. This SAFI is needed to propagate MDT group address and the associated PE’s Loopback address. The format for AFI 1 (IPv4) is as follows:

MP_NLRI = [RD:PE’s IPv4 Address]:[MDT Group Address],
MP_NEXT_HOP = [BGP Peer IPv4 Address].

Here RD is the RD corresponding to the VRF that has the MDT configured and “IPv4 address” is the respective PE router’s Loopback address. Normally, per Cisco rules this is the same Loopback interface used for VPNv4 peering, but this could be changed using the VRF-level command bgp next-hop.

If all PEs in the local AS exchange this information and pass it to PIM SSM, the P-PIM (Provider PIM, facing the SP core) process will be able to build an (S,G) trees for the MDT group address towards the other PE’s IPv4 addresses. This is by the virtue of the fact that the PE’s IPv4 address is known via IGP, as all PEs are in the same AS. There are no problems using the BGP-free core for intra-AS mVPN with PIM-SSM. Also, it’s worth mentioning that a precursor to MDT was as special extended community used along with VPNv4 address family. MP-BGP would use RD value of 2 (not applicable to any unicast VRF) to transport the associated PE’s IPv4 address along with an extended community that contains the MDT group address. This solution allowed for the “bootstrap” information propagation inside a single AS, since the extended-community was non-transitive. Using extended communities allowed for PIM SSM discovery information distribution inside a single AS. This temporary solution was replaced by the MDT SAFI draft.

Next, consider the case of Inter-AS VPN where at least one AS uses BGP-free core. When two peering Autonomous Systems activate the IPv4 MDT SAFI, the ASBRs will advertise all information learned from PE’s to each other. The information will further propagate down to each AS’s PEs. Next, the P-PIM processes will attempt to build (S, G) trees towards the PE IP addresses in neighboring systems. Even though the PEs may know the other PE’s addresses (e.g. if Inter-AS VPN Option C is being used), the P-routers don’t have this information. If Inter-AS VPN Option B is in use, even the PE routers will have no proper information to build the (S, G) trees.

RPF Proxy Vector

The solution to this problem uses a modification to PIM protocol and RPF check functionality. Known as RPF Proxy Vector, it defines a new PIM TLV that contains the IPv4 address of the “proxy” router used for RPF checks and as an intermediate destination for PIM Joins. Let’s see how it works in a particular scenario.

On the diagram below you can see AS 100 and AS 200 using Inter-AS VPN Option B to exchange VPNv4 routes. PEs and ASBRs peer via BGP and exchange VPNv4 and IPv4 MDT SAFI prefixes. For every external prefix relayed to its own PEs, the ASBRs would change the next-hop found MP_REACH_NLRI to its local Loopback address. For VPNv4 prefixes, this achieves the goal of terminating the LSP on the ASBR. For MDT SAFI prefixes, this procedure sets the IPv4 address to be used as “proxy” in PIM Joins.

mvpn-inter-as-basic

Let’s say that MDT group used by R1 and R5 is 232.1.1.1. When R1 receives the MDT SAFI update with the MDT value of 200:1:232.1.1.1, PE IPv4 address 20.0.5.5 and the next-hop value of 10.0.3.3 (R3’ Loopback0 interface) it will pass this information down to PIM process. The PIM process will construct a PIM Join for group 232.1.1.1 towards the IP address 20.0.5.5 (not known in AS 100) and insert a proxy vector value of 10.0.3.3. The PIM process will then use the route to 10.0.3.3 to find the next upstream PIM peer to send the Join message to. Every P-router will process the PIM Join message with the proxy vector, and use the proxy IPv4 address to relay the message upstream. As soon as the message reaches the proxy router (in our case it’s R3), the proxy vector is being removed and the PIM Joins propagate further using the regular procedure, as the domain behind the proxy is supposed to have visibility of the actual Join target.

In addition to use the proxy vector to relay the PIM Join upwards, every routers creates a special mroute state for the (S,G) pair where S is the PE IPv4 address and G is the MDT group. This mroute state will have the proxy IPv4 address associated with it. When a matching multicast packet going from the external PE towards the MDT address hits the router, the RPF check will be performed based on the upstream interface associated with the proxy IPv4 address, not the actual source IPv4 address found in the packet. For example, in our scenario, R2 would have an mroute state for (20.0.5.5, 232.1.1.1) with the proxy IPv4 address of 10.0.3.3. All packets coming from R5 to 232.1.1.1 will be RPF checked based on the upstream interface towards 10.0.3.3.

Using the above-described “proxy-based” procedure, the P routers may successfully perform RPF checks for packets with the source IPv4 addresses not found in the local RIB. The tradeoff is the amount of multicast state information that has to be stored in the P-routers memory – it’s going to be proportional to the number of PEs multiplied by number of mVPN in the worst-case scenario where every PE’s participate in every mVPN. There could be more multicast route states in situations where Data MDT are being used in addition to the Default MDT.

BGP Connector Attribute

Another additional piece of information is needed for “intra-VPN” operations: joining a PIM tree towards a particular IP address inside a VPN and performing an RPF check inside the VPN. Consider the use of Inter-AS VPN Option B, where VPNv4 prefixes have their MP_REACH_NLRI next-hop changed to the local ASBR’s IPv4 address. When a local PE receives a multicast packet on the MDT tunnel interface it decapsulates it and performs a source IPv4 address lookup inside the VRF’s table. Based on MP-BGP learned routes, the next-hop would point towards the ASBR (Option B), while the packets might be coming across a different inter-AS link running multicast MP-BGP peering. Thus, relying solely on the unicast next-hop may not be sufficient for Inter-AS RPF checks.

For example, look at the figure below, where R3 and R4 run MP-BGP for VPN4 while R4 and R6 run multicast MP-BGP extension. R1 peers with both ASBRs and learns VPNv4 prefixes from R3 while it learns as MDT SAFI information and PE’s IPv4 addresses from R6. PIM is enabled only on the link connecting R6 and R4.

mvpn-inter-as-diverse-paths

In this situation, the RPF lookup would fail, as MDT SAFI information is exchanged across the link running M-BGP, while VPNv4 prefixes next-hop point to R3. Thus, a method is required to preserve the information for the RPF lookup.

Cisco suggested the use of a new, optional transitive attribute named BGP Connector to be exported along with VPNv4 prefixes out of PE hosting an mVPN. This attribute contains the following two components: [AFI/SAFI] + [Connector Information] and in general defines information needed by network protocol identified by AFI/SAFI pair to connect to the routing information found in MP_REACH_NLRI. If AFI=IPv4 and SAFI=MDT, the connector attribute contains the IPv4 address of the router originating the prefixes associated with the VRF that has an MDT configured.

The customer-facing PIM process (C-PIM) in the PE routers will use information found in the BGP Connector attribute to perform the intra-VPN RPF check as well as to find the next-hop to send PIM Joins. Notice the C-PIM Joins do not need to have the RPF proxy vector piggy-backed in the PIM messages, as those are transported inside MDT Tunnel towards the remote PEs.

You may notice that the use of BGP Connector attribute eliminates the need for special “VPNv4 multicast” address family that could be used to transport RPF check information for VPNv4 prefixes. The VPNv4 multicast address family is not really needed as the multicast packets are tunneled through SP cores using MDT Tunnel and using the BGP connector is sufficient for RPF-checks at the provider edge. However, the use of mBGP is still needed in situations where diverse unicast and multicast transport paths are used between the Autonomous Systems.

Case Study

Let’s put all concepts to test in a sample scenario. Here, two autonomous systems AS #100 and #200 peer using Inter-AS VPN Option B to exchange VPNv4 prefixes across the link between R3-R4. At the same time, multicast prefixes are exchanged across the peering link R6-R4 along with MDT SAFI attributes. AS 100 implements BGP-free core, so R2 does no peer via BGP with any other routers and only uses OSPF for IGP prefixes exchange. R1 peers with both ASBRs: R3 and R6 via MP-BGP for the purpose of VPNv4 and multicast prefixes and MDT SAFI exchange.

mvpn-case-study

Here on the diagram the links highlighted with orange color are enabled for PIM SM and may carry the multicast traffic. Notice that MPLS traffic and multicast traffic take different paths between the systems. The following are the main R1’s configuration highlights:

  • VRF RED configured with MDT of 232.1.1.1. PIM SSM configured for P-PIM instance using the default group range of 232/8.
  • The command ip multicast vrf RED rpf proxy rd vector
    ensures the use of RPF proxy vector for the MDT built for VRF RED. Thus the MDT tree built using P-PIM would make use of RPF proxy vector. Notice that the command ip multicast rpf proxy vector applies to the Joins received in the global routing table and is typically seen in the P-routers.
  • R1 exchanges VPNv4 prefixes with R3 and multicast prefixes with R6 via BGP. At the same time R1 exchanges IPv4 MDT SAFI with R6 to learn the MDT information from AS 200.
  • Connected routes from VRF RED are redistributed into MP-BGP.
R1:
hostname R1
!
interface Serial 2/0
 encapsulation frame-relay
 no shutdown
!
ip multicast-routing
ip pim ssm default
!
interface Serial 2/0.12 point-to-point
 ip address 10.0.12.1 255.255.255.0
 frame-relay interface-dlci 102
 mpls ip
 ip pim sparse-mode
!
interface Loopback0
 ip pim sparse-mode
 ip address 10.0.1.1 255.255.255.255
!
router ospf 1
 network 10.0.12.1 0.0.0.0 area 0
 network 10.0.1.1 0.0.0.0 area 0
!
router bgp 100
  neighbor 10.0.3.3 remote-as 100
  neighbor 10.0.3.3 update-source Loopback 0
  neighbor 10.0.6.6 remote-as 100
  neighbor 10.0.6.6 update-source Loopback 0
address-family ipv4 unicast
  no neighbor 10.0.3.3 activate
  no neighbor 10.0.6.6 activate
address-family vpnv4 unicast
  neighbor 10.0.3.3 activate
  neighbor 10.0.3.3 send-community both
address-family ipv4 mdt
  neighbor 10.0.6.6 activate
address-family ipv4 multicast
  neighbor 10.0.6.6 activate
  network 10.0.1.1 mask 255.255.255.255
address-family ipv4 vrf RED
  redistribute connected
!
no ip domain-lookup
!
ip multicast vrf RED rpf proxy rd vector
!
 ip vrf RED
  rd 100:1
  route-target both 200:1
  route-target both 100:1
  mdt default 232.1.1.1
!
ip multicast-routing vrf RED
!
interface FastEthernet 0/0
 ip vrf forwarding RED
 ip address 192.168.1.1 255.255.255.0
 ip pim dense-mode
 no shutdown

Notice that in the configuration above, Loopback0 interface is used as the source for the MDT tunnel, and therefore has to have PIM (multicast routing) enabled on it. Next in turn, R2’s configuration is straightforward – OSPF used for IGP, adjacencies with R1, R3 and R3 and label exchange via LDP. Notice that R2 does NOT run PIM on the uplink to R3, and does NOT run LDP with R6. Effectively, the path via R6 is used only for multicast traffic while the path across R3 is used only for MPLS LSPs. R2 is configured for PIM-SSM and RPF proxy vector support for global routing table.

R2:
hostname R2
!
no ip domain-lookup
!
interface Serial 2/0
 encapsulation frame-relay
 no shut
!
ip multicast-routing
!
interface Serial 2/0.12 point-to-point
 ip address 10.0.12.2 255.255.255.0
 frame-relay interface-dlci 201
 mpls ip
 ip pim sparse-mode
!
ip pim ssm default
!
interface Serial 2/0.23 point-to-point
 ip address 10.0.23.2 255.255.255.0
 frame-relay interface-dlci 203
 mpls ip
!
interface Serial 2/0.26 point-to-point
 ip address 10.0.26.2 255.255.255.0
 frame-relay interface-dlci 206
 ip pim sparse-mode
!
interface Loopback0
 ip address 10.0.2.2 255.255.255.255
!
ip multicast rpf proxy vector
!
router ospf 1
 network 10.0.12.2 0.0.0.0 area 0
 network 10.0.2.2 0.0.0.0 area 0
 network 10.0.23.2 0.0.0.0 area 0
 network 10.0.26.2 0.0.0.0 area 0

R3 is the ASBR used for implementing Inter-AS VPN Option B. It peers via BGP with R1 (the PE) and R4 (the other ASBR). Only VPNv4 address family is enabled with the BGP peers. Notice that the next-hop for VPNv4 prefixes is changed to self, in order to terminate the transport LSP from the PE on R3. No multicast or MDT SAFI information is exchanged across R3.

R3:
hostname R3
!
no ip domain-lookup
!
interface Serial 2/0
 encapsulation frame-relay
 no shut
!
interface Serial 2/0.23 point-to-point
 ip address 10.0.23.3 255.255.255.0
 frame-relay interface-dlci 302
 mpls ip
!
interface Serial 2/0.34 point-to-point
 ip address 172.16.34.3 255.255.255.0
 frame-relay interface-dlci 304
 mpls ip
!
interface Loopback0
 ip address 10.0.3.3 255.255.255.255
!
router ospf 1
 network 10.0.23.3 0.0.0.0 area 0
 network 10.0.3.3 0.0.0.0 area 0
!
router bgp 100
 no bgp default route-target filter
 neighbor 10.0.1.1 remote-as 100
 neighbor 10.0.1.1 update-source Loopback 0
 neighbor 172.16.34.4 remote-as 200
address-family ipv4 unicast
  no neighbor 10.0.1.1 activate
  no neighbor 172.16.34.4 activate
address-family vpnv4 unicast
  neighbor 10.0.1.1 activate
  neighbor 10.0.1.1 next-hop-self
  neighbor 10.0.1.1 send-community both
  neighbor 172.16.34.4 activate
  neighbor 172.16.34.4 send-community both

The second ASBR in AS 100 – R6, could be characterized as the multicast-only ASBR. In fact, this ASBR is only used to exchange prefixes in multicast and MDT SAFI address families with R1 and R4. MPLS is not enabled on this router, and its sole purpose is multicast forwarding between AS 100 and AS 200. There is no need to run MSDP as PIM SSM is used for multicast trees construction.

R6:
hostname R6
!
no ip domain-lookup
!
interface Serial 2/0
 encapsulation frame-relay
no shut
!
ip multicast-routing
ip pim ssm default
ip multicast rpf proxy vector
!
interface Serial 2/0.26 point-to-point
 ip address 10.0.26.6 255.255.255.0
 frame-relay interface-dlci 602
 ip pim sparse-mode
!
interface Serial 2/0.46 point-to-point
 ip address 172.16.46.6 255.255.255.0
 frame-relay interface-dlci 604
 ip pim sparse-mode
!
interface Loopback0
 ip pim sparse-mode
 ip address 10.0.6.6 255.255.255.255
!
router ospf 1
 network 10.0.6.6 0.0.0.0 area 0
 network 10.0.26.6 0.0.0.0 area 0
!
router bgp 100
 neighbor 10.0.1.1 remote-as 100
  neighbor 10.0.1.1 update-source Loopback 0
  neighbor 172.16.46.4 remote-as 200
address-family ipv4 unicast
  no neighbor 10.0.1.1 activate
  no neighbor 172.16.46.4 activate
 address-family ipv4 mdt
  neighbor 172.16.46.4 activate
  neighbor 10.0.1.1 activate
  neighbor 10.0.1.1 next-hop-self
address-family ipv4 multicast
  neighbor 172.16.46.4 activate
  neighbor 10.0.1.1 activate
  neighbor 10.0.1.1 next-hop-self

Pay attention to the following. Firstly, R6 is set for PIM SSM and RPF proxy vector support. Secondly, R6 sets itself as the BGP next hop in the updates sent under multicast and MDF SAFI families. This is needed for proper MDT tree construction and correct RPF vector insertion. The next router, R4, is the combined VPN4 and Multicast ASBR for AS 200. It performs the same functions that R3 and R6 perform separately for AS 100. The VPNv4, MDT SAFI, and Multicast address families are enabled under BGP process for this router. At the same time, the router support RPF Proxy Vector and PIM-SSM for proper multicast forwarding. This router is the most configuration-intensive of all routers in both Autonomous Systems, as it also has to support MPLS label propagation via BGP and LDP. Of course, as a classic Option B ASBR, R4 has to change the BGP next-hop to itself for all address families updates sent to R5 – the PE in AS 200.

R4:
hostname R4
!
no ip domain-lookup
!
interface Serial 2/0
 encapsulation frame-relay
no shut
!
ip pim ssm default
ip multicast rpf proxy vector
ip multicast-routing
!
interface Serial 2/0.34 point-to-point
 ip address 172.16.34.4 255.255.255.0
 frame-relay interface-dlci 403
 mpls ip
!
interface Serial 2/0.45 point-to-point
 ip address 20.0.45.4 255.255.255.0
 frame-relay interface-dlci 405
 mpls ip
 ip pim sparse-mode
!
interface Serial 2/0.46 point-to-point
 ip address 172.16.46.4 255.255.255.0
 frame-relay interface-dlci 406
 ip pim sparse-mode
!
interface Loopback0
 ip address 20.0.4.4 255.255.255.255
!
router ospf 1
 network 20.0.4.4 0.0.0.0 area 0
 network 20.0.45.4 0.0.0.0 area 0
!
router bgp 200
  no bgp default route-target filter
  neighbor 172.16.34.3 remote-as 100
  neighbor 172.16.46.6 remote-as 100
  neighbor 20.0.5.5 remote-as 200
  neighbor 20.0.5.5 update-source Loopback0
address-family ipv4 unicast
  no neighbor 172.16.34.3 activate
  no neighbor 20.0.5.5 activate
  no neighbor 172.16.46.6 activate
address-family vpnv4 unicast
  neighbor 172.16.34.3 activate
  neighbor 172.16.34.3 send-community both
  neighbor 20.0.5.5 activate
  neighbor 20.0.5.5 send-community both
  neighbor 20.0.5.5 next-hop-self
address-family ipv4 mdt
  neighbor 172.16.46.6 activate
  neighbor 20.0.5.5 activate
  neighbor 20.0.5.5 next-hop-self
address-family ipv4 multicast
  neighbor 20.0.5.5 activate
  neighbor 20.0.5.5 next-hop-self
  neighbor 172.16.46.6 activate

The last router in the diagram is R5. It’s a PE in AS 200 configured symmetrically to R1. It has to support VPNv4, MDT SAFI and Multicast address families to learn all necessary information from the ASBR. Of course, PIM RPF proxy vector is enabled for VRF RED’s MDT as well as PIM-SSM is configured for the default group range in the global routing table. There is no router in AS 200 that emulates BGP free core, as you may have noticed.

R5:
hostname R5
!
no ip domain-lookup
!
interface Serial 2/0
 encapsulation frame-relay
no shut
!
ip multicast-routing
!
interface Serial 2/0.45 point-to-point
 ip address 20.0.45.5 255.255.255.0
 frame-relay interface-dlci 504
 mpls ip
 ip pim sparse-mode
!
interface Loopback0
 ip pim sparse-mode
 ip address 20.0.5.5 255.255.255.255
!
router ospf 1
 network 20.0.5.5 0.0.0.0 area 0
 network 20.0.45.5 0.0.0.0 area 0
!
ip vrf RED
 rd 200:1
 route-target both 200:1
 route-target both 100:1
 mdt default 232.1.1.1
!
router bgp 200
  neighbor 20.0.4.4 remote-as 200
  neighbor 20.0.4.4 update-source Loopback0
  address-family ipv4 unicast
  no neighbor 20.0.4.4 activate
address-family vpnv4 unicast
  neighbor 20.0.4.4 activate
  neighbor 20.0.4.4 send-community both
address-family ipv4 mdt
  neighbor 20.0.4.4 activate
address-family ipv4 multicast
   neighbor 20.0.4.4 activate
   network 20.0.5.5 mask 255.255.255.255
address-family ipv4 vrf RED
  redistribute connected
!
ip multicast vrf RED rpf proxy rd vector
ip pim ssm default
!
ip multicast-routing vrf RED
!
interface FastEthernet 0/0
 ip vrf forwarding RED
 ip address 192.168.5.1 255.255.255.0
 ip pim dense-mode
 no shutdown

Validating Unicast Paths

This is the simplest part. Use the show commands to see if the VPNv4 prefixes have propagated between the PEs and test end-to-end connectivity:

R1#sh ip route vrf RED

Routing Table: RED
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

B    192.168.5.0/24 [200/0] via 10.0.3.3, 00:50:35
C    192.168.1.0/24 is directly connected, FastEthernet0/0

R1#show bgp vpnv4 unicast vrf RED
BGP table version is 5, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:1 (default for vrf RED)
*> 192.168.1.0      0.0.0.0                  0         32768 ?
*>i192.168.5.0      10.0.3.3                 0    100      0 200 ?

R1#show bgp vpnv4 unicast vrf RED 192.168.5.0
BGP routing table entry for 100:1:192.168.5.0/24, version 5
Paths: (1 available, best #1, table RED)
  Not advertised to any peer
  200, imported path from 200:1:192.168.5.0/24
    10.0.3.3 (metric 129) from 10.0.3.3 (10.0.3.3)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:100:1 RT:200:1
      Connector Attribute: count=1
       type 1 len 12 value 200:1:20.0.5.5
      mpls labels in/out nolabel/23

R1#traceroute vrf RED 192.168.5.1

Type escape sequence to abort.
Tracing the route to 192.168.5.1

  1 10.0.12.2 [MPLS: Labels 17/23 Exp 0] 432 msec 36 msec 60 msec
  2 10.0.23.3 [MPLS: Label 23 Exp 0] 68 msec 8 msec 36 msec
  3 172.16.34.4 [MPLS: Label 19 Exp 0] 64 msec 16 msec 48 msec
  4 192.168.5.1 12 msec *  8 msec

Notice that in the output above, the prefix 192.168.5.0/24 has the next-hop value of 10.0.3.3 and the BGP Connector attribute value of 200:1:20.0.5.5. This information will be used for RPF checks further when we start feeding multicast traffic.

Validating Multicast Paths

Multicast forwarding is a bit more complicated. The first thing we should do is making sure the MDTs have been built from R1 towards R5 and from R5 towards R1. Check the PIM MDT groups on every PE:

R1#show ip pim mdt 
  MDT Group       Interface   Source                   VRF
* 232.1.1.1       Tunnel0     Loopback0                RED
R1#show ip pim mdt bgp
MDT (Route Distinguisher + IPv4)               Router ID         Next Hop
  MDT group 232.1.1.1
   200:1:20.0.5.5                              10.0.6.6          10.0.6.6

R5#show ip pim mdt 
  MDT Group       Interface   Source                   VRF
* 232.1.1.1       Tunnel0     Loopback0                RED

R5#show ip pim mdt bgp
MDT (Route Distinguisher + IPv4)               Router ID         Next Hop
  MDT group 232.1.1.1
   100:1:10.0.1.1                              20.0.4.4          20.0.4.4

In the output above, pay attention to the next-hop values found in the MDT BGP information. In AS 100 it points toward R6 while in AS 200 it points to R4. Those next-hops are to be used as the proxy vectors for PIM Join messages. Check the mroutes for the tree (20.0.5.5, 232.1.1.1) starting from R1 and climbing up across R2, R6, R4 to R5:

R1#show ip mroute 232.1.1.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(20.0.5.5, 232.1.1.1), 00:58:49/00:02:59, flags: sTIZV
  Incoming interface: Serial2/0.12, RPF nbr 10.0.12.2, vector 10.0.6.6
  Outgoing interface list:
    MVRF RED, Forward/Sparse, 00:58:49/00:01:17

(10.0.1.1, 232.1.1.1), 00:58:49/00:03:19, flags: sT
  Incoming interface: Loopback0, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial2/0.12, Forward/Sparse, 00:58:47/00:02:55

R1#show ip mroute 232.1.1.1 proxy
(20.0.5.5, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
  200:1/10.0.6.6             0.0.0.0          BGP MDT   00:58:51/stopped

R1 shows the RPF proxy value of 10.0.6.6 for the source 20.0.5.5. Notice that there is a tree toward 10.0.1.1, which has been originated from R5. This tree has no proxies, as its now inside its native AS. Next in turn, check R2 and R6 to find the same information (remember that the actual proxy removes the vector when it sees itself in the PIM Join message proxy field):

R2#show ip mroute 232.1.1.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(10.0.1.1, 232.1.1.1), 01:01:41/00:03:25, flags: sT
  Incoming interface: Serial2/0.12, RPF nbr 10.0.12.1
  Outgoing interface list:
    Serial2/0.26, Forward/Sparse, 01:01:41/00:02:50

(20.0.5.5, 232.1.1.1), 01:01:43/00:03:25, flags: sTV
  Incoming interface: Serial2/0.26, RPF nbr 10.0.26.6, vector 10.0.6.6
  Outgoing interface list:
    Serial2/0.12, Forward/Sparse, 01:01:43/00:02:56

R2#show ip mroute 232.1.1.1 proxy 
(20.0.5.5, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
    200:1/10.0.6.6             10.0.12.1        PIM       01:01:46/00:02:23

Notice the same proxy vector for (20.0.5.5, 232.1.1.1) set by R1. As expected, there is “contra-directional” tree built toward R1 from R5, that has no RPF proxy vector. Proceed to the outputs from R6. Notice that R6 knows of two proxy vectors, one of which is R6 itself.

R6#show ip mroute 232.1.1.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(10.0.1.1, 232.1.1.1), 01:05:40/00:03:21, flags: sT
  Incoming interface: Serial2/0.26, RPF nbr 10.0.26.2
  Outgoing interface list:
    Serial2/0.46, Forward/Sparse, 01:05:40/00:02:56

(20.0.5.5, 232.1.1.1), 01:05:42/00:03:21, flags: sTV
  Incoming interface: Serial2/0.46, RPF nbr 172.16.46.4, vector 172.16.46.4
  Outgoing interface list:
    Serial2/0.26, Forward/Sparse, 01:05:42/00:02:51

R6#show ip mroute proxy
(10.0.1.1, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
  100:1/local                172.16.46.4      PIM       01:05:44/00:02:21

(20.0.5.5, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
  200:1/local                10.0.26.2        PIM       01:05:47/00:02:17

The show commands outputs from R4 are similar to R6’s – it’s the proxy for both multicast trees:

R4#show ip mroute 232.1.1.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(10.0.1.1, 232.1.1.1), 01:08:42/00:03:16, flags: sTV
  Incoming interface: Serial2/0.46, RPF nbr 172.16.46.6, vector 172.16.46.6
  Outgoing interface list:
    Serial2/0.45, Forward/Sparse, 01:08:42/00:02:51

(20.0.5.5, 232.1.1.1), 01:08:44/00:03:16, flags: sT
  Incoming interface: Serial2/0.45, RPF nbr 20.0.45.5
  Outgoing interface list:
    Serial2/0.46, Forward/Sparse, 01:08:44/00:02:46

R4#show ip mroute proxy
(10.0.1.1, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
  100:1/local                20.0.45.5        PIM       01:08:46/00:02:17

(20.0.5.5, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
  200:1/local                172.16.46.6      PIM       01:08:48/00:02:12

Finally, the outputs on R5 mirror the ones we saw on R1. However, this time the multicast trees have swapped in their roles: the one toward R1 has the proxy vector set:

R5#show ip mroute 232.1.1.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(10.0.1.1, 232.1.1.1), 01:12:07/00:02:57, flags: sTIZV
  Incoming interface: Serial2/0.45, RPF nbr 20.0.45.4, vector 20.0.4.4
  Outgoing interface list:
    MVRF RED, Forward/Sparse, 01:12:07/00:00:02

(20.0.5.5, 232.1.1.1), 01:13:40/00:03:27, flags: sT
  Incoming interface: Loopback0, RPF nbr 0.0.0.0
  Outgoing interface list:
    Serial2/0.45, Forward/Sparse, 01:12:09/00:03:12

R5#show ip mroute proxy
(10.0.1.1, 232.1.1.1)
  Proxy                      Assigner         Origin    Uptime/Expire
  100:1/20.0.4.4             0.0.0.0          BGP MDT   01:12:10/stopped

R5’s BGP table also has the connector attribute information for R1’s Ethernet interface:

R5#show bgp vpnv4 unicast vrf RED 192.168.1.0
BGP routing table entry for 200:1:192.168.1.0/24, version 6
Paths: (1 available, best #1, table RED)
  Not advertised to any peer
  100, imported path from 100:1:192.168.1.0/24
    20.0.4.4 (metric 65) from 20.0.4.4 (20.0.4.4)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:100:1 RT:200:1
        Connector Attribute: count=1
       type 1 len 12 value 100:1:10.0.1.1
      mpls labels in/out nolabel/18

In addition to the connector attributes, one last piece of information needed is the multicast source information propagated via IPv4 multicast address family. Both R1 and R5 should have this information in their BGP tables:

R5#show bgp ipv4 multicast
BGP table version is 3, local router ID is 20.0.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i10.0.1.1/32      20.0.4.4                 0    100      0 100 i
*> 20.0.5.5/32      0.0.0.0                  0         32768 i

R1#show bgp ipv4 multicast
BGP table version is 3, local router ID is 10.0.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.0.1.1/32      0.0.0.0                  0         32768 i
*>i20.0.5.5/32      10.0.6.6                 0    100      0 200 i

Now it’s time to verify the multicast connectivity. Make sure R1 and R5 see each other as PIM neighbors over the MDT and than do a multicast ping toward the group joined by both R1 and R5:

R1#show ip pim vrf RED neighbor 
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
      S - State Refresh Capable
Neighbor          Interface                Uptime/Expires    Ver   DR
Address                                                            Prio/Mode
20.0.5.5          Tunnel0                  01:31:38/00:01:15 v2    1 / DR S P

R5#show ip pim vrf RED neighbor 
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
      S - State Refresh Capable
Neighbor          Interface                Uptime/Expires    Ver   DR
Address                                                            Prio/Mode
10.0.1.1          Tunnel0                  01:31:17/00:01:26 v2    1 / S P

R1#ping vrf RED 239.1.1.1 repeat 100

Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:

Reply to request 0 from 192.168.1.1, 12 ms
Reply to request 0 from 20.0.5.5, 64 ms
Reply to request 1 from 192.168.1.1, 8 ms
Reply to request 1 from 20.0.5.5, 56 ms
Reply to request 2 from 192.168.1.1, 8 ms
Reply to request 2 from 20.0.5.5, 100 ms
Reply to request 3 from 192.168.1.1, 16 ms
Reply to request 3 from 20.0.5.5, 56 ms

This final verification concludes our testbed verification.

Summary

In this blog post we demonstrated how MP-BGP and PIM extensions could be used to effectively implement Inter-AS multicast VPN between the autonomous systems with BGP-free cores. PIM SSM is used to build the inter-AS trees and MDT SAFI is used to discover the MDT group addresses along with the PEs associated with those. PIM RPF proxy vector allows for successful RPF checks in the multicast-route free core, by the virtue of proxy IPv4 address. Finally, BGP connector attribute allows for successful RPF checks inside a particular VRF.

Further Reading

Multicast VPN (draft-rosen-vpn-mcast)
Multicast Tunnel Discovery (draft-wijnands-mt-discovery)
PIM RPF Vector (draft-ietf-pim-rpf-vector-08)
MDT SAFI (draft-nalawade-idr-mdt-safi)

MDT SAFI Configuration
PIM RPF Proxy Vector Configuration

About Petr Lapukhov, 4xCCIE/CCDE:

Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.

Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website


You can leave a response, or trackback from your own site.

14 Responses to “Inter-AS mVPNs: MDT SAFI, BGP Connector & RPF Proxy Vector”

 
  1. IK says:

    Thank you Petr, great as always! very useful for SP students.

  2. Dani Petrov says:

    Just excellent! That’s why you guys are number 1 in CCIE preparation courses. Keep it that way! TWO thumbs up!!

    Best regards,
    Dani Petrov

  3. David Franjoso says:

    Excellent ! Even after i made my CCIE (with your help), IE is one of the most important sources of interesting material on the WEB. Thanks PETR

  4. Wow ;) I guess you had great fun figuring this one out.

  5. Ted says:

    Petr,

    For clarification, this way of achieving inter-AS mVPN would not be present in the SP Lab, correct? I’m saying this because MDT SAFI is not supported on the IOS for the SP Lab as far as I know. If this is correct, would using GRE Tunnels be a valid solution in the lab?

    Ted

    • @Ted

      Correct, the software used in current SP exam supports MDT information propagation using the reserved RD:2 but this only allows for intra-AS PIM SSM support. For inter-AS VPNs you would need to use PIM-SM and fully propagate multicast routing information across the systems.

  6. Swap #19804 says:

    Current SP CCIE lab IOS 12.2S has few restrictions when it comes to InterAS VPNs…i have summarized the details -

    http://eminent-ccie.blogspot.com/2009/12/issues-in-122s-inter-as-and-csc.html

    Swap
    #19804

  7. shivlu jain says:

    Petr
    In case of inter-as communication, I would like to add one more thing when we have cisco and juniper communication:-
    For the data MDT, the method to signal the source address is described in draft-rosen-vpn-mcast section 7.2, which is supported by both IOS and JUNOS.
    http://www.potaroo.net/ietf/idref/draft-rosen-vpn-mcast/#page-19

    For the default MDT, the signaling in IOS is done using draft-nalawade-idr-mdt-safi, which is not supported in JUNOS.
    http://tools.ietf.org/html/draft-nalawade-idr-mdt-safi-03

    It means for interas communication, we have to use ssm for data mdt and anycast rp for default mdt.

    http://www.mplsvpn.info/2009/02/multicast-vpn-faq.html

  8. [...] scenarios. You may want to read the following article for a discussion of Inter-AS mVPN features Inter-AS mVPNs: MDT SAFI, BGP Connector and RPF Proxy Vector . However, for the long time the question with MPVNs was – could we replace IP tunneling with MPLS [...]

  9. vrf80 says:

    Hi,

    Does R5 have any use for the connector attribute in this scenario, seeing that it only has one exit point and everything points to 20.0.4.4? Does R1 use it inside the vrf for say a source in 192.168.5.0 – it sees the next hop as 20.0.5.5, looks up the connector attr being 10.0.6.6 and uses that as in the pim-joins? I might be a little off, feel free to clarify:-)

  10. [...] Inter-AS mVPNs: MDT SAFI, BGP Connector and RPF Proxy Vector [...]

  11. mihai says:

    Could u pls share with us the IOS ver and HW u used for this minilab?
    Thank you.

  12. Mbong Hudson says:

    I have just taking off a great deal of time to go through this because I couldn’t respond positively to this during a test. Thank you for exposing my stupidity because I am about to lab it and it looks much easier after reading it than what I thought. Thanks

 

Leave a Reply

Categories

CCIE Bloggers