Unified MPLS

Also known as seamless MPLS, or hierarchical MPLS. It’s a way to scale to a very large network with multiple IGP domains. Let’s get started exploring this feature.

I’ll use the below topology as we go along.

Unified MPLS Topology

What’s new here is the RFC 3107 BGP IPv4 + label and the fact that we have multiple IGP domains. In this topology we have both an OSPF domain to the left between R2-R3-R4. Also to the right we have ISIS between R4-R5-R6.

We still have a VPNv4 peering between our PE routers which give us the NLRI for the customer prefixes along with a VPN label for these. So far so good. The glue between the PE routers is what makes up unified MPLS. So to get reachability between our PE routers without doing a full redistribution between our IGP domains, we use BGP IPv4 and make our ASBRs inline RRs. Additionally we set NH to the RRs. This is done because we keep the IGP domain separate and use BGP to allocate the label for transport of the egress PE’s loopback0 interface. In total, this builds up a hierarchical MPLS. We still need LDP in the IGP domains to provide a transport label between PE-ASBR and ASBR-ASBR (if we had any) and lastly ASBR-PE. The only thing that changes is that we need an extra label at our ingress PE. That’s how it works. Let’s have a look at the configuration and verification.

Configuration

! R2 (PE)
interface Loopback0
ip address 2.2.2.2 255.255.255.255
ip ospf 1 area 0
!
vrf definition a
rd 2.2.2.2:10
route-target export 65000:10
route-target import 65000:10
!
address-family ipv4
exit-address-family
!
mpls label range 200 299
mpls ldp router-id Loopback0 force
!
router ospf 1
router-id 2.2.2.2
mpls ldp autoconfig
!
router bgp 65000
bgp router-id 2.2.2.2
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 4.4.4.4 remote-as 65000
neighbor 4.4.4.4 update-source Loopback0
neighbor 6.6.6.6 remote-as 65000
neighbor 6.6.6.6 update-source Loopback0
!
address-family ipv4
network 2.2.2.2 mask 255.255.255.255
neighbor 4.4.4.4 activate
neighbor 4.4.4.4 send-label
exit-address-family
!
address-family vpnv4
neighbor 6.6.6.6 activate
neighbor 6.6.6.6 send-community extended
exit-address-family
!
address-family ipv4 vrf a
neighbor 10.0.12.1 remote-as 65001
neighbor 10.0.12.1 activate
exit-address-family

Beginning with our PE, R2. Everything is pretty standard until we reach the ipv4 address-family under BGP. Here we start being a bit uncomfortable if we’re used to regular MPLS VPNs. But nothing to worry about. We’re just using IPv4+labels to allocate a transport label for the PE loopbacks for use in the other IGP domains.

R3 I’m not going to show as it only has OSPF + LDP configured, so nothing new here.

R4 on the other hand have have lots of new stuff going on. Let’s look:

! R4 (ASBR/RR)
interface Loopback0
ip address 4.4.4.4 255.255.255.255
ip ospf 1 area 0
!
mpls ldp router-id Loopback0 force
mpls label range 400 499
!
router ospf 1
router-id 4.4.4.4
mpls ldp autoconfig
!
router isis
net 49.0001.0040.0400.4004.00
is-type level-2-only
advertise passive-only
metric-style wide
passive-interface Loopback0
mpls ldp autoconfig
!
router bgp 65000
bgp router-id 4.4.4.4
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 2.2.2.2 remote-as 65000
neighbor 2.2.2.2 update-source Loopback0
neighbor 6.6.6.6 remote-as 65000
neighbor 6.6.6.6 update-source Loopback0
!
address-family ipv4
neighbor 2.2.2.2 activate
neighbor 2.2.2.2 route-reflector-client
neighbor 2.2.2.2 next-hop-self all
neighbor 2.2.2.2 send-label
neighbor 6.6.6.6 activate
neighbor 6.6.6.6 route-reflector-client
neighbor 6.6.6.6 next-hop-self all
neighbor 6.6.6.6 send-label
exit-address-family

If we begin by looking at our IGP configs. This is an ASBR due to the fact that we border between two IGP domains – OSPF and ISIS. Note that in this example, there is no need to redistribute between the two, but had we run OSPF in both domains using separate processes, we would have had to redistribute the loopback between the two, because you can only enable an interface for one routing process at a time.

With the above example, I’ve simply advertising loopback0 into both OSPF and ISIS making it possible to both R2 and R6 to reach R4 which ultimately provides full reachability between our PE (along with all the other stuff in BGP, too, obviously).

The other stuff in BGP is the things that happen in our ipv4 address-family. Again we see configuration that isn’t used in regular MPLS VPN configurations. To be precise we have the RR configuration along with the NHS all config. The ‘all’ keyword is necessary because we’re using iBGP where the NH isn’t supposed to be changed. Taking directly from a CIsco command reference:

"(Optional) Specifies that the next hop of both eBGP- and iBGP-learned routes is updated by the route reflector (RR)."

Verification

If we traceroute from R1 to R7, let’s see which labels are being used:

R1#traceroute 7.7.7.7 so lo0
Type escape sequence to abort.
Tracing the route to 7.7.7.7
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.12.2 2 msec 1 msec 1 msec
2 10.0.23.3 [MPLS: Labels 301/405/603 Exp 0] 3 msec 2 msec 2 msec
3 10.0.34.4 [MPLS: Labels 405/603 Exp 0] 2 msec 2 msec 2 msec
4 10.0.45.5 [MPLS: Labels 500/603 Exp 0] 7 msec 11 msec 16 msec
5 10.0.67.6 [MPLS: Label 603 Exp 0] 2 msec 2 msec 2 msec
6 10.0.67.7 6 msec * 2 msec
R1#

So first hop is the client facing interface of our ingress PE, R2. Nothing special here, just IPv4 unicast.
Next we land on our P router, R3. here we see a label stack of three labels! With regular MPLS VPNs we only have 2 labels in our stack when traversing the SP core. I’ve also done a wireshark capture on Gi1.23 of R2:

Ping request from R1 to R7

Sure enough the same labels as we see in our traceroute are present in the ping. But why do we impose three labels on R2? Let’s have a look:

R2#sh ip cef vrf a 7.7.7.7/32 detail 
7.7.7.7/32, epoch 0, flags [rib defined all labels]
recursive via 6.6.6.6 label 603
recursive via 4.4.4.4 label 405
nexthop 10.0.23.3 GigabitEthernet1.23 label 301-(local:204)
R2#

Now the term hierarchical MPLS starts to make sense. So we start by putting label 603 into the stack. This is our VPN label that should be received from R6 via BGP VPNv4. Let’s double check:

R2#sh bgp vpnv4 u vrf a 7.7.7.7/32 
BGP routing table entry for 2.2.2.2:10:7.7.7.7/32, version 8
Paths: (1 available, best #1, table a)
Advertised to update-groups:
2
Refresh Epoch 1
65007, imported path from 6.6.6.6:10:7.7.7.7/32 (global)
6.6.6.6 (metric 3) (via default) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100, valid, internal, best
Extended Community: RT:65000:10
mpls labels in/out nolabel/603
rx pathid: 0, tx pathid: 0x0
R2#

Sure enough we have an out label of 603 received from 6.6.6.6, R6. Great! The next label that R2 imposes is 405. We have this extra label due to the fact that R4 changes NH to self for 6.6.6.6/32. This means that we get an extra LSP endpoint, R4, on our way to R6. Hence we must have a label for the path.

Finally we push label 301 to be able to reach R4. This label gets pop’ed by R3 because of PHP before landing on R4 which is what we see at hop 3 (R4) in our traceroute. From here the labels for 6.6.6.6/32 are just swapped until we reach our final destination, the egress PE, R6.

If we had had another IGP domain between OSPF and ISIS, a label stack of 3 would also be imposed by the ASBR, because we’d have a BGP IPv4+label here too, just like we saw on R2.

Legacy

Although unified MPLS makes it possible for an SP to build huge networks, it is an old way of scaling the network. Segment Routing has shown up to replace the way of using MPLS in a very scalable way. I’ll write a post about SR another time. For now, I hope the above post about unified MPLS was worth the read. Thanks!

Multicast VPN Extranet

This post talks about how you can do inter-VRF multicast using BGP VPNv4 multicast (SAFI 129).

You might have seen this guide on Cisco.com
Configuring Multicast VPN Extranet Support

They suggest leaking unicast routes between VRFs which isn’t required for this to work.

I’m using this topology to go through the configuration:

Our source is R9 in VRF a which is configured as rosen draft on R10 and R12. The configuration hereof is plain:

! R10
vrf definition a
rd 10.10.10.10:10
route-target export 65000:10
route-target import 65000:10
!
address-family ipv4
mdt default 232.0.0.10
mdt data 232.0.1.0 0.0.0.255 threshold 1
mdt data threshold 1
exit-address-family
!
router bgp 65000
bgp router-id 10.10.10.10
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 11.11.11.11 remote-as 65000
neighbor 11.11.11.11 update-source Loopback0
!
address-family vpnv4
neighbor 11.11.11.11 activate
neighbor 11.11.11.11 send-community extended
exit-address-family
!
address-family vpnv4 multicast
neighbor 11.11.11.11 activate
neighbor 11.11.11.11 send-community extended
exit-address-family
!
address-family ipv4 mdt
neighbor 11.11.11.11 activate
neighbor 11.11.11.11 send-community extended
exit-address-family
!
address-family ipv4 vrf a
neighbor 10.0.109.9 remote-as 65009
neighbor 10.0.109.9 activate
exit-address-family
!
address-family ipv4 multicast vrf a
network 10.0.109.0 mask 255.255.255.0
exit-address-family

What might not seem so plain with this configuration, is the BGP part. With L3 MPLS VPNs we’re used to configuring `address-family vpnv4` which implies unicast. Since we’re not interested in using unicast for RPF, we need some way of distributing multicast routes for RPF checks. This we can do using a multicast address-family. In this case I’m using VPNv4 multicast which gives us the capability of using importing the because of route-targets. Specifically we’re exporting the prefix 10.0.109.0/24 into VPNv4 multicast.

Let’s look at the other side, R12.

! R12
vrf definition a
rd 12.12.12.12:10
route-target export 65000:10
route-target import 65000:10
!
address-family ipv4
mdt default 232.0.0.10
mdt data 232.0.1.0 0.0.0.255 threshold 1
mdt data threshold 1
exit-address-family
!
vrf definition b
ipv4 multicast multitopology
rd 12.12.12.12:11
route-target export 65000:11
route-target import 65000:11
!
address-family ipv4
exit-address-family
!
address-family ipv4 multicast
topology base
route-replicate from vrf a multicast all

!
exit-address-family
!
router bgp 65000
bgp router-id 12.12.12.12
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 11.11.11.11 remote-as 65000
neighbor 11.11.11.11 update-source Loopback0
!
address-family ipv4
neighbor 11.11.11.11 activate
exit-address-family
!
address-family vpnv4
neighbor 11.11.11.11 activate
neighbor 11.11.11.11 send-community extended
exit-address-family
!
address-family vpnv4 multicast
neighbor 11.11.11.11 activate
neighbor 11.11.11.11 send-community extended
exit-address-family
!
address-family ipv4 mdt
neighbor 11.11.11.11 activate
neighbor 11.11.11.11 send-community extended
exit-address-family
!
address-family ipv4 vrf b
neighbor 10.12.14.14 remote-as 65014
neighbor 10.12.14.14 activate
exit-address-family

Now the fun begins. Instead of leaking unicast routes using RTs, we configure the address-family ipv4 multicast under the receiver vrf, VRF b. Note, this requires you to configure multicast multitopology first. Now we can replicate the multicast routes from VRF a into VRF b. In this case I import all multicast routes, but one could specify a route-map making the import focus on specific multicast routes.

R12# sh bgp vpnv4 multicast vrf a 10.0.109.0/24
BGP routing table entry for 12.12.12.12:10:10.0.109.0/24, version 13
Paths: (1 available, best #1, table a:multicast)
Not advertised to any peer
Refresh Epoch 7
Local, imported path from 10.10.10.10:10:10.0.109.0/24 (global)
10.10.10.10 (metric 3) (via default) from 11.11.11.11 (11.11.11.11)
Origin IGP, metric 0, localpref 100, valid, internal, best
Extended Community: RT:65000:10
Originator: 10.10.10.10, Cluster list: 11.11.11.11
Connector Attribute: count=1
type 1 len 12 value 10.10.10.10:10:10.10.10.10
rx pathid: 0, tx pathid: 0x0
R12#
R12#sh ip route multicast vrf b bgp | be +
+ - replicated route, % - next hop override
Gateway of last resort is not set

10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
B + 10.0.109.0/24 [200/0] via 10.10.10.10, 00:08:32
R12#
R12#sh ip rpf vrf b 10.0.109.10
RPF information for ? (10.0.109.10)
RPF interface: Tunnel1
RPF neighbor: ? (10.10.10.10)
RPF route/mask: 10.0.109.0/24
RPF type: multicast (bgp 65000)
Doing distance-preferred lookups across tables
Using Extranet RPF Rule: BGP Imported Route, RPF VRF: b
RPF topology: ipv4 multicast base, originated from ipv4 unicast base
R12#

So now we’re able to perform RPF check in the receiver vrf, VRF b, on the 10.0.109.0/24 prefix which is where the RP in VRF a is located, on R10. Let’s look at the RP configuration:

! R10
ip pim vrf a bsr-candidate GigabitEthernet1.109 0
ip pim vrf a rp-candidate GigabitEthernet1.109

! R12
ip pim vrf b rp-address 10.0.109.10

! Verification:
R12#sh ip pim vrf a rp mapping
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/4
RP 10.0.109.10 (?), v2
Info source: 10.0.109.10 (?), via bootstrap, priority 0, holdtime 150
Uptime: 00:15:54, expires: 00:01:45
R12#
R12#sh ip pim vrf b rp mapping
PIM Group-to-RP Mappings
Group(s): 224.0.0.0/4, Static
RP: 10.0.109.10 (?)
R12#

Now if we join a multicast group on R14, our receiver, we should be state all the way to the RP on R10. Let’s try:

! R14
interface GigabitEthernet1.1214
encapsulation dot1Q 1214
ip address 10.12.14.14 255.255.255.0
ip pim dr-priority 0
ip pim sparse-mode
ip igmp join-group 224.10.11.14

! R12
R12#sh ip mroute vrf b 224.10.11.14 | be (
(*, 224.10.11.14), 00:00:15/stopped, RP 10.0.109.10, flags: SJC
Incoming interface: Tunnel1, RPF nbr 10.10.10.10, using vrf a, Mbgp
Outgoing interface list:
GigabitEthernet1.1214, Forward/Sparse, 00:00:15/00:02:56
R12#
R12#sh ip mroute vrf a 224.10.11.14 | be (
(*, 224.10.11.14), 00:01:15/00:01:44, RP 10.0.109.10, flags: SJCE
Incoming interface: Tunnel1, RPF nbr 10.10.10.10, Mbgp
Outgoing interface list: Null
Extranet receivers in vrf b:
(*, 224.10.11.14), 00:01:15/stopped, RP 10.0.109.10, OIF count: 1, flags: SJC
R12#

! R10
R10#sh ip mroute vrf a 224.10.11.14 | be (
(*, 224.10.11.14), 00:01:44/00:02:45, RP 10.0.109.10, flags: S
Incoming interface: Null, RPF nbr 0.0.0.0
Outgoing interface list:
Tunnel2, Forward/Sparse, 00:01:44/00:02:45
R10#

Great! We have multicast state for 224.10.11.14 all the way through VRF b to VRF a on R12 and to VRF a on R10. Notice the IIL (Incoming Interface List) on R10 is Null, because we haven’t generated any traffic for this group yet. But the main thing is that we have state, and we can actually see the connection between the VRFs on R12 in VRF a.

Last thing to do is check reachability by pinging the group on R9, our source.

! R9
R9#ping 224.10.11.14 rep 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 224.10.11.14, timeout is 2 seconds:
………
R9#

! R10
R10#show ip mfib vrf a 224.10.11.14 count
Forwarding Counts: Pkt Count/Pkts per second/Avg Pkt Size/Kilobits per second
Source: 10.0.109.9,
SW Forwarding: 0/0/0/0, Other: 0/0/0
HW Forwarding: 10/0/117/0, Other: 0/0/0
R10#
R10#sh ip mroute vrf a 224.10.11.14 | be \(10.0.109.
(10.0.109.9, 224.10.11.14), 00:02:34/00:00:25, flags: T
Incoming interface: GigabitEthernet1.109, RPF nbr 10.0.109.9
Outgoing interface list:
Tunnel2, Forward/Sparse, 00:02:34/00:03:06
R10#

! R12
R12#sh ip mroute vrf a 224.10.11.14 | be \(10.0.109.
(10.0.109.9, 224.10.11.14), 00:02:21/00:00:38, flags: JTE
Incoming interface: Tunnel1, RPF nbr 10.10.10.10, Mbgp
Outgoing interface list: Null
Extranet receivers in vrf b:
(10.0.109.9, 224.10.11.14), 00:02:21/stopped, OIF count: 1, flags: T
R12#
R12#sh ip mroute vrf b 224.10.11.14 | be \(10.0.109.
(10.0.109.9, 224.10.11.14), 00:00:11/stopped, flags: T
Incoming interface: Tunnel1, RPF nbr 10.10.10.10, using vrf a, Mbgp
Outgoing interface list:
GigabitEthernet1.1214, Forward/Sparse, 00:00:11/00:02:48
R12#
R12#sh ip pim vrf a mdt
implies mdt is the default MDT, # is (,) Wildcard,
is non-(,) Wildcard
MDT Group/Num Interface Source VRF
232.0.0.10 Tunnel1 Loopback0 a
R12#
R12#sh ip pim vrf a mdt bgp
MDT (Route Distinguisher + IPv4) Router ID Next Hop
MDT group 232.0.0.10
12.12.12.12:10:10.10.10.10 11.11.11.11 10.10.10.10
R12#

Note the E flag in the above output: E – Extranet

And the IIL in VRF b is Tunnel1 which is the multicast tunnel interface for our rosen draft implementation using group address 232.0.0.10 for transport in VRF a. And we build the tunnel with R10 which is the VPNv4 next-hop for 232.0.0.10.

Oh, and it did look like the pings didn’t work, but with a debug ip icmp on R14, we can in fact see, that R14 tries to reply, but since it doesn’t have a unicast route back to 10.0.109.9, the reply never reaches R9.

! R14
*Jan 4 16:09:58.357: ICMP: echo reply sent, src 10.12.14.14, dst 10.0.109.9, topology BASE, dscp 0 topoid 0

That’s all for now. I hope you enjoyed seeing one way of using BGP to provide RPF check for an inter-VRF multicast extranet solution. And alternative could be static mroutes or the ip multicast rpf select feature.

LDP

Label Distribution Protocol (LDP) is one of the protocols that can be used for MPLS to distribute labels. Other protocols are RSVP-TE, BGP, and IGPs (ISIS and OSPF). This short post addresses LDP and how it works.

I’m using two routers to talk about LDP in this post. R2 and R3 that are connected like this:

Configuration

LDP is very simple to configure. It is basically just one command needed.

! R2 (IOS-XE)
interface GigabitEthernet1.23
 encapsulation dot1Q 23
 ip address 10.0.23.2 255.255.255.0
 ip router isis 1
 mpls ip
 isis network point-to-point

For IOS XR the configuration is this:

mpls ldp
 interface GigabitEthernet0/0/0/0.23

Packets

LDP establishes adjacencies between directly connected routers. It does so by using a multicast hello packet that looks like this:

This is an LDP Hello Message sent by LSR (Label Switching Router) R3. This we can see by looking at the IP header which is sourced from 10.0.23.3. We can also look inside the MPLS header and find the LSR ID of 3.3.3.3 which is R3’s Loopback0 address. The destination of the packet is 224.0.0.2 which is the link local all routers multicast address. And the TTL (not shown) is set to 1. So this truly is a link local discovery message asking to see if any other LDP enabled routers are available on the link. The UDP port numbers are 646 both for the source and destination ports. Finally, in the LDP header, we see an IPv4 transport address. This lets other LDP enabled routers known to which address they should form a TCP session once they’ve discovered each other. So this address has to be routable.

Next in the process of establishing an adjacency the routers send LDP Initialization Messages:

These messages are sent after the three way TCP handshake and lets each know the capabilities of each other along with an authentication TLV.

Finally LDP exchanges labels using Address packets:

First (not expanded) is an LDP header that contains a keepalive. The last LDP header contains the addresses bound to ourselves (essentially all our interface addresses). Here we also see the various Label Mapping Messages that contains the actual labels to be used by the neighbor (R2). I’ve expanded one of them listing the prefix 3.3.3.3/32 which is the loopback of R3 (the advertising router). The label value is 3 which is actually referred to as an implicit-null label. This means that the router receiving this label (R2) should pop (remove) the label before sending the packet to this router (R3). We call this process PHP (Penultimate Hop Popping).

Verification

We can verify LDP by looking at R2:

R2#sh mpls ldp bindings 3.3.3.3 32
  lib entry: 3.3.3.3/32, rev 6
        local binding:  label: 200
        remote binding: lsr: 3.3.3.3:0, label: imp-null
R2#

This is the LRIB (Label Routing Information Base) of R2 for prefix 3.3.3.3/32. Here we see the label that we (R2) has allocated for 3.3.3.3/32 (200) and also we see the received binding from R3 which is the imp-null label.

The LFIB (Label Forwarding Information Base) also verifies this:

R2#sh mpls forwarding-table 3.3.3.3 32 
Local      Outgoing   Prefix         Bytes Label   Outgoing   Next Hop    
Label      Label      or Tunnel Id   Switched      interface              
200        Pop Label  3.3.3.3/32     0             Gi1.23     10.0.23.3   
R2#

If R2 had received multiple remote bindings for 3.3.3.3/32 it would pick the label advertised by the LDP neighbor found by looking at the routing table and the LDP neighbor table. Let me demonstrate this. Again looking at R2:

R2#sh ip route 3.3.3.3
Routing entry for 3.3.3.3/32
  Known via "isis", distance 115, metric 10, type level-2
  Redistributing via isis 1
  Last update from 10.0.23.3 on GigabitEthernet1.23, 3d20h ago
  Routing Descriptor Blocks:
  * 10.0.23.3, from 3.3.3.3, 3d20h ago, via GigabitEthernet1.23
      Route metric is 10, traffic share count is 1
R2#

We have a route for 3.3.3.3/32 received from R3 (3.3.3.3). The next hop is 10.0.23.3 out Gi1.23. This information, in particular the next hop, can be used to decided from which neighbor we should pick a label for 3.3.3.3/32. We do so by looking at all our LDP neighbors:

R2#sh mpls ldp neighbor
    Peer LDP Ident: 3.3.3.3:0; Local LDP Ident 2.2.2.2:0
        TCP connection: 3.3.3.3.18131 - 2.2.2.2.646
        State: Oper; Msgs sent/rcvd: 84/83; Downstream
        Up time: 01:04:08
        LDP discovery sources:
          GigabitEthernet1.23, Src IP addr: 10.0.23.3
        Addresses bound to peer LDP Ident:
          10.0.23.3       10.0.34.3       3.3.3.3         
R2#

The important piece of information here is the Addresses bound to peer LDP Ident. This is where we correlate the next hop to the LDP neighbor. In this case R3 as the LDP neighbor. Now we know that R2 must use the imp-null (pop) action when sending packets to 3.3.3.3/32.

mVPN – Profile 0 aka. “Rosen Draft”

mVPN Profile 0 is the original way of doing mVPN when we had no extensions to LDP. This means we need PIM in the underlay – the SP core. Specifically the profile is called:

Profile 0 Default MDT - GRE - PIM C-mcast Signaling

Topology

I’ll use the following topology to go through and explain the concepts and workings of Rosen Draft with Data MDT.

Default MDT

MDT (Multicast Distribution Tree) is referring to the SPs global environment – the underlay. And the Default MDT is what is used to connect the PEs together in a LAN like NBMA fashion where every PE will become PIM neighbors with all other PEs. We need the Default MDT to provide the underlay for the customers multicast traffic that travels inside the mVRF (multicast VRF).

To be able to join the Default MDT the PEs need the sources of every other PEs. How does the PE know the source IPs of the other PEs? We use BGP IPv4 MDT for the signaling of the (S,G) that should be joined for the Default MDT. SSM is the only viable solution with the Default MDT although you could do ASM, but why would you? So the configuration in the cores is really simple:

! P routers
ip multicast-routing distributed
ip pim ssm default
!
int x/y
 ip pim sparse-mode
! PE routers
ip multicast-routing distributed
ip multicast-routing vrf x distributed
ip pim ssm default
!
int x/y
 ip pim sparse-mode 
!
int Loopback0
 ip pim sparse-mode
!
router bgp 65000
 neighbor rr peer-group
 neighbor rr remote-as 65000
 neighbor rr update-source Loopback0
 neighbor 3.3.3.3 peer-group rr
 !
 address-family vpnv4
  neighbor rr send-community extended
  neighbor 3.3.3.3 activate
 exit-address-family
 !
 address-family ipv4 mdt
  neighbor 3.3.3.3 activate
 exit-address-family
 !
 address-family ipv4 vrf a
  redistribute connected metric 0
 exit-address-family
! RR
router bgp 65000
 neighbor rr-client peer-group
 neighbor rr-client remote-as 65000
 neighbor rr-client update-source Loopback0
 neighbor 2.2.2.2 peer-group rr-client
 neighbor 4.4.4.4 peer-group rr-client
 !
 address-family vpnv4
  neighbor rr-client send-community extended
  neighbor rr-client route-reflector-client
  neighbor 2.2.2.2 activate
  neighbor 4.4.4.4 activate
 exit-address-family
 !
 address-family ipv4 mdt
  neighbor rr-client route-reflector-client
  neighbor 2.2.2.2 activate
  neighbor 4.4.4.4 activate
 exit-address-family

On the PE routers you will need the regular RP configuration or SSM configuration in the mVRF.

When you configure the Default MDT group in the VRF, a MTI (Multicast Tunnel Interface) is created. It is of type mGRE using the VPNv4 update-source as the source interface and it is also unnumbered to this interface. Here is what the configuration looks like:

interface Tunnel0
 ip unnumbered Loopback0
 no ip redirects
 ip mtu 1500
 tunnel source Loopback0
 tunnel mode gre multipoint

Although we can’t see that the tunnel is a member of a VRF from the derived-config, it is clearly seen when viewing the VRF:

R4#sh vrf
  Name         Default RD            Protocols   Interfaces
  a            4.4.4.4:10            ipv4        Gi1.46
                                                 Tu0

It is a bit like having a backdoor VRF with a DMVPN tunnel interface and the underlay (source) in global (no frontdoor VRF).

The VRF is configured like this:

! VRF a on R4
vrf definition a
 rd 4.4.4.4:10
 route-target export 65000:10
 route-target import 65000:10
 !
 address-family ipv4
  mdt default 232.1.0.10
  mdt data 232.1.1.0 0.0.0.255 threshold 1
  mdt data threshold 1
 exit-address-family

Ignore the mdt data part for now. We see the mdt default configuration with the group 232.1.0.10. You must use uniq groups per VRF.

To join the other PEs we need their addresses. This is received using BGP IPv4 MDT:

R4#sh bgp ipv4 mdt all 2.2.2.2/32
BGP routing table entry for 2.2.2.2:10:2.2.2.2/32 version 2
Paths: (1 available, best #1, table IPv4-MDT-BGP-Table)
  Not advertised to any peer
  Refresh Epoch 2
  Local
    2.2.2.2 from 3.3.3.3 (3.3.3.3)
      Origin incomplete, metric 0, localpref 100, valid, internal, best
      Originator: 2.2.2.2, Cluster list: 3.3.3.3,
      MDT group address: 232.1.0.10

      rx pathid: 0, tx pathid: 0x0
R4#

Here we see an update that originated from R2 and was reflected by R3. It has a next-hop matching its VPNv4 update-source which is Loopback0 with an IP address of 2.2.2.2/32. Also it contains the MDT group address 232.1.0.10 for the Default MDT. Now we can join the Default/MDT. This information is used by PIM:

R4#sh ip pim mdt bgp 
MDT (Route Distinguisher + IPv4)    Router ID    Next Hop
  MDT group 232.1.0.10
   4.4.4.4:10:2.2.2.2               3.3.3.3      2.2.2.2
R4#

And we can see in the multicast routing table (in global) that we in fact did join the Default MDT:

R4#sh ip mroute 232.1.0.10
IP Multicast Routing Table
Flags: s - SSM Group
T - SPT-bit set,
I - Received Source Specific Host Report, 
Z - Multicast Tunnel

(4.4.4.4, 232.1.0.10), 00:00:30/00:02:59, flags: sT
  Incoming interface: Loopback0, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1.34, Forward/Sparse, 00:00:30/00:02:59
(2.2.2.2, 232.1.0.10), 00:28:51/stopped, flags: sTIZ
  Incoming interface: GigabitEthernet1.34, RPF nbr 10.0.34.3
  Outgoing interface list:
    MVRF a, Forward/Sparse, 00:28:51/00:01:08

R4#

This should give us a PIM neighbor over the MTI in VRF a:

R4#sh ip pim vrf a nei
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
      P - Proxy Capable, S - State Refresh Capable, G - GenID Capable,
      L - DR Load-balancing Capable
Neighbor        Interface             Uptime/Expires    Ver   DR
Address                                                            Prio/Mode
2.2.2.2         Tunnel0               00:02:29/00:01:43 v2    1 / S P G
R4#

A bit like Layer 3 MPLS VPNs where we transit the SP core using the global table. So the customer multicast traffic in the overlay travels over the MTI which uses the Default MDT (the underlay).

At this point we have no mroute entries in VRF a, but the network is ready to serve the customers multicast traffic via the Default MDT.

Data MDT

Why do we need a Data MDT? Well, the Default MDT is joined by all PEs, meaning that all PEs will receive all multicast packets. This is not very efficient and reminds us of the old days where we had PIM dense mode. The idea of the Data MDT is that once a certain threshold (in kbps) is passed, the ingress PE (so the PE with the source behind it) will send a PIM Data-MDT Join message over the Default MDT so all egress PEs receive it. The message contains (S,G,MDT) where MDT is a new multicast group taken from a range of addresses specified under the VRF. In this case that would be 232.1.1.0/24.

The PIM Data-MDT Join message looks like this:

So the destination is the All PIM Routers address 224.0.0.13 of the inner IP packet. Its a UDP packet and the Data part should hold the (S,G,MDT) information. The outer IP header uses the Default MDT group address as destination.

Immediately we receive a PIM Join message initiate by the egress PE. This is sent upstream towards the ingress PE. It looks like this:

After 3 seconds the ingress PE stops sending the stream on the Default MDT and now sends it only over the Data MDT, meaning that only those egress PEs with receivers behind them will receive the traffic. Here is the wireshark to confirm this:

The Data-MDT Join in the top (red) was at time 47.59. The next ICMP Echo sent is at 49.18 which isn’t after the 3 seconds, so this was sent over the Default MDT. Now the highlighted packet shows the new Data MDT Group which is 232.1.1.0 (the first available IP in the Data MDT range). The switchover has happened and the Default MDT is no longer used for this communication to 232.5.5.5 (the inner multicast destination – the customer packet).
The PEs that currently do not have receivers behind them, will cache the Data-MDT Join information for fast signaling if a receiver should send a report requesting the stream.

Data MDT groups can be reused. They time out after a period of inactivity.

MPLS

As the profile name says we’re using GRE to transport the customers traffic across the SP core. Actually the packet is multicast in multicast. NO MPLS is used in the forwarding of multicast! But why do we need MPLS then? Well, multicast relies heavily on RPF (Reverse Path Forwarding) check to avoid loops and duplicate packets. So for the mVRF we must have a unicast IP address for RPF checks. For this we use normal Layer 3 MPLS VPNs that does require MPLS.

RPF

RPF in the mVRF can be a bit confusing. Here I’m thinking about how the egress PE (so the PE with receivers behind it) does RPF check when the multicast stream is received on the MTI.

R4#sh ip rpf vrf a 10.0.12.1
RPF information for ? (10.0.12.1)
  RPF interface: Tunnel0
  RPF neighbor: ? (2.2.2.2)
  RPF route/mask: 10.0.12.0/24
  RPF type: unicast (bgp 65000)
  Doing distance-preferred lookups across tables
  BGP originator: 2.2.2.2
  RPF topology: ipv4 multicast base, originated from ipv4 unicast base
R4#

The MTI isn’t used for unicast routing. So the RPF check for the customer source prefix will not use the MTI as the RPF interface hence RPF will fail. To go about this unfortunate situation, the RPF interface is set to the MTI that is associated with that mVRF. And if the RPF interface is set to the MTI, the RPF neighbor is set to the customer source (VPNv4 prefix) BGP next-hop which is the ingress PE. This next-hop must also be the PIM neighbor in the mVRF table.