Recently, there were discussions going around about Cisco’s new datacenter technology – Overlay Transport Virtualization (OTV), implemented in Nexus 7k data-center switches (limited demo deployments only). The purpose of this technology is connecting separated data-center islands over a convenient packet switched network. It is said that OTV is a better solution compared to well-known VPLS, or any other Layer 2 VPN technology. In this post we are going to give a brief comparison of two technologies and see what benefits OTV may actually bring to data-centers.
We are going to give a rather condensed overview of VPLS functionality here, just to have a baseline to compare OTV with. A reader is assumed to have solid understanding or MPLS and Layer 2 VPNs, as technology fundamentals are not described here.
The following is the list of VPLS limitations, which are inherently rooted in the Ethernet technology:
Of course, the community came with some solutions to the above problems. Firstly, the problem of address-table growth could be alleviated using MAC-in-MAC (IEEE 802.1ah) encapsulation for customer frames. In this solution, PE devices only have to learn other PE’s MAC addresses, or the MAC-in-MAC stacking devices could be pushed down to the customer network. Even simpler, CE devices could be replaced with routers, thus exposing only a single CE MAC address to the VPLS cloud. Next, there is a lot of work being done for multicast optimization in VPLS. The root cause lies in adapting the main underlying transports – MPLS label switching – to effectively handling multicast traffic. The solution uses point-to-multipoint LSPs and either M-LDP or RSVP-TE extensions to signal those. It is not yet completely standardized and widely adopted, but the work is definitely in progress. However, the main scaling factor of Ethernet – topology agnostic control plane with uncontrolled data-plane learning is not addresses in VPLS extensions so far.
As an alternative to using sophisticated M-LSPs, the multicast problem could be “resolved” by co-locating additional Layer 3 VPN topology and running Layer 3 mVPNs. A “proxy” PIM router deployed at every site will process the IGMP messages and signal multicast trees in the core network. While this solution is not as elegant as using P2MP LSPs and introduces additional operational expenses, but it nonetheless offers a working alternative.
Regardless the loud name, from technical standpoint OTV looks like nothing more than VPLS stripped off MPLS transport and having optimized multicast handling similar to one used in Draft Rosen’s Layer 3 mVPNs. The are some Ethernet flooding behavior improvenments, but those are questionable. Let’s see how OTV works. Notice that this time we use the notion of CE devices, not PE, as OTV is the technology to be deployed at customer’s edge.
Now for some conclusions. OTV claims to be better than VPLS, but this could be argued. To begin with, VPLS is positioned as provider edge technology and OTV is customer-edge technology. Next, the following list captures similarities and differences between the two technologies:
To summarize, the price paid for flooding reduction is slower convergence in presence of topology changes and control-plane scalability challenges. The main problem – topology unawareness that leads to the need of re-learning MAC address is not addressed in OVT (yet). However, if you think that data-plane flooding in data-centers could be very intensive, the amount of control plane flooding introduced could become acceptable.
To summarize, it looks like OTV is an attempt to fast-track a slightly better “customer” VPLS in datacenters, while IETF folks struggle for actual VPLS standardization. The technology is “CE-centric”, in essence that it does not require any provider intervention with exception to providing multicast and L3 services. It is most likely that OTV and VPLS projects are being carried by different teams that are being time-pressed and thus don’t have resources to coordinate their efforts and come with a unified solution. There are no huge improvements so far in terms of Ethernet optimization, with except to reduced flooding in network core, traded for control-plane complexity. At its current form, OTV might look a bit disappointing, unless is a first step to implementing TRILL (Transport Interconnection for Lots of Links) – new IETF standard for routing bridges.
Before we being, it is worth noting that IETF TRILL project somewhat parallels IEEE 802.1aq standard development. Both standards propose replacement of STP with link-state routing protocol. We’ll be discussing mainly RBridges in this post, due to the fact that more open information is available on TRILL. Plus, IETF papers are much easier to read compared to IEEE documents!
Like mentioned above, RBridges is short name for Routing Bridges, the project pioneered by Radia Perlman of Sun Microsystems. If you remember, she is the person who invented the original (DEC) STP protocol. In the new proposal, all bridges become aware of each other and the whole topology by using IS-IS routing protocol. Every bridge has e “nickname” – a 16-bit identified, which addresses the router in the global topology (similar to OSPF router-id). Once the topology is built, the switches operate as following:
To summarize, TRILL keeps intact Ethernet’s dynamic data-plane learning feature that made the technology so “plug-and-play”. However, the amount of flooding is now controlled by use of distribution trees and hop counting. The net effect of flooding is significantly reduced due to the fact that topology changes do not flush the MAC address tables. Load balancing is much more effective and deterministic in TRILL networks. TRILL networks are easier to troubleshoot, as every RBridges associates the “flat” MAC address with the “location” in the network defined via the remote bridge nickname. Lastly, the problem of address table growth is somewhat resolved, due to the fact that MAC addresses need not to be known on every switch in the domain, but only on the switches that actually have connection to the end equipment.
Even though TRILL offers some benefits, the MAC address learning and frame flooding remains there. Furthermore, the problem of address space growth is not fully resolved, as with TRILL it results in “core-edge” address table asymmetry. If you are looking for a complete solution for all Ethernet issues, it is recommended to read the paper “Floodless in SEATTLE” (see Further Reading below), which offers significantly re-engineered Ethernet technology utilizing DHT (Distributed Hash Tables) found in peer-to-peer networks and link-state routing. SEATTLE offers truly floodless Ethernet and resolves the address space growth problem by making the global MAC address table work like a distributed database. Thanks to Daniel Ginsburg aka dbg for referring me to this wonderful reading!
Right now, OTV looks like an attempt to rapidly deploy VPLS functionality without relying on MPLS transport. This is probably driven by the growth need for deploying large data-centers and interconnecting them across convenient packet-switched networks at customer edge. OTV reduces flooding in network cores but makes convergence process slower in the presence of topology changes. Multicast traffic is forwarded in optimal manner using core network’s multicast services. If OTV is a first step toward TRILL, then it looks like a very promising technology. Otherwise it is just a VPLS replica with some optimizations. Still, I hope that one day OTV and VPLS branches will be merged and TRILL would become implemented in one common VPLS framework!
OTV Patent Paper
RBridges Draft Document
VPLS using LDP Signaling
VPLS using BGP Signaling
Multicasting in VPLS
Multihoming in VPLS
Multicast VPNs using Draft Rosen
Introduction to M-LSPs and Practical Examples
Tags: 802.1aq, ccie, otv, overlay transport virtualization, trill, vpls
Petr Lapukhov's career in IT begain in 1988 with a focus on computer programming, and progressed into networking with his first exposure to Novell NetWare in 1991. Initially involved with Kazan State University's campus network support and UNIX system administration, he went through the path of becoming a networking consultant, taking part in many network deployment projects. Petr currently has over 12 years of experience working in the Cisco networking field, and is the only person in the world to have obtained four CCIEs in under two years, passing each on his first attempt. Petr is an exceptional case in that he has been working with all of the technologies covered in his four CCIE tracks (R&S, Security, SP, and Voice) on a daily basis for many years. When not actively teaching classes, developing self-paced products, studying for the CCDE Practical & the CCIE Storage Lab Exam, and completing his PhD in Applied Mathematics.
Find all posts by Petr Lapukhov, 4xCCIE/CCDE | Visit Website
You can leave a response, or trackback from your own site.
Is this part of R&S BP?
Very good article Petr!!
OTV is going to boom but the problem is the investment. Don’t know when Cisco will come up with existing IOS.
Make me unsee it.
Flooding MAC-address information over full mesh of IS-IS adjacencies. What a nauseating idea!
Just imagine (don’t, really) N^2 number of messages to be sent when a MAC comes or goes. Or worse yet, when the topology changes …
This makes me sick, literally.
/me runs away screaming
Hey Petr, it seems you’re working on SP Operations track
Thank you very much for the articles!
well they could reduce flooding by using mesh-groups, but still that sounds horrible. Here is another thing – one may implement the same behavior in data plane, i.e. flooding reduction. Simply dont flood unknown unicasts over the pseudowires or Ethernet ports (in convenient ethernet network). No control plane extensions really required. Heck let me try and get this working with 3560s!
Thanks a million for referring me to SEATTLE. That was a wonderful reading!
Nope, it’s more like a “new hot technology” that Cisco pushes to the market. Apparently it’s not new and not too hot
My attempt was to demostrate that OTV does not really solve any of the problems it claims to solve. Not with the current implementation. Maybe when they get TRILL integrated, but at the moment OTV looks rather bad with multicast optimization being the only real benefit.
Well, they could. Though, as far as I understand from the article (I honestly don’t know a thing about OTV except what I learned here), it all looks autoconfigured. Coming up with consistent and reliable way to set up mesh-groups automatically can be quite challenging. Doable, though.
> Thanks a million for referring me to SEATTLE. That was a wonderful reading!
Sure. Bouncing papers you come across to someone is a valuable way to get some fresh thoughts back
And SEATTLE has some very simple and beautiful ideas behind it. I like it too.
Nice post–very comprehensive. For a comparison of OTV and VPLS, you can check out this doc. https://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9402/white_paper_c11-574984.html
Thanks for the interesting readin! However, the comparison is by no means biased towards OTV There are fews statements that could be argued, e.g. VPLS inability to use control plane learning/ARP signaling. LDP and BGP could be easily extended for this purpose, but I do understand that this involves IETF standartization process and could be time consuming. Plus, it will result in the same scalabilty issues that OTV has with this approach.
The document does not mention that control-plane MAC address learning results in significant control-plane scalability issues due to the excessive amount of flooding introduced to the control plane. It is true that suppressing data-plane flooding is more important due to huge amount of data flows in data-centers, but designing unscalable control plane could be dangerous. Not to mention, that data-plane flooding reduction negatively impacts Ethernet data-plane convergence in situations where a topology-change occurts.
[...] Mr Lapukhov did a really great post here. [...]
Hello I am just studying OTV,
I think it is L2 technology however what is the difference among L2TP and Pseudowire technology and OTV?
Hi Peter ,
thanks for sharing your thoughts on OTV and Trill stuff , I wonder what’s your point of view on their “intermediate Trill” version known as L2MP layer 2 MultiPathing of the comming Nexus D modules ?
If you refer to Cisco’s DBridges (Datacenter Bridges or MAC-in-MAC routing) when talking about L2MP, then from technology standpoint it’s are not much different than TRILL. The main distinction is proprietary implementation (MiM) and “in-core only” usage at the moment.
L2MP and OTV are essentially different in their purpose, as OTV was created as DCI technology, while L2MP/TRILL are “internal data-center” technologies. L2MP allows for resolving two main problems associated with Ethernet spanning:
1) Suboptimal link bandwidth utilization and slow convergence in presence of changes
2) Topology changes causing massive flooding storms
However, some scalability problems still remain, as dynamic data-plane learning & unknown unicast flooding are still in place. Just their impact has been limited.
In my opinion, introducing routing features to a plug-and-play protocol does not worth it long term. The benefits of flat Ethernet domains are mainly transparent mobility and non-IP protocol bridging. The first feature could be emulated using L3 services, and the second feature could be implemented using controlled L2 tunneling.
If Ethernet is that much needed, then keep it there, but strip off dynamic learning and unknown unicast flooding. Make it work similar to Fibre Channel – explicit node logins and logouts (e.g. via 802.1x) that allow for control-plane address learning and separating of node IDs from topological information. This extension is part of TRILL and I hope this is the direction DC Ethernet would evolve into.
A note about 802.1aq. To help folks understand without having to get the full documents which are pretty verbose there is a wiki page that we are trying to keep up to date.
Hopefully this can help. Essentially ‘AQ’ is shortest path using MiM. One tree per source per ISID with optional head end or transit replication, multipathing, nodal or independent outer b-mac addressing etc.
I believe Cisco already has some intermediate 802.1AQ implementation for DBridges (Nexus) using MiM.
Hi Petr ,
thank you for your coments regarding L2MP , probably for some environements it can be interesting. I was recently wondering also what are the real examples of the ethernet frame loops apart the ones on the unknown unicast or the temporary loops related to lost bpdu’s .
Imagine a situation where all 3 swiches of the looped ethernet topology have information on all active source mac addresses. In that particular case if we had only the communications between the known macs what would be the end result of disabling STP on all trunked ports ( panning-tree bpdufilter enable ) ; are we still having a chance of having any looping unicast frames or not ?
I believe the Cisco use of Ethernet has a different I-tag than current 802.1ah (MiM) the purpose being to carry a TTL and the computations are like TRILL (i.e. shared trees, non symmetric, non congruent, hashed forwarding).
IEEE 802.1aq uses IEEE 802.1ah (stock I-tag) and produces source/service specific multicast trees with either head end (VPLS style) or transit replication and is backward compatible with the .1ag etc. OA&M. There are no unknown C-mac advertisements in AQ ISIS, they are constrained naturally by ISID. C-VID to Tree assignment is similar to MSTP (head end choice of 16 equal cost trees per source – ECT).
I agree its annoying the IEEE documents are not openly accessible and are somewhat ‘verbose’
I can definitely see a need for DHT style c-mac/b-mac resolution and c-mac/IP-ADDR resolution and its an area of active research. I had a prototype of this in my lab a few years ago and it works superbly, probably time to standardize this kind of light weight DHT.
Hi Peter, thanks a lot for your information!
As you said, “… in essence that it does not require any provider intervention with exception to providing multicast and L3 services..”. I think that L2 services(L2VPN) should also be provideded by the provider here, at least isis need it to exchange hello(layer-2 packet as you know) with remoter CE.
If you’re interested in Trill/RBridge development check also the last NANOG seminar:
Hello Peter , I was wondering recently re-reading the Trill-Rbridge protocol specification version 16 if the outer ethernet header is really used when the transit bridges are RBridges only and not regular Ethernet bridges ? Is the ” outer ethernet header ” only there for the interop with regular ethernet switches ?
Reference: http://tools.ietf.org/pdf/draft-ietf-trill-rbridge-protocol-16.pdf , page 30.
[...] What is Overlay Transport Virtualization [...]
This is the best guide on Overlay Transport Virtualization I’ve seen. I’m definitely bookmarking this.
Please tell me this is NOT on the RnS blue print…last 2 days i just got mastering multi region MST.
I’m curious to read it but am going to refrain least i read this and other stuff important to RnS falls out…
Click here to cancel reply.
Mail (will not be published) (required)
© 2011 INE, Inc., All Rights Reserved