nsx lab – configure layer 2 vpn

So far we have set up routing on a single site, now we want to emulate a second site and configure a L2VPN tunnel creating a layer 2 stretched network. The idea is to emulate a two site deployment, with site 2 using a standalone edge.

I’m not including any screenshots of the site 2 build, try and built it just using the diagram.

Points to note:

– create a Transport Zone for the site 2, and add compute cluster B to it.
– remove the Web-Tier-LIF from the dLR
– site 2 does not have a dLR, only logical switches and a ESG
– Create a 3rd web server vm and add it to the site2-web-tier logical switch when ready

NSX-Lab-stretched-layer-2-networks - Stretched layer 2 network (1)

Configure site to site layer 2 vpn

Once the Site-2 ESG has been set up, we can start to configure a layer 2 vpn

NSX-Lab-Site-2-site - L2 stretched network

 

Create the Trunk Interface on the original ESG in site-a

L2VPN-SiteA-1

 

L2VPN-SiteA-2.1

L2VPN-SiteA-3

 

sub-interface

 

Once done, ping the gateway ip 172.16.10.1, and web-2 172.16.10.11 – both should respond
Ping app-1 172.16.20.10, and web-3 172.16.10.12, neither respond.

 

Create the SSL cert on the original ESG in site-a

L2VPN-SiteA-5

L2VPN-SiteA-6

 

L2VPN-SiteA-7

 

L2VPN-SiteA-8

Create Peer Site on Site-1 ESG (Edge Perimeter Gateway)

L2VPN-SiteA-2.2

Set the listener

L2VPN-SiteA-2.3

Enable L2VPN and afterward publish changes

L2VPN-SiteA-2.4

It is now possible to see the Tunnel status, but it will be down as only one side has been configured

L2VPN-SiteA-12

Create the Trunk Interface on the original Site2 ESG

L2VPN-SiteB-2.1

Once the Trunk has been added from the VPN tab, change the mode to client

L2VPN-SiteB-2.2

Add the client details, these will be what you used when creating the peer site on the site1 ESG

L2VPN-SiteB-2.4

 

Don’t forget to enable and publish changes…

After a few seconds you can click on the Fetch Status button and hopefully, the Tunnel will be up

L2VPN-SiteB-2.5

 

Lets run a few pings to see how it looks

Ping gateway (on ESG)

[root@web-1 ~]# ping 172.16.10.1 -c 2
PING 172.16.10.1 (172.16.10.1) 56(84) bytes of data.
64 bytes from 172.16.10.1: icmp_seq=1 ttl=64 time=9.82 ms
64 bytes from 172.16.10.1: icmp_seq=2 ttl=64 time=1.01 ms

Ping neighbor on same Tier and same site

[root@web-1 ~]# ping 172.16.10.11 -c 2
PING 172.16.10.11 (172.16.10.11) 56(84) bytes of data.
64 bytes from 172.16.10.11: icmp_seq=1 ttl=64 time=5.78 ms
64 bytes from 172.16.10.11: icmp_seq=2 ttl=64 time=1.14 ms

Ping vm in remote site, but on same tier

[root@web-1 ~]# ping 172.16.10.12 -c 2
PING 172.16.10.12 (172.16.10.12) 56(84) bytes of data.
64 bytes from 172.16.10.12: icmp_seq=1 ttl=64 time=11.6 ms
64 bytes from 172.16.10.12: icmp_seq=2 ttl=64 time=3.47 ms

The next test caused a headache, as I had configured configured a default gateway on web-1 [172.16.10.254] and in my case the local egress optimization ip was 172.16.10.1 once I reset the gateway on web-1 to 172.16.10.1, I was able to ping the vms on the other tiers/segments.

[root@web-1 ~]# ping 172.16.20.10 -c 2
PING 172.16.20.10 (172.16.20.10) 56(84) bytes of data.
64 bytes from 172.16.20.10: icmp_seq=1 ttl=62 time=7.48 ms
64 bytes from 172.16.20.10: icmp_seq=2 ttl=62 time=2.01 ms

[root@web-1 ~]# ping 172.16.30.10 -c 2
PING 172.16.30.10 (172.16.30.10) 56(84) bytes of data.
64 bytes from 172.16.30.10: icmp_seq=1 ttl=126 time=1.74 ms
64 bytes from 172.16.30.10: icmp_seq=2 ttl=126 time=1.81 ms

What I haven’t been able to do was reach the app and db tiers from the remote site. I’d image there is a way to do this, but it’s getting more into routing than I want, so I’ll leave it on the side for now

root@web-3 ~]# ping 172.16.10.10 -c 2
PING 172.16.10.10 (172.16.10.10) 56(84) bytes of data.
64 bytes from 172.16.10.10: icmp_seq=1 ttl=64 time=44.1 ms
64 bytes from 172.16.10.10: icmp_seq=2 ttl=64 time=3.10 ms

[root@web-3 ~]# ping 172.16.20.10 -c 2
connect: Network is unreachable

 

So we have stretched the L2 network, with direct connections to the edge devices in each site, no dLR is used, so the edge is a single point of failure and a possible bottleneck, HA deployment would be advised in Production.

 

 

Grateful acknowledgment to Giuliano Bertello, this is an elaboration of his blog, I have added my own diagrams, and my lab is 6.2 – he covers some of the technical aspects in more detail. 
blog.bertello nsx-for-newbies-part-9-l2vpn-and-stretched-vlanvxlan-networks

Join the Conversation

5 Comments

  1. Hi Russ,

    Thank you for the post, very useful.
    I do have one question though, do we need to have uplinks assigned to the dedicated dvpg L2VPN or we just need to have the dvpg created and use them for Trunking?
    I have created them but without any uplinks and the L2VPN is working but in my case i cannot ping the IP of the sub-interface on the EDGEs.
    I do have 10.10.10.1/24 for the sub-interfaces on both EDGEs and also the VMs on the two stretched VXLANs have their default GW at 10.10.10.1/24.
    Both VMs on both ends cannot ping the sub-interface IP 10.10.10.1/24.
    Do you have any idea what might be wrong?
    Thank you.

    1. Have you tried with uplinks on the dvpg-L2VPN for both sites. (my lab is down so I can’t test it for you)
      That you have used the same network 10.10.10.x/24 for both edges has me thinking that you have accidentally invalidated the two site setup.

      I found a lot of complexity with this lab in setting up the routing and different address schemes on both sites on a nested environment.
      Half the learning process with this is the enviroment setup and troubleshooting, more than once I’ve caused myself hours of investigation by leaving a firewall rule on, or having the route set wrong on the vyos/physical segment.

      Don’t kill yourself if you don’t get it all working, sometimes it’s best to move on to something else and come back after a few weeks with a fresh setup
      Have a look at module 5 http://docs.hol.vmware.com/HOL-2016/hol-sdc-1625_html_en/#l370416 and work through HOL 1625

      1. Thanks for the hints Russ!
        I found that the problem was down to the firewall of the EDGEs, once I turned them off I was able to ping the sub-interfaces IPs.

  2. Hi ,

    Very Nice blog what is the use of egress optimization gateway address . I am digging to much on google but no perfect answer . Please help

    1. It’s like bertello explains, if the default gateway for virtual machines is same across the two sites (because you have stretched the L2 subnet), Egress Optimization ensures traffic will use the local site edge. Go back and look at the diagram on my post, it’s just saying that the egress or traffic going out should use the optimal or closest route, which will will be the local gateway not the gateway on the other site.

      So the vm Web-1 will use the gateway (Perimeter edge) on Site A, and Web-3 would use the site 2 ESG as it’s gateway

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.