Why don’t we do like AWS do? – Multisite Networking

When I started playing with virtual machines, I got a real buzz at how easy it was, spin up a vm, move it between servers or storage, right click and action.

I get a similar buzz from working with AWS, they make technically complex but desirable solutions look easy. For example distributing an application across availability zones or even regions protects from site failure in a much superior way to most DR solutions.

AWS simple load balanced application

Distributed applications are a logical progression for high availability, some applications such as Microsoft Exchange DAGs, SQL Always On, or even VMware vCenter High availability are replicated over the network. The requirement for expensive synchronized storage replication is removed, from a cost perspective, that is significant, bandwidth and network latency become key measures or constraints.

Another approach to distributed applications is the use of load balancers. Having load balancers sit in front of your scaled out servers makes everyone look good, a server goes offline, and a healthcheck failure can prompt a swift service or system restart, or even kill and redeploy the server. High availability without service interruption.

VMware has used both native replication and load balancers in recent vCenter editions, HA for the platform service controller is through a load balancer, whereas vCenter replicates natively over the network. Load balancers are a great solution, but they add cost and complexity to infrastructure design.

In public cloud, such as AWS I can spin up a couple of EC2 instances in different availability zones and add them to a load balancer in a few minutes. Route 53 DNS service will redirect users to the address AWS provides for the load balancer and go live is less then 30 mins. On premises, I’d be lucky to get a meeting with the network architect the next day.

This started me wondering about on premises SDDC solutions, of course this is not for the average start up, but large enterprises with two or more datacenters.

My search lead me to the VMware NSX Solution team, and VMWorld 2016 Sessions and blogs of Humair Ahmed.

NET7854R – Multisite Networking and Security with Cross-vCenter NSX—Part 1

This VMWorld session is a great introduction to multisite networking in general, and NSX use cases for workload mobility, resource pooling, and disaster recovery.

Traditional solutions to multi-site networking include spanning L2 over dark fiber or Virtual Private LAN Service. Network overlays such as VXLAN extend L2 over L3 and simplify the linking together of datacenters.

NSX allows the creation of either local or universal (multisite) elements such as the universal transport zone, universal distributed logical switches and router, it is these objects along with VXLAN that allows NSX to join geographically dispersed sites together.

In the following example, traffic enters site A and is then distributed to both sites, BGP weight or OSPF cost is used to route traffic out through the same site. As servers on the Web Logical Switch/virtual wire/VXLAN segment are in the same L2 domain, this is a workload mobility solution with possible use cases in datacenter migrations.

NSX Cross vCenter without local egress 

The NSX Local load balancer service could be placed on the ESG (edge service gateways) if these are deployed as HA active/passive, or on a one armed load balancer if ECMP (Equal cost multipath) is used.

This of course isn’t truly load balancing between sites as traffic to and from site 2 must traverse the interconnect and route through site 1.

The next example modifies the above, the Local Egress NSX feature is enabled. The application is stretched across sites, ingress/egress to the physical network is local to the site.

NSX Cross vCenter with local egress 

However in the above example the routing starts to get complicated, one option is to advertise /32 host routes from the NSX ESG so traffic can be routed on a specific application IP address.

It is advisable to automate this via NSX REST API or scripting/orchestration, so that when a workload vMotion occurs from one site to another, a /32 route is injected into the network.

This can be a possible solution for a private network but for Internet facing applications ISPs may not allow the advertisement of /32 routes and as such this is not practical for Internet facing applications.

Another option is to use a global server load balancer (GSLB) such as F5 BIG-IP DNS.

NSX Cross vCenter with F5 GSLB

By using a GSLB solution with Cross-VC NSX, local site ingress/egress can be achieved, traffic that is initiated from the client and needs to be load balanced will leverage the local F5 LTMs for ingress and egress through the local site.

Source network address translation (SNAT) is done by the F5 LTM appliance both on internal floating IP and external floating IP. When a client attempts to connect to a web server and makes a DNS request, the F5 BIG-IP DNS replies with the F5 LTM External Virtual IP (VIP) address provided by the load balancing algorithm.

Clearly this is an expensive solution, aimed at enterprises who want to embrace distributed applications, and see the advantages of spreading them between datacenters.

My view is that the function of NSX in the above is more related to L2 stretching as the load balancing is being done by F5, NSX local LBs could be used instead of the F5 LTMs, that should be considered as it may drive down cost, but if you are going to invite F5 or another GLBS to the party you might as prefer to let them do their thing.

The F5 GSLB solution is probably the most similar to what AWS do, in reality AWS has load balancers in each site, and DNS routes to the elastic load balancer on one of the availability zones and then on to the virtual machine.

AWS ELB multisite

Further information on AWS load balancer can be found at AWS re:Invent 2016: Elastic Load Balancing Deep Dive and Best Practices (NET403)

The VMWorld session can be viewed from  NET7854R – Multisite Networking and Security with Cross-vCenter NSX—Part 1

The NSX multisite design guide is available https://communities.vmware.com/docs/DOC-32552

The Blog Multi-site Active-Active Solutions with NSX-V and F5 BIG-IP DNS is available here multi-site-active-active-solutions-nsx-f5-big-ip-dns



The modest tolerated latency between sites for VXLAN, make it feasible to extend networks for distances beyond metro range.

  • 10 GB link, 5-10 ms latency – Single vCenter, NSX and Stretched vSphere Clusters (vSphere Metro Storage Cluster)
  • 1 GB link, 150 ms latency – Separate vSphere clusters and either a single vCenter or a vCenter at each site