Following on from Reference Design for SDDC with NSX and vSphere – NSX Components, vCenter Topology, Connectivity Considerations
This post covers reference architecture for NSX Edge clusters
NSX Edge Cluster Design
Design considerations for Edge Cluster Design
Establish the type of resources required for the edges: Load Balancers, VPN, dLR Control VM
What is the nature of the workload; North-South vs East-West traffic
Will additional VMs for monitoring, log collection, share the cluster, will NSX controllers also be deployed in the same cluster
What physical hosts are available; How many uplinks, throughput of physical nics, and available CPU and memory on host
Datacenter; size and topology (ie: multiple Data centers with SRM)
Availability and Scaling; VLAN, Rack Availability, Bandwidth, Over-subscription
Edge devices can be deployed as stand alone or in a high availability configuration, as a Active/Standby pair or up to eight edges using ECMP
Understand the services that will be used, Routing, Load-Balancing, Perimeter firewall, VPN
Statefull services limit the use of ECMP
Determine the form factor that will be needed based on the function of the edge
What level of high availability be required
Is multi-tenancy needed, requiring an edge per Tenant?
Is ECMP going to be used to allow scaling of north-south traffic?
What are the required bandwidth expectations?
A single cpu edge should achieve 2/3 GB bandwidth
For line rate use Quad-Large
*Multiple nodes used at once in ECMP
A pair of edge devices are deployed with heartbeat and service synchronization.
Layer 2 connectivity is required between each device, Internal vNics connect the pair.
Multi-interface routing is supported: OSPF, BGP, hello/hold timer must be set to 40/120.
DRS anti-affinity rules are automatically created, it is recommended that the vSphere cluster has at least 3 nodes.
CPU/memory resources are reserved from 6.2.4
A Load Balanced Edge can be deployed near application tier.
Multiple tenants can have separate edges.
Active standby Gateway used for;
Perimeter FW, NAT, LB, SSL-VPN, North-South routing
Scalable North-South traffic forwarding
Up to eight instances.
Smaller failure domain, for example with eight edges only 1/8 of services are affected if a edge is lost
Stateful services not supported in ECMP mode, so no firewall or load balancer can be used.
Layer 2 connectivity required for peering to physical network.
Muti-interface routing OSPF, BGP, hello/hold timer must be set to 1/3.
DRS anti-affinity rules must be manually created, recommended at least a 3 host cluster.
Multiple tenants can have separate edges.
ECMP with DLR and Edge
ECMP is supported on Distributed Logical Router Control VM and Edge devices. Both can forward up to eight equal cost routes to a destination. For a multi tenant environment eight edge devices can be deployed for each DLR Control VM.
The purpose is to increase the North-South bandwidth, and reduce the failure domain.
Routing protocol timers can be reduced to 1/3 seconds for hello/hold for OSPF/BGP.
Very important best practice is not to place ECMP edge instance and DLR Control VM on the same host.
Dual failure will trigger a race condition due to the short protocol timers resulting in a much longer outage
See page 134 of the NSX Reference Design Guide Version 3.0: Avoiding Dual Failure of Edge and Control VM.
DRS Anti-affinity rules are automatically created between active-standby instances, additional rule should be used to separate ECMP edges from the DLR Control VM.
Virtual Port Channel and Routing Peer Termination
vPC Peering is not supported on some platforms, and adds complexity. Vendor support for your current code version should be establish before deployment.
NSX Edge Routing Design with Rack-mount server
Edge Uplink Interfaces are mapped to individual VLANS, each VLAN maps to an upstream router
Two VLAN uplinks are created with one to one mapping to the physical routers.
On the vSphere Distributed Switch port groups are created for the two vlans.
Portgroup teaming mode is set to ‘route based on originating port’ and active/unused configured determine traffic flow.
LACP should be avoided due to vendor dependencies
eBPG is used between NSX edges and routers (OSPF could also be used)
Redundancy is handled by Dynamic Routing as Edges have adjacencies with both routers
Standard network best practices should be followed.
Route summarisation should be used to reduce the prefixes being sent upstream.
Default route must follow the uplink status.
Loss of both uplinks should withdraw all routes.
Routing protocols will be discussed further in a later section
Edge HA Models Comparison
In the next section, routing protocol and topology will be discussed