Command line troubleshooting skills are invaluable when trying to diagnose problems. In this post I’ve collected some useful commands and a basic troubleshooting methodology for errors affecting the deployment of distributed Logical Routers and Edge Gateways
This assumes the nsx manager and controllers have been deployed
1. First check esx hosts have been prepared and configured with vteps
2. Check that esx hosts are added to the transport zone
3. Check VTEP is available on esx host
(here 192.168.1.130 is my vtep ip, and vmk3 is the vmkernel interface)
If you don’t see your vtep be sure you have added the host to the transport zone…
<esx host># esxcli network ip route ipv4 list -N vxlan Network Netmask Gateway Interface Source ------------- ------------- ------------- --------- ------ default 0.0.0.0 192.168.130.1 vmk3 MANUAL 192.168.130.0 255.255.255.0 0.0.0.0 vmk3 MANUAL
4. Check esx host VTEP connectivity, the MTU, note the mac etc
<esx host> # esxcfg-vmknic -l Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack vmk0 38 IPv4 192.168.110.53 255.255.255.0 192.168.110.255 00:50:56:b2:09:10 1500 65535 true STATIC defaultTcpipStack vmk1 11 IPv4 192.168.120.53 255.255.255.0 192.168.120.255 00:50:56:6a:8a:62 1500 65535 true STATIC defaultTcpipStack vmk2 17 IPv4 192.168.111.53 255.255.255.0 192.168.111.255 00:50:56:6f:18:28 1500 65535 true STATIC defaultTcpipStack vmk3 50 IPv4 192.168.130.53 255.255.255.0 192.168.130.255 00:50:56:69:b3:cd 1600 65535 true STATIC vxlan
5. Ping other vteps in the enviroment from the esx host
ping ++netstack=vxlan -I vmk3 < another hosts vtep ip >
<esx host> # ping ++netstack=vxlan -I vmk3 192.168.130.55 PING 192.168.130.55 (192.168.130.55): 56 data bytes 64 bytes from 192.168.130.55: icmp_seq=0 ttl=64 time=2.638 ms 64 bytes from 192.168.130.55: icmp_seq=1 ttl=64 time=1.054 ms Check controller is connected to the ESXi host. # cat /etc/vmware/netcpa/config-by-vsm.xml
5. Check on esx host if the controllers are registered in netcpa
In the example the controllers are 192.168.110.71,192.168.110.72,192.168.110.73
<esx host> # cat /etc/vmware/netcpa/config-by-vsm.xml <config> <connectionList> <connection id="0000"> <port>1234</port> <server>192.168.110.71</server> <sslEnabled>true</sslEnabled> <thumbprint>B3:15:5A:CE:FB:2D:F1:B9:3A:AB:D3:74:11:B6:74:03:04:70:C8:AE</thumbprint> </connection> <connection id="0001"> <port>1234</port> <server>192.168.110.72</server> <sslEnabled>true</sslEnabled> <thumbprint>D5:0B:AC:10:21:AE:2F:A2:53:30:7C:1E:4C:7E:A2:25:9F:B6:29:3C</thumbprint> </connection> <connection id="0002"> <port>1234</port> <server>192.168.110.73</server> <sslEnabled>true</sslEnabled> <thumbprint>CD:C0:43:80:D9:52:42:71:09:17:3A:86:72:DB:D5:AB:A1:B7:2D:A8</thumbprint> </connection> </connectionList> <vdrDvsList> <vdrDvs id="0000"> <numActiveUplink>1</numActiveUplink> <numUplink>2</numUplink> <teamingPolicy>FAILOVER_ORDER</teamingPolicy> <uplinkPortNames>dvUplink1,dvUplink2</uplinkPortNames> <uuid>24 aa 2d 50 e3 17 ca 58-74 c7 06 f5 e5 74 f8 7b</uuid> <vxlanOnly>true</vxlanOnly> </vdrDvs> </vdrDvsList>
6. Check esx host vxlan status
Control plane should be in synch, and segment info, vtep ip should be clear.
# net-vdl2 -l VXLAN Global States: Control plane Out-Of-Sync: No UDP port: 8472 VXLAN VDS: vDS-Branch VDS ID: 24 aa 2d 50 e3 17 ca 58-74 c7 06 f5 e5 74 f8 7b MTU: 1600 Segment ID: 192.168.130.0 Gateway IP: 192.168.130.1 Gateway MAC: 00:50:56:b2:02:1d Vmknic count: 1 VXLAN vmknic: vmk3 VDS port ID: 50 Switch port ID: 33554439 Endpoint ID: 0 VLAN ID: 0 IP: 192.168.130.53 Netmask: 255.255.255.0 Segment ID: 192.168.130.0 IP acquire timeout: 0 Multicast group count: 0
Restart the netcpa service and looks at the logs if the connection isn't registered
etc/init.d/netcpad restart cat /var/log/netcpa.log | grep connection