VMware Cloud on AWS – Storage Overview

This post follows on from the previous posts VMware on AWS – Overview and Setup and VMware on AWS – Elastic Resource Management

I will focus firstly on VMC storage and then go to discuss Native AWS storage and how this could be used with VMC.

VMC Storage

ESXi will be installed directly on the host, these are not embedded versions sitting on top of a AWS hypervisor. Amazon’s Elastic Block Storage is used for the install and boot of ESXi. Customers will not have root access to this disk as the install is a service provided and managed by VMware.

vSAN will be used to store customer virtual machine files and management components such as the VMC vCenter, NSX manager and NSX controllers, VMware will also restrict file level access and permissions on these machines.

Eight local NVme 1.7 TB SSD local devices are assigned to each host to be used in vSAN. These will be automatically assigned to two vSAN diskgroups.

The resulting 10,2 TB will available as raw capacity, if additional hosts are added, capacity from those nodes will be added dynamically.

Currently de-dup, compression or vSAN encryption are not available.  As VMware on AWS is limited to a single availability zone, vSAN is single site.

The NVMe devices used for vSAN are self-encrypting, encryption is provided, but not though vSAN. NVMe encryption is done at the firmware layer and encryption keys are managed by AWS.

The effective capacity available from per node should work out at around 5,2 TB or 21 TB in total for a 4-node cluster, assuming FFT= 1, and RAID 1.

A number of option exist for user defined storage based policies, which can be applied at virtual machine disk level.

  • No data redundancy
  • 1 failure     RAID-1 (performance)
  • 1 failure     RAID-5 (capacity for 4 node cluster)
  • 2 failures   RAID-1 (performance)
  • 2 failures   RAID-6 (capacity for 6 node cluster)
  • 3 failures   RAID-1 (performance)

Increases in FTT will obviously reduce the amount of available vm capacity. vSAN Object Space Reservation, allows varying levels of thin/thick provisioning of virtual machines.

Keep in mind a very small amount of the above capacity is required for the management services. VMware assumes responsibility for monitoring the health and performance of the vSAN, Health Monitoring and vSAN Performance Service are in turn not exposed to the end user.

If a host is returned to AWS, a termination process that includes as secure drive wipe will ensure that all data has been removed from the disks.

Native AWS Storage Services

AWS has a number of storage services that could be leveraged by VMC.

S3 Object Storage

S3 or Simple Storage Service is highly durable, highly available object storage.
By highly durable amazon puts the possibility of data loss at 99.999999999, by default multiple copies of the data are replicated across availability zones within a region. The availability SLA for standard S3 is 99.99.

S3 capacity is assigned dynamically, so you pay for what you use, available capacity is practically unlimited, the maximum size of a single file is 5 TB. Beside standard S3, there is an Infrequently Accessed version, a Reduced Redundancy version and ultra cheap Glacier archive storage.

AWS charges are based on multiple factors, capacity used, data access, and data transferred from S3 – either to the Internet or between AWS regions. By locating VMC on AWS infrastructure S3 transfer costs can be avoided.

VMware are working along with a number of backup vendors to certify backups to S3. Additionally with AWS lifecycle management is possible to automate moving older data to Glacier as long term archive solution.

The advantages are clear, no need to provide TBs of on-premises storage, the agility of pay as you grow, and the transfer to an OPEX cost model. AWS customers report large savings when moving to S3 and Glacier as a backup and archive store.

A simple single region deployment would look as below, however S3 is a region wide product and data is safe if an AZ burns down.

Cross region replication can be activated (cross region data transfer has associated costs) to provide copies of data in a geographically distinct region.
In this case you would be protected in the unlikely case of a region wide problem, but as VMC becomes available in other regions, then in a couple of hours you can have new hosts and start repopulating the environment.

In compassion to having standby hosts and replicated storage sitting in a remote data center this is great option as a regional DR solution.

Another use case is as remote storage for virtual machines, S3 is accessed over URLs or through the AWS CLI, so virtual machines could use S3 as a replacement for object level storage. Keep in mind this is for GET, PUT or DELETE operations, files can be replaced by newer versions, but not modified.

How Reliable is AWS Storage

S3 is incredibly durable, but errors that affect availability do occur…and when they do AWS learns from them, do an internet search of AWS postmortems, and you will find some excellent, honest explications and remediation steps for each major outage. Understanding that service outages do occur is fundamental to correctly architecting any workload on AWS.