vSAN Performance Benchmark Best Practices
After you setup a new vSAN/VxRAIL/VxRACK HCI deployment , you will definitely need to benchmark your cluster before moving it to production . Here is your small guide to “vSAN Performance Benchmark” using a famous useful tool called HCI Bench which is freely available on VMware Labs website .
You may be wondering what is the recommended and correct way to test the performance and review performance on a hyper-converged infrastructure .
If you have not gone thru my previous posts about understanding-vm-storage-policies and understanding-vsan-component-placement-sizing I highly encourage you to go thru that topic which will give you a deep insight on how components are created , placed and how we can view them .
How NOT to assess Performance
Now lets discuss how we can run performance benchmark to validate a vSAN cluster which was newly deployed and about to be staged for your production .Many people prefer to deploy a Windows or Linux VM running tools available at the guest level like IOmeter to do performance benchmark which is okay in one way to asses your performance , however we are NOT going to assess the cluster properly.We all understand that the vSAN is a hyper-converged infrastructure , the data is placement is purely controlled and driven by SPBM (Storage policy based management ).
Lets say we have a new 8 node cluster and lets also consider your test benchmark VM (Windows or Linux which is going to probably run some performance bench marking tool) was deployed with the “vSAN Default Storage Policy” and imagine your vmdk size where you will be running the performance test is around 50GB , we should probably be seeing 3 components (2 Data components and a witness) which is going to reside only three hosts and 3 physical disks on those three hosts . When you start running your performance tests , all the workload and IO are only going to hit the two data components on those two hosts where the data components are placed and never going to hit any other hosts nor their disk . If the tests are super heavy workloads , we are only going to beat down those two disks which contain your data components and chances are pretty high that you will start seeing “Component congestion” and “SSD Congestion” which is artificially induced as all IOPS are specifically targeted on those two components residing those two disks .
- All your IOPS were targeting only two individual capacity tier disks residing on a disk group belonging to two hosts in the entire cluster where the two component resides and never touched any disks on your other 6 nodes (as per above example of a 8 node cluster)
- A very high workload can cause Component congestion against the components for the VM and not truly testing overall vSAN performance . The results will not be accurate and may report poor performance because of the induced component congestion (Due to disks being highly busy and not able to cope up with new incoming IO) and also SSD congestion (If there were too many outstanding IO and SSD unable to Destage the data from the cache tier to capacity tier disks since the capacity disks are still busy serving existing IOPS for the components in question)
- Also considering this being a 8 node cluster which multiple Disk Groups , we never really hit any other node over the vSAN network and their disk groups at all to see if everyone is healthy .
This is one of the best I have seen which can run a good benchmark for a vSAN cluster , since this tool can deploy multiple VMs on all the hosts and each with will have multiple vDisks with components spread across all the disks and disk groups on all hosts , this ensures that we push traffic thru vSAN network interfaces across all nodes and generate and hit workload against all the SSD cache tier and SSD/MD capacity (All-Flash or Hybrid ) . Here is how you initially deploy the OVA , configure it to run the BenchMark tool , we will later see how we understand the Results post running the test in the next section which I will be adding soon .
Accept Technical Preview License and download it from https://labs.vmware.com/flings/hcibench
Good to have Instructions handy which may be needed later
Start to import the HCI Bench OVA
Name your Management HCI bench tool VM
Choose your cluster or resource pool to which this HCI management VM will be deployed upon
Accept and hit next for the additional settings for this OVA
Datastore where this HCI management bench tool is going to be deployed , this is not necessary be the destination datastore where the performance bench tests will be run .
Here it is very Important to understand what is Private and Public network and this will vary depending on the environment where this is being deployed . Please understand that public network is the network where the HCI management VM is deployed through which you can run and manage your Benchmark from , ensure you have a machine in the same network from which this bench tool can be managed from . The private network is where the VDBench machines will be deployed for performance benchmark and these needs to be able to communicate with the HCI management machine thru this second nick , also it is good to have a DHCP server on this private network or HCI bench tool has the power to assign private IPs thru this network , this can be configured later during initial configuration.
Review your OVA deployment settings
Power on your HCI Bench VM wait for it to show up the IP address post boot up
Open your HCI Bench Over the Browser using its Port and port 8080
Once you enter into the configuration page (HCI Bench Management IP address : 8080) , you will need to fill in your vcenter hostname IP / username/password to authenticate to VC , also fill in details of the datacenter , customer and network where your test VDBench machines will be deployed to run the performance bench tests .
You may choose deploy the VMs directly on the hosts if you were not using VDS (DV switches) in the environment , also make sure your DRS is not automatic if you were going to deploy directly on the hosts , this can be left unchecked .
Click on download button the HCI bench tool and it should direct you to the Oracle website where you can download this
Accept the EULA and download the VDBench zip file from the Oracle website
Finish uploading the VDbenchp zip file to the HCL bench management VM
Validate and kick of the bench mark test Run
Post Deployment and Test completion , you must be seeing a bunch of vdbench test machines with multiple disks attached to them on which the benchmark tests were run , the below test was run with easyMode on the VDbench test which deployed about 6 VMs on the three node vSAN cluster and each VM had 9 disks on which the benchmark test was run .