How to perform vSAN cluster maintenance activities? You may be wondering what should be the Steps to reboot vSAN hosts and pre-checks before attempting a host reboot which is participating in a vSAN cluster , here is your guide to make sure your are not affecting your production in the process of a vSAN node upgrade . It is highly recommended to have a BACKUP in place for your VMs before you attempt any maintenance activity on a production cluster .
- Step1 : Check for VM storage policy compliance on all the virtual machines and ensure that they are compliant with their assigned VM storage policy . Fastest way is to select all VMs ⇒ Actions ⇒ VM policies ⇒ Check VM storage policy compliance .
Note** : If there were any VMs with FTT=0 (Fault to tolerate) as their policy and if this is considered as an important VM , please change the policy to Virtual SAN default Policy before attempting any Host Maintenance-Mode .
- Step2 : Check for possible inaccessible objects and VMs .Easiest way is to go the health plugin (works only if we are 6.2 and above release) to see if there were any inaccessible objects , I however recommend you to use and familiarize with RVC commands as this would be your best friend !!.Its also easy to check the same status from the RVC command line on your vcenter server where your vSAN cluster resides . Please see RVC command line guide here , .First log into RVC console (see how-to-login to RVC) , change directory to the vSAN cluster , next run vsan.check_state to look for any inaccessible objects
- Step3 : We will also need to make sure that there are no ongoing re-sync or re-balance in the cluster with the help of RVC commands on the vCenter server .
- Step4 : Increase the clom repair delay to two hours (120 mins) ONLY when you think you will need more than one hour to complete the reboot on one of the host without rebuild kick off which will causing complete resync for the components that are currently residing the host which is about to be placed in maintenance to other hosts in the cluster, proceed to put one host in maintenance mode with ensure accessibility option from the Webclient , if the DRS is not fully automated , you will have to manually move (vMotion) the VMs to other hosts . Below are the screenshots from both 6.0 and 6.5 versions where the wizard might look slightly different while changing the value clom repair delay .
Note* : This step is not applicable for a three node vSAN Cluster as vSAN cannot kick of a rebuild of the components to other host because this would violate the storage policies applied on the VMs . This is only needed to avoid unnecessary rebuild operation during the process of host reboot , we proceed to Step 4 skipping this step . Also note that if there was another node failure in the cluster during the process of rebooting a node , we may compromise Virtual machine vmdk components depending on the policy used on each components. Hence it is recommended to Always have backups .
- Step5 : Proceed to put host is in maintenance mode and later you may proceed with reboot of the host and wait for it to come back online . If there were any issues while entering maintenance mode on a host , DONOT force reboot the host , contact support to investigate this further, force reboots can cause adverse affects and issues with the cluster and VMs .
Note* : You have three modes to place the hosts in maintenance Mode , choose wisely what you desire , see vSAN-Host-Maintenance-Guide for detailed explanation on all three modes of maintenance.
- Step6 : Post reboot you must see the host back in maintenance , you may now take the host out of maintenance , you will need to watch out for possible resync , use RVC command : ” vsan.resync_dashboard .” (see how-to-login to RVC) under cluster directory . If there is an ongoing resync , DONOT attempt maintenance on any of the other hosts , wait for the resync to complete . Please contact VMware technical support if there were anomalies around resync completion or any other issue.
- Step7 : After confirming resync completion (or 0GB to resync) , you may proceed to reboot the next hosts if at you had plans for other hosts , continue following Steps 1 to 5 in a cyclic manner one host after another .
For more information regarding steps and best practices see VMware KB-here which has the procedure to power down and power-up vSAN clusters and admin guides for vSAN 6.1 , 6.2 ,6.5 and 6.6 are available here