Learn everything about vsan setup and administration , to know about most of things around vSAN you will need to understand very few basic things/checks on your environment which will help you monitor your environment better . The general queries that comes to everyone’s mind which I can think of are as below . For more information regarding vSAN administration and best practices also refer VMware vSAN Admin Guides which are available here
- How to validate my Design ?
- How can I monitor my vSAN health and what tools can I use to validate my health status?
- What are things I can look and share while contacting VMware?
- What are my best practices ?
- I am I running a good stable version of vSAN ? Should I upgrade the setup , what are my challenges in doing so?
- What are my precautions in need to consider before rebooting a host participating in a vSAN Cluster?
What is a valid design for a vSAN HCI
I would say a very first link which I would refer to storagehub-vSAN , this has most of the things that you must be looking for from a new design stand point.
If you are design a new vSAN hyper converged infrastructure the probable considerations for you options while procure your hardware is the VxRAIL from DellEMC which is currently having a very huge market in terms of performance , availability , scalability , manageability and customer experience . You may check out the latest features , please see the SPEC-SHEET , TECH-BOOK and Planning Guide . We also do have “vSAN ready nodes” which are validated with certain partners of VMware . Hence your deployment is hassle free , especially for VxRail EMC will proactively will send you the network requirements for the appliance , vcenter and ESXI hosts , if all these are kept ready your deployment would take under 30 minutes,from a support standpoint all you need to do is just open one case with EMC .
If you have some existing server nodes available already in the data center and you wish to convert them to a vSAN-HCI . You will need to firstly check your current hardware compatibility with vSAN . Please see the VMware-vSAN-HCL , here we have three category I/O controller , SSD and HDD . I would suggest to check out vSAN-H/W-Quick-reference guide , or manually check for compatible I/O controller , SSD and HDD for your requirement .
A very important consideration is to have the recommended “I/O controller,SSDs and HDDs is to the Firmware and driver match per vSAN HCL” . We have seen complicated issues if the drivers and firmware does not match per HCL and hence very reason for a separate HCL guide for vSAN .
Monitor vSAN and health from GUI
What and How to monitor my vSAN health . First thing is you will need to Navigate to vSAN health service by Navigating from Cluster ⇒ Monitor ⇒ vSAN ⇒ Health . You may or may not see any warning different Metrics test generally run by the vSAN health service , here we have “Hardware Compatibility , Performance Service , Network , Physical disk , Data , Cluster , Limits and Online health” . Warnings thrown under each metrics will further needs to be examined by expanding each of them .
Here is an example , where we see that the Hardware compatibility and performance service are having issues reported . When we expand these we see that the controller firmware is not being pulled properly by vSAN , you will definitely see a warning against vSAN HCL DB up-to-date you will need to update this either with update this by clicking get latest version online if the VC has connectivity to the internet otherwise manually upload it by following article here
Performing Firmware Upgrade made easy with vSAN – 6.6
You will need a vib file on each host which will poll the firmware information from the hardware and display it on the health service . This is made very simple with vSAN 6.6 , however earlier this had to be manually pushed to each host from the hardware management page HP/Dell/Lenovo..etc . On vSAN 6.6 , we can simply Navigate from Cluster ⇒ Configure ⇒ vSAN ⇒ Updates ,here we get recommendation to install the respective management tool for hardware which is installed on your hosts which will help pull the required firmware version online and also have them pushed to each host . Once the management tool is installed on all your hosts we should be able to see the current firmware on each host .You may see a warning for maintenance mode vs DRS , please choose to enable DRS on the cluster and the health plugin should automatically proceed to put hosts into maintenance mode and reboot them . However installing Mgmt-tool may not require a reboot in every case.
Once the required Management Tools is installed on all hosts participating in your cluster , you will now be prompted with the available latest recommended firmware for the I/O controller used for vSAN .You will need to again download from the Updates menu , then push it to the hosts , if DRS is enabled on the cluster the vSAN health service will automatically put hosts into maintenance mode update the firmware bring the hosts out of maintenance mode one by one .
Once all hosts are rebooted one by one you should be able to see the current firmware and recommended firmware match on the Cluster ⇒ Monitor ⇒ vSAN ⇒ Health , all warnings against configuration hardware should be cleared .
Drivers not detected properly on vSAN
Many times we see that the recommended driver on the vSAN HCL DB is already installed , however we see the current version detected as N/A . In these situations , please follow the below steps to overcome this problem .
I have taken an example in my case , however these may differ with different hardware and recommended drivers on vSAN HCL guide .
Step1 : Check the current version of the driver installed
[root@localhost:~] esxcfg-scsidevs -a
vmhba0 lsi_mr3 link-n/a sas.514187706c076000 (0000:02:00.0) Avago (LSI) Dell PERC PERC H710
Step2 : Check the driver which is loaded to manage the I/O controller card :
esxcli system module list | less
Here we will find the respective driver which is in use for the host , in this case it would be lsi-mr3 and driver that is loaded and currently in use to handle your I/O controller Dell PERC PERC H710 .
Step3 : Replace the driver with the correct one
If you have already installed the latest driver as mentioned in the vSAN HCL guide Link for the above I/O controller here in this case its a “megaraid_sas version 6.603.55.00.1vmw” , then your job is pretty easy all we need to do is disable the driver in use and then enable the correct driver .
Note : You will need to have the host in maintenance mode before you proceed with this action , a full data migration is not required in this case “ensure accessibility mode” should be fine .
Take ssh to the host and execute similar command as seen below for your respective driver that needs to be disabled ,this simply disable the lsi_mr3 driver in my case , next we will need to enable the correct driver .
To Disable wrong driver :
esxcli system module set –enabled=false –module=lsi_mr3
To Enable and load Correct Driver :
esxcli system module set –enabled=true –module=megaraid_sas
esxcli system module load –module=megaraid_sas
If at all you see that you had a lower version of the correct driver are if you wish to upgrade/re-install it before enabling it , please download the respective latest driver upload it to the host using winscp on tmp directory or a datastore , later use the esxicli software vib to re-install the correct driver and then finally reboot the host .
Example :
Remove Existing Driver :
[root@localhost:/tmp] esxcli software vib remove –vibname=scsi-megaraid-sas
Removal Result
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
VIBs Installed:
VIBs Removed: VMware_bootbank_scsi-megaraid-sas_6.603.55.00-2vmw.600.0.0.2494585
VIBs Skipped:
Install Correct Driver :
[[root@localhost:/tmp] esxcli software vib install -d “/tmp/megaraid_sas-6.606.06.00-offline_bundle-2351571.zip”
Installation Result
Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
Reboot Required: true
VIBs Installed: LSI_bootbank_scsi-megaraid-sas_6.606.06.00-1OEM.550.0.0.1331820
VIBs Removed:
VIBs Skipped