Unable to manage vSAN on Webclient

At many times we may see issues with webclient reporting a 120 second time out and we are unable to manage vSAN from the configure (Cluster ⇒ Configure ⇒ vSAN ) and monitor (Cluster ⇒ Monitor⇒ vSAN ) for vSAN . However from the SSH the cluster is health with no issues . This is generally indication of problems with the vSAN health services and some times it is the storage provider service which runs at the vcenter level which is responsible to poll and display information on the web-client for vSAN-UI area .

General issues seen when we have a problem with vSAN-health Plugin .

  • Unable to create new cluster and enable vSAN services.
  • Unable to view and Manage existing vSAN cluster from webclient .
  • Unable to create and delete vSAN Disk groups from Webclient .
  • The vSAN health plugin reports no data .
  • Cannot enable / disable stretch clusters under Configure(Manage) TAB on a vSAN enabled Cluster .
  • Error : Unexpected status code : 503
  • Error : Error occurred while loading general configuration of vSAN
  • Error : The query execution timed out because of back-end property provider?

 

 

How to handle and troubleshoot issues with vSAN-Health Service

Best way to isolate the problem is by checking if the behavior is only with one cluster on the vCenter server of if this is the same behavior with all clusters in the vCenter server . Once we find out the source of the problem if its one cluster in the vCenter server or if its seen with all cluster in the vCenter .

Here is an example where we have manually stopped the vSAN health service on a vCenter server running 6.0 Update 3 latest patch . We clearly see that the vSAN health plugin shows no data , even the vSAN General , Disk Management and Health and performance service doesnot show any information .I also created a new cluster and later tried to enable vSAN , however we dont see the option to turn on vSAN .  These set off issues clearly indicates problem with the health service plugin on the vCenter server (VMware-vsan-health service) which is responsible for these issues .

Note** : we will not be seeing all of these problems with vSAN 6.6.1 (running 6.5U2) it looks like the dependency for the Configure -> vSAN tab is now removed , we can still see disk groups , create new clusters and enable vSAN on this release .

 

Step1: A simple fix is to check if the vSAN-health service is running or not on vCenter server , if found to be stopped start the service manually and check back on the web-client to see if everything is back to normal .

Check if the vsan-health / vmware-vsan-health is running on the vCenter server.

  • VCSA Appliance
    vCenter-P:~ # service-control --status
    
    Running: 
    vmware-cis-license (VMware License Service) vmware-eam (VMware ESX Agent Manager) vmware-invsvc (VMware Inventory Service) 
    vmware-perfcharts (VMware Performance Charts) vmware-psc-client (VMware Platform Services Controller Client) 
    vmware-rhttpproxy (VMware HTTP Reverse Proxy) vmware-sca (VMware Service Control Agent) vmware-sps (VMware vSphere Profile-Driven Storage Service) 
    vmware-syslog (VMware Common Logging Service) vmware-syslog-health (VMware Syslog Health Service) vmware-vapi-endpoint (VMware vAPI Endpoint) 
    vmware-vdcs (VMware Content Library Service) vmware-vpostgres (VMware Postgres) vmware-vpx-workflow (VMware vCenter Workflow Manager) 
    vmware-vpxd (VMware vCenter Server) vmware-vsm (VMware vService Manager) vmware-vws (VMware System and Hardware Health Manager) 
    vsphere-client ()INFO:root:
    
    Stopped: vmware-mbcs (VMware Message Bus Configuration Service)
     vmware-netdumper (VMware vSphere ESXi Dump Collector) vmware-rbd-watchdog (VMware vSphere Auto Deploy Waiter) 
    vmware-vsan-health (VMware VSAN Health Service)Stopped: vmware-mbcs (VMware Message Bus Configuration Service)
     vmware-netdumper (VMware vSphere ESXi Dump Collector) vmware-rbd-watchdog (VMware vSphere Auto Deploy Waiter) 
    vmware-vsan-health (VMware VSAN Health Service)
    
    Start the service by running :
    vCenter-P:~ # service-control --start vmware-vsan-health
    INFO:root:Service: vmware-vsan-health, Action: start
    Service: vmware-vsan-health, Action: start
    2018-05-09T21:48:47.624Z Running command: ['/sbin/chkconfig', u'vmware-vsan-health']
    2018-05-09T21:48:47.676Z Done running command
    2018-05-09T21:48:47.676Z Running command: ['/sbin/service', u'vmware-vsan-health', 'status']
    2018-05-09T21:48:47.685Z Done running command
    2018-05-09T21:48:47.685Z Running command: ['/sbin/chkconfig', '--force', u'vmware-vsan-health', 'on']
    2018-05-09T21:48:47.731Z Done running command
    2018-05-09T21:48:47.732Z Running command: ['/sbin/service', u'vmware-vsan-health', 'start']
    2018-05-09T21:48:47.746Z Done running command
    2018-05-09T21:48:47.746Z Successfully started service vmware-vsan-health
    vCenter-P:~ #
    
    
  • Windows vCenter server
    Check Status:

    Start the Service :

Step2: If this was the problem with just one cluster under the vCenter server  and if other vSAN enabled cluster were responding fine . Then the problem is with one of the vSAN hosts in the cluster .

  • Check if “vsanmgmt ” and vsanvpd service is running on all the hosts
  • If either of this service was found not to be running feel free to restart them , these services dont cause any issues to running production virtual machines as they are foe vSAN management and vSAN VASA provider service .
  • Even if we found these services to be running fine on all the hosts , it is still okay to restart them and check back in the GUI , all GUI tasks should now work as expected .
    [root@esxi01:~] /etc/init.d/vsanmgmtd status ; /etc/init.d/vsanvpd status
    vsanperfsvc is running
    vsanvpd is running.
    
    [root@esxi01:~] /etc/init.d/vsanmgmtd restart ; /etc/init.d/vsanvpd restart
    watchdog-vsanperfsvc: Terminating watchdog process with PID 67652
    vsanperfsvc started
    watchdog-vsanvpd: Terminating watchdog process with PID 67133
    vsanvpd stopped
    vsanvpd started

Step3 : If the vSAN-health service wont startup .

This is a very rare situation which can be caused due to some files missing or some files corrupted . We will need to investigate this by checking the logs for vSAN-health service .

Note* : It is suggested to engage VMware support to further diagnose the error to fix the issue with health service crash after start and not recommended to troubleshoot this on your own .

Logs to check :

Appliance VCSA Path : /var/log/vmware/vsan-health/vmware-vsan-health-service.log

WindowsVC Path : C:\ProgramData\VMware\vCenterServer\logs\vsan-health\vmware-vsan-health-service.log

 

Additional helpful KB : vSAN Health Service – HCL Health – Host issues retrieving hardware info

admin

Hareesh K G is a Site Reliability Engineer with VMware VSAN Engineering, his current focus is with VMware vSAN ® on-premises , his overall expertise is with Storage Availability Business Unit Products (VMware vSAN ®, VMware Site Recovery Manager® and vSphere Data Protection® ). Started his career with EMC support for Clariion and VNX block storage in 2012 and has been with VMware since 2015.

You may also like...