How to troubleshoot issues with vSAN Storage policy compliance and VASA Providers ?

At many times we wonder why we are not able to re-apply policy on a vSAN object or a virtual machine , it may be grayed out and many VMs on the vSAN datastore shows up as non-compliant .  Here is a complete guide which helps you understanding the role and importance of the VASA providers in a vSAN environment ,the difference between the latest vSAN 6.7 VP and previous versions and troubleshooting methods to fix VASA issues .

Introduction to VASA Providers

VASA is shorthand for vSphere Storage APIs for Storage Awareness. When VASA was introduced back in vSphere 6.0 days it allowed storage arrays to integrate with vCenter for management functionality via server-side plug-ins or Vendor Providers. The storage provider exists on either the storage array service processor or it may also be a standalone host – this is at the discretion of the vendor. So what dis it do? VASA provider exposes the array features capabilities to the vCenter server and let vCenter take advantage of these features for a virtual machine life cycle or a datastore  . In case of vSAN , vSAN also uses VASA provider to expose its feature of SPBM , RAID , fault-tolerance , object space reservation , striping ..etc . The ESXi hosts (Nodes) part of the cluster runs the VASA provider and exposes this to the vCenter server over port 8080 so that vCenter can understand all the features and capabilities of vSAN .

 

VASA Provider with earlier version of vSAN

In the earlier version of vSAN which includes 5.5,6.1,6.2 and 6.6 VASA providers were always exposed to the vCenter server thru port 8080 on each of the vSAN Nodes under the vCenter server . . Ideally when everything is fine we should see all hosts VASA providers listed under vCenter⇒Configure (*Manage)⇒Storage Providers as online and one of the providers for each cluster in an active state while rest  will be reported as standby .If these providers were not listed correctly and if found not online/active we usually face some of the following issues

  • Unable to provision new VMs on vSAN datastore .
  • Unable to verify Storage compliance for any of the VMs in vSAN datastore.
  • Unable to re-apply/apply new storage policy on any of the VMs under vSAN datastore
  • All existing VMs on vSAN datastore will show up as Non-Compliant .

How that we know the types of issues which we can face with VASA providers let see how we can troubleshoot it .

 

Step1: Check if providers from all hosts are online and listed correctly .

Login to the vCenter server , navigate vCenter⇒Configure (*Manage)⇒Storage Providers . filter the VASA provider list with key word vSAN and check if the providers from all hosts shows up online and at least on host should be in active .

Step2: Re synchronize the vASA providers if we do not see VASA providers listed , if we still do not see the providers from all hosts , we will need to restart the VASA provider service for vSAN on all hosts and the do the resync once again .

  1. Check if the vSAN VASA provider service status and if found not running restart it . We need to first SSH to all hosts in the affected cluster , check the vSAN provider status and restart it.
    1. Check the VASA provider status :
      [root@hostname:~] /etc/init.d/vsanvpd status
      
      vsanvpd is running.
    2. If the vsanvpd is not running, start it manually
      [root@hostname:~] /etc/init.d/vsanvpd start
      
      vsanvpd started
    3. If the service is crashing due to some reason we can check the cause in vsanvpd.log.
      [root@hostname:~] cat /var/run/log/vsanvpd.log
  2.  If the VASA providers are running fine what next ? we need to check if the providers are reachable to vCenter server .To examine for port liveness, connect to the VASA provider via port 8080 and determine if the VASA XML information is returned. This process varies between the vCenter Server Appliance (VCSA) and Windows vCenter.
    1. Appliance :
      Use the ‘cURL’ utility to check the VASA Provider:
      curl –insecure https://<host>:8080/version.xml
      (example : root@vcsa1 [ ~ ]# curl –insecure https://esxi-2.gsslabs.org:8080/version.xml
      <vasa-provider><supported-versions><version id=”2″ serviceLocation=”/vasa/services/vasaService”/></supported-versions></vasa-provider>)
    2. Windows :
      Use a web browser to check the VASA Provider by navigating to ‘https://<host>:8080/version.xml’
    3. Check ESXi host firewalls:
      Examine the host “Security Profile” and ensure that the ‘vsanvp’ rule is enabled to permit host communication over port 8080.

    4. Check vCenter Server firewall

      • On Windows vCenter Server, check that the Windows Firewall is either disabled, or that all VMware-installed rules are active. In addition, check for custom rules that may be interfering with port 8080 outbound or inbound.
      • On the VCSA, the firewall should be correctly configured by default.
    5. Examine VASA certificates
      If the VASA provider is running and it is reachable by vCenter Server, the problem may be related to certificates. VASA and SPBM use certificate exchange, and the vCenter Server must accept the VASA provider certificates.
      Certificate-related problems will be called in the SPBM Java process’s wrapper log. The location varies by vCenter Server type.

      Windows vCenter Server: %ProgramData%\VMware\vCenterServer\logs\vmware-sps\wrapper.log
      VCSA: /var/log/vmware/vmware-sps/wrapper.log

      In some cases the VASA provider certificate may have a 0 Byte size. To resolve this, the host-side (provider) VASA certificates are required, These are stored on each ESXi host in /etc/vmware/ssl/

      [root@esxi-1:~] ls -lah /etc/vmware/ssl/ | grep -i vsanvp
      -r--r--r-T 1 root root 0 Apr 7 2017 .#vsanvp_castore.pem
      -rw-r--r-- 1 root root 3.1K May 6 06:03 vsanvp.pem
      -rw-r--r-- 1 root root 2.1K May 6 06:18 vsanvp_castore.pem
      -rw-rw-rw- 1 root root 64 May 6 12:30 vsanvp_secret
      
      

What is new with VASA in vSAN 6.7

In vSAN 6.7 the VP does not run on all ESXi hosts as it did in previous versions of vSAN.

  • The VP now runs on vCenter server, therefore the SPBM service and the VP both share the same certificate.
  • The new vSAN VP can self register to the vSAN VP side, if vSAN VP is offline or unregistered then we can register back to the SPBM side automatically
  • There is better integration with the health service and the API’s
  • SPS now needs to support less as it only needs to support the IO filters
  • VASA Provider uses port 8080 TCP Unicast for internal communication
  • VASA does not need to run on the 6.5 hosts as it is running on the 6.7 VC

Here is a comparision between the two version vSAN 6.6 and vSAN 6.7 . The VP in a vSAN 6.6 environment on the left and a vSAN 6.7 environment on the right. The 6.6 environment shows that the VP is being managed by esxi-1, however the 6.7 environment shows that the VP is being managed internally.When moving to 6.7 SMS will remove all host based VP’s. Once upgrade is complete the old VPs will be automatically removed and you will only see one VP in the lists. vSAN VP will now publish all of the policies to the SPBM, Older hosts cannot support the new policies.

 

 

 

 

 

 

VASA Troubleshooting through CLI-vCenter Server in vSAN 6.7

As we now understand that the  VASA provider is being moved to the vCenter server , troubleshooting VASA provider over CLI is also on the vCenter server  only and not on ESXi the vsanvpd.log now is moved into the health plugin directory of vcenter server .

  1. Check if vsan-health service is running .

    root@photon-machine [ ~ ]# vmon-cli -s vsan-health
    Name: vsan-health
    Starttype: AUTOMATIC
    RunState: STARTED
    RunAsUser: vsan-health
    CurrentRunStateDuration(ms): 386462471
    HealthState: HEALTHY
    FailStop: N/A
    MainProcessId: 11668

    To Restart service :

    root@photon-machine [ ~ ]# vmon-cli -r vsan-health
    Completed Restart service request.
  2. Logs to check for issues with vsan VP 6.7 :

    cd /var/log/vmware/vsan-health/
    /var/log/vmware/vsan-health ]# less vsanvp.log
    [ /var/log/vmware/vsan-health ]# less vmware-vsan-health-service.log | grep failed

admin

Hareesh K G is a Site Reliability Engineer with VMware VSAN Engineering, his current focus is with VMware vSAN ® on-premises , his overall expertise is with Storage Availability Business Unit Products (VMware vSAN ®, VMware Site Recovery Manager® and vSphere Data Protection® ). Started his career with EMC support for Clariion and VNX block storage in 2012 and has been with VMware since 2015.

You may also like...