Troubleshoot Hyperflex Datastore Mount Issues in ESXi

Available Languages

Download Options

PDF (443.9 KB)
View with Adobe Reader on a variety of devices
ePub (433.7 KB)
View in various apps on iPhone, iPad, Android, Sony Reader, or Windows Phone
Mobi (Kindle) (348.4 KB)
View on Kindle device or Kindle app on multiple devices

Updated:September 30, 2021

Document ID:214511

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Introduction

This document describes the steps you can use to troubleshoot Hyperflex Datastore mount issues.

Prerequisites

Requirements

There are no specific requirements for this document.

Components Used

This document is not restricted to specific software and hardware versions.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information:

By default, Hyperflex datastores are mounted in NFS v3.

NFS (Network File System) is a file-sharing protocol used by the hypervisor to communicate with a NAS (Network Attached Storage) server over a standard TCP/IP network.

Here is a description of NFS components used in a vSphere environment:

NFS server– a storage device or a server that uses the NFS protocol to make files available over the network. In the Hyperflex world, each controller VM runs an NFS server instance. The NFS server IP for the datastores is the eth1:0 interface IP.
NFS datastore – a shared partition on the NFS server that can be used to hold virtual machine files.
NFS client – ESXi includes a built-in NFS client used to access NFS devices.

In addition to the regular NFS components, there is a VIB installed on the ESXi called the IOVisor. This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disk drives that are attached to individual virtual machines. From the hypervisor’s perspective, it is simply attached to a network file system.

Problem

The symptoms of mount issues can show up in the ESXi host as datastore inaccessible.

Datastores Inaccessible in vCenter

Screenshot 2019-06-13 at 12.12.17 AM

Note: When your Datastores show up as inaccessible in vCenter, they are seen as mounted unavailable in the ESX CLI. This means the datastores were previously mounted on the host.

Check the Datastores via CLI:

SSH to the ESXi host, and enter the command:

[root@node1:~] esxcfg-nas -l
test1 is 10.197.252.106:test1 from 3203172317343203629-5043383143428344954 mounted unavailable
test2 is 10.197.252.106:test2 from 3203172317343203629-5043383143428344954 mounted unavailable

Datastores Not Available At All In vCenter/CLI

Screenshot 2019-06-13 at 12.43.59 AM

Note: When your Datastores are not present in vCenter or CLI. This indicates that the Datastore was never successfully mounted on the host previously.

Check the Datastores via CLI

SSH to the ESXi host and enter the command:

[root@node1:~] esxcfg-nas -l
[root@node1:~]

Solution

The reasons for the mount issue can be different, check the list of checks to validate & correct if any.

Network Reachability Check

The first thing to check in case of any datastore issues is whether the host is able to reach the NFS server IP.

The NFS server IP in the case of Hyperflex is the IP assigned to the virtual interface eth1:0, which is present on one of the SCVMs.

If the ESXi hosts are unable to ping the NFS server IP it causes the datastores to become inaccessible.

Find the eth1:0 IP with the ifconfig command on all SCVMs.

Note: The Eth1:0 is a virtual interface and is present on only one of the SCVMs.


root@SpringpathControllerGDAKPUCJLE:~# ifconfig eth1:0
eth1:0 Link encap:Ethernet HWaddr 00:50:56:8b:62:d5
inet addr:10.197.252.106 Bcast:10.197.252.127 Mask:255.255.255.224
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

In order to the ESXi host with datastore mount issues and check if it is able to reach the NFS server IP.

[root@node1:~] ping 10.197.252.106
PING 10.197.252.106 (10.197.252.106): 56 data bytes
64 bytes from 10.197.252.106: icmp_seq=0 ttl=64 time=0.312 ms
64 bytes from 10.197.252.106: icmp_seq=1 ttl=64 time=0.166 m

If you are able to ping, proceed with the steps to troubleshoot in the next section.

If you are not able to ping, you have to check your environment to fix the reachability. there are a few pointers that can be looked upon:

hx-storage-data vSwitch Settings:

Note: By default, all the config is done by the installer during the cluster deployment. If it has been changed manually after that, please verify the settings

MTU Settings - If you have enabled jumbo MTU during cluster deployment, the MTU on the vSwitch must also be 9000. In case you do not use jumbo MTU this must be 1500.

Screen Shot 2019-06-13 at 12.07.17 PM

Teaming and Failover - By default, the system tries to ensure that the storage data traffic is switched locally by the FI. Hence the active & standby adapters across all hosts must be the same.

Screen Shot 2019-06-13 at 12.11.22 PM

Port Group Vlan settings - The storage-data VLAN must be specified on both Storage Controller Data Network & Storage Hypervisor Data Network port groups.

Screen Shot 2019-06-13 at 12.16.31 PM

Screen Shot 2019-06-13 at 12.16.46 PM

No overrides on Port Group level- The Teaming & Failover settings done on the vSwitch level get applied to the port groups by default, hence it is recommended to not override the settings on the port-group level.

Screen Shot 2019-06-13 at 12.19.54 PM

UCS vNIC Settings:

Note: By default, all the config is done by the installer during the cluster deployment. If it has been changed manually after that, please verify the settings

MTU Settings- ensure the MTU size and QoS policy are configured correctly in the storage-data vnic template. The storage-data vnics use Platinum QoS policy and the MTU must be configured as per your environment.

Screen Shot 2019-06-13 at 14.00.54 PM

VLAN Settings - The hx-storage-data VLAN created during the cluster deployment must be allowed in the vnic template. ensure it is not marked as native

Screen Shot 2019-06-13 at 14.01.48 PM

IOvisor/ SCVMclient/ NFS Proxy Status Check

The SCVMclient vib in the ESXI acts as the NFS Proxy. It intercepts the Virtual Machine IO, sends it to the respective SCVM, and serves them back with the needed info.

Ensure that the VIB is installed on our hosts, for this ssh to one of the ESXI and run the commands:

[root@node1:~] esxcli software vib list | grep -i spring
scvmclient                  3.5.2b-31674   Springpath VMwareAccepted 2019-04-17   
stHypervisorSvc             3.5.2b-31674   Springpath VMwareAccepted 2019-05-20
vmware-esx-STFSNasPlugin    1.0.1-21       Springpath VMwareAccepted 2018-11-23

Check the status of the scvmclient on the esxi now and ensure its running, if it is stopped please start it with the command /etc/init.d/scvmclient start

 [root@node1:~] /etc/init.d/scvmclient status
+ LOGFILE=/var/run/springpath/scvmclient_status
+ mkdir -p /var/run/springpath
+ trap mv /var/run/springpath/scvmclient_status /var/run/springpath/scvmclient_status.old && cat /var/run/springpath/scvmclient_status.old |logger -s EXIT
+ exec
+ exec
Scvmclient is running

Cluster UUID Resolvable To The ESXI Loopback IP

Hyperflex maps the UUID of the cluster to the loopback interface of the ESXi, so that the ESXI passes the NFS requests to its own scvmclient. If this is not there, you can face issues with the datastores mount on the host. In order to verify this, ssh to the host which has datastores mounted, and ssh to the host with issues, and cat the file /etc/hosts

If you see the nonfuntional host does not have the entry in /etc/hosts, you can copy it from a funcional host into the /etc/hosts of the nonfunctional host.

Nonfunctional Host

[root@node1:~] cat /etc/hosts
# Do not remove these lines, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
10.197.252.75 node1

Functional Host

[root@node2:~] cat /etc/hosts
# Do not remove these lines, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
10.197.252.76 node2
127.0.0.1 3203172317343203629-5043383143428344954.springpath 3203172317343203629-5043383143428344954

Stale Datastore Entries In /etc/vmware/esx.conf

If the HX cluster has been recreated without the re-install of ESXI, you might have old datastore entries in the esx.conf file.

This does not allow you to mount the new datastores with the same name. You can check all the HX datastores in esx.conf from the file:

[root@node1:~] cat /etc/vmware/esx.conf | grep -I nas
/nas/RepSec/share = "10.197.252.106:RepSec"
/nas/RepSec/enabled = "true"
/nas/RepSec/host = "5983172317343203629-5043383143428344954"
/nas/RepSec/readOnly = "false"
/nas/DS/share = "10.197.252.106:DS"
/nas/DS/enabled = "true"
/nas/DS/host = "3203172317343203629-5043383143428344954"
/nas/DS/readOnly = "false"

if in the output, you see that the old datastore that are mapped and use the old cluster UUID, hence ESXi does not allow you to mount the same-named datastore with the new UUID.

In order to resolve this, is needed to remove the old datastore entry with the command - esxcfg-nas -d RepSec

Once removed, retry the mount of the datastore from the HX-Connect

Check Firewall Rules in ESXi

Check for firewall enable settings

It is set to False, it causes problems.

[root@node1:~] esxcli network firewall get
Default Action: DROP
Enabled: false          
Loaded: true

Enable it with the commands:

[root@node1:~] esxcli network firewall set –e true

[root@node1:~] esxcli network firewall get
Default Action: DROP
Enabled: true
Loaded: true

Check for connection rule settings: