Nutanix, ESXi, and iSCSI — a cautionary tale
We have been using Nutanix hyper-converged systems for just over five years and have received excellent performance from these clusters. Unfortunately this article is to warn my fellow users about a potential issue with ESXi and iSCSI on Nutanix.
Several years ago Nutanix AOS added a parameter to the ESXi iSCSI software adapter. This parameter was likely intended to have ESXi connect to an iSCSI datastore within Nutanix ABS. This parameter has been dropped in recent AOS releases, but if you have an older cluster it may still be in place.
Look in your vSphere client, ESXi host, configure, storage adapters:
IF you see the Nutanix CVM private IP addresses under the dynamic discovery targets or static discovery targets then you are vulnerable to this potential issue.
Explanation
This iSCSI target will cause ESXi to interrogate the CVM for any available disks. When the Nutanix cluster has no iSCSI volumes available this does not cause any problems. But once you configure an iSCSI volume (for any purpose), ESXi will see this disk and then the problem starts.
From my experience, once ESXi sees any SCSI disk it presumes it can read details about that disk. But the CVM will deny access, so ESXi asks again and again, endlessly.
All these queries around inaccessible iSCSI disks will cause ESXi hostd to become bogged down and eventually disconnect from vCenter. Our experience is this host will eventually reconnect, but there was nothing we could do to improve that particular host until it reconnected on its own.
Remediation
Here is the process we used to fix this issue:
- wait (patiently) for the host to reconnect to vCenter
- static discovery targets in vSphere client — delete each entry that points to the CVM internal IP address (wait 15–30 seconds between deleting each entry)
- dynamic discovery targets — delete each entry with CVM internal IP address (should only be one entry)
- rescan storage — vSphere client will likely begin warning you to perform this function, I suggest you wait until all targets have been removed
The screenshot above indicates a cluster where we need to remove these iSCSI targets. We have no iSCSI volumes configured on this cluster, so we can take our time.
Conclusion
This issue caused us considerable pain several months ago, and it took Nutanix support quite a bit of time to help us find the resolution. I hope this article might help you with your clusters!