While working on updating my vGhetto Automated vSphere Lab Deployment script to add support for NSX 6.3 with vSphere 6.5, I ran into an issue with the Host Preparation step. Although the resolution turned out to be quite simple, it was very difficult to diagnose the problem. I suspect this scenario could easily be encountered by others, so I wanted to make folks aware of what I ran into. There is also another potential gotcha for host preparation that I did not encounter myself, but it was brought to my attention that I thought was also worth sharing as well.

Scenario 1 - Attempted Host Preparation and all "Install agent" tasks fails with "Cannot complete the operation. See the event log for details" and below is a screenshot of the error. There was nothing useful when looking at the event logs for either NSX or ESXi using the vSphere Web Client.

There was also nothing useful in the ESXi log /var/log/esxupdate.log that gave insights to why the NSX VIBs failed to install:

2017-02-16T12:38:53Z esxupdate: 73899: Transaction: DEBUG: Populating VIB list from all VIBs in metadata https://vcenter65-1.primp-industries.com:443/eam/vib?id=d4917629-51d1-4da9-82d6-8da54815447d; depots:
2017-02-16T12:38:54Z esxupdate: 73899: downloader: DEBUG: Downloading https://vcenter65-1.primp-industries.com:443/eam/vib?id=d4917629-51d1-4da9-82d6-8da54815447d to /tmp/tmpdfcbr23q...
2017-02-16T12:38:54Z esxupdate: 73899: Metadata.pyc: INFO: Unrecognized file vendor-index.xml in Metadata file
2017-02-16T12:38:54Z esxupdate: 73899: imageprofile: INFO: Adding VIB VMware_locker_tools-light_6.5.0-0.0.4564106 to ImageProfile (Updated) ESXi-6.5.0-4564106-standard
2017-02-16T12:38:54Z esxupdate: 73899: imageprofile: INFO: Adding VIB VMware_bootbank_esx-vsip_6.5.0-0.0.4987428 to ImageProfile (Updated) ESXi-6.5.0-4564106-standard
2017-02-16T12:38:54Z esxupdate: 73899: imageprofile: INFO: Adding VIB VMware_bootbank_esx-vxlan_6.5.0-0.0.4987428 to ImageProfile (Updated) ESXi-6.5.0-4564106-standard
2017-02-16T12:38:54Z esxupdate: 73899: vmware.runcommand: INFO: runcommand called with: args = '['/bin/localcli', 'system', 'maintenanceMode', 'get']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2017-02-16T12:38:54Z esxupdate: 73899: HostInfo: INFO: localcli system returned status (0) Output: Disabled Error:

The root cause ended up being something very simple which I was aware of but completely forgot about. The reason the NSX VIBs failed was that my ESXi hosts were running 6.5 GA and not recently released 6.5a which is needed for NSX 6.3. It would have been nice to get a simple error message that just stated the required version of ESXi was not being met and that would have quickly jogged my memory or at least pointed me in right direction for further troubleshooting. Once I applied the ESXi 6.5a patch and rebooted, NSX was able to successfully complete the host preparation.

Scenario 2 - This was not something I had encountered myself, but the behavior is similiar to scenario #1 and is also quite difficult to troubleshoot. Luckily, we do have a VMware KB 2053782 outlining this particular situation but I had found while reproducing this in the lab that the symptoms could easily miss-direct customers to look else where from a troubleshooting standpoint. Attempted Host Preparation where all tasks returned back successfully. The NSX VIBs actually do get installed which you can verify by going to the ESXi hosts, however NSX continues to show a "Not Ready" status for all hosts as shown in the screenshot below.

There was also nothing useful in the logs to help pinpoint the potential issue. In this particular scenario, the issue is observed when the vSphere Update Manager (VUM) service is not running on the vCenter Server, even if you are not using VUM to deploy the NSX VIBs. The ESX Agent Manager (EAM) which is responsible for deploying the NSX VIBs and it apparently relies on VUM to approve the installation or uninstallation of VIBs. If VUM is not running, you would find yourself in this situation. The solution is to either get the VUM service running again or you can disable the VUM check by modifying vCenter Server which is outlined in the above KB article. In my lab environment, I had manually disabled the VUM service to reproduce this issue, so once I had re-enabled it, NSX was able to successfully complete the host preparation step.

I have already shared this feedback with the NSX team and they will be looking into how we can improve our error messages for the future to help customers better diagnose and troubleshoot NSX issues.

8 thoughts on “Potential ESXi Host Preparation issues with NSX 6.3

  1. An additional comment on this.. If you don’t go back after upgrading each host to 6.5A and re-run NSX host prep on them then you get a totally orthogonal error message of

    “The vSphere Distributed Switch configuration on some hosts differed from that of the vCenter Server”.

    This means you can’t vMotion stuff off of your remaining 6.0 host or between 6.5A hosts. (like what happened to me!)

    Definitely need more complete error messaging. IMHO, NSX should push a VC event every x minutes stating that hosts are not compliant.

  2. Issue I had with 6u2 & 6.3 was eventually tracked down to it complaining that there was no room left in the bootbank – had to remove a driver to free some space before it would install. This was with HPE customised install, although it only did it on the one host I’d tried to get VIC working with too.

  3. I had similar issue like scenario 2 on NSX 6.1 and found this error message on eam.log: “com.vmware.vim.vmomi.client.exception.ConnectionException: org.apache.http.conn.HttpHostConnectException: Connection to https://:8084 refused”
    Port 8084 seems to be the port for VUM and bypassing VUM as you mentioned per KB 2053782 did the trick
    If it throws a same error message on VUM issue, maybe the error message on eam.log can be added into the KB

  4. Seriously need help here … Have been troubleshooting for the past few hours using the above two methods and even jacobgardinermoon Kb as well it does work… What else could it be.. There no way for me to proceed with the nix deployment if still stuck at this host preparation tab ..

  5. Having the same issue with with vSphere 6.0U2 / NSX 6.3, trying the method to change bypassVumEnabled to true (KB 2053782). Invoke method returns void, but on checking the config remains set to false? Am I missing something to get the change to take?

Thanks for the comment!