Automating Intel Sighting remediation using PowerCLI (SSH not required)

In case you may not be aware, Intel recently notified VMware that certain Intel Broadwell and Haswell CPUs are affected by Intel Sighting after applying the latest microcode update to remediate against the Spectre vulnerability. VMware has published the following KB 52345 which provides more details on the affected Intel CPUs along with the recommended workaround in case you have already applied the latest ESXi patches containing the faulty microcode. I highly recommend you carefully read over the KB before, even if you have not applied the ESXi patches proceeding further.

With this updated news, I have also updated my existing Spectre verification script (found here) to include the additional Intel Sighting information which can help customers easily identify whether they have ESXi hosts that are impacted. In this article, I will provide a solution to help automate the deployment of the Intel Sighting remediation as outlined in the KB article, but unlike the manual steps outlined in the KB, SSH access to the underlying ESXi host will NOT required.

Step 1 - Download the VerifyESXiMicrocode.ps1 PowerCLI script as well as the new PowerCLI script IntelSightingWorkaround.ps1

Step 2 - Run the "Verify-ESXiMicrocodePatch" function against a specific vSphere Cluster or ESXi host to determine if you are impacted by the Intel Sighting issue. Below is a screenshot for a system which is affected by Intel Sighting and we can determine this by seeing a value of "True" under the IntelSighting column.

Step 3 (Optional) - This step is optional, but I wanted to demonstrate how you can tell whether the Intel Sighting workaround has been applied correctly. You can use the "Verify-ESXiMicrocodePatchAndVM" function which provides information from a Virtual Machine's perspective and whether the new CPU instructions are exposed to the VM. In the screenshot below, I have a Test VM called TestVM-03 that has been powered on and as you can see, the three new CPU instructions (IBR,IBPB and STIB) are present as I have not applied the Intel Sighting workaround.

Step 4 (Optional) - This step is also optional, but I wanted to demonstrate how you can quickly check the contents of /etc/vmware/config without requiring SSH access or direct ESXi Shell access. You can use the "Get-Esxconfig" function and specify an ESXi host to query. The screenshot below confirms that we have not deployed the Intel Sighting workaround.

Step 5 - Once we have identified the ESXi hosts that are impacted by Intel Sightin (See step 2), we need to create a text file that contains the Hostname/IP of ESXi hosts (one on each line) that we wish to remediate. This will then be used as an input to the remediation function. In the example here, I have a single host to remediate and have added its name to a file that I have called "affected_hosts".txt (it can be named anything).

Note: The remediation does NOT require the ESXi host to reboot for the changes to go into effect, but you may still want to consider following standard procedures of putting the host into Maintenance Mode if you wish to control when VMs will see the masked instruction which will require a VM power cycle (OS restart is not sufficient).

To apply the remediation, you will use the "Set-IntelSightingsWorkaround" function which has a single mandatory parameter called AffectedHostList. Below is a screenshot of running the function and the remediation is fairly quick as it is merely appending a single string to the /etc/vmware/config configuration file. After the remediation has completed, we can then use the "Get-Esxconfig" function to confirm that we have added the masked CPU instruction to the ESXi host as shown in screenshot below.

Step 6 - As the KB mentions, for the changes to go into affect, you will need to perform a complete power cycle of your VMs before they can see the new masked CPU instructions. If you recall earlier, I had a VM called TestVM-03 which saw the three new CPU instructions. If we power cycle the VM and re-run the "Verify-ESXiMicrocodePatchAndVM" function against our VM, we should now see that the three CPU instructions have been properly masked away as shown in the screenshot below.

When it is time to remove the workaround, the process is also very straight forward as well. You can use the "Remove-IntelSightingsWorkaround" function which also accepts a list of ESXi hosts to remediate. Once the remediation has completed and you wish to expose the new CPU instructions to the VMs, you will also need to power cycle the VMs for CPU instructions to be unmasked.

Verify Hypervisor-Assisted Guest Mitigation (Spectre) patches using PowerCLI

VMware recently published a new knowledge base (KB) article 52085 that outlines instructions for enabling the Hypervisor-Assisted Guest Mitigation (CVE-2017-5715), also known as the Spectre vulnerability. This KB also provides steps to verify the updated microcode (included in the ESXi patch) has been applied along with Virtual Machine verification (those applicable) to ensure that they are seeing the new CPU features. While following the KB and patching one of my vSphere environments, I had noticed the verification steps was not only manual but it also to difficult to scale beyond a few VMs as it required customers to look for a specific set of strings from within the vmware.log file which is generated for each powered on VM, which could easily be several hundreds or thousands of VMs.

I figured there had to be a better way to help customers automate this at scale and remove the human factor. The other reason I was not fond of the current method is that it would require customers to either enable ESXi Shell/SSH access or to manually or through automation to download every single vmware.log file to inspect for specific log entries which can take quite a bit of time for any sizable environment. I had an idea on how this could be done without having to look at the vmware.log file while leveraging our vSphere APIs and did some investigation. Before proceeding further, please familiarize yourself with KB 52085. For complete background on both Spectre (CVE-2017-5753 & CVE-2017-5715) and Meltdown (CVE-2017-5754) as it relates to all VMware products, please carefully read through this top level KB 52245 which is being updated as new information is available, so you may want to subscribe to the KB for all the latest updates.

UPDATE 4 (01/23/18) - VMware has just updated KB 52345 with the current list of Intel CPUs affected by Intel Sightings. I have also updated my script to reflect all these changes. Make sure to download the latest version to ensure you have the latest changes.

UPDATE 3 (01/16/18) - I have just enhanced the script further to also collect the current microcode version running on a given ESXi host. Unfortunately, this information can only be collected when SSH is enabled and is something a user must explicitly allow. The benefit here is that Intel Sighting impact reporting is more robust as the code now checks for both impacted CPU signature as well as the microcode affected by Intel Sighting as outline in KB 52345. See below for more details.

UPDATE 2 (01/14/18) - As mentioned in the last update, I have been working on a Intel Sighting remediation script which can help customers automate the temporary workaround as recommended in KB 52345. Please see this blog post for complete details.

UPDATE 1 (01/13/18) - VMware just published a new KB 52345 outlining certain Intel Broadwell and Haswell CPUs being affected by Intel Sightings after applying latest microcode updates. Please refer to the KB for the complete details. I am currently working on a script to help with the remediation as the remediation method outlined in the KB would require customers to enable SSH access and manually update /etc/vmware/config. In the meantime, I wanted to provide a way for customers to easily identify whether their system are affected by Intel Sightings and thanks to community member Adam Robinson who just submitted a patch this morning to update my existing script to include these details. I have also added the CPU model to the output as additional information.

Note: This script only provides information about your existing vSphere environment, it does not make any changes or provides any remediation steps, please follow the KB for those instructions.

The PowerCLI script is called VerifyESXiMicrocodePatch.ps1 and it contains the following two functions:

  • Verify-ESXiMicrocodePatchAndVM
  • Verify-ESXiMicrocodePatch

Continue reading

Identifying ESXi boot method & boot device

There was an interesting discussion on our internal Socialcast platform last week on figuring out how an ESXi host is booted up whether it is from local device like a disk or USB device, Auto Deploy or even boot from SAN along with its respective boot device? Although I had answered the question, I was not confident that we actually had a reliable and programmatic method for identifying all the different ESXi boot methods, which of course piqued my interest.

With a bit of trial and error in the lab, I believe I have found a method in which we can identify the ESXi boot type (Local, Stateless, Stateless Caching, Stateful or Boot from SAN) along with some additional details pertaining to the boot device. To demonstrate this, I have created the following PowerCLI script ESXiBootDevice.ps1 which contains a function called Get-ESXiBootDevice.

The function can be called without any parameters, in which it will query all ESXi hosts for a given vCenter Server and/or standalone ESXi host. You can also specify a specific ESXi host by simply passing in the -VMHostname option.

Here is an example output for one of my lab environments which shows several ESXi hosts and their different boot methods from local disk to Auto Deploy which can include stateless, stateless caching and stateful deployments. Depending on the BootType, the boot device shown in the Device column will either be the MAC Address of the NIC used to network boot the ESXi host or the identifier of a disk device. I have also included some additional details such as vendor/model along with the media type (SAS, SSD or USB) which is available as part of ESXCLI.

This script also supports ESXi environments that boot from SAN (FC, FCoE or iSCSI) and you can easily identify that with the word "remote" for the BootType. I would like to give a huge thanks to David Stamen who helped me out with the boot from SAN testing.

Cross vCenter Clone with vSphere 6.0+

Similar to the Cross vCenter vMotion (xVC-vMotion) feature that was introduced in vSphere 6.0, Virtual Machine cloning can also be performed across two different vCenter Servers, regardless of their deployment type or SSO configurations. If vCenter Servers are part of an Enhanced Linked Mode (ELM), then a xVC-Clone operation can be initiated using the vSphere Web Client. However, if the vCenter Servers are not part of the same SSO Domain, then you will need to use the vSphere API to perform the operation similar to an xVC-vMotion.

Right before the holiday break, I had a few inquiries on providing some sample code that could demonstrate the Cross vCenter Clone (xVC-Clone) operation using the vSphere API. With that, I have created a new PowerCLI script called xNew-VM.ps1 which does exactly that and should feel familiar for anyone who has used my xMove-VM.ps1 script before. Before getting started, be sure to familiarize yourself with the requirements and supported configurations for xVC-Clone operations, which can be found here and here.

The script contains a number of variables that will need to be defined based on your specific environment. The following variables below pertain to the "source" vCenter Server from where you wish to initiate the clone operation from:

$sourcevmname = "PhotonOS-02"
$sourceVC = ""
$sourceVCUsername = "administrator@vsphere.local"
$sourceVCPassword = "VMware1!"

The following variables below pertain to the "destination" vCenter Server from where the cloned VM will be deployed to:

$destvmname= "PhotonOS-02-Clone"
$destVC = ""
$destVCUsername = "administrator@vsphere.local"
$destVCpassword = "VMware1!"
$datastorename = "vsanDatastore"
$datacenter = "Datacenter-SiteB"
$cluster = "Santa-Barbara"
$resourcepool = "MyRP" # cluster property not needed if you are using RP, useful for VMware Cloud on AWS
$vmhostname = ""
$vmnetworkname = "VM Network"
$foldername = "Discovered virtual machine"
$switchtype = "vss"
$poweron = $false #optional
$snapshotname = "pristine" #optional

The last two variables are optional which merely controls whether the cloned VM is powered on and whether the source VM is being cloned while running, in which case, a snapshot is required.

Below is a screenshot of my two vCenter Servers, where vcenter65-1 is my "Source" and vcenter65-3 is my "Destination. I have a VM called PhotonOS-02 which is powered on and I have already taken a snapshot which I have named "pristine". I will then perform a xVC-Clone of the VM to the "Destination" vCenter Server and the new VM will be called "PhotonOS-02-Clone".

Once you have updated the script, you can simply execute it as shown in the screenshot below.

If we now head back over to our vSphere Web/H5 Client, we should now see the xVC-Clone operation start and shortly after (depending on the size of the VM being cloned), it should eventually appear on the destination vCenter Server that was specified in the script.

For quick and small deployments of a few VMs, the xVC-Clone feature can be quite handy. However, if you plan to deploy more than a couple of VMs, it would be more efficient to setup a vSphere Content Library which can configured to automatically sync all or specific content to the destination vCenter Server(s), which will help speed up deployment. I should also point out that cloning from vSphere 6.5+ from onPrem to VMware Cloud on AWS (VMC) also works and is fully supported in both direction which the vMotion/Provisioning PM had just shared with me. For VMC usage, you will need to ensure you specify the Compute-ResourcePool as the permission model in VMC is restrictive and you can not deploy directly to the vSphere Cluster and/or ESXi host.

Lastly, as mentioned earlier, the script I have provided is merely an example demonstrating the use of the xVC-Clone operation using the vSphere API and consuming it through PowerCLI. Unlike the Move-VM cmdlet which supports xVC operation natively, the New-VM does not currently support xVC-Clone. If you would like to see this natively supported as part of PowerCLI's New-VM cmdlet, be sure to add a +1 to PowerCLI Feature Request here.

Quick Tip - NSX-T 2.1 upgrade gotcha, use NSX Manager IP Address vs Hostname

NSX-T 2.1 was released right before the holiday break and one of the biggest enhancement is support for VMware's upcoming Pivotal Container Service (PKS) which you can read more about here. Right before I took some much needed time off with the family, I had attempted an upgrade of my existing NSX-T 2.0 environment and ran into an issue which turned out to be a known issue but apparently it is no where listed on the NSX-T 2.1 release notes.

I figure I would share the simple workaround as it drove me a bit insane trying to figure out what was going on. After successfully uploading the NSX-T 2.1 upgrade bundle and then starting the upgrade, which starts off by updating the Upgrade Coordinator component and shortly after, I am shown the following error message:

This page is only available on the NSX Manager where Upgrade Coordinator is running. To enable the service, run the command "set service install-upgrade enabled" on the NSX Manager. If the install-upgrade service is already enabled, try disabling it using "clear service install-upgrade enabled" and then enable it again.

Although the error seems to suggest the Upgrade Coordinator service may not be running (it was), even after performing the recommended operations, the error message continue to persists as seen in the screenshot below.

After reaching out to Engineering as I was out of ideas on how to further troubleshoot the issue, it turns out this was actually a known upgrade issue going from 2.0 to 2.1. Instead of connecting to the NSX-T Manager using its Hostname (FQDN), I needed to use the IP Address instead for the upgrade. Once I logged back in using the IP Address, I was no longer seeing the error message and could proceed with my NSX-T upgrade as shown in the screenshot below.