This past week and half, I have been spending quite a bit of time familiarizing myself with the recently released VMware Pivotal Container Service solution, also referred to as VMware PKS for short (yes, that is a K not a C which is a nod to Google's container scheduler Kubernetes). VMware PKS is part of a project that I am currently working on and I figure I would share the process and steps I took to deploy VMware PKS in my own personal lab, in case other folks are interested in trying out this neat and powerful solution for deploying Cloud Native Apps using Kubernetes which was co-developed between VMware, Pivotal and Google.
If you would like to learn more about this first release of VMware PKS and the benefits it provides to both developers (consumers) and operators (admins/SRE) for Kubernetes infrastructure, check out this blog post here. Merlin Glynn, one of the Product Managers for PKS also did an awesome light board video overview of VMware PKS if you want the sparks notes version. If you simply want to give PKS a try without deploying anything, the CNA folks have also published a PKS HOL which can you find here.
This will be the first, in a series of articles outlining my VMware PKS deployment and configuration which hopefully can help benefit others as it took me several attempts while learning about the solution. Although the first few articles will include manual guidance, rest assure, there will be some cool automation towards the end but I figure that folks may want to go through this once by hand to get a good understanding on all the different components and how they interact with each other. Plus, some of the PKS-specific automation is still being worked on by the product team and hopefully I will be able to share some of that real soon.
In the last few days, I have been ramping back up on NSX-T usage, especially as I take a closer look at VMware Pivotal Container Service (PKS) which hopefully I will be able to share more with an upcoming blog series (at least, thats the plan when I find some spare time). While working on some Automation using the NSX-T REST APIs and the new NSX-T PowerCLI cmdlets, I found it to be pretty challenging and frustrating to troubleshoot NSX-T error messages thrown from the PowerCLI cmdlets.
For example, below is a PowerCLI snippet which I wrote to create a new T1 Logical Router in NSX-T.
However, when I ran the code, PowerCLI threw back a pretty generic and useless error which I am sure some of you have probably seen before "A server error occurred: 'com.vmware.vapi.std.errors.invalid.frequest': . Check $Error.Exception.ServerError". Even checking the error variable as stated in the error message did not yield any further details or specific to the issue I was running into.
I was recently made aware of this article in which the author, Karim Elatov, had successfully demonstrated the use of a Sonnett Thunderbolt 2 to 10 Gigabit Ethernet Adapter with ESXi running on an Apple Mac Mini. As far as I am aware of, this may be the first public confirmation that such a device would work with ESXi, not to mention having it functional on the Mac Mini. I know in past years, there have been unconfirmed reports on various forums mentioning a Thunderbolt to 10GbE solution that works with ESXi but it was unclear on whether custom drivers were needed or if it would even work with newer versions of ESXi.
This topic has been popular amongst our customers who virtualize Apple MacOS on vSphere. In fact, several years back I had written an article on Thunderbolt Storage for ESXi, which includes a number of solutions that our customers have implemented to provide remote storage for their vSphere infrastructure running on either an Apple XServe, Mac Pro or Mac Mini. Questions around a functional Thunderbolt to 10GbE has definitely been asked about, but I had never heard from any customer who have had a successful story to share, at least until now.
From Karim's post, it looks like he was able to get this working using ESXi 6.0 but it was unclear if there was anything he needed to do to get the device recognized. I reached out to Karim and he was able to confirm that the Thunderbolt device was recognized by ESXi without any additional driver installation. In fact, if you look at this console output on his blog, you will see that it simply uses the inbox Intel ixgbe driver. I had also asked if Karim tried this with the latest version of ESXi, which is currently at 6.5 Update 1. Karim was kind enough to perform one additional test for me which was to confirm the device would still work with the latest ESXi release, which you can see for yourself in the screenshot below.
While catching up on emails from being out on paternity leave, I came across a VSAN HCL feature request that I had filed in early December of last year. I was pleasantly surprise to see that it had been marked resolved. The VSAN Hardware Compatibility List is continuously being updated to include new hardware platforms and components that have been certified by our hardware vendors.
This of course is great news for our customers, but one challenge that I have heard on several occasion is how to keep up with the changes, especially the addition of new components? The VSAN HCL allows you to subscribe and get notification on individual components which is useful to get the latest recommendation. However, in the case of new components being add or even removed, there is no easy solution and most folks resort to manually checking the VSAN HCL by hand.
VSAN itself does include an HCL healthcheck, which you can have it directly pull the latest VSAN HCL JSON from VMware's website, although this may not always be possible for customers without direct or proxy access to the internet. This means for customers who may have dark sites, they must periodically download the latest VSAN HCL JSON and distribute that throughout their vCenter Server, where as a more ideal solution is to only push the update when a new JSON has been provided by VMware.
My feature request was quite simple, it asks for a method in which customers can easily subscribe and get notified of when the VSAN HCL database has been updated. This has been delivered as a tiny RSS feed service.
In case you may not be aware, Intel recently notified VMware that certain Intel Broadwell and Haswell CPUs are affected by Intel Sighting after applying the latest microcode update to remediate against the Spectre vulnerability. VMware has published the following KB 52345 which provides more details on the affected Intel CPUs along with the recommended workaround in case you have already applied the latest ESXi patches containing the faulty microcode. I highly recommend you carefully read over the KB before, even if you have not applied the ESXi patches proceeding further.
With this updated news, I have also updated my existing Spectre verification script (found here) to include the additional Intel Sighting information which can help customers easily identify whether they have ESXi hosts that are impacted. In this article, I will provide a solution to help automate the deployment of the Intel Sighting remediation as outlined in the KB article, but unlike the manual steps outlined in the KB, SSH access to the underlying ESXi host will NOT required.
Step 2 - Run the "Verify-ESXiMicrocodePatch" function against a specific vSphere Cluster or ESXi host to determine if you are impacted by the Intel Sighting issue. Below is a screenshot for a system which is affected by Intel Sighting and we can determine this by seeing a value of "True" under the IntelSighting column.
Step 3 (Optional) - This step is optional, but I wanted to demonstrate how you can tell whether the Intel Sighting workaround has been applied correctly. You can use the "Verify-ESXiMicrocodePatchAndVM" function which provides information from a Virtual Machine's perspective and whether the new CPU instructions are exposed to the VM. In the screenshot below, I have a Test VM called TestVM-03 that has been powered on and as you can see, the three new CPU instructions (IBR,IBPB and STIB) are present as I have not applied the Intel Sighting workaround.
Step 4 (Optional) - This step is also optional, but I wanted to demonstrate how you can quickly check the contents of /etc/vmware/config without requiring SSH access or direct ESXi Shell access. You can use the "Get-Esxconfig" function and specify an ESXi host to query. The screenshot below confirms that we have not deployed the Intel Sighting workaround.
Step 5 - Once we have identified the ESXi hosts that are impacted by Intel Sightin (See step 2), we need to create a text file that contains the Hostname/IP of ESXi hosts (one on each line) that we wish to remediate. This will then be used as an input to the remediation function. In the example here, I have a single host to remediate and have added its name to a file that I have called "affected_hosts".txt (it can be named anything).
Note: The remediation does NOT require the ESXi host to reboot for the changes to go into effect, but you may still want to consider following standard procedures of putting the host into Maintenance Mode if you wish to control when VMs will see the masked instruction which will require a VM power cycle (OS restart is not sufficient).
To apply the remediation, you will use the "Set-IntelSightingsWorkaround" function which has a single mandatory parameter called AffectedHostList. Below is a screenshot of running the function and the remediation is fairly quick as it is merely appending a single string to the /etc/vmware/config configuration file. After the remediation has completed, we can then use the "Get-Esxconfig" function to confirm that we have added the masked CPU instruction to the ESXi host as shown in screenshot below.
Step 6 - As the KB mentions, for the changes to go into affect, you will need to perform a complete power cycle of your VMs before they can see the new masked CPU instructions. If you recall earlier, I had a VM called TestVM-03 which saw the three new CPU instructions. If we power cycle the VM and re-run the "Verify-ESXiMicrocodePatchAndVM" function against our VM, we should now see that the three CPU instructions have been properly masked away as shown in the screenshot below.
When it is time to remove the workaround, the process is also very straight forward as well. You can use the "Remove-IntelSightingsWorkaround" function which also accepts a list of ESXi hosts to remediate. Once the remediation has completed and you wish to expose the new CPU instructions to the VMs, you will also need to power cycle the VMs for CPU instructions to be unmasked.