The idea of "Instant Cloning" a Nested ESXi VM (running ESXi in a VM) is not a new concept. In fact, I had shared a solution back in 2015 using the private VMFork APIs. However, what has changed is the ease of consumption, primarily due to the re-architecture of Instant Clone in vSphere 6.7 (more details here and here) which resulted in a public and simplified API. Some of you might ask, why not simply clone a Nested ESXi VM or create a Link Clone? What benefit would I get by using Instant Clone?
The answer is not only speed, but the fact that the instantiated VM is fully operational and ready to start executing where as a traditional full clone or linked clone requires a full OS boot up that can take up to several minutes to deploy and configure. This may not sound like much for a small number of Nested ESXi VMs, but as you increase the number of instances, Instant Clone really shines while still maintaining speed and the instant availability of the VM. As you can imagine, this definitely opens up for some interesting use cases whether it be for personal home lab or educational purposes like VMware HOL. In addition, we also have customers who deploy Nested ESXi not only at high scale but also with a high churn rate for development purposes, think CI/CD type of a workload who can also benefit from Instant Clone.
So how fast are we talking about? Lets say you wanted to test out the latest version of VSAN in vSphere 6.7, you would normally deploy 3 Nested ESXi VMs, power them up and wait for them to be ready on the network. With Instant Clone, you can deploy three fully functional Nested ESXi VMs in just 30seconds! As the VMs are instantly available for consumption, you can start the VSAN enablement workflow immediately and even parts of that can be baked into the Instant Clone workflow. With the ease of provisioning Nested ESXi VMs, you can simply maintain a catalog of ESXi templates which are in "frozen" states and then leverage Instant Clone to deploy just-in-time Nested ESXi environments and discard them once you are done. Pretty slick if you ask me! and something I plan on using going forward.
Disclaimer: Nested ESXi is still not officially supported by VMware. Please use at your own risk.
Step 1 - Create the base Nested ESXi template (I have tested both 6.5u2 and 6.7) which will we will use to Instant Clone from. You can either install ESXi in a VM by hand OR you can use one of my existing Nested ESXi Virtual Appliances (here and here). Make sure the VM is only configured with a single VMDK which should just contain the ESXi installation. If you are using my appliance, make sure to delete the second and third VMDK, the reason for this is that when we Instant Clone, the disk UUIDs will be duplicated and you will have conflicts when trying to create a VMFS volume or setup VSAN. The way we handle this is part of the Instant Clone instantiation where we will hot-add additional VMDKs based on your use case and this will ensure each VM will have unique UUIDs for each disk.
Step 2 - Download the Nested ESXi customize.sh script from my Instant Clone community repo and upload that to our base Nested ESXI template and ensure it has the execute permission (chmod +x customize.sh) before running. This script is responsible for prepping the VM prior to initiating the "freeze" operation and cleaning out any unique identities like the host UUID and vmkernel interfaces which is needed to ensure we do not have duplication in our Instant Clones. At this point, you can now run the script as shown in the screenshot below and it will perform a series of operations and then freeze the VM.
Step 3 - Next, download the PowerCLI driver script InstantClone-ESXi.ps1 which will be used to deploy new Instant Clones from our template. It will expect that you have access to my Instant Clone PowerCLI module, if not, please download that from here. The script has a number of variables, they should be pretty self explanatory but I will quickly go over them below:
- $SourceVM - This is the name of your base Nested ESXi template, replace it with whatever
$numOfVMs - This is the number of Instant Clone you wish to deploy, I recommend setting this to 1 to make sure it works before creating more
$ipNetwork - This defines the first three octets of your network (e.g. 192.168.30) if you are using static assignment or else you can ignore
$ipStartingCount - This defines the initial starting address (e.g. 50) and will increment by one based on the $numOfVMs variable, this is only applicable for static assignment or else you can ignore
$netmask - This defines the netmask for your network if you are using static assignment or else you can ignore
$dns - This defines the DNS server to use if you are using static assignment or else you can ignore
$gw - This defines the network gateway if you are using static assignment or else you can ignore
$networktype - This can be value of static or dhcp, where as static will require the above properties to be set. If you specify dhcp, then VM network that you have placed your base template must support DHCP or you will not receive IP Addresses when you deploy your Instant Clones
In addition to generating the appropriate guestinfo properties which will be fed to each of the Instant Clone for customization, we also need to generate a random UUID which will be used to configure each Instant Clone and ensure that they have unique identities, especially important if you plan to enable VSAN. I will not bore you with the details, you can refer to the code for the specifics but if this is not performed, you will definitely into a number of issues and this actually took me a bit of trial/error to figure out, so saved you a lot of the pain 🙂
Note: The PowerCLI script is just an example of what can be done, you can easily modify this to perform other tasks as part of the deployment workflow
The script currently assumes you will use these Instant Clones for VSAN, so it also hot-adds two VMDKs (4 and 8 GB respectively). If you do not want this to happen or if you want to change the size, go ahead and update the PowerCLI script. Once you have saved all your changes, we are now ready to run the script. If everything was configured correctly, your new Instant Clone Nested ESXi VMs should be up and running immediately after the script has completed. In the example below, I created three Nested ESXi Instant Clones.
If open the VM Console for one of our Instant Clones, you can get more details on what has occurred from within the ESXi VM such as updating the UUID, recreating the management VMkernel interface among other things. If you run into customization issues, they will show up here as well as stored in the logs under /ic-customization
I was also monitoring esxtop to see how much memory I was saving. The top image is the physical ESXi host prior to deploying the 64 Instant Clone (4GB) and the bottom is after deploying the 64 VMs and we can see we are saving a whopping 339GB of shared memory which is pretty insane given this was deployed to single SuperMicro E200-8D with just 128GB of physical memory!