OK, the wait is finally over! In this final article, we will now walk through the process of getting access to this project as well as how to get this deployed in your own environment. For those that just want to see the code, you can find it at the Github project below:

Github Project: https://github.com/lamw/usb-to-sddc

Below are the details outlining the environment and software requirements as well as the instructions to consume this in your own home lab environment. The content below is a subset of what is published on the Github project, but this should get you going. For more details, please refer to the Github project and if you have any issues/questions, feel free to file a Github issue.

Environment Requirements:

  • USB key that is at least 6GB in capacity
  • Access to either macOS or Linux system as the script that creates the USB key is only supported on these two platforms
  • No additional USB keys must be plugged into the hardware system other than the primary installer USB key
  • Hardware system must have at least 2 disk drives which can either be 1xHDD and 1xSSD for running Hybrid vSAN OR 2xSSD for running All-Flash vSAN
  • Both Intel NUC 6th Gen and Supermicro E200-8D and E300-8D have been tested with this solution. It should work with other hardware systems that meet the minimum requirements but YMMV

Software Requirements:

Note: Other ESXi / VCSA 6.5.x versions can also be substituted, this includes the latest ESXi 6.5d (vSAN 6.6) release which I have also verified myself.

Usage:

Step 1 - Clone the Github repository to your local system by running the following command:

git clone https://github.com/lamw/usb-to-sddc.git

If you do not have git installed on your computer or if you prefer to just download the scripts manually, you can do so by downloading the following file below: https://github.com/lamw/usb-to-sddc/archive/master.zip

Step 2 - Change into the usb-to-sddc directory or extract the contents if you downloaded the zip file from Github instead.

Step 3 - Download all the files listed in the Software Requirements above to your local desktop.

Step 4 - Open the KS.CFG using your favorite text editor such as vi, Visual Studio Code, etc. Search for the following tag in the file # ---> START EDIT HERE <--- # which should be located on Line 10.

There are 25 variables as shown below which can be adjusted to customize your deployment:

VSAN_DISK_TYPE="AF"
PHOTON_IP="192.168.1.10"
PHOTON_CIDR="24"
PHOTON_GATEWAY="192.168.1.1"
PHOTON_DNS="192.168.1.1"
ESXI_IP="192.168.1.100"
ESXI_PASSWORD="VMware1!"
ESXI_NETMASK="255.255.255.0"
ESXI_GATEWAY="192.168.1.1"
ESXI_HOSTNAME="nuc.primp-industries.com"
ESXI_DNS="192.168.1.1"
VCSA_IP="192.168.1.200"
VCSA_HOSTNAME="192.168.1.200"
VCSA_PREFIX="24"
VCSA_GATEWAY="192.168.1.1"
VCSA_DNS="192.168.1.1"
VCSA_SSO_DOMAIN_NAME="vsphere.local"
VCSA_SSO_SITE_NAME="virtuallyGhetto"
VCSA_ROOT_PASSWORD="VMware1!"
VCSA_SSO_PASSWORD="VMware1!"
VCSA_SSH_ENABLED="true"
VCSA_CEIP_ENABLED="true"
VCSA_DATACENTER_NAME="VSAN-Datacenter"
VCSA_CLUSTER_NAME="VSAN-Cluster"
VCSA_WEBCLIENT_THEME_NAME="CormacHogan"

The variables should be pretty self-explanatory, but here are few that need some additional explanation:

  • VSAN_DISK_TYPE - Defines whether you have a Hybrid or All-Flash vSAN setup based on your physical disks. The valid values are HYBRID or AF.
  • PHOTON_IP - This is the IP Address of the DeployVM. If you are deploying in an isolated network (e.g. using cross-over cable between your laptop and server), make sure the network between DeployVM and ESXi host is on the same network.
  • PHOTON_CIDR - This is the CIDR network for DeployVM (e.g. 24 = /24 = 255.255.255.0)
  • PHOTON_GATEWAY and PHOTON_DNS - This is the Gateway and DNS Server for DeployVM.
  • VCSA_IP and VCSA_HOSTNAME - If you do not have valid DNS in your enviornment which both forward and reverse is functional, then make sure both these variables have the exact same IP Addresss or your VCSA deployment will fail as it will try to resolve the hostname (FQDN) with the DNS server you provided.
  • VCSA_WEBCLIENT_THEME_NAME - Defines the theme that will automatically be applied if you wish to customize the vSphere Web Client as described here. You can find the complete list of theme names here.

Step 5 - Next, edit either create_sddc_deployment_on_usb_for_osx.sh or create_sddc_deployment_on_usb_for_linux.sh depending on the platform that you will be using to create the USB installer. You will need to edit the following variables at the top of the script which will point it to the download files you had performed in Step 3.

  • UNETBOOTIN_APP_PATH - Complete path to the UNetbootin application directory, only applicable for Mac OS X users
  • ESXI_ISO_PATH - Complete path to the ESXi ISO
  • VCSA_ISO_PATH - Complete path to the VCSA ISO
  • ESXI_KICKSTART_PATH - Complete path to the KS.CFG
  • DEPLOYVM_ZIP_PATH - Complete path to the DeployVM zip

Step 6 - The next step is to now use the create_sddc_deployment_on_usb_for_X.sh to create our USB insaller. Plug a USB key into your system. Please be aware, all contents on the USB key will be wiped after you confirm the USB key that is to be used to create the automated installer.

First you need to identify the USB device that was plugged in to your system to make sure you select the right one.

On macOS - You can run diskutil list and identify the device which should look like /dev/diskX.

On Linux - You can run parted -l and identify the device which should look like /dev/sdX.

Next, to run the script you will need to use sudo and pass in the device that you had retrieved from the previous commands.

Here is an example of running on macOS system:

sudo ./create_sddc_deployment_on_usb_for_osx.sh /dev/disk4


Here is an example of running on Linux system:

sudo ./create_sddc_deployment_on_usb_for_linux.sh /dev/sdb


The script will automatically clear existing partitions and create the expected partition scheme. It will copy all the software packages you had downloaded from Step 3 and once it has completed, it will also unmount the USB device.

Step 7 - The final step is to now take the USB key and plug it into your system and simply power it on. If you want to verify that things are working, you can connect an external monitor and watch the installation but I will warn you, it is pretty boring 🙂 If things are going well, you should see the ESXi installer stay on the "reading installation file" for quite a while as this is where the majority of the time is spent during the %pre section where it forms the vSAN datastore and copies all the files from PAYLOAD partition over to vSAN.

Once ESXi has been successfully installed, which you can verify by observing it is on the main boot screen with an IP Address. You can open a browser to ESXi Embedded Host Client (e.g https://[IP]/ui) and login. Depending on when this is done, you may only see the DeployVM and/or VCSA being deployed. If you want to follow the remainder progress of the deployment, you can login to the DeployVM using the IP Address you had assigned it and the credentials is root/VMware1! by default.

Once logged into the DeployVM, you can tail /root/script.log which will give you the progress of the VCSA deployment and configuration.

Expected Timings

Here is what you can expect from a timing standpoint from creating the USB Installer to plugging it into your system and provisioning the SDDC. From the testing I have observed in my personal lab, the USB Installer took 11min and the USB to SDDC deployment took 45min, which is from the time I plugged it into the NUC and powered it on to point where I can login to the vSphere Web Client of the vCenter Server. Obviously, YMMV depending on your hardware configuration.

Process Estimated Time
Create USB Installer key 10-15 minutes
USB to SDDC deployment 45-60 minutes

25 thoughts on “Project USB to SDDC - Part 3

  1. “Access to either macOS or Linux system as the script that creates the USB key is only supported on these two platforms”

    Surely you can’t be serious

    • Ted,

      I would love to have multi-platform support and provide choice, however there’s a limitation w/Windows if you’re not familiar with it that prevents you from creating multiple partitions on a USB device (which is needed for this solution). There’s workarounds but that requires 3rd party tools and most don’t provide automation interfaces, so I wasn’t left with much choice. Now, you can still use the solution, but instead of having a script that automate the creation of the USB key, you will have to do that portion manually.

  2. Hi William,

    first of all thank you for the great post. Please keep up the good work.
    I just saw that the links arund the “VCSA_WEBCLIENT_THEME_NAME” description are not working in the post. On github they are fine. 😉

    Greetings from Cork,
    Benedikt

  3. Hi William,
    First of all thank you very much.

    I had to comment lines:
    #ls ${VCSA_ISO_DIRECTORY}/VCSA-part-* > /dev/null 2>&1
    #if [ $? -eq 1 ]; then

    #else
    # echo “VCSA ISO has already been splitted, skipping step …”
    #fi

    from create_sddc_deployment_on_usb_for_osx.sh
    because of the error:
    cp: cannot stat ‘/VCSA-part-*’: No such file or directory

    Also Linux USB creation was OK (some errors on compilation though, I think I need to force gcc-4.8) but install failed on %pre% line 209, could not investigate more for lack of time (sorry about that).

  4. Don’t you need the DNS server to depl\oy and run the vCSA? Or is it unnecessary if VCSA_HOSTNAME is set as an IP address?

    • Ideally yes, but for lab env, DNS may or may not be available. If its not available, then make sure IP and HOSTNAME is set to IP Address or else you will get a failed VCSA deployment

      • Thanks for the reply. I’ll take a try and modify the Deploy VM to also act as a DNS server since I may want to change the vCSA IP after deployment.

  5. Hi William,

    i got your script running on intel nuc7i3, using a prep´d iso. I used powercli:
    Add-EsxSoftwareDepot https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml
    new-esximageprofile -cloneprofile “ESXi-6.5.0-20170104001-standard” -name “ESXi-6.5.0-20170104001-standard-7nuc” -vendor vGhetto
    Remove-EsxSoftwarePackage -ImageProfile “ESXi-6.5.0-20170104001-standard-7nuc”-SoftwarePackage “net-e1000e”
    Remove-EsxSoftwarePackage -ImageProfile “ESXi-6.5.0-20170104001-standard-7nuc”-SoftwarePackage “ne1000”
    add-EsxSoftwarePackage -ImageProfile “ESXi-6.5.0-20170104001-standard-7nuc”-SoftwarePackage “net-e1000e 3.2.2.1-2vmw.600.3.57.5050593”
    Export-EsxImageProfile -ImageProfile “ESXi-6.5.0-20170104001-standard-7nuc” -ExportToIso -FilePath C:\Temp\esxi650nuc7.iso

    My Setup is nuc7i3 with 32 GB RAM, 1050 GB SSD and 128GB NVMe. The Kickstart script seems to have some issues if both, ssd and nvme have the same numbers but ssd is 1 scale larger. after installation i got 128GB on my vsanDatastore, as with this issue it used the ssd as cache and nvme as storage… after the 3rd run i just kicked in the correct names for my configuration. Is there any possibility of seeing the logs after installation? I haven´t found anything… my vdq -q output:
    [[email protected]:~] vdq -q
    [
    {
    “Name” : “naa.2020030102060804”,
    “VSANUUID” : “”,
    “State” : “Ineligible for use by VSAN”,
    “Reason” : “Has partitions”,
    “IsSSD” : “0”,
    “IsCapacityFlash”: “0”,
    “IsPDL” : “0”,
    },

    {
    “Name” : “t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03”,
    “VSANUUID” : “52aee84a-85f1-55ef-42c7-fa97824005c1”,
    “State” : “In-use for VSAN”,
    “Reason” : “None”,
    “IsSSD” : “1”,
    “IsCapacityFlash”: “1”,
    “IsPDL” : “0”,
    },

    {
    “Name” : “t10.NVMe____Force_MP500_____________________________170379320001225301C900000001”,
    “VSANUUID” : “52210736-bd44-17aa-0d8b-8f783d31c162”,
    “State” : “In-use for VSAN”,
    “Reason” : “None”,
    “IsSSD” : “1”,
    “IsCapacityFlash”: “0”,
    “IsPDL” : “0”,
    },

    running your code directly on my esx gives me for the nvme:
    localcli storage core device capacity list -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 | tail -1 | awk ‘{print $5}’

    114473

    and for the storage ssd:
    localcli storage core device capacity list -d t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 | tail -1 | awk ‘{print $5}’

    1001562

    And after i found out, that changing the passwords to something with a “$” was not my best idea (they are getting cropped on transfer to the deployvm), my nuc7i3 is running esxi6.5 and vcsa 6.5 on vsan ;o)

    HTH
    Patric

    • Hi Patric,

      Thanks for giving this a try. Strange to see the larger SSD get used as cache … I see in your vdq output that the correct device is tagged as capacity, was this something you did or the script? The logic on the capacity comparison can be seen in https://github.com/lamw/usb-to-sddc/blob/master/KS.CFG#L110-L124

      In terms of logs, yes you can look at /var/log/esxi_install.log which will log all my syslog entries. You can also add additional entries for additional debugging purposes

      • Hi William,

        my vdq -q output is my actual output, after i altered the script in lines 136 and 142 with the actual names of my SSDs, so there is no mixup ;o) I also checked /var/log/esxi_install.log, but it only contains a whole copy the script, not the actual logged lines.

        I will reinstall the using my esxi6.5 (i also want to add the startech usb driver into installation, so i can use the usb ethernet nic from beginning). I will then post my vdq -q and the parts of esxi_install.log.

        btw: thanks for your very good work, its just great 🙂

        • Hi again,

          as the log is a bit long, i pasted it here:
          https://pastebin.com/yGpwq8yt

          this is getting logged a couple of times, always with the same input. If i try to grep through the log using some specific wording of you, like “largest” (seen in line 126), i only get this as a result, hard to get anything out with that. There is also no variables filled, so i cant see the number mixup, or even which disk is largest disk.

          here is my vdq -q without changing the KS.CFG:
          [[email protected]:~] vdq -q
          [
          {
          “Name” : “naa.2020030102060804”,
          “VSANUUID” : “”,
          “State” : “Ineligible for use by VSAN”,
          “Reason” : “Has partitions”,
          “IsSSD” : “0”,
          “IsCapacityFlash”: “0”,
          “IsPDL” : “0”,
          },

          {
          “Name” : “t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03”,
          “VSANUUID” : “5281956e-126d-93b6-b341-f33a48cec4c4”,
          “State” : “In-use for VSAN”,
          “Reason” : “None”,
          “IsSSD” : “1”,
          “IsCapacityFlash”: “0”,
          “IsPDL” : “0”,
          },

          {
          “Name” : “t10.NVMe____Force_MP500_____________________________170379320001225301C900000001”,
          “VSANUUID” : “523b5481-a0ac-f97e-4d8c-3cd2f5477b52”,
          “State” : “In-use for VSAN”,
          “Reason” : “None”,
          “IsSSD” : “1”,
          “IsCapacityFlash”: “1”,
          “IsPDL” : “0”,
          },

          ]

          as you can see, the smaller NVMe flash is used as capacityFlash.

          Thanks

          Patric

    • As in can’t wait to see NSX, vRealize Automation, Network Insight and Log Insight into the mix!

  6. Patric,

    The log you’ve got doesn’t actually show me what the script has done. This is merely the parsing of the script 🙂 What I’m looking for is the actual output messages from these commands and easiest method is to just do grep VSAN-KS /var/log/esxi_install.log since all my messages a pre-fix with “VSAN-KS” and that should give us more info.

    You can also enable debugging within the Python script just change L70 https://github.com/lamw/usb-to-sddc/blob/master/KS.CFG#L70 from False to True and that should give us more info as well

    • Hi William,

      it does not matter, what keyword i use to grep through the esxi_install.log, there is nothing else logged, than the parsed script. It is logged a couple of times (i think 3-4 times). Also i read the logfile to check, if i can see anything, but there is nothing of interest in it 🙁

      I saw L70 a couple of days ago and tried to use it, as i thought, i could get my issue figured out alone. But if i activate debug by changing L70 to True, all i get is an error while reading the installation script (just after the installerboot is done) at line 259 (which is the end of python code). As i only have some basic python knowledge, i am not able to figure out, what is wrong, if debug = True.

      I just retried it with a fresh git clone and everything, same experience.

  7. absolutly, i am using

    esxcli vsan storage diskgroup unmount –ssd $SSDNAME

    and then delete the file systems using the webclient. i also used a live linux to remove all partitioning data in my tries.

    • Its actually much easier than that, just run the following two commands and you can reboot afterwards:

      esxcli vsan cluster leave
      esxcli vsan storage remove -s [INSERT-SSD-CACHE-ID]

      I’m still unable to determine why you’re having issues nor why you’re not seeing the expected logs … which I think may have to do with the system not being in a clean state. As mentioned, you can see the logic of the python script, it just iterates through devices seen in vdq and then compares the sizes. You can try adding “dryrun” to the top of the kickstart and manually try to run the section in python that does the size comparison to see if you can identify where/how its failing

      • OK, here’s a python script (https://pastebin.com/wmk5jGU2) that you can upload to ESXi host and run which will walk through the exact same logic. It only does print statements.

        Just create a normal bootable ESXi image on USB key and when it starts the interactive installer, enable SSH (/etc/init.d/SSH start) and upload this script and then run it and see what it returns

        Here’s sample output of what it should look like as it iterates through the disks:

        [[email protected]:~] python /tmp/simulate.py
        Found Disk: naa.6000c2995124c1351611d7f32912cbe2
        Running disk capacity command: localcli storage core device capacity list -d naa.6000c2995124c1351611d7f32912cbe2 | tail -1 | awk ‘{print $5}’
        Largest Capacity Disk so far: naa.6000c2995124c1351611d7f32912cbe2 (8192)
        Found Disk: naa.6000c29ede38305c2003b1df8b8cbeb8
        Running disk capacity command: localcli storage core device capacity list -d naa.6000c29ede38305c2003b1df8b8cbeb8 | tail -1 | awk ‘{print $5}’
        Largest Capacity Disk so far: naa.6000c29ede38305c2003b1df8b8cbeb8 (4096)
        Running disk capacity tagging command: localcli vsan storage tag add -d naa.6000c2995124c1351611d7f32912cbe2 -t capacityFlash
        Running disk group create command: localcli vsan storage add -s naa.6000c29ede38305c2003b1df8b8cbeb8 -d naa.6000c2995124c1351611d7f32912cbe2

  8. Actually, could you provide the output to the following:

    localcli storage core device capacity list -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001

    localcli storage core device capacity list -d t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03

    • Hi,

      i already posted the requested numbers in my first post, i guessed, you would be interested in them 😉
      here is the output of the python script:

      [[email protected]:/tmp] python script.py
      Found Disk: t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03
      Running disk capacity command: localcli storage core device capacity list -d t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 | tail -1 | awk ‘{print $5}’
      Largest Capacity Disk so far: t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 (1001562)
      Found Disk: t10.NVMe____Force_MP500_____________________________170379320001225301C900000001
      Running disk capacity command: localcli storage core device capacity list -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 | tail -1 | awk ‘{print $5}’
      Largest Capacity Disk so far: t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 (114473)
      Running disk capacity tagging command: localcli vsan storage tag add -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 -t capacityFlash
      Running disk group create command: localcli vsan storage add -s t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 -d t10.NVMe____Force_MP500_____________________________170379320001225301C900000001

      here are the requested capacities:

      Device Physical Blocksize Logical Blocksize Logical Block Count Size Format Type
      ———————————————————————————————————————————————————————
      t10.NVMe____Force_MP500_____________________________170379320001225301C900000001 4096 512 234441648 114473 MiB 512e

      Device Physical Blocksize Logical Blocksize Logical Block Count Size Format Type
      ————————————————————————————————————————————————————–
      t10.ATA_____Crucial_CT1050MX300SSD1_________________________1710161C4F03 512 512 2051200368 1001562 MiB 512n

  9. Hey William,

    just tinkered again (and learnd a lot of python basics, by the way, thats great, too), the main problem is, that diskSize is a string, while using “>” as operator is tricky, so i added some conversions into it. basically i convert the string into an integer and on my logging back to string, so python won´t bug around with printing an integer. My guess is, that as a string the 1001562 is bytewise compared smaller than 114473. I already forked the project with my changes.

    HTH

    Patric

Thanks for the comment!