How to create vCenter Alarm to alert on ESXi 5.5u1 NFS APD issue?

As some of you may have heard, there is currently a known issue with NFS based datastores (includes VSA NFS datastores) after upgrading to vSphere 5.5 Update 1. The issue causes NFS datastores to disconnect and go into an APD (All Paths Down) state. VMware is currently aware of the problem and you can follow KB 2076392 for the latest updates.

While going through my Twitter stream this morning, I noticed an interesting question from fellow Blogger and friend Jase McCarty who asked the following:

vsphere55u1-nfs-apd-alarm-2
I was quite surprised to hear that there were no vCenter Alarms being triggered for this issue. I decided to take a look at the KB to better understand the symptoms and see if there was anything I could do to help. From what I can tell, the only way to identify this particular problem is by looking at the logs which the KB has an example of what you would see.

Once I took a look at the logs, I knew there was at least two methods in which one could get alerts. One option would be to leverage vCenter Log Insight and create a query based on the particular string but no every customer is using Log Insight and it does require a bit of setup. The second more obvious option for me would be to key off of the VMkernel VOBs that are being generated which I have written about in the past for detecting duplicate IP Addresses for ESXi and VSAN component threshold count.

Here are the steps to create vCenter Alarm:

Step 1 – Create a new vCenter Alarm and give it a name. Select “Hosts” for Monitor and “Specific event occurring …” for Monitor for

vsphere55u1-nfs-apd-alarm-0
Step 2 – For the Trigger, you will add the following VOB entries (just copy/paste them in)

  • esx.problem.storage.apd.start
  • esx.problem.vmfs.nfs.server.disconnect
  • esx.problem.storage.apd.timeout

Note: The alarm will activate if ANY of the VOBs are seen since it is an OR statement. It would have been nice to be able to group these together to generate the alarm

vsphere55u1-nfs-apd-alarm-1
Once the alarm has been created, you will at least have a way to get notified if you are potentially affected by this problem. I would still highly recommend you subscribe to KB 2076392 for all the latest updates.

OVF template for creating Nested ESXi 3 or 32 node VSAN Cluster

Last week I had to build a couple of Nested VSAN environments for testing and of course I used my VSAN Nested ESXi OVF template to help expedite the deployment. After deploying the OVF for the third time to get my three Nested ESXi nodes, it hit me. Why am I doing this each time when I know I will need a minimum of three nodes for a proper VSAN environment? Not sure why I did not think of this earlier, but why not create a vApp that contains three Nested ESXi VM templates?

By leveraging the Dynamic Disk feature in OVF, I was able to create two tiny vApps (40KB & 410KB respectively) based off of my original Nested VSAN ESXi OVF template:

The only difference with these OVF templates is that you can now easily an quickly deploy a single OVF that will contain the minimal number of VSAN nodes up to the maximum supported which is 32.

Disclaimer: Nested Virtualization is not not officially supported by VMware, please use at your own risk

Prerequisite:

  • vSphere Web Client
    • To deploy either the single VSAN Nested ESXi OVF template or these new ones, you need to make sure you deploy using the vSphere Web Client. The reason for this is that the lossless OVF import/export feature is only available when using the vSphere Web Client, else you the import will not capture all the settings the OVF template was configured with.
  • vSphere Cluster w/DRS enabled
    • vApp creation is only possible when DRS is enabled

Step 1 – Deploy the OVF template using the vSphere Web Client and make sure you select “Accept extra configuration options” which contains extra parameters needed to run ESXi and VSAN in a nested environment.

nested-esxi-vsan-3-node-template-0
Step 2 – Go through the OVF deployment wizard as you normally would. When you get to “Customize Template” you will notice each Nested ESXi VM is in its own Category as seen in the screenshot below. Here you can leave the defaults for a minimal VSAN deployment which contains 2GB disk for ESXi installation, 4GB disk for an “emulated” SSD and 8GB disk for MD or you can specify the size for each disk.

nested-esxi-vsan-3-node-template-1
In just a couple of seconds, you will now have a vApp that contains either a 3-node Nested ESXi VM or you can go big and deploy a 32-node Nested ESXi environment.

nested-esxi-vsan-3-node-template-2
Note: Please note there maybe other configurations changes such as this one and/or increase in VM resources to run larger VSAN Clusters.

I know these OVF templates will come in handy for myself when needing to quickly deploy a VSAN running in a Nested ESXi environment and hopefully it will also benefit others in the community as well!

How to automatically monitor VSAN Component threshold using a vCenter Alarm?

There was an interesting VMware KB article that was shared by Ron Oglesby last week which had caught my eye.

vsan-component-threshold
I had noticed earlier in the week that Ron was interested in finding the current VSAN Component count which is exposed in a variety of interfaces: RVC (vSphere Ruby Console) available on both Windows and Linux as well as through the vSphere API. I even created some recent scripts here and here using the vSphere API to remotely query the number of VSAN components for each ESXi host. I much prefer this option from a management standpoint and not have to log into each individual ESXi host.

After looking at VMware KB 2071379, I can see why Ron had asked his question as I also felt the KB was incomplete. However, to the unsuspecting eye it may not be obvious but the KB actually does contain the answer but it does not really go into any details that can be consumed by a customer. In the article, it mentions that VSAN has the ability to trigger an alarm when the threshold of the number of VSAN components on a particular host has reached 80%. What the article lacks are the details of how and where this alarm is triggered. First off, the alarm mentioned here is for vCenter Server. Secondly, this is made possible through the use of the VOB (VMkernel Observation) ID mentioned in the article. You can actually create vCenter alarms based on these ESXi host generated VOBs which I have written about in the past such as this one on detecting duplicate IP Address for your ESXi hosts. The process in creating this vCenter Alarm is pretty straight forward and I agree that this alarm should have been created by default (something I will raise internally with the engineering team).

Here are the steps to create a vCenter Server Alarm to notify at the 80% VSAN Component threshold:

Step 1 – Create a new vCenter Server Alarm and give it a name and select “Monitor specific event …” for a Host and make sure it is enabled.

Screen Shot 2014-04-11 at 4.41.27 AM
Step 2 – Add in esx.problem.vob.vsan.lsom.componentthreshold for the Event

Screen Shot 2014-04-11 at 4.42.01 AM
Step 3 – You can leave the actions to be empty which will just generate a regular vSphere Alarm or you can specify an action.

Once we have our vCenter Alarm created, we will probably want to test and verify the alarm is working using either a Nested ESXi VSAN environment or an actual VSAN environment. The next question is how do we go about creating 2400 VSAN components? Well, instead of manually creating 2400 VMs which will probably take awhile, we can easily do so by leveraging a neat little utility found in the ESXi Shell called /usr/lib/vmware/osfs/bin/objtool

Disclaimer: The tools and scripts used in this article is mainly for education and information purposes The command below will create an object called object-1 with size 1KB leveraging the VSAN Policy of hostFailuresToTolerate=0 & forceProvisioning=1:

/usr/lib/vmware/osfs/bin/objtool create -s 1KB -a 3 -n object-1 -p “((\”hostFailuresToTolerate\” i0) (\”forceProvisioning\” i1))”

For this particular test, we just want to quickly create 2400 VSAN components. To do so, you will need about 32GB of memory to reach the maximum amount of supported VSAN Components. This should not be a problem for a “real” VSAN environment but for my Nested ESXi environment I had to increase my resources for this test. Since VSAN is a Distributed Object Store, the objects being created will be randomly placed within a VSAN Cluster. To quickly get to 2400 components, I also put 2 out 3 ESXi hosts into Maintenance Mode to ensure all objects are created o the first ESXi host.

vsan-component-count-alarm-3
Finally, to assist with the creation of these VSAN Objects automatically, I created a quick script which you can run in the ESXi Shell

The creation of each object will have an associated UUID which will be saved into a temporary file /tmp/uuid and you can use the following script to delete the object once you have confirmed the vCenter Alarm works.

Once the 2400 VSAN Component count has been reach, you should now see the alarm we created earlier triggering for reaching the 80% threshold.

vsan-component-count-alarm-0

Exploring VSAN APIs Part 9 – VSAN Component count

The last topic that I would like to explore before concluding my VSAN API blog series are some of the advanced VSAN disk statistics that are available for either troubleshooting or informational purposes. One such statistic that would be handy to know about is the number of VSAN Components per ESXi host, which I have already demonstrated in my recent VSAN configuration maximum query script and VSAN PowerCLI vCheck Plugins.

VSAN-components
These disk statistics are made available through the VSAN InternalSystem manager and using the QueryPhysicalVsanDisks() vSphere API method, we can either retrieve all or a specific set of properties for each ESXi host. I have created a sample vSphere SDK for Perl script called vsanDiskStatsQueries.pl that will demonstrate the use of this API.

Disclaimer:  These scripts are provided for informational and educational purposes only. It should be thoroughly tested before attempting to use in a production environment.

Here is an example of running the script against a VSAN Cluster which will produce the number of VSAN components for each ESXi host:

./vsanDiskStatsQueries.pl –server vcenter55-1.primp-industries.com –username root –cluster VSAN-Cluster

vsan-disk-component-query
If we take a look at the script, you will notice we filtered on two specific properties: lsom_objects_count and owner. One thing to note is the output for this method is JSON string, so you will need to parse the output accordingly.

The owner property indicates the UUID of a specific ESXi host and the lsom_objects_count represents the number of VSAN components. To be able to identify the particular ESXi host and compare it to the owner property, we use the QueryHostStatus() API which was discussed in Exploring VSAN APIs Part 5. Once we have a match for the current ESXi host, we simply extract the lsom_objects_count property which I use a simple hash table to keep track of the results and display it at the very end of the script.

This concludes my 9-part series of exploring the new VSAN APIs. Hopefully for those of you who followed the series have enjoyed it, I know I definitely had fun learning about the new APIs and how you can automate every aspect of VSAN from a scripting and programmatic perspective.

  1. Exploring VSAN APIs Part 1 – Enable VSAN Cluster
  2. Exploring VSAN APIs Part 2 – Query available SSDs
  3. Exploring VSAN APIs Part 3 – Enable VSAN Traffic Type
  4. Exploring VSAN APIs Part 4 – VSAN Disk Mappings
  5. Exploring VSAN APIs Part 5 – VSAN Host Status
  6. Exploring VSAN APIs Part 6 – Modifying Virtual Machine VM Storage Policy
  7. Exploring VSAN APIs Part 7 – VSAN Datastore Folder Management
  8. Exploring VSAN APIs Part 8 – Maintenance Mode
  9. Exploring VSAN APIs Part 9 – VSAN Component count

VSAN vCheck Plugins

After creating my VSAN Configuration Maximum query script I thought it would also be useful to create an equivalent set of VSAN vCheck Plugins. For those of you who have not heard of or used vCheck (pretty rare unless you do not use PowerShell/PowerCLI in your environment), it is a PowerShell reporting HTML framework created by Alan Renouf. vCheck allows you to schedule a series of PowerCLI scripts/checks against your vSphere environment and produces a daily report on the things you care most about such as datastore capacity being under a certain threshold or potential snapshots growing out of control in your environment.

Given this is the primary use case for vCheck, I figure it would make sense to implement these same set of VSAN configuration maximum checks in vCheck as well. This would also give me the opportunity to learn more about vCheck as I have never used it before. If you are new to vCheck, I highly recommend you check out Jonathan Medd‘s article on how to get started with vCheck here.

Here is a sample report of a real VSAN environment to get an idea of what the report could look like: VSAN-vCheck-Report.html

Below are the VSAN vCheck Plugins that I have created which also includes a bonus plugin which reports on the capacity of a VSAN Datastore. You can pick and choose the VSAN plugins that you want to use in your environment and then customize the threshold parameter for each report based on your requirements.

  1. 990 VSAN Capacity Report.ps1
  2. 991 VSAN Configuration Maximum Disk Group Per Host Report.ps1
  3. 992 VSAN Configuration Maximum Magnetic Disks Per Disk Group Report.ps1
  4. 993 VSAN Configuration Maximum Total Magnetic Disks In All Disk Groups Per Host Report.ps1
  5. 994 VSAN Configuration Maximum Component Per Host Report.ps1
  6. 995 VSAN Configuration Maximum Hosts Per Cluster Report.ps1
  7. 996 VSAN Configuration Maximum VMs Per Host Report.ps1
  8. 997 VSAN Configuration Maximum VMs Per Cluster Report.ps1

For those of you who are looking to evaluate VSAN in their environment, hopefully these VSAN vCheck reports will come in handy. If there are others that you feel that might be useful, feel free to leave a comment or contribute back to the vCheck project on Github.