I was just cleaning up one of my development labs and found that one of my VCSA (vCenter Server Appliance) which I had configured with vSphere Syslog Collector was no longer capturing logs for several of my ESXi hosts. After looking at some of the ESXi logs, I realized that I had rebooted the VCSA at one point and that caused an interruption in syslog forwarding and then knew immediately that I just needed to reload the syslog configuration via ESXCLI as noted in this VMware KB to restore log forwarding.

After restoring my syslog configurations, I had remembered a neat little trick I learned from one of the VMware TAMs about creating a vCenter Alarm to alert you when an ESXi host is no longer able to reach a remote syslog server. I thought this might be very handy alarm to have in your vCenter Server in case you hit a similar issue or having some connectivity issues with your syslog servers. By default, there is not an event on syslog connectivity but you can create a vCenter Alarm based on an eventId which shows up as “esx.problem.vmsyslogd.remote.failure” in both /var/log/hostd.log as well as /var/log/vobd.log.

Now that we know the eventId, we just need to create a vCenter Alarm which will notify us when it has a connectivity issue with it’s configured syslog server.

Step 1 – Create a new alarm, in this example I am calling it “Syslog Connection Error” and you will need to specify the Alarm Type as “Host” and monitor for a specific event.

Step 2 – Next, click on Triggers and we will go ahead and paste in our eventId which is “esx.problem.vmsyslogd.remote.failure”

Step 3 - Lastly, you can configure an Action, if you wish to send an SNMP trap, run a command or send an email notification. In this example, we are just going to generate a regular vCenter Alarm event, so go ahead and just click OK to save the alarm.

To test the alarm, I just disabled the syslog-collector on the VCSA using “service syslog-collector stop” and you should see an alarm generate for any ESXi hosts forwarding it’s logs to that particular syslog server.

So now when your ESXi hosts can not reach it’s syslog server, you will automatically be notified and can look into the problem immediately. Now having an alarm is great … but you might be wondering what about the need to reload the syslog configuration on all your ESXi hosts to restore syslog forwarding? This can definitely be a challenge/annoying, especially if the syslog server’s connectivity is returned after some amount of time and you have hundreds of hosts.

Well luckily, you no longer have to worry about this, with the latest ESXi 5.0 patch03 that was just released, this problem has been addressed and ESXi syslog daemon will automatically start forwarding logs once connectivity has been restored to the syslog server. It is still definitely recommended that you have more than one syslog server in your environment and that they are properly being monitored. Also, do not forget with ESXi 5.0 you can now configure more than one remote syslog server, for more details take a look at this article here.

Note: After applying the patch, you will no longer be able to generate an alarm based on the eventId for syslog when using UDP. You will see something like “Hostd [290D5B90 verbose 'SoapAdapter'] Responded to service state request” in the hostd.log. The alarm will only be valid if you’re using TCP or SSL protocol for syslog which have not been patched with latest p03.

If you are looking for a quick way to reload your syslog configurations, you can easily write a simple for loop to reload your ESXi hosts using the remote ESXCLI:

Here is another example using PowerCLI in-conjunction with ESXCLI:

2 thoughts on “Detecting ESXi Remote Syslog Connection Error Using a vCenter Alarm

  1. Hi William,

    Your article popped up as a referer on my website:) In the article your talking about patch03 for ESXi5; this resolves the issues with the syslog deamon. I was reading the changelog of patch03, the thing I found on the syslog deamon is:
    “PR 838922: An ESXi host might not restart UDP logging after a temporary interruption that might be caused by target server reboot or network UDP package being lost.”

    Is the issue also solved if TCP or SSL is used, this is an option after all…?

    Thx, Viktor

Thanks for the comment!