VMware ESXi RAM Disk Full due to Adaptec’s ARCCONF Bug


Description

If you use ARCCONF for monitoring your Adaptec RAID controller, you may face a bug where Adaptec CIM Provider does not fully clean it’s temporary files and fills up the root RAM disk of the ESXi server.

Affected System

This is a confirmed system configuration. One may experience this bug with other versions of the software or hardware. The primary suspect for the bug is the CIM provider.

Symptoms

You are using ARCCONF to monitor Adaptec Controller on your ESXi server and you start to receive one or more of following errors in ESXi:

  • The VMRC console has disconnected…attempting to reconnect
  • unable to connect to the MKS: a general error occurred: internal error
  • ESXi logs have RAM disk is full errors.
  • vdf -h command in SSH show’s ram disk root as 99%-100% used:
    Ramdisk                   Size      Used Available Use% Mounted on
    root                       32M       32M        0M 100% --
    etc                        28M      280K       27M   0% --
    tmp                       192M      112K      191M   0% --
    hostdstats                249M        4M      244M   2% --

Cause

When querying ARCCONF GETCONFIG a log file /var/log/arcconf.log is created on the ESXi server. This log file is always appended and never cleaned by the driver.

RAM disk default size is 32Mb. The speed at which the RAM disk becomes full depends on the monitoring intervals and the actual config of the controller.  In our previous configuration, it took 60 days to fill up the disk. As our monitoring became more complex and with shorter intervals, it took 7 days. Keep in mind that the log is deleted if the server restarts. So, depending on circumstances, you may never notice the bug.

Official Fix

There is no known official fix as of 02-Oct-2013.

Workaround

The workaround is to clean the arcconf.log manually or using cron job. We use a cron job that cleans arcconf.log every two minutes.

*/2   *    *   *   *  /bin/echo > /var/log/arcconf.log

For the cron to be persistent across reboots, add following lines to the /etc/rc.local.d/local.sh

/bin/kill $(cat /var/run/crond.pid)
/bin/echo "*/2   *    *   *   *  /bin/echo > /var/log/arcconf.log" >> /var/spool/cron/crontabs/root
/usr/lib/vmware/busybox/bin/busybox crond

First line kills crond, second adds our ECHO command and third restarts crond.

—————————————————————–

UPD 18-10-2013: fixed typo in the crond schedule.

UPD 7-07-2014: fixed another typo in the crond schedule description.

6 thoughts on “VMware ESXi RAM Disk Full due to Adaptec’s ARCCONF Bug

  1. Nik

    /tmp # echo 1 >> /var/log/arcconf.log
    /tmp # cat /var/log/arcconf.log

    1
    /tmp # ls -l
    -rw——- 1 root root 0 Feb 13 10:06 31NuQ4
    -rw——- 1 root root 0 Feb 11 19:58 3eTUJa
    -rw——- 1 root root 0 Feb 10 13:28 64UiRB
    -rw——- 1 root root 0 Feb 11 20:05 GDJqXp
    -rw——- 1 root root 0 Feb 10 12:44 Rff5X9
    -rw——- 1 root root 0 Feb 11 19:49 TZowEp
    -rw——- 1 root root 0 Feb 12 12:48 UolH23
    -rw-r–r– 1 root root 201322481 Feb 14 12:03 arcconf.log

    Size of arcconf.log stay big.

    1. Alphall Post author

      Hi,
      You should use

       echo 1 > /var/log/arcconf.log
      

      only one ” >”, not “>>”. The first – overwrites the file. Second one, appends.

      Cheers!

      UPD: Plus, I just noticed, that you echo to arcconf.log located in /var/log, but you run ls in the /tmp
      If it’s not a symlink, then you should

      echo > /tmp/arcconf.log
    1. Alphall Post author

      aprogrammer,

      Yes, this article is not about monitoring but about a bug that may cause ESXi to fail while performing a monitoring task.

      Thank you for the link, it’s quite interesting! However we have found that using ARCCONF GETCONFIG alone is not sufficient for an adequate monitoring. It does not covers all failure modes of Adaptec’s controller. Nor it provides a way to predict a failure.
      We are planning to write an article about our monitoring procedure. Make sure to check back later!

      Regards!

  2. Dmitry

    */2 * * * * it’s not every half hour, it’s every two minutes …
    Every half an hour – */30 * * * *

Comments are closed.