January 30, 2014

A few days after upgrading to Celery v3.1, we noticed that the Celerybeat worker that handles regular dispatching of jobs would often hang. Upon further inspection, we noticed that the process seemed to be blocked on reading from a certain file descriptor. What was baffling was that it wasn't waiting on a file descriptor connected to the RabbitMQ message broker. Instead, it seemed like the process was trying to read from a random socket.

Celery uses universally unique identifiers (UUID) generation for creating unique task ID's, and the Kombu library creates one for initializing the "remote control command exchange," also known as the pidbox queue. Python v2.7 relies on the libuuid library on Unix-based machines to create these, and the creation of a UUID is usually very quick. I could tell that the code was stuck inside the __uuid_generate_random() function to the libuuid library. So why would the process hang on a fairly simple operation?

The phenomenon only really occurred when running in --detach mode, which forks a process and executes in daemon mode. Without this flag, there was no issue. We could see that Celerybeat would start a few tasks but freeze and fail to launch subsequent tasks.

By monitoring the Celery startup process, I noticed that an attempt was made to spawn off a process called /usr/sbin/uuidd. The libuuid library attempts to do so when invoked by the uuid_generate_time() function, which is ultimately executed on startup when Celery needs to generate a UUID for the pidbox. The problem starts to happen when launching this uuidd daemon fails. In Ubuntu v12.04 and Debian Squeeze, the related /var/run/uuidd directory needs to be created with the correct permissions. Without it, the daemon will be unable to spawn and libuuidd code will fallback to reading from /dev/urandom.

Because this operation is done before forking the process, a file descriptor is opened. To avoid concurrent writes by a parent and child process, the standard practice is to close all existing file descriptors. Celery performs this operation after forking the process. However, the libuuid library (v2.20) declares open file handles to /udev/random to be static, which means it will remain resident for the lifetime of the program:

static int get_random_fd(void)
    {
      static int fd = -2;
      .
    }

Consider what happens the next time a call is made to this function via uuid.uuid4(), which is called whenever Celery needs to generate a unique ID for a task. Since a file descriptor to /dev/urandom will only be created once (the initial value is set to -2, subsequent values will use the existing file descriptor), the read operation will fail because it has already been closed. There is no attempt to reopen /dev/urandom. Fortunately, in this case, there is a fallback mechanism in the library code to generate random data via the rand() function:

/* We do this all the time, but this is the only source of
    * randomness if /dev/random/urandom is out to lunch.
    */
    for (cp = buf, i = 0; i < nbytes; i++)
      *cp++ ^= (rand() >> 7) & 0xFF;

What happens though if the file descriptor opened actually corresponds to one that gets reclaimed later? In this situation, the read() operation could succeed by reading from the wrong file descriptor. In our case, we noticed that the connection was attempting to read from a UDP socket, which we traced down to a connection to the statsd client.

We were able to bypass this problem by starting the uuidd daemon before Celery loads. By doing so, fallback reads to /dev/urandom are avoided before the process is forked. Subsequent versions of libuuid (v2.24) appear to have this problem fixed, though the Debian/Ubuntu packages are still lagging in terms of upgrading. You could also simply avoid the --detach mode as well, but this approach may complicate restarting with the restart scripts provided.

For more details about this issue, see: https://github.com/celery/celery/pull/1822.



blog comments powered by Disqus