September 2, 2015

In order to take advantage of a Python patch that helps resolve issues with high-memory usage for long-lived tasks, we recently tried to upgrade some our machines to Ubuntu 14.10, which includes a distribution of Python 2.7.8 that already incorporates this fix. After upgrading, we started to notice a large amount of cache misses in our memcached cluster. We initially thought the problem was related to cache evictions, but our dashboards all seemed to indicate that there was plenty of memory available for memcached to request additional amounts.

We started doing some experiments and noticed a strange behavior between memcache clients running on Ubuntu 14.04 and 14.10 machines. Often, writing a key/value pair on one machine couldn't be read back with a different Ubuntu distribution. On the same versions, the same key/value pair could be read back consistently. For instance, we would set a test pair with the following commands against our memcache cluster:

import pylibmc
cc = pylibmc.Client(['10.10.1.1:11211', '10.10.1.2:11211','10.10.1.3:11211'])
cc.behaviors = {'ketama': True, 'tcp_nodelay': True}
print cc.set('test', 123)
Attempts to read the result on both sides depended on which Ubuntu version initially set the key/value pair.
print cc.get('test')

Some background context: We chose to use the ketama hashing algorithm, which is designed to reduce the number of cache misses as you add or remove servers to your memcache cluster. Since memcached nodes do not replicate between each other, a hashing algorithm that doesn't completely cause the keys to be recalculated as nodes are removed or added is important to minimize the number of cache misses. For more on how the ketama hashing algorithms work, see here.

The only major apparent change between the Ubuntu versions was that the libmemcached versions were different. By building individual libmemcached versions, we were able to confirm that the problem started happening in versions after libmemcached 1.0.9. One indication that there was a difference was that calling get_behaviors() yielded different results depending on the libmemcached version. On Ubuntu 14.04, the ketama_weighted parameter was always being set. On Ubuntu 14.10, it was not.

We traced the problem down to a bug that was fixed in libmemcached 1.0.9:

if (MEMCACHED_DISTRIBUTION_CONSISTENT_WEIGHTED) // enum, which always resolves to true

It turns out the logic for checking whether to use weighted Ketama in libmemcached 1.0.9 was always resolving to true, causing this option to be used in Ubuntu 14.04 regardless of any options that you set. Because subsequent libmemcached versions had this issue corrected (i.e. Ubuntu 14.10), different hashing algorithms are essentially being used. Because memcached clusters do not replicate their data across all machines, choosing a different hashing algorithm results in a different machine being read.

The lesson here is that if you intend to use memcached and wish to upgrade some of your Ubuntu machines, the safest option is to use weighted Ketama. Obviously if you already have a mix of older and newer Ubuntu machines in production, you should either move to use to use weighted Ketama and/or upgrade all machines to use the newer Ubuntu version.



blog comments powered by Disqus