June 16, 2013

One of the more convenient aspects of writing code in interpreted languages such as Python or Ruby is that you normally can avoid dealing with memory management. However, one known case where Python will definitely leak memory is when you declare circular references in your object declarations and implement a custom __del__ destructor method in one these classes. For instance, consider the following example:

class A(object):
    def __init__(self, b_instance):
      self.b = b_instance

class B(object):
    def __init__(self):
        self.a = A(self)
    def __del__(self):
        print "die"

def test():
    b = B()


When the function test() is invoked, it declares an instance of B, which passes itself to A, which then sets a reference to B, resulting in a circular reference. Normally Python's garbage collector, which is used to detect these types of cyclic references, would remove it. However, because of the custom destructor (the __del__ method), it marks this item as "uncollectable". By design, it doesn't know the order in which to destroy the objects, so leaves them alone (see Python's garbage collection documentation for more background). You can verify this aspect by forcing the Python garbage collector to run and inspecting what is set inside the gc.garbage array:

import gc
print gc.garbage
[<__main__.B object at 0x7f59f57c98d0>]

You can also see these circular references visually by using the objgraph library, which relies on Python's gc module to inspect the references to your Python objects. Note that objgraph library also deliberately plots the the custom __del__ methods in a red circle to spotlight a possible issue.

You would just need to make a call to objgraph.show_backrefs() to show this issue:
def test(): b = B()

    import objgraph
    objgraph.show_backrefs([b, b.a], refcounts=True)

While much of this knowledge is usually well-known about Python, it still often shows up. Introducing circular references can often come up when trying to implement connection pooling, whereby a number of network connections are collected and reused by a process. In these cases, a connection pool object is created that is provided a reference to various connection objects. The connection itself must be able to report status information back to the pool. Since the network connection must often be terminated gracefully by closing sockets and/or file descriptors, developers often tried to implement their own __del__ destructor methods, which invariably could create a potential memory leak.

To avoid circular references, you usually need to use weak references, declaring to the interpreter that the memory can be reclaimed for an object if the remaining references are of these types, or to use context managers and the with statement (for an example of this latter approach, see how it was solved for the happybase library).

Also, if you think you've not been guilty of introducing circular references, think again! If you've ever tried to dump a stack trace within a function and assigned a local variable to the third value in the tuple, you are actually creating a cycle with the stack frame and the local variable.

def main():
        raise Exception('here')

    exc_info = sys.exc_info()
    import objgraph
    objgraph.show_backrefs([exc_info, main])


Again, this cycle can be shown visually:

In this specific case, the memory can normally be reclaimed since cyclic references alone aren't going to cause leaks, but forcing Python to handle the detection requires special heuristics to do so it's best to clean up after introducing this cyclic reference. In this example, you should add a finally: clause and delete the local variable.

def main()
        raise Exception('here')

    exc_info = sys.exc_info()
    del exc_info[2]

blog comments powered by Disqus