Enums in Python

As you probably already know, enums are awesome because they allow you to refer to a set of items in a canonical way without having to worry too much about how the items are actually represented. However, this awesomeness is not available out of the box in Python 2.x. Luckily, starting in Python 3.4, an enum implementation is one of the included batteries. Since we're running Python 2.7.x at Hearsay Social, we can't take advantage of the built-in enum library in Python 3.4. Also, there's some functionality that we require that's not part of Python 3.4 enum library and is not provided by the existing enum libraries on the Python Package Index, so we decided to roll our own.

Note: We didn't extend the backported enum34 library since we had already written our own enum library before PEP 435 was created. Luckily, our implementation is similar to enums in Python 3.4 so it should be fairly straightforward to depend on enum34. Feel free to send us a PR to make this change :)


We wrote our own enum library because none of the existing Python enum implementations support all of the following features:

  • Static code analysis (i.e. PyLint) to help find typos
  • Nesting enums
  • Built-in internationalization support
  • Identity comparison

So the problem with existing enum implementations is that they weren't designed with nested enums nor internationalization in mind. To clarify, a nested enum is the ability to associate enum values from one enum to enum values from another enum.

Take a look the 2 implementations of CardSuites below. The first implementation uses Enum from the enum34 package and the second implementation uses RichEnum from our own richenum package.

import gettext
    _ = gettext.translation("enums", fallback=True).ugettext

    from enum import Enum  # enum34 package

    class FMLCardColors(Enum):
        RED = "red"
        BLACK = "black"

    class FMLCardSuites(Enum):
        CLUBS = "clubs"
        DIAMONDS = "diamonds"
        HEARTS = "hearts"
        SPADES = "spades"

    FMLCardSuitesDisplay = {
        FMLCardSuites.CLUBS: _("clubs"),
        FMLCardSuites.DIAMONDS: _("diamonds"),
        FMLCardSuites.HEARTS: _("hearts"),
        FMLCardSuites.SPADES: _("spades"),
    }

    FMLCardSuiteToCardColor = {
        FMLCardSuites.CLUBS: FMLCardColors.BLACK,
        FMLCardSuites.DIAMONDS: FMLCardColors.RED,
        FMLCardSuites.HEARTS: FMLCardColors.RED,
        FMLCardSuites.SPADES: FMLCardColors.BLACK,
    }

    FMLCardColorsDisplay = {
        FMLCardColors.BLACK: _("black"),
        FMLCardColors.RED: _("red"),
    }

    print FMLCardSuitesDisplay[FMLCardSuites.DIAMONDS]  # prints "diamonds"
    print FMLCardColorsDisplay[FMLCardSuiteToCardColor[FMLCardSuites.DIAMONDS]]  # prints "black"
    print [k for k, v in FMLCardSuiteToCardColor.iteritems() if v == FMLCardColors.BLACK]  # prints a list of FMLCardSuites that are black
import gettext
    _ = gettext.translation("enums", fallback=True).ugettext

    from richenum import RichEnumValue, RichEnum

    class _CardColors(RichEnumValue):
        pass

    class CardColors(RichEnum):
        RED = _CardColors(canonical_name="red", display_name=_("red"))
        BLACK = _CardColors(canonical_name="black", display_name=_("black"))

    class _CardSuites(RichEnumValue):
        def __init__(self, canonical_name, display_name, color):
            self.color = color
            super(_CardSuites, self).__init__(canonical_name, display_name)

    class CardSuites(RichEnum):
        CLUBS = _CardSuites(canonical_name="clubs", display_name=_("clubs"),
                            color=CardColors.BLACK)
        DIAMONDS = _CardSuites(canonical_name="diamonds", display_name=_("diamonds"),
                               color=CardColors.RED)
        HEARTS = _CardSuites(canonical_name="hearts", display_name=_("hearts"),
                             color=CardColors.RED)
        SPADES = _CardSuites(canonical_name="spades", display_name=_("spades"),
                             color=CardColors.BLACK)

        @classmethod
        def from_color(cls, color):
            return [e for e in cls if e.color == color]

    print CardSuites.DIAMONDS.display_name  # prints "diamonds"
    print CardSuites.DIAMONDS.color.display_name  # prints "black"
    print CardSuites.from_color(CardColors.BLACK)  # prints a list of CardSuites that are black

As you can see, both implementations offer similar functionality. Both the Enum in enum34 and RichEnum use metaclasses to create the enum class, so you can run static analysis programs like PyLint on your code to check for errors. The usage of metaclasses also allows for comparison by identity.

However, one big difference is that the RichEnum implementation is easier to maintain and allows for cleaner code. As demonstrated in the first implementation, if you want to associate multiple attributes/values to a single enum (i.e. nest enums), then you need to create and maintain another structure per additional attribute/value to map the enum to. It's easier to add a new attribute to each enum value than to add another dictionary that maps all the enum values to the new attribute. Since associating enums is so simple, adding classmethods to get enums based on a related enum is also straightforward. RichEnum also has built-in support for internationalization since it forces you to set a display_name upfront.

Sometimes we want to specify an ordering for our RichEnums. That's where OrderedRichEnum comes in.

Using the implementation of CardSuites above:

print list(suite.display_name for suite in CardSuites)  # prints [u'hearts', u'spades', u'clubs', u'diamonds']

With OrderedRichEnum, we can control the order in which the enum values are returned during iteration.

import gettext
    _ = gettext.translation("enums", fallback=True).ugettext

    from richenum import OrderedRichEnumValue, OrderedRichEnum

    class _OrderedCardColors(OrderedRichEnumValue):
        pass

    class OrderedCardColors(OrderedRichEnum):
        RED = _OrderedCardColors(index=1, canonical_name="red", display_name=_("red"))
        BLACK = _OrderedCardColors(index=2, canonical_name="black", display_name=_("black"))

    class _OrderedCardSuites(OrderedRichEnumValue):
        def __init__(self, index, canonical_name, display_name, color):
            self.color = color
            super(_OrderedCardSuites, self).__init__(index, canonical_name, display_name)

    class OrderedCardSuites(OrderedRichEnum):
        CLUBS = _OrderedCardSuites(index=1, canonical_name="clubs", display_name=_("clubs"),
                                   color=OrderedCardColors.BLACK)
        DIAMONDS = _OrderedCardSuites(index=2, canonical_name="diamonds", display_name=_("diamonds"),
                                      color=OrderedCardColors.RED)
        HEARTS = _OrderedCardSuites(index=3, canonical_name="hearts", display_name=_("hearts"),
                                    color=OrderedCardColors.RED)
        SPADES = _OrderedCardSuites(index=4, canonical_name="spades", display_name=_("spades"),
                                    color=OrderedCardColors.BLACK)

    print list(suite.display_name for suite in OrderedCardSuites)  # prints [u'clubs', u'diamonds', u'hearts', u'spades']

Again, adding support for ordering enum values to the naive enum implementation would be annoying and a pain to maintain. It's easier to add an attribute than to add another dictionary.

Stay tuned for part 2: RichEnums and OrderedRichEnums in Django!

Circular References in Python

One of the more convenient aspects of writing code in interpreted languages such as Python or Ruby is that you normally can avoid dealing with memory management. However, one known case where Python will definitely leak memory is when you declare circular references in your object declarations and implement a custom __del__ destructor method in one these classes. For instance, consider the following example:

class A(object):
    def __init__(self, b_instance):
      self.b = b_instance

class B(object):
    def __init__(self):
        self.a = A(self)
    def __del__(self):
        print "die"

def test():
    b = B()

test()

When the function test() is invoked, it declares an instance of B, which passes itself to A, which then sets a reference to B, resulting in a circular reference. Normally Python's garbage collector, which is used to detect these types of cyclic references, would remove it. However, because of the custom destructor (the __del__ method), it marks this item as "uncollectable". By design, it doesn't know the order in which to destroy the objects, so leaves them alone (see Python's garbage collection documentation for more background). You can verify this aspect by forcing the Python garbage collector to run and inspecting what is set inside the gc.garbage array:

import gc
gc.collect()
print gc.garbage
[<__main__.B object at 0x7f59f57c98d0>]

You can also see these circular references visually by using the objgraph library, which relies on Python's gc module to inspect the references to your Python objects. Note that objgraph library also deliberately plots the the custom __del__ methods in a red circle to spotlight a possible issue.

You would just need to make a call to objgraph.show_backrefs() to show this issue:
def test(): b = B()

    import objgraph
    objgraph.show_backrefs([b, b.a], refcounts=True)

While much of this knowledge is usually well-known about Python, it still often shows up. Introducing circular references can often come up when trying to implement connection pooling, whereby a number of network connections are collected and reused by a process. In these cases, a connection pool object is created that is provided a reference to various connection objects. The connection itself must be able to report status information back to the pool. Since the network connection must often be terminated gracefully by closing sockets and/or file descriptors, developers often tried to implement their own __del__ destructor methods, which invariably could create a potential memory leak.

To avoid circular references, you usually need to use weak references, declaring to the interpreter that the memory can be reclaimed for an object if the remaining references are of these types, or to use context managers and the with statement (for an example of this latter approach, see how it was solved for the happybase library).

Also, if you think you've not been guilty of introducing circular references, think again! If you've ever tried to dump a stack trace within a function and assigned a local variable to the third value in the tuple, you are actually creating a cycle with the stack frame and the local variable.

def main():
    try:
        raise Exception('here')
    except:
        pass

    exc_info = sys.exc_info()
    import objgraph
    objgraph.show_backrefs([exc_info, main])

main()

Again, this cycle can be shown visually:

In this specific case, the memory can normally be reclaimed since cyclic references alone aren't going to cause leaks, but forcing Python to handle the detection requires special heuristics to do so it's best to clean up after introducing this cyclic reference. In this example, you should add a finally: clause and delete the local variable.

def main()
    try:
        raise Exception('here')
    except:
        pass

    exc_info = sys.exc_info()
    del exc_info[2]

Hearsay Social Hosts San Francisco PyData Meetup

On Thursday May 2, 2013 Hearsay Social welcomed members of the San Francisco PyData meetup to our living room. Wes McKinney, the founder of the Python pandas library spoke about the tool and how it makes data analysis easy.

Hearsay Social is a long time supporter of the Python community and numerous open source libraries. Along with sponsoring the annual PyCon conference and offering space for meetups, Hearsay Social engineers are encouraged to contribute to the libraries & frameworks we use everyday: Django, Celery, Chef, etc.

For his talk, Wes walked through the process of analyzing a GitHub repository using pandas. In real time he pulled data from the API, organized it using pandas and began looking for answers to questions about who was contributing the most, how quickly bugs were being fixed and how activity changed around major releases. Everything was done using an iPython notebook which is available for download along with the slides here:

https://www.dropbox.com/sh/05d16q8zm5uozkl/dD2mHWqhhI