Store python PersistentMapping objects into an objects TreeSet
February 2017
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28        
About
This site is an effort to share some of the base knowledge I have gathered through all this years working with Linux, FreeBSD, OpenBSD, Python or Zope, among others. So, take a look around and I hope you will find the contents useful.
Recent Entries
Recent Comments
Recent Trackbacks
Categories
OpenBSD (9 items)
BSD (0 items)
FreeBSD (19 items)
Linux (3 items)
Security (3 items)
Python (22 items)
Zope (13 items)
Daily (144 items)
e-shell (9 items)
Hacks (14 items)
PostgreSQL (3 items)
OSX (8 items)
Nintendo DS (0 items)
enlightenment (0 items)
Apache (3 items)
Nintendo Wii (1 items)
Django (24 items)
Music (12 items)
Plone (7 items)
Varnish (0 items)
Lugo (2 items)
Sendmail (0 items)
europython (7 items)
Cherokee (1 items)
self (1 items)
Nature (1 items)
Hiking (0 items)
uwsgi (0 items)
nginx (0 items)
cycling (7 items)
Networking (1 items)
DNS (0 items)
Archives

Syndicate this site (XML)

RSS/RDF 0.91

09 mayo
2016

Store python PersistentMapping objects into an objects TreeSet

When the OOTreeSet refuses to add new PersistentMapping objects...

I've been working with ZODB for quite some time now. First when I was doing Zope based web development (oh, the ol' good^hard days), then in some courses I gave about persistence and data storage in python, nowadays working with pyramid and still using ZODB for some apps.

Some days ago, working on one of those pyramid based projects, I found something really weird, something that was not working as expected, and it took me some time to figure out what was it. I'd like to share it, just in case it happens to any of you.

When working with ZODB, there are a couple of packages you will probably use: persistent and BTrees. Both are related to adding persistence to python objects, so you can store them easily in the objects database. Long story short, they make your life a bit easier.

Let's show some sample source code:

from persistent.mapping import PersistentMapping
from functools import total_ordering

@total_ordering
class Box(PersistentMapping):
    __parent__ = __name__ = None
    def __init__(self, name):
        self.name = name
        super(Box, self).__init__()
    def __repr__(self):
        return "<%s.%s %s>" % (self.__class__.__module__,
                               self.__class__.__name__,
                               self.name)
    def __str__(self):
        return self.name

    def __hash__(self):
        return hash(str(self))

    def __eq__(self, other):
        return self.__hash__() == other.__hash__()

    def __lt__(self, other):
        return self.__hash__() > other.__hash__()

    def __gt__(self, other):
        return self.__hash__() < other.__hash__()

Just a model called Box. I'm going to skip all the ZODB setup here, as it is not needed to show you the problem I've found. The class has the usual python methods __init__, __repr__, __str__ and a single name attribute.

It also has a __hash__ method, which is used by the builtin hash() when comparison operations are performed by things like a dictionary (for example to check for duplicates when adding items to a dict).

The __hash__ method is also used in the __eq__, __lt__ and __gt__ methods, just as example in this sample code. These methods should contain some more extensive logic that could let us know if two Box instances are equal or if one is less/greater than the other one.

Finally, the @total_ordering decorator is applied to the Box class. This decorator will extend our comparison methods (__eq__, __lt__ and __gt__) adding the ones that are missing (__le__, __ge__, etc). Basically less code typing.

Let's play a bit with this model in a python interpreter/shell:

>>> from models import Box
>>> sweets = Box(name='sweets')
>>> apples = Box(name='apples')
>>> onions = Box(name='onions')
>>> sweets == apples
False
>>> apples == onions
False
>>> onions == onions
True
>>> sweets > apples
True
>>> apples > sweets
False
>>> apples < sweets
True
>>> boxes = []
>>> boxes.append(sweets)
>>> boxes.append(apples)
>>> boxes.append(onions)
>>> boxes
>>> [<models.Box sweets>, <models.Box apples>, <models.Box onions>]
>>> boxes.append(sweets)
>>> boxes
[<models.Box sweets>, <models.Box apples>, <models.Box onions>, <models.Box sweets>]
>>>

We can create boxes, compare them (well, the great than/less than comparisons are a bit naive as I mentioned, but for the purposes of this, it works) and even add them into a list of boxes. We can the same Box instance multiple times to the boxes list. This works in the same way for both python 2.7.x and 3.x.

Now, when storing those lists of objects into a ZODB, in some situations using a OOTreeSet from the BTrees package is more efficient than storing a plain list. Let's play a bit more with a python 3 shell:

>>> from BTrees.OOBTree import OOBTree, OOTreeSet
>>> boxes_ts = OOTreeSet()
>>> boxes_ts.add(sweets)
1
>>> boxes_ts.add(apples)
1
>>> boxes_ts.add(onions)
1
>>> boxes_ts
<BTrees.OOBTree.OOTreeSet object at 0x10c1c30d0>
>>> list(boxes_ts)
[<models.Box sweets>, <models.Box apples>, <models.Box onions>]
>>> boxes_ts.add(sweets)
0
>>> list(boxes_ts)
[<models.Box sweets>, <models.Box apples>, <models.Box onions>]
>>>

As expected, we can add the different instances of Box to the treeset, but we cannot add duplicates there, so the last attempt to add the sweets box does not work.

Now, same thing in python 2:

>>> from BTrees.OOBTree import OOBTree, OOTreeSet
>>> boxes_ts = OOTreeSet()
>>> boxes_ts.add(sweets)
1
>>> boxes_ts.add(apples)
0
>>> list(boxes_ts)
[<models.Box sweets>]
>>> boxes_ts.add(onions)
0
>>> list(boxes_ts)
[<models.Box sweets>]
>>>

"WTF!" - was my first thought when I notice I couldn't add more Box instances into the treeset. They are different objects, different instances of the same class, but our class has all the needed methods so anything trying to compare both objects can find out that they are actually not the same object, right?

Well, actually not, at least for python 2.

It took me a while to figure out was happening, but first the help from ztane from the #pyramid channel in freenode, then after looking a bit at the source code from the BTrees package, I was able to find it.

The code that performs the comparison operation between two objects, before adding them to the TreeSet was using a cmp() function imported from BTrees._compat. Taking a look in the file BTrees/_compat.py I found:

if sys.version_info[0] < 3: #pragma NO COVER Python2

    PY2 = True
    PY3 = False

    from StringIO import StringIO
    BytesIO = StringIO

    int_types = int, long
    xrange = xrange
    cmp = cmp

    ...

else: #pragma NO COVER Python3

    PY2 = False
    PY3 = True

    from io import StringIO
    from io import BytesIO

    int_types = int,
    xrange = range

    def cmp(x, y):
        return (x > y) - (y > x)

    ...

So, if we run this code in python 3, it relies on a cmp() function defined in that module withing BTrees, which performs some gt/lt operations (covered by our __lt__ and __gt__ methods), but if we run it in python 2 it relies on the builtin cmp from the standard library.

The reason for doing it like that is explained here:

http://python3porting.com/problems.html#unorderable-types-cmp-and-cmp

Now, let's go back to the python 2 shell:

>>> cmp(sweets, apples)
0
>>> cmp(sweets, sweets)
0
>>>

Effectively, for the builtin cmp in python 2, those objects are the same.

Easy to fix, as cmp will rely on the __cmp__ method, if our class provides it. So we only have to extend our class code a bit:

from persistent.mapping import PersistentMapping
from functools import total_ordering

@total_ordering
class Box(PersistentMapping):
    __parent__ = __name__ = None
    def __init__(self, name):
        self.name = name
        super(Box, self).__init__()
    def __repr__(self):
        return "<%s.%s %s>" % (self.__class__.__module__,
                               self.__class__.__name__,
                               self.name)
    def __str__(self):
        return self.name

    def __hash__(self):
        return hash(str(self))

    def __eq__(self, other):
        return self.__hash__() == other.__hash__()

    def __lt__(self, other):
        return self.__hash__() > other.__hash__()

    def __gt__(self, other):
        return self.__hash__() < other.__hash__()

    def __cmp__(self, other):
        if self.__lt__(other):
            return -1
        if self.__eq__(other):
            return 0
        if self.__gt__(other):
            return 1

Now, back to the python 2 shell:

>>> from models import Box
>>> sweets = Box('sweets')
>>> apples = Box('apples')
>>> cmp(sweets, apples)
-1
>>> cmp(apples, sweets)
1
>>> cmp(apples, apples)
0
>>>

Much better, now let's try adding to a TreeSet:

>>> from BTrees.OOBTree import OOBTree, OOTreeSet
>>> boxes_ts = OOTreeSet()
>>> boxes_ts.add(sweets)
1
>>> boxes_ts.add(sweets)
0
>>> boxes_ts.add(apples)
1
>>> boxes_ts.add(apples)
0
>>> boxes_ts.add(sweets)
0
>>>

It works!

Now, some open questions...

According to the official python 2 docs for __cmp__:

Called by comparison operations if rich comparison (see above) is not defined.

So, shouldn't the builtin cmp() use the rich comparison methods in case __cmp__ is not available?.

If @total_ordering is supposed to fill in our code with the rest of the needed comparison methods (considering we already provided __eq__ and one of __lt__ or __gt__), shouldn't it add a __cmp__ method too in python 2?

Posted by wu at 08:47 | Comments (0) | Trackbacks (0)
<< The Flower Power Sprint | Main | Sentry email notifications not arriving? >>
Comments
There are no comments.
Trackbacks
Please send trackback to:http://blog.e-shell.org/322/tbping
There are no trackbacks.
Post a comment