Django, SQLite, GLOB, CAST and sorting
November 2017
Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
About
This site is an effort to share some of the base knowledge I have gathered through all this years working with Linux, FreeBSD, OpenBSD, Python or Zope, among others. So, take a look around and I hope you will find the contents useful.
Recent Entries
Recent Comments
Recent Trackbacks
Categories
OpenBSD (9 items)
BSD (0 items)
FreeBSD (19 items)
Linux (3 items)
Security (3 items)
Python (22 items)
Zope (13 items)
Daily (144 items)
e-shell (9 items)
Hacks (14 items)
PostgreSQL (3 items)
OSX (8 items)
Nintendo DS (0 items)
enlightenment (0 items)
Apache (3 items)
Nintendo Wii (1 items)
Django (24 items)
Music (12 items)
Plone (7 items)
Varnish (0 items)
Lugo (2 items)
Sendmail (0 items)
europython (7 items)
Cherokee (1 items)
self (1 items)
Nature (1 items)
Hiking (0 items)
uwsgi (0 items)
nginx (0 items)
cycling (9 items)
Networking (1 items)
DNS (0 items)
Archives

Syndicate this site (XML)

RSS/RDF 0.91

26 octubre
2015

Django, SQLite, GLOB, CAST and sorting

... or how to sort (properly) by a char field containing number values

Some days ago I found an interesting dilemma while working on a django based application. The application has a model similar to this one:

class Customer(models.Model):
    code = models.CharField(_('Customer code'), max_length=10)
    joined = models.DateField(_('Joined date'), auto_now_add=True)
    first_name = models.CharField(_('First name'), max_length=255)
    last_name = models.CharField(_('First name'), max_length=255)
    email = models.EmailField(_('Email address'))

    ...

That code attribute is a CharField where alphanumeric customer codes are stored. Those codes are mostly numbers (1,2,3...N) but then there are also codes like 2a, 3-groupB, etc.

At one point, it became useful to sort such users based on that code. If that would be an IntegerField, it would be as easy as using order_by:

from .models import Customer
customers = Customer.objects.all().order_by('code')

Or, if you want the results sorted in descending order:

customers = Customer.objects.all().order_by('-code')

But then, this was not an IntegerField, but a CharField, which means that using order_by would return something like:

0
1
10
11
12
13
...
2
20
21
...
3
...

And so on. Not exactly what was needed.

The following is the write-up covering the little adventure that was the work to fix it.

My first thought was - hey, this is a string, it will be properly sorted if we just fill the numbers properly -. Using python's zfill it would be really easy to calculate the larger value, and then pad zeros to the smaller values and sort them.

Sounded good, but you can't tell django's ORM to sort by a model method or property. So that approach would mean writing some code to sort the results in python code, after retrieving them from the database. Something you could do, but that is not exactly a good choice (and then you would have to parse codes a bit, as they are not numeric only).

Doing it at database level sounded like the way to go. The database should be able to handle sorting efficiently, no need to write some more code at application level.

Thinking a bit more about it I did recall of the CAST function from PostgreSQL, that can be used to change dinamically the type of a field, for example when making a query. This means that instead of a query like this, that will sort the results incorrectly:

select * from customer order by code;

We could do something like:

select * from customer order by CAST(code as int);

Which would first transform codes into integers, then order the results by that. Nice! - I thought - That should do it. But it did not.

Remember that code has values that cannot be directly casted to an integer (3-groupB for example). That means that if you run that query, you will get an error like this one:

ERROR:  invalid input syntax for integer: "3-groupB"

Here is where the substring function from PostgreSQL becomes handy. Using such function, you can extract parts of a string/char value, matching patterns, even using some regexps, as it is needed in this case.

Something like that should do it:

select * from customer order by CAST(
    substring(code from '^[0-9]+')
    as int);

There, substring extracts characters from code that match numbers from 0 to 9, CAST changes that resulting string into an integer and the query orders the results by that.

"Got it" - I thought. "Now, let's tell django's ORM to use that".

That was the easy part. Django makes it really easy with the extra method of django QuerySets. Something like this should do it:

customers = Customer.objects.extra(select={
    'sorted_code': "CAST(substring(code from '^[0-9]+') as int)"}
).order_by('sorted_code')

"Great, that was easy, let's give it a try" - and then I opened a django shell and gave it a try, but all I got was a huge traceback ending with:

OperationalError: near "from": syntax error

"W T F" - I did exclaim. I did check the syntax like a thousand times, and it would have worked flawlessly on any PostgreSQL server. I was sure of that. And that was precisely the problem, for this project we have been using SQLite instead. FACEPALM.

Anyway, SQLite is good enough and should be able to do this too, maybe just using a slightly different syntax, right?

In SQLite we can use GLOB (instead of pgsql's substring) and then its own CAST implementation, both inside a CASE expression, to mimic what we did with pgsql. This SQL code should work:

select * from customer order by
case
  when code GLOB '[0-9]*' then cast(code as int)
  else code
end

This is a bit more "complete" than the previous code we used in PostgreSQL, as it has a fallback in case GLOB do not find any match to be casted, returning the value of code as it comes from the db [1].

No, let's translate this into django extra code:

cast = """
CASE
  WHEN code GLOB '[0-9]*' THEN CAST(code AS INT)
  ELSE code
END
"""
customers = Customer.objects.extra(
    select={'casted_code': cast}
).order_by('-casted_code')

Et Voilà. That code could be used anywhere in your app. In my case, I had to put it inside the get_queryset of a ModelAdmin subclass. This way the django admin shows customers sorted correctly by default:

class CustomerAdmin(admin.ModelAdmin):

    # here goes the usual inlines, list_display, list_filter, etc

    def get_queryset(self, request):
        qs = super(CustomerAdmin, self).get_queryset(request)
        cast = """
        CASE
          WHEN code GLOB '[0-9]*' THEN CAST(code AS INT)
          ELSE code
        END
        """
        qs = qs.extra(select={'casted_code': cast}).order_by('-casted_code')
        return qs
[1]A more complex example can be found here: http://stackoverflow.com/a/26055535/939007

Posted by wu at 11:07 | Comments (2) | Trackbacks (0)
<< Anxiety | Main | File locking in python >>
Comments
Re: Django, SQLite, GLOB, CAST and sorting

Nice post, had fun while reading :).

Posted by: r0sk at octubre 26,2015 23:34
Re: Django, SQLite, GLOB, CAST and sorting

I'm glad you enjoyed it! Thanks for reading.

Posted by: Wu at octubre 30,2015 12:53
Trackbacks
Please send trackback to:http://blog.e-shell.org/308/tbping
There are no trackbacks.
Post a comment