File locking in python
May 2017
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      
About
This site is an effort to share some of the base knowledge I have gathered through all this years working with Linux, FreeBSD, OpenBSD, Python or Zope, among others. So, take a look around and I hope you will find the contents useful.
Recent Entries
Recent Comments
Recent Trackbacks
Categories
OpenBSD (9 items)
BSD (0 items)
FreeBSD (19 items)
Linux (3 items)
Security (3 items)
Python (22 items)
Zope (13 items)
Daily (144 items)
e-shell (9 items)
Hacks (14 items)
PostgreSQL (3 items)
OSX (8 items)
Nintendo DS (0 items)
enlightenment (0 items)
Apache (3 items)
Nintendo Wii (1 items)
Django (24 items)
Music (12 items)
Plone (7 items)
Varnish (0 items)
Lugo (2 items)
Sendmail (0 items)
europython (7 items)
Cherokee (1 items)
self (1 items)
Nature (1 items)
Hiking (0 items)
uwsgi (0 items)
nginx (0 items)
cycling (9 items)
Networking (1 items)
DNS (0 items)
Archives

Syndicate this site (XML)

RSS/RDF 0.91

29 octubre
2015

File locking in python

... or how to prevent (periodic) processes overlap

So, let's keep up with the practical techie posts.

If you do software development, almost any kind of it, at one point you will find this scenario where you are running some code in a process in a cron job (or any kind of periodic scheduler). Now, the crontab entry sets that this process has to be run, for example, each 5 minutes. One day 5 minutes is not enough for that process (that usually takes less than a minute) to finish... and there it goes the next call to run that code.

"What could go wrong?" (TM)

Well, depending on the code, maybe nothing happens, maybe an ugly mess will turn a nice day into a nightmare or maybe you will get a call in the middle of the night urging you to fix it ASAP.

Probably there will be a gazillion ways to fix it properly, and the fix will depend a lot on the code, what it does and how it was written to begin with.

In my case the solution seemed to be file locking, that is, create a locked file that could be checked before running the process. If the file is locked, that means another process is already running, so the current process should not start executing that code.

I was working on some python code (what else?), and python itself comes with some handy libraries and utilities to handle file locking: fcntl and specially fcntl.lockf. lockf has the benefit of being quite good at handling deadlocks (that is, something goes wrong with the process running the locking code and the lock file is left in a locked state, even if the code that locked it is not running anymore).

There are more options available, like zc.lockfile for example, but I usually prefer modules/code included in the standard library, unless the external package has too many benefits over it. Feel free to take a look at those options through Google or pypi.

Using lockf is really easy:

import fcntl
import sys

lock_filename = '/tmp/sample-locking.lock'
lock_file = open(lock_filename, 'w')

try:
    fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
    print('Cannot lock: ' + lock_filename)
    sys.exit(1)

print('Locked! Running code...')

quit = False
while quit is not True:
    quit = input('Press q to quit ')
    quit = str(quit) == 'q'

print('Bye!')
sys.exit(0)

This sample code tries to lock a file (/tmp/sample-locking.lock) and, if it can set the lock, it goes on executing some more code. In this case, a very simple loop waiting for a command to finish/quit (this is enough for the scope of this post, keeping the process running).

If you want to learn more about lockf, please refer to the official documentation here: https://docs.python.org/3/library/fcntl.html#fcntl.lockf | https://docs.python.org/2/library/fcntl.html#fcntl.lockf

Now, if we run this code in a python interpreter:

$ python lockexample.py
Locked! Running code...
Press q to quit q

The process will be running until we press the key q. Now, try to run it in a new python interpreter:

$ python lockexample.py
Cannot lock: /tmp/sample-locking.lock
$

As the first process was still running, the second process cannot acquire/set the lock, and so it stops running, showing an error message to the user.

Note: This code should run fine with both python 2.x and 3.x*.

This is a very simple example, but I guess you got the point here.

Pitfalls!

Easy, right? Well, it took me some time of writing code, testing, getting errors, looking through code, docs and sample code out there... until I got it right, because there are certain pitfalls you would be aware of.

Opening the lock file

I bet this line catched your eye:

lock_file = open(lock_filename, 'w')

And maybe you thought "hey, why just not use 'with' to open the file?". Maybe you thought something like this would be better (more pythonic):

with open(lock_filename, 'w') as lock_file:
    # Rest of the code goes indented here

That should work, but be sure all the code after that line is properly indented, being the whole block inside that with statement, otherwise the locking will not work, as the file will get closed as soon as the execution of the code leaves the with statement and the lock will be removed.

Refactoring code

At one point, maybe you would like to reuse the locking code. If your codebase is big enough, and this overlap problem happens somewhere else, maybe you prefer to have this code into a reusable piece of code, a function or a method.

Something like this [1]:

def lock(filename):
    lock_file = open(filename, 'w')
    try:
        fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
    except IOError:
        return False
    return True

Then, in your code, you could simply call it:

import fcntl
import sys

def lock(filename):
    lock_file = open(filename, 'w')
    try:
        fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
    except IOError:
        return False
    return True

lock_filename = '/tmp/sample-locking.lock'
locked = lock(lock_filename)

if not locked:
    print('Cannot lock: ' + lock_filename)
    sys.exit(1)

print('Locked! Running code...')

quit = False
while quit is not True:
    quit = input('Press q to quit ')
    quit = str(quit) == 'q'

print('Bye!')
sys.exit(0)

But if you try this in a couple of python interpreters, as I did before, you will notice both runs will acquire the lock. This is because once the lock function code is run, python's garbage collector will clean up stuff, closing the locked file and removing that lock.

So, if you have to lock multiple files, you will have to repeat that piece of code along your codebase.

Locking inside a single process (threading)

Finally, maybe you have this shiny code that runs some functions/methods in different threads... and maybe you feel the temptation of using this technique to prevent one thread to run some code if the code is being run in another thread... It will not work.

Let's bring in an example (very rough/simple example):

import fcntl
import threading

def task():
    lock_filename = '/tmp/sample-locking.lock'
    lock_file = open(lock_filename, 'w')
    try:
        fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
    except IOError:
        print('Cannot lock: ' + lock_filename)
        return False
    print('Locked! Running code...')
    quit = False
    while quit is not True:
        quit = raw_input('Press q to quit ')
        quit = str(quit) == 'q'
    print 'Bye!'
    return True

if __name__ == '__main__':
    print('creating threads')
    first_thread = threading.Thread(target=task, args=())
    second_thread = threading.Thread(target=task, args=())

    print('starting threads')
    first_thread.start()
    second_thread.start()

    print('joining threads')
    first_thread.join()
    second_thread.join()

    print('closing')

If you run this code in a python interpreter, you will see both threads report the file as locked and the code is executed in both of them:

$ python lockexample-same-code.py
creating threads
starting threads
joining threads
 Locked! Running code...
Press q to quit Locked! Running code...

Pressing q once will end the execution of the first thread:

q
Bye!
Press q to quit

Then pressing q again will end the execution of the second thread.

If you need locking between different threads in the same process, take a look at this in the official python documentation:

https://docs.python.org/2/library/threading.html

Specially the part about locking objects:

https://docs.python.org/2/library/threading.html#lock-objects

This article on Thread synchronization mechanisms in python would be helpful too:

http://effbot.org/zone/thread-synchronization.htm

[1]inspired on this post: http://linux.byexamples.com/archives/494/how-can-i-avoid-running-a-python-script-multiple-times-implement-file-locking/

Posted by wu at 09:35 | Comments (1) | Trackbacks (0)
<< Django, SQLite, GLOB, CAST and sorting | Main | PyConES 2015 >>
Comments
Re: File locking in python

Update: As part of some testing, I pulled the power plug from a box running that locking code (while the code was running). After the reboot, the file was unlocked.

This means that not only in cases where the process running that code gets killed (for example), the lock will be removed. Even in a situation like a sudden power failure, you will be able to re-run the code after a reboot without having deadlocks and trouble.

Posted by: Wu at octubre 30,2015 11:36
Trackbacks
Please send trackback to:http://blog.e-shell.org/309/tbping
There are no trackbacks.
Post a comment