File locking in python
So, let's keep up with the practical techie posts.
If you do software development, almost any kind of it, at one point you will find this scenario where you are running some code in a process in a cron job (or any kind of periodic scheduler). Now, the crontab entry sets that this process has to be run, for example, each 5 minutes. One day 5 minutes is not enough for that process (that usually takes less than a minute) to finish... and there it goes the next call to run that code.
"What could go wrong?" (TM)
Well, depending on the code, maybe nothing happens, maybe an ugly mess will turn a nice day into a nightmare or maybe you will get a call in the middle of the night urging you to fix it ASAP.
Probably there will be a gazillion ways to fix it properly, and the fix will depend a lot on the code, what it does and how it was written to begin with.
In my case the solution seemed to be file locking, that is, create a locked file that could be checked before running the process. If the file is locked, that means another process is already running, so the current process should not start executing that code.
I was working on some python code (what else?), and python itself comes with some handy libraries and utilities to handle file locking: fcntl and specially fcntl.lockf. lockf has the benefit of being quite good at handling deadlocks (that is, something goes wrong with the process running the locking code and the lock file is left in a locked state, even if the code that locked it is not running anymore).
There are more options available, like zc.lockfile for example, but I usually prefer modules/code included in the standard library, unless the external package has too many benefits over it. Feel free to take a look at those options through Google or pypi.
Using lockf is really easy:
import fcntl import sys lock_filename = '/tmp/sample-locking.lock' lock_file = open(lock_filename, 'w') try: fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB) except IOError: print('Cannot lock: ' + lock_filename) sys.exit(1) print('Locked! Running code...') quit = False while quit is not True: quit = input('Press q to quit ') quit = str(quit) == 'q' print('Bye!') sys.exit(0)
This sample code tries to lock a file (/tmp/sample-locking.lock) and, if it can set the lock, it goes on executing some more code. In this case, a very simple loop waiting for a command to finish/quit (this is enough for the scope of this post, keeping the process running).
If you want to learn more about lockf, please refer to the official documentation here: https://docs.python.org/3/library/fcntl.html#fcntl.lockf | https://docs.python.org/2/library/fcntl.html#fcntl.lockf
Now, if we run this code in a python interpreter:
$ python lockexample.py Locked! Running code... Press q to quit q
The process will be running until we press the key q. Now, try to run it in a new python interpreter:
$ python lockexample.py Cannot lock: /tmp/sample-locking.lock $
As the first process was still running, the second process cannot acquire/set the lock, and so it stops running, showing an error message to the user.
Note: This code should run fine with both python 2.x and 3.x*.
This is a very simple example, but I guess you got the point here.
Easy, right? Well, it took me some time of writing code, testing, getting errors, looking through code, docs and sample code out there... until I got it right, because there are certain pitfalls you would be aware of.
I bet this line catched your eye:
lock_file = open(lock_filename, 'w')
And maybe you thought "hey, why just not use 'with' to open the file?". Maybe you thought something like this would be better (more pythonic):
with open(lock_filename, 'w') as lock_file: # Rest of the code goes indented here
That should work, but be sure all the code after that line is properly indented, being the whole block inside that with statement, otherwise the locking will not work, as the file will get closed as soon as the execution of the code leaves the with statement and the lock will be removed.
At one point, maybe you would like to reuse the locking code. If your codebase is big enough, and this overlap problem happens somewhere else, maybe you prefer to have this code into a reusable piece of code, a function or a method.
Something like this :
def lock(filename): lock_file = open(filename, 'w') try: fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB) except IOError: return False return True
Then, in your code, you could simply call it:
import fcntl import sys def lock(filename): lock_file = open(filename, 'w') try: fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB) except IOError: return False return True lock_filename = '/tmp/sample-locking.lock' locked = lock(lock_filename) if not locked: print('Cannot lock: ' + lock_filename) sys.exit(1) print('Locked! Running code...') quit = False while quit is not True: quit = input('Press q to quit ') quit = str(quit) == 'q' print('Bye!') sys.exit(0)
But if you try this in a couple of python interpreters, as I did before, you will notice both runs will acquire the lock. This is because once the lock function code is run, python's garbage collector will clean up stuff, closing the locked file and removing that lock.
So, if you have to lock multiple files, you will have to repeat that piece of code along your codebase.
Finally, maybe you have this shiny code that runs some functions/methods in different threads... and maybe you feel the temptation of using this technique to prevent one thread to run some code if the code is being run in another thread... It will not work.
Let's bring in an example (very rough/simple example):
import fcntl import threading def task(): lock_filename = '/tmp/sample-locking.lock' lock_file = open(lock_filename, 'w') try: fcntl.lockf(lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB) except IOError: print('Cannot lock: ' + lock_filename) return False print('Locked! Running code...') quit = False while quit is not True: quit = raw_input('Press q to quit ') quit = str(quit) == 'q' print 'Bye!' return True if __name__ == '__main__': print('creating threads') first_thread = threading.Thread(target=task, args=()) second_thread = threading.Thread(target=task, args=()) print('starting threads') first_thread.start() second_thread.start() print('joining threads') first_thread.join() second_thread.join() print('closing')
If you run this code in a python interpreter, you will see both threads report the file as locked and the code is executed in both of them:
$ python lockexample-same-code.py creating threads starting threads joining threads Locked! Running code... Press q to quit Locked! Running code...
Pressing q once will end the execution of the first thread:
q Bye! Press q to quit
Then pressing q again will end the execution of the second thread.
If you need locking between different threads in the same process, take a look at this in the official python documentation:
Specially the part about locking objects:
This article on Thread synchronization mechanisms in python would be helpful too:
|||inspired on this post: http://linux.byexamples.com/archives/494/how-can-i-avoid-running-a-python-script-multiple-times-implement-file-locking/|