Memcache get
and set
can hit a race condition if you use them together to achieve something like locking because it is not atomic. A Simple memcache
locking logic using get
and set
would look like:
def access(resource):
if memcache.get(resource):
return "Access Denied"
else:
memcache.set(resource, timestamp)
return "Access Granted and Lock Obtained"
If this code is running in multi-threaded web applications, you can have a scenario where two get
can happen together. Hence both will get the access the resource.
add
doesn't get into this problem because it just sets the key
only if the key
does not exists. It also makes only a single roundtrip and hence it is atomic by default.
The same API with add
would look like:
def access(resource):
if memcache.add(resource, timestamp):
return "Access Granted and Lock Obtained"
else:
return "Acess Denied"
Now, I am writing a script to simulate this race condition while using get
, set
and prove that using add
solves the concurrent access problem with this naive memcache
lock implementation. I am using files as my resource here. Multiple users can see the same file in the web application front-end and move it to another location if they get a lock
to that file. However, sometimes get
, set
race condition happens giving access to the file for two users leading to file not found
error for one of the users. Using add
doesn't lead to this problem.
The script uses threading
to simulate this scenario. If a file not found
error happens then its considered a failure. I have two variants of the move
function, one using get
, set
and another using add
which can be used in the run
method of the thread
to demonstrate both functionalities.
Can anyone review this and tell me if this code is fine? A high level logical review is also fine if it is hard to set this up in your machine. One thing I notice is that the ratio of error is pretty high, but is this because I am actually doing threading
which really is not random.
Pre-requisites: You need memcache
module and memcached
server running on port 5551
to run this. You also need a folder called archive
to move the files.
import memcache
import signal
import sys
import shutil
import os
import threading
import time
mc = memcache.Client(['127.0.0.1:5551'], debug=0)
cwd = os.getcwd()
error = 0
processed = 0
def touch(fname, times=None):
with open(fname, 'a'):
os.utime(fname, times)
def signal_handler(signal, frame):
print('You pressed Ctrl+C!')
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print("Ctrl-C exits")
def create_files():
for fname in range(1,11):
touch(str(fname) + ".txt")
def move_files_get_set():
global error
global processed
files = [file for file in os.listdir(".") if file.endswith(".txt")]
for file in files:
if os.path.isfile(file) and not mc.get(file):
processed = processed + 1
mc.set(file, int(round(time.time() * 1000)))
try:
shutil.move(os.path.join(cwd, file), os.path.join(cwd, "archive", "%s" % file))
except Exception as e:
print(e)
processed = processed - 1
error = error + 1
print("%s errors happened with %s" % (error, processed))
mc.set(file, None)
def move_files_add():
global error
global processed
files = [file for file in os.listdir(".") if file.endswith(".txt")]
for file in files:
if os.path.isfile(file) and mc.add(file, int(round(time.time() * 1000))):
processed = processed + 1
try:
shutil.move(os.path.join(cwd, file), os.path.join(cwd, "archive", "%s" % file))
except Exception as e:
print(e)
processed = processed - 1
error = error + 1
print("%s errors happened with %s" % (error, processed))
mc.set(file, None)
class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print "Starting " + self.name
move_files_get_set()
print "Exiting " + self.name
def trigger():
while True:
create_files()
t1 = myThread("1", "Thread 1", "1")
t2 = myThread("2", "Thread 2", "2")
t1.start()
t2.start()
t1.join()
t2.join()
1 Answer 1
You have a race condition!
But not just the one you expected. Let's take out the 'beef' of the algorithm, and make something simpler:
import threading
processed = 0
def move_files_get_set():
global processed
for val in xrange(100000):
processed = processed + 1
class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print "Starting " + self.name
move_files_get_set()
print "Exiting " + self.name
def trigger():
t1 = myThread("1", "Thread 1", "1")
t2 = myThread("2", "Thread 2", "2")
t1.start()
t2.start()
t1.join()
t2.join()
print processed
if __name__ == "__main__":
trigger()
What I did was basically take out everything related to memcache and files.
Here's the output:
$ python base.py
Starting Thread 1
Starting Thread 2
Exiting Thread 1
Exiting Thread 2
123418
$ python base.py
Starting Thread 1
Starting Thread 2
Exiting Thread 2
Exiting Thread 1
182550
Another race condition
In move_files_add
.
def move_files_add():
global error
global processed
files = [file for file in os.listdir(".") if file.endswith(".txt")]
for file in files:
if os.path.isfile(file) and mc.add(file, int(round(time.time() * 1000))):
processed = processed + 1
try:
shutil.move(os.path.join(cwd, file), os.path.join(cwd, "archive", "%s" % file))
except Exception as e:
print(e)
processed = processed - 1
error = error + 1
print("%s errors happened with %s" % (error, processed))
mc.set(file, None)
What can happen is:
* p1: os.path.isfile(file) -> true
* p2: os.path.isfile(file) -> true
* p1: mc.add(file, bla) -> true
* p1: shutil.move() -> done
* p1: mc.set(file, None) -> done
* p2: mc.add(file, bla) -> true
* p2: shutil.move() -> KABOOM!
Moving files the correct way
When doing files, and there is a risk of race conditions, (almost) always use the following pattern:
try:
os.rename(source, target)
except FileNotFoundError:
# Oops, too late!
Now, os.rename
has a problem, it only works on the same filesystem. So depending on your needs you could do something like:
try:
os.rename(source, tempfile_in_same_dir_as_source)
except FileNotFoundError:
# Oops, too late! No worries.
pass
else:
# Yes, it worked. Now we just need to make sure it's moved
# to the right target location.
shutil.move(tempfile_in_same_dir_as_source, target)
-
\$\begingroup\$ Thanks for analyzing this one.
Thread
race itself can't be avoided but that is to simulate the real time scenario of users. So I need that :-) The second point is valid, I can bump into that scenario. I have enhanced theadd
function in this review question codereview.stackexchange.com/questions/131258/…. Do you see any races there? Solving using try: except is easy way out, so I want to avoid try: except as a challenge. Is there a theoretical way to avoid races, is that possible etc. \$\endgroup\$Nishant– Nishant2016年06月07日 05:53:20 +00:00Commented Jun 7, 2016 at 5:53 -
\$\begingroup\$ I tried also looking at that one yesterday, but wemt to bed instead. Maybe I'll take a look later today. For now: why not consider compare-and-set. neopythonic.blogspot.nl/2011/08/… ? It's supposed to also be used for atomic mutations. \$\endgroup\$Sjoerd Job Postmus– Sjoerd Job Postmus2016年06月07日 05:57:27 +00:00Commented Jun 7, 2016 at 5:57
-
\$\begingroup\$ I took a look. But compare-and-set can be used to implement compare-and-swap, which gives you a better ability to report the key of the process doing the handling. \$\endgroup\$Sjoerd Job Postmus– Sjoerd Job Postmus2016年06月07日 06:09:28 +00:00Commented Jun 7, 2016 at 6:09
-
\$\begingroup\$ Read this as 3rd comment. Fixing a typo. Yeah, please have a look. I read that blog by GVR. It helped me understand race condition better but compare-and-set seemed to be for handling dirty state while doing get-set which different from my use case like say when you want to append that value to a new value etc. In my case
add
seemed to be betterv and the development team also pointed out some examples usingadd
. More than data the avalibility or not of the key seemed more important. \$\endgroup\$Nishant– Nishant2016年06月07日 07:41:08 +00:00Commented Jun 7, 2016 at 7:41 -
1\$\begingroup\$ If compare-and-set is not needed, it's not needed. I already added answer to the other question. \$\endgroup\$Sjoerd Job Postmus– Sjoerd Job Postmus2016年06月07日 07:46:41 +00:00Commented Jun 7, 2016 at 7:46
Explore related questions
See similar questions with these tags.