1
\$\begingroup\$

We have about 14G files and each is about 150k.

I'm writing a script to upload them to Azure Blob storage and would like to run an upload() function (each with own file to upload) in 5 threads.

Here is how I realized it and it looks like it works - but I still have doubts about how correct this code is.

...
class Loader:
 def __init__(self):
 self.account = 'accname'
 #self.container = 'userdata'
 self.container = 'bar1'
 key = 'DQ4***A=='
 self.base_url = 'http://' + self.account + '.blob.core.windows.net'
 self.blob = BlobService(account_name=self.account, account_key=key)
 def uploader(self, filepath, userfile, locker):
 print 'Uploading file: {}'.format(userfile)
 self.blob.put_block_blob_from_path(self.container,
 userfile,
 filepath,
 max_connections=1,
 max_retries=5,
 retry_wait=1.0)
 locker.release()
 def upload(self, path):
 print('Upload files from {} to {}'.format(path, self.base_url))
 locker = threading.BoundedSemaphore(5)
 for root, dirs, files in os.walk(path):
 print root
 for userfile in files:
 #print 'Uploading {}'.format(os.path.join(root, userfile))
 locker.acquire()
 t = threading.Thread(target=self.uploader, args=(os.path.join(root, userfile), userfile, locker))
 t.start()
...

And script in action:

$ ./storage_upload.py -u --path /tmp/bartest/
Upload files from /tmp/bartest/ to http://accname.blob.core.windows.net
/tmp/bartest/
Uploading file: 15.txt
Uploading file: 12.txt
Uploading file: 7.txt
Uploading file: 13.txt
Uploading file: 19.txt
ferada
11.4k25 silver badges65 bronze badges
asked Apr 4, 2016 at 8:51
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Looks okay if a bit sloppy with different formatting for some print x vs. print(x) calls (the latter being preferred really); you probably should also use new-style classes, i.e. class Loader(object):.

Other than that the main concern I'd have is that the semaphore should be protected against exceptions. This is mostly a concern for bigger scripts, but it's a good habit regardless. Thus, the release method should be called regardless of whether an exception was raised by anything else in the thread - otherwise the program could just get stuck there, which is fine for a one-off script probably.

You should probably also check whether the threading actually does improve the throughput, considering Pythons global interpreter lock.

answered Apr 4, 2016 at 10:44
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.