Here I have class that supposed to take dictionaries of product objects and their articles (kind of unique identifier) from queue and insert or update them in database table.
Received dictionary looks like this:
products_and_articles= {'products':[],'articles':[]}
Please suggest any improvements I can make to this class.
class TableUpdater(threading.Thread):
def __init__(self, import_queue, model_for_update):
self.import_queue = import_queue
self.model_for_update = model_for_update
threading.Thread.__init__(self)
def run(self):
while True:
try:
products_and_articles = self.import_queue.get(block=True, timeout=20)
products_for_update = self.model_for_update.objects.filter(article__in=products_and_articles['articles'])
articles_for_update = [product.article for product in products_for_update]
articles_for_create = list(set(products_and_articles['articles']) - set(articles_for_update))
absent_products = [p for p in products_and_articles['products'] if p.article in articles_for_create]
products_for_create = []
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article == old_product.article:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
bulk_update(products_for_update)
if len(products_for_update) != len(products_and_articles['products']):
for new_product in absent_products:
old_product = self.model_for_update()
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id':
setattr(old_product, old_field, getattr(new_product, old_field))
products_for_create.append(old_product)
product_bulk_create(products_for_create, 0, articles_for_create)
except Queue.Empty:
print 'Queue is empty'
break
2 Answers 2
As mentioned by jcfollower, you have two sections that are deeply indented. These should be extracted into sub-methods. It will make the code easier to follow and not cause the text to bunch up at the right side of the screen.
More importantly, it is currently hard to tell what is actually happening with all the for loops and if clauses. Well named methods will provide context to the reader. When I'm looking at run()
, I'm not concerned with how each filed is updated. I just want to know the main tasks that are being done. Then, once I have a high level understanding, I can dive done into the implementation details to check if the updates are done correctly and efficiently1.
Specify the queue read timeout as a constant instead of a magic number. If you find the the value needs to be tuned, it will be easier to find.
self.model_for_update.objects.filter(article__in=products_and_articles['articles'])
Is the double underscore in the argument name intentional? Python has a convention for when double underscores are used and separating words in variable names is very different from the convention.
Does product_bulk_create()
put some restriction on articles_for_create
such that it must be a list? Unless that is the case, using the set
returned by the set subtraction should be sufficient.
products_and_articles
is a odd name for a variable. You never have an operation that directly uses the whole dictionary. You are just passing references of the respective lists. Instead, you can replace the dictionary with a tuple and unpack the values directly into variables.
products, articles = self.import_queue.get(block=True, timeout=20)
1 For loops that are nested 3 and 4 deep are flags that there might be an inefficient algorithm.
Instead of:
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article == old_product.article:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
try:
for new_product in products_and_articles['products']:
for old_product in products_for_update:
if new_product.article != old_product.article:
continue
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
Then consider making a dispatch by making a dictionary:
article_to_product_for_update = {product.article: product for product in products_for_update}
although it might need to instead be:
article_to_product_for_update = defaultdict(list)
for product in products_for_update:
article_to_product_for_update[product.article].append(product)
and then your loop can be:
for new_product in products_and_articles['products']:
old_product = article_to_product_for_update[new_product.article]
for old_field in old_product._meta.get_all_field_names():
...
or
for new_product in products_and_articles['products']:
for old_product in article_to_product_for_update[new_product.article]:
for old_field in old_product._meta.get_all_field_names():
...
Then consider the inner part:
for old_field in old_product._meta.get_all_field_names():
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field and old_field != 'id' and old_field != 'slug':
setattr(old_product, old_field, getattr(new_product, old_field))
Firstly this should be
for old_field in old_product._meta.get_all_field_names():
if old_field in {'id', 'slug'}:
continue
for new_field in new_product._meta.get_all_field_names():
if old_field == new_field:
setattr(old_product, old_field, getattr(new_product, old_field))
And then you can do the same transformation with a dictionary to get:
for old_field in old_product._meta.get_all_field_names():
if old_field in {'id', 'slug'}:
continue
new_field = field_to_new_product[old_field]
setattr(old_product, old_field, getattr(new_product, old_field))
Which gives the much flatter loop of:
for new_product in products_and_articles['products']:
old_product = article_to_product_for_update[new_product.article]
for old_field in old_product._meta.get_all_field_names():
if old_field in {'id', 'slug'}:
continue
new_field = field_to_new_product[old_field]
setattr(old_product, old_field, getattr(new_product, old_field))
Try the same tricks elsewhere, too.
-
2\$\begingroup\$
new_field
is not used in the last two snippets. It is quite redundant even in the original, as one could useif hasattr(new_product, old_field)
ortry except AttributeError
instead. \$\endgroup\$Janne Karila– Janne Karila2014年12月19日 06:56:07 +00:00Commented Dec 19, 2014 at 6:56 -
\$\begingroup\$ @JanneKarila Good eye! I'll wait for micgeronimo to chime in about what the intended purpose of the code is, because it confuses me. \$\endgroup\$Veedrac– Veedrac2014年12月19日 07:04:04 +00:00Commented Dec 19, 2014 at 7:04
-
\$\begingroup\$ @Veedrac I have e-commerce site of electronical goods. There are millions of products from about 20 of suppliers. This products being received by different ways, either through APIs of suppliers or simply scraping their sites. Obviously I need to update information about products, because price,amount or anything else can change frequently. New products my appear.I need something that will update/create information. Now I think about separating update and insert to individual classes and also use some kind of mutex(threading.Lock). Will implement it today and hope it will be more stable \$\endgroup\$micgeronimo– micgeronimo2014年12月19日 09:30:52 +00:00Commented Dec 19, 2014 at 9:30
-
\$\begingroup\$ @Veedrac would like to thank you for your answer, I've got a lot of useful stuff from it :) Actually I'm very new to Django and programming at all(just 3 month doing this) so it is really very helpful for me \$\endgroup\$micgeronimo– micgeronimo2014年12月19日 09:32:08 +00:00Commented Dec 19, 2014 at 9:32
-
\$\begingroup\$ @micgeronimo My question was meant on a smaller scale; what is this particular loop trying to do? (I suggest you add these response to the question for other readers.) \$\endgroup\$Veedrac– Veedrac2014年12月19日 15:08:00 +00:00Commented Dec 19, 2014 at 15:08
Explore related questions
See similar questions with these tags.
for
loops to separate functions and then call those functions from your while loop. \$\endgroup\$