I'm adding three attributes to a Django QuerySet that represents a hierarchical tree structure:
- a
display_name
which is added to all objects - the id of the object's parent (or
immediate_parent
) which is added to all objects based on a condition - a boolean
is_parent
field that should be added if and only if the object has children in the hierarchy.
The first two proved to be pretty simple to add while iterating over the QuerySet. The third proved more difficult because I was trying to modify an element outside of the one in the iteration. I tried to modify using both the .get()
and the .filter()
methods from the QuerySet API, but it didn't work to modify the original object. You can read more about that in the original stack overflow question.
Eventually I came up with this solution that involves iterating over the QuerySet 3 times:
causelist_data = CauseHierarchy.objects.filter(cause_set_version_id=cause_set_version_id).extra(order_by=['sort_order']).prefetch_related('cause_id','parent_id')
for c in causelist_data:
c.display_name = c.cause_id.cause_name + " (" + c.cause_id.acause + ")"
if len(c.path_to_top_parent.split(',')) == 1:
c.immediate_parent = c.path_to_top_parent.split(',')[-1]
else:
c.immediate_parent = c.path_to_top_parent.split(',')[-2]
parents = []
for c in causelist_data:
parents.append(int(c.immediate_parent))
for c in causelist_data:
if int(c.cause_id_id) in parents:
c.is_parent = True
This is pretty slow. Is there any way I can modify the parent
object in the original iteration? (Any other suggestions welcome as well)
1 Answer 1
Right now you are iterating over the queryset three times in order to get set the immediate_parent
and later set is_parent
. While it doesn't look like you can cut this down to a single iteration, you can definitely cut it down to two.
Removing the extra iteration
Right now you are iterating over your queryset three times
- Set the
immediate_parent
on the object - Get a list of all of the parents
- Set
is_parent
on the object
Which works, but you can actually combine the first two iterations into a single one. This is because getting the list of the parents does not require any extra knowledge that comes after the first iteration. So you end up with
- Set
immediate_parent
and build a list of parent ids - Set
is_parent
on the object if it's a parent
Or in code form
causelist_data = CauseHierarchy.objects.filter(cause_set_version_id=cause_set_version_id).extra(order_by=['sort_order']).prefetch_related('cause_id','parent_id')
parents = []
for c in causelist_data:
c.display_name = c.cause_id.cause_name + " (" + c.cause_id.acause + ")"
if len(c.path_to_top_parent.split(',')) == 1:
c.immediate_parent = c.path_to_top_parent.split(',')[-1]
else:
c.immediate_parent = c.path_to_top_parent.split(',')[-2]
parents.append(int(c.immediate_parent))
for c in causelist_data:
if int(c.cause_id_id) in parents:
c.is_parent = True
Use sets where the list can only have unique values
Right now you are using a list for parents
, so each time you append
the parent, it gets added to the list. This means that there can be multiple ids in the list for a single parent, which doesn't make sense when we are just using it to check for the existence of the id.
You can use a set
instead. This has the benefit of having a faster lookup time, so the second iteration won't take as long.
parents = set()
for c in causelist_data:
parents.add(int(c.immediate_parent))
Use in
to check if a character is in a string
Right now in your first iteration you are looking for a ,
in the path_to_top_parent
in order to determine if there is only a single parent in the path of now. But you're doing this by splitting the string on ,
and checking that there is only one part (so there wasn't a ,
).
if ',' in c.path_to_top_parent:
Is the same as
if len(c.path_to_top_parent.split(',')) > 1:
But it has the added benefit of making it clear why you are branching there.
Putting it all together
causelist_data = CauseHierarchy.objects.filter(cause_set_version_id=cause_set_version_id).extra(order_by=['sort_order']).prefetch_related('cause_id','parent_id')
parents = set()
for cause in causelist_data:
cause.display_name = cause.cause_id.cause_name + " (" + cause.cause_id.acause + ")"
if ',' in cause.path_to_top_parent:
cause.immediate_parent = cause.path_to_top_parent.split(',')[-2]
else:
cause.immediate_parent = cause.path_to_top_parent.split(',')[-1]
parents.add(int(cause.immediate_parent))
for cause in causelist_data:
if int(cause.cause_id_id) in parents:
cause.is_parent = True