I currently this code working, but its performance is very poor — 1.45 seconds seems a bit too much for a simple recursive if statement that only checks attribute values.
def _check_delete_status(self, obj) -> bool:
obj_name = obj._sa_class_manager.class_.__name__
self.visited.append(obj_name)
if getattr(obj, 'status', 'deleted').lower() != 'deleted':
children = [parent for parent in self._get_parents(self._get_table_by_table_name(obj_name)) if parent not in self.visited]
for child in children:
if (child_obj := getattr(obj, child, None)) and child not in self.visited:
if self._check_delete_status(child_obj):
return True
else:
return True
return False
Although self._get_parents
seems like a counterintuitive name (it's used elsewhere in the code), in this case it is still very useful to this solution: It returns a list with all possible attribute names that object might have as children. For example, an object named appointment
will have ['patient', 'schedule']
as response; of which patient
will have []
since it doesn't have any children, and schedule
will have ['physiotherapist', 'address', 'patient', 'service']
returned.
When those values are then used on getattr(object, child_name)
it returns the object corresponding to the child.
I tried to think on how to do this iteratively, but couldn't up come with any solutions.
PS: The reason for the self.visited
list, is that sometimes an object might have the exact same object nested inside, and since they have the same values they can be skipped.
EDIT: The "helper" methods:
def _get_table_by_table_name(self, table_name: str) -> Table:
return self._all_models[table_name]['table']
@staticmethod
def _get_parents(table: Table) -> set:
parents = set()
if table.foreign_keys:
for fk in table.foreign_keys:
parent_name = fk.column.table.name
parents.add(parent_name) if parent_name != table.name else None
return parents
-
1\$\begingroup\$ This is a class method - so please show the whole class. \$\endgroup\$Reinderien– Reinderien2021年07月04日 22:50:26 +00:00Commented Jul 4, 2021 at 22:50
-
\$\begingroup\$ @Reinderien The whole class is a bit too big, but I've included the other methods used. \$\endgroup\$Vinicius F.– Vinicius F.2021年07月04日 23:02:49 +00:00Commented Jul 4, 2021 at 23:02
2 Answers 2
Your code is improperly using lists.
children = [parent for parent in self._get_parents(self._get_table_by_table_name(obj_name)) if parent not in self.visited]
By using a list comprehension you have to generate every parent before you can validate a parent. Lets instead use say you want to know if 0 is in range(1_000_000)
. What your code would be doing is building a list of 1 million numbers before you check the first value is 0.
You should use a generator expression, or just use standard for
loops, to build children
so we can exit early. (Of course doing so would rely on self._get_parents
and self._get_table_by_table_name
not returning lists. Which I don't have access to, so cannot comment on.)
self.visited.append(obj_name)
parent not in self.visited
child not in self.visited
We know self.visited
is a list. So we know in
runs in \$O(n)\$ time. You want to instead use a set which runs in \$O(1)\$ time.
Ignoring changing self.visited
to a set you can simplify the code using any
and a generator expression.
def _check_delete_status(self, obj) -> bool:
obj_name = obj._sa_class_manager.class_.__name__
self.visited.append(obj_name)
if getattr(obj, 'status', 'deleted').lower() == 'deleted':
return True
else:
return any(
child not in self.visited
and (child_obj := getattr(obj, child, None))
and child not in self.visited
and self._check_delete_status(child_obj)
for child in self._get_parents(self._get_table_by_table_name(obj_name))
)
You should also notice you don't need child not in self.visited
twice.
-
\$\begingroup\$ While this does make the code way cleaner, it still took the same 1.45 seconds (even after changing
self.visited
to a set). Should I assume this happens for another reason? \$\endgroup\$Vinicius F.– Vinicius F.2021年07月04日 21:40:44 +00:00Commented Jul 4, 2021 at 21:40 -
\$\begingroup\$ Making a quick perf_counter shows this:
Now at patient: Time since start -> 0.0 seconds
Now at appointment: Time since start -> 0.0001 seconds
Now at schedule: Time since start -> 0.3194 seconds
Now at service: Time since start -> 0.5754 seconds
Now at organization: Time since start -> 0.8869 seconds
Now at physiotherapist: Time since start -> 1.312 seconds
Now at address: Time since start -> 1.5363 seconds
\$\endgroup\$Vinicius F.– Vinicius F.2021年07月04日 21:48:54 +00:00Commented Jul 4, 2021 at 21:48 -
1\$\begingroup\$ @ViniciusF. If you've made all the changes I suggested (including removing the duplicate
child not in self.visited
and ensuring both_get_parents
and_get_table_by_table_name
do not return lists) then I, or anyone else, can't improve the performance of your code as you have not provided the code which has the performance issue. \$\endgroup\$2021年07月04日 21:52:30 +00:00Commented Jul 4, 2021 at 21:52 -
1\$\begingroup\$ Yeah, thought so. I'm guessing this is a job for the cProfile then. But much appreciated for your help :D \$\endgroup\$Vinicius F.– Vinicius F.2021年07月04日 22:23:44 +00:00Commented Jul 4, 2021 at 22:23
I should've done this before, but I ran cProfile on my function and discovered that what was making it slow was totally unrelated to it being recursive or anything: SQLAlchemy was performing lazy-loading so it had to send some requests to get all relationships it needed which slowed my function down quite a bit.
After I added lazy='immediate'
to my mapper
parameters it ran smoothly under 0.0006 seconds.
Nevertheless, the code optimization suggested by Peilonrayz was quite useful and definitely implemented.