I have a list of groups, which I would like to categorize based on a few criteria, and the elements can satisfy multiple criteria, and thus end up in multiple groups.
I created this function to split them up:
def categorize_multi(groups:Iterable[T], categories:list[Callable[[T], bool]])->tuple[Iterator[T],...]:
'''Split a list into multiple categories. Elements of the iterable can be in multiple categories if they satisfy multiple categories.
'''
return tuple(filter(category, it) for category, it in zip(categories, it.tee(groups, len(categories))))
Here is an example of how I use it:
group_center_in_defend_zone = lambda group: group.center.distance_to(self.natural_defend_midpoint) < 30 and self.ai.pathing_manager.influence_maps[IMType.ZONES][group.center] not in {0,1}
group_air_only = lambda group: group.power.air_power > 0 and group.power.ground_power == 0
enemy_power_outside_natural, air_only_harass_groups_near_bases = categorize_multi(self.ai.combat.enemy_groups, [group_center_in_defend_zone, group_air_only])
categorize_multi
always returns a tuple of length defined by the # of categories.
I am using it within code for a bot which plays Starcraft 2, so I am looking to minimize the time it takes to categorize the enemy units into the various groupings I require.
Is there ways I could improve this for performance?
-
1\$\begingroup\$ Have you considered using pandas? \$\endgroup\$Reinderien– Reinderien2025年01月17日 12:34:21 +00:00Commented Jan 17 at 12:34
-
\$\begingroup\$ Why are these all iterators? Are you using all groups? Is the input any iterable? \$\endgroup\$Bharel– Bharel2025年02月27日 12:15:36 +00:00Commented Feb 27 at 12:15
-
\$\begingroup\$ How are you using the returned iterators? \$\endgroup\$no comment– no comment2025年03月15日 20:56:19 +00:00Commented Mar 15 at 20:56
2 Answers 2
missing Review Context
My reading of the code is that you neglected to prepend these lines to the OP source.
import itertools as it
from typing import Callable, TypeVar
from collections.abc import Iterable, Iterator
T = TypeVar("T")
lambda
This is just silly.
group_air_only = lambda group: group.power.air_power > 0 and group.power.ground_power == 0
Occasionally we find a short anonymous Callable convenient. But usually we're better off giving them a descriptive name, maybe annotating, or adding a docstring.
What you meant to write was
def group_air_only(group):
return group.power.air_power > 0 and group.power.ground_power == 0
Similarly for def group_center_in_defend_zone()
could improve this for performance?
You're already taking advantage of short-circuiting and
.
For an expression x and y
, we "win", we can be "lazy",
if x
is False or y
is False.
Evaluating x
, which might need a function call,
costs its average elapsed time \$\times 100\%\$.
If e.g. x
is True just \1ドル\%\$ of the time, then
evaluating y
costs its average \$\times 1\%\$.
So depending on the percentage and the relative eval costs,
rearranging as y and x
may let us do less work.
"it" shadowing
import itertools as it
...
for category, it in zip(categories, it.tee( ... ))
...
Binding it
to a module, and then to an iterator, is poor form.
Invent a new name.
You could shorten your
return tuple(filter(category, it) for category, it in zip(categories, it.tee(groups, len(categories))))
with map
:
return tuple(map(filter, categories, it.tee(groups, len(categories))))
Likely it'll also be slightly faster, but I doubt it's significant.
Unless you consume the resulting iterators in parallel, it might also be slightly faster and take slightly less memory to not use tee
but just dump the groups into a tuple and give that to all filters:
return tuple(map(filter, categories, it.repeat(tuple(groups))))