split list elements into multiple categories

Question 1

I have a list of groups, which I would like to categorize based on a few criteria, and the elements can satisfy multiple criteria, and thus end up in multiple groups.

I created this function to split them up:

def categorize_multi(groups:Iterable[T], categories:list[Callable[[T], bool]])->tuple[Iterator[T],...]:
 '''Split a list into multiple categories. Elements of the iterable can be in multiple categories if they satisfy multiple categories.
 '''
 return tuple(filter(category, it) for category, it in zip(categories, it.tee(groups, len(categories))))

Here is an example of how I use it:

group_center_in_defend_zone = lambda group: group.center.distance_to(self.natural_defend_midpoint) < 30 and self.ai.pathing_manager.influence_maps[IMType.ZONES][group.center] not in {0,1} 
group_air_only = lambda group: group.power.air_power > 0 and group.power.ground_power == 0
enemy_power_outside_natural, air_only_harass_groups_near_bases = categorize_multi(self.ai.combat.enemy_groups, [group_center_in_defend_zone, group_air_only])

categorize_multi always returns a tuple of length defined by the # of categories.

I am using it within code for a bot which plays Starcraft 2, so I am looking to minimize the time it takes to categorize the enemy units into the various groupings I require.

Is there ways I could improve this for performance?

Question 2

Have you considered using pandas?

Question 3

Why are these all iterators? Are you using all groups? Is the input any iterable?

Question 4

How are you using the returned iterators?

Question 5

missing Review Context

My reading of the code is that you neglected to prepend these lines to the OP source.

import itertools as it
from typing import Callable, TypeVar
from collections.abc import Iterable, Iterator
T = TypeVar("T")

lambda

This is just silly.

group_air_only = lambda group: group.power.air_power > 0 and group.power.ground_power == 0

Occasionally we find a short anonymous Callable convenient. But usually we're better off giving them a descriptive name, maybe annotating, or adding a docstring.

What you meant to write was

def group_air_only(group):
 return group.power.air_power > 0 and group.power.ground_power == 0

Similarly for def group_center_in_defend_zone()

could improve this for performance?

You're already taking advantage of short-circuiting and. For an expression x and y, we "win", we can be "lazy", if x is False or y is False. Evaluating x, which might need a function call, costs its average elapsed time \$\times 100\%\$. If e.g. x is True just \1ドル\%\$ of the time, then evaluating y costs its average \$\times 1\%\$. So depending on the percentage and the relative eval costs, rearranging as y and x may let us do less work.

"it" shadowing

import itertools as it
 ...
 for category, it in zip(categories, it.tee( ... ))
 ...

Binding it to a module, and then to an iterator, is poor form. Invent a new name.

Question 6

You could shorten your

 return tuple(filter(category, it) for category, it in zip(categories, it.tee(groups, len(categories))))

with map:

 return tuple(map(filter, categories, it.tee(groups, len(categories))))

Likely it'll also be slightly faster, but I doubt it's significant.

Unless you consume the resulting iterators in parallel, it might also be slightly faster and take slightly less memory to not use tee but just dump the groups into a tuple and give that to all filters:

 return tuple(map(filter, categories, it.repeat(tuple(groups))))

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Answer 1 · 2025-03-15 00:40:06Z

missing Review Context

My reading of the code is that you neglected to prepend these lines to the OP source.

import itertools as it
from typing import Callable, TypeVar
from collections.abc import Iterable, Iterator
T = TypeVar("T")

lambda

This is just silly.

group_air_only = lambda group: group.power.air_power > 0 and group.power.ground_power == 0

Occasionally we find a short anonymous Callable convenient. But usually we're better off giving them a descriptive name, maybe annotating, or adding a docstring.

What you meant to write was

def group_air_only(group):
 return group.power.air_power > 0 and group.power.ground_power == 0

Similarly for def group_center_in_defend_zone()

could improve this for performance?

You're already taking advantage of short-circuiting and. For an expression x and y, we "win", we can be "lazy", if x is False or y is False. Evaluating x, which might need a function call, costs its average elapsed time \$\times 100\%\$. If e.g. x is True just \1ドル\%\$ of the time, then evaluating y costs its average \$\times 1\%\$. So depending on the percentage and the relative eval costs, rearranging as y and x may let us do less work.

"it" shadowing

import itertools as it
 ...
 for category, it in zip(categories, it.tee( ... ))
 ...

Binding it to a module, and then to an iterator, is poor form. Invent a new name.

no comment no comment 1,1006 silver badges10 bronze badges · Answer 2 · 2025-03-15 21:10:02Z

You could shorten your

 return tuple(filter(category, it) for category, it in zip(categories, it.tee(groups, len(categories))))

with map:

 return tuple(map(filter, categories, it.tee(groups, len(categories))))

Likely it'll also be slightly faster, but I doubt it's significant.

Unless you consume the resulting iterators in parallel, it might also be slightly faster and take slightly less memory to not use tee but just dump the groups into a tuple and give that to all filters:

 return tuple(map(filter, categories, it.repeat(tuple(groups))))

Stack Exchange Network

split list elements into multiple categories

2 Answers 2

missing Review Context

lambda

"it" shadowing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

split list elements into multiple categories

2 Answers 2

missing Review Context

lambda

"it" shadowing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions