I'm bothered by my answer to this SO question.
I'm pretty sure it's more efficient than the sort-and-cull implementations, but I'm having trouble expressing that in a way that I trust.
Also, it's rather convoluted by python standards.
Is there a way to express the same process more clearly, without pages of comments?
The problem statement:
Given a subject list of strings, make a list containing every element of the subject that isn't a substring of a different element of the subject.
(削除) Original (削除ここまで) Current solution:
from itertools import groupby
from operator import itemgetter
from typing import Dict, Generator, Iterable, List, Union
# Exploded is a recursive data type representing a culled list of strings as a tree of character-by-character common prefixes. The leaves are the non-common suffixes.
Exploded = None
Exploded = Dict[str, Union[Exploded, str]]
def explode(subject:typing.Iterable[str])->Exploded:
heads_to_tails = dict()
for s in subject:
if s:
head = s[0]
tail = s[1:]
if head in heads_to_tails:
heads_to_tails[head].append(tail)
else:
heads_to_tails[head] = [tail]
return {
h: prune_or_follow(t)
for (h, t) in heads_to_tails.items()
}
def prune_or_follow(tails: List[str]) -> Union[Exploded, str]:
if 1 < len(tails):
return explode(tails)
else: #we just assume it's not empty.
return tails[0]
def implode(e: Exploded) -> Generator[str, None, None]:
for (head, continued) in e.items():
if isinstance(continued, str):
yield head + continued
else:
for tail in implode(continued):
yield head + tail
def cull(subject: List[str]) -> List[str]:
return list(implode(explode(subject)))
print(cull(['a','ab','ac','add']))
print(cull([ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' , ' this is another sentence ' , 'a boy ran towards the mill and fell']))
print(cull(['a', 'ab', 'ac', 'b', 'add']))
I know that it works for the (削除) two (削除ここまで) three test cases.
1 Answer 1
All in all this looks good already.
A few tips:
continue
If you change the test in explode
from if s:
to
if not s:
continue
You can save a level of indentation for the rest of the loop
collections.defaultdict
checking whether a key is in a dict, and adding it if it isn't is why there is collections.defaultdict
which can simplify explode
a lot:
def explode(subject: Iterable[str]) -> Exploded:
heads_to_tails = defaultdict(list)
for s in subject:
if not s:
continue
heads_to_tails[s[0]].append(s[1:])
return {h: prune_or_follow(t) for (h, t) in heads_to_tails.items()}
tuple unpacking
A matter of style, but in most situations where you use my_list[0]
and my_list[1:]
, you can use tuple unpacking. You can use this in prune_or_follow
def prune_or_follow(tails: List[str]) -> Union[Exploded, str]:
start, *rest = tails
if rest:
return explode(tails)
else:
return start
Whether this is more clear is a matter of taste.
This tuple unpacking does not work as clean in explode
because it converts the string into a list of 1-character strings.
Exploded
You can use a string in typing
so you can do:
Exploded = Dict[str, Union["Exploded", str]]
without the Exploded = None
cull
typing
cull can accept any iterable. For the result, I don't think the order of strings is important, so there is no need to limit you to a list
. The signature can then be def cull(subject: Iterable[str]) -> Collection[str]:
. Whether you return a list or a set is a matter of implementation.
If the result is just used for iteration, you can even forgo the list
call, and just return the implode
generator, and let the caller decide whether he needs in in a list, set or whatever data structure he wants.
implode
now you use e
as argument name. One-letter variable names are unclear in general, and to be avoided most of the time*. Since it is a kind of a tree, I would change this variable.
A slightly different take in implode
which doesn't use string concatenation but tuple unpacking:
def implode(tree: Exploded, previous=()) -> Generator[str, None, None]:
for root, branches in tree.items():
if isinstance(branches, str):
yield "".join((*previous, root, *branches))
else:
yield from implode(branches, *previous, root)
or equivalently:
def implode(tree: Exploded, previous=()) -> Generator[str, None, None]:
for root, branches in tree.items():
if isinstance(branches, str):
yield "".join((*previous, root, *branches))
else:
yield from implode(branches, (*previous, root))
The choice between your version and this is a matter of taste, but I wanted to present the different possibilities.
Another variation, is using try-except ArgumentError
instead of the isinstance
:
def implode_except(tree: Exploded, previous=()) -> Generator[str, None, None]:
for root, branches in tree.items():
try:
yield from implode(branches, (*previous, root))
except AttributeError:
yield "".join((*previous, root, *branches))
'* I find i
, j
etc acceptable for a counter or index, and x
and y
for coordinates, but in general I try not to use one-letter variable names.
-
\$\begingroup\$ Thanks; i've implemented most of this over at the parent question! \$\endgroup\$ShapeOfMatter– ShapeOfMatter2019年06月27日 13:47:49 +00:00Commented Jun 27, 2019 at 13:47
Explore related questions
See similar questions with these tags.
print(cull(['a', 'ab', 'ac', 'b', 'add']))
outputs['add', 'b']
.groupby()
does not sort it's input, so when the dictionary comprehension inexplode()
gets to 'add', it overwrites the entry for 'a' that was created when processing `['a', 'ab', 'ac']. \$\endgroup\$groupby
would sort, and why we'd care if it did. This isn't even the first time I've been hit by this. In absolute seriousness, who in hell publishes a function like that to a public library? \$\endgroup\$