In order to provide a better output of sets with many consecutive elements (a similar class could be written for a list
or tuple
, but in this case I stay with a set
), I wrote the following class, derived from a set
.
However, some inspiration has been found in the itertools
and operator
package documentation.
I'm using the code to output ports assigned to a device under test in a test system.
import itertools
import operator
class PrettySet(set[int]):
""" Subclassed set which redefines __str__ operator so that consecutive elements are compressed in the
formatted string """
@staticmethod
def _seg_to_str(segment: list):
return f"{segment[0]}..{segment[-1]}" if len(segment) > 2 else ", ".join(map(str, segment))
def __str__(self):
groups = itertools.groupby(enumerate(sorted(self)), lambda x: x[1] - x[0])
segments = [list(map(operator.itemgetter(1), g)) for _, g in groups]
return "{" + ", ".join(map(self._seg_to_str, segments)) + "}"
Example output:
> data = {1, 2, 3, 6, 7, 8, 10, 11, 14, 15, 16, 18, 20, 21, 22, 23, 24, 25}
> print(PrettySet(data))
> {1..3, 6..8, 10, 11, 14..16, 18, 20..25}
Please review my code in terms of
- readability
- usability
- size
What would you do differently? Thank you!
3 Answers 3
Stop Writing Classes
See Stop writing classes by Jack Diederich. Short summary: if you instantiate a class, use the class once, and then throw it away, (eg, print(PrettySet(data))
) you should just write a function.
def pretty_set(data: set[int]) -> str:
"""Create a string representation of a set of integers, with the integers
in increasing order, and consecutive integers represented by a start value,
two dots, and the end value.
>>> pretty_set({1, 5, 12, 2, 3, 13, 6, 7})
'{1..3, 5..7, 12, 13}'
"""
... implementation here ...
Minor nit
The empty set will return the string {}
which appears to be an empty dictionary, not an empty set.
Readability
Your code needs blank lines before method definitions.
You need to use type hints consistently. In def _seg_to_str(segment: list):
, segment should be list[int]
, and the return type should be -> str
.
groups = itertools.groupby(enumerate(sorted(self)), lambda x: x[1] - x[0])
is very dense and opaque. It returns something like ((1, [(0, 1), (1, 2), (2, 3)]), (3, [(3, 6), (4, 7), (5, 8)]), (4, [(6, 10), (7, 11)]), (6, [(8, 14), (9, 15), (10, 16)]), (7, [(11, 18)]), (8, [(12, 20), (13, 21), (14, 22), (15, 23), (16, 24), (17, 25)]))
, which is an iterable of tuples of an int and a list of tuples of two ints. No comment in sight as to what those mean, so maintainability is low.
Next, you have [list(map(operator.itemgetter(1), g)) for _, g in groups]
, which is making a nested list of lists ... plus for _, g in groups
is extracting the second item of elements of groups, and operator.itemgetter(1)
is also extracting the second item of something. Again, no comment in sight.
Different implementation
The segments = [list(...) for ...]
statement could be turned into a generator to prevent unnecessary construction of the outer loop, but the inner lists all get unnecessarily created. A lighter weight implementation could be made with a function that takes elements one-at-a-time, collect sequential elements, and yield
string of either single elements or ranges. No tuples or lists need to be constructed.
Perhaps:
def pretty_set(data: set[int]) -> str:
"""Create a string representation of a set of integers, with the integers
in increasing order, and consecutive integers represented by a start value,
two dots, and the end value.
>>> pretty_set({1, 5, 12, 2, 3, 13, 6, 7})
'{1..3, 5..7, 12, 13}'
"""
def segments():
it = iter(sorted(data))
start = end = next(it)
for item in it:
if item == end + 1:
end = item
else:
if end > start + 1:
yield f"{start}..{end}"
else:
yield str(start)
if start != end:
yield str(end)
start = item
end = item
if end > start + 1:
yield f"{start}..{end}"
else:
yield str(start)
if start != end:
yield str(end)
if len(data) < 2:
return str(data)
return "{" + ", ".join(segments()) + "}"
data = {1, 2, 3, 6, 7, 8, 10, 11, 14, 15, 16, 18, 20, 21, 22, 23, 24, 25}
print(pretty_set(data))
It is longer, but perhaps it might be more understandable, and possibly a little more efficient.
-
\$\begingroup\$ When I first saw the original code I also thought this was yet another class that should actually be just a function. But in this case I see a real benefit of the class because you can pass the
PrettySet
instance around and it will pretty print everywhere it is printed. \$\endgroup\$M472– M4722023年01月23日 08:51:47 +00:00Commented Jan 23, 2023 at 8:51 -
\$\begingroup\$ The actual usecase of this class is to be used as a
set
. However, I needed some improved output for logging purposes to prevent sth like:WARNING:root:Something happened on ports {1, 2, 3, 4, 5, 6, 7, 8, ..., 20}
. Therefor, I replaced the originalset
containing a subset of ports by thePrettySet
class. This way I can log failing ports without thinking about output format on each and everylogger.warning(f"blabla {failing_ports}")
. I also watched the video, which makes some good points, but I think it's biased as it misses any discussion of the drawbacks of the procedural approach. \$\endgroup\$Looser User– Looser User2023年01月25日 13:58:05 +00:00Commented Jan 25, 2023 at 13:58 -
\$\begingroup\$ @LooserUser Based on the usage shown in the question as posted, with
print(PrettySet(data))
, the "Stop Writing Classes" feedback is correct. In future questions, post all relevant code (ie, both the utility and usages of the utility) to get feedback relevant to your real question instead of a toy example. \$\endgroup\$AJNeufeld– AJNeufeld2023年01月25日 23:27:20 +00:00Commented Jan 25, 2023 at 23:27 -
\$\begingroup\$ I'm still not completely decided whether I use the class or the proc approach. I's not that I object the proc approach. It's just that I was confronted very often with repetitive code which spreads pod's and dict accesses all over and I wished people used more type sane and reusable code being it a set of classes or even procedures. Anyway, there's a drawback in my derived class I didn't see before. The
intersection
,union
and such do not work as expected, returning the originalset
instead of thePrettySet
. A proc would free me from the need to implement those overloads. \$\endgroup\$Looser User– Looser User2023年01月26日 09:28:32 +00:00Commented Jan 26, 2023 at 9:28
Useability
I think your code does a good job at outputting the information in a readable and concise way. I also see potential for reuseability in other code.
In my opinion, contrary to AJNeufeld's answer it also makes sense for the pretty set to be a class instead of a function because this enables you to pass a PrettySet
around and have it pretty printed everywhere.
Typing
As AJNeufeld already pointed out in their answer there are some issues with your type annotations.
What I would like to add here is that the type checker mypy
can detect these errors if you run it in strict mode:
$ mypy --strict pretty_set.py
pretty_set.py:37: error: Function is missing a return type annotation
pretty_set.py:37: error: Missing type parameters for generic type "list"
pretty_set.py:39: error: Function is missing a type annotation
Fund 3 errors in 1 file (checked 1 source file)
Readability
I strongly agree with AJNeufeld's answer that the first two lines of your __str__
methods where the groups
and segments
are constructed are hard to understand.
I would suggest to use itertools.pairwise
instead to compare each element with its predecessor. This can then be used to check if the elements are consecutive or not.
import itertools
class PrettySet(set[int]):
"""Subclassed set which redefines __str__ operator so that consecutive
elements are compressed in the formatted string"""
def __str__(self) -> str:
"""Create a string representation of the set.
Consecutive values are compressed, for example {1, 3, 4, 5, 7}
is formatted as {1, 3..5, 7}.
"""
parts: list[str] = []
if self:
def make_string(start: int, prev: int) -> str:
return f"{prev}" if start == prev else f"{start}..{prev}"
ordered = sorted(self)
start = ordered[0]
for prev, elem in itertools.pairwise(ordered):
if elem - prev > 1:
parts.append(make_string(start, prev))
start = elem
parts.append(make_string(start, ordered[-1]))
return f"{{{', '.join(parts)}}}"
I think this code is easier to understand but I may have a bias because I wrote it.
I am also not sure if I find return f"{{{', '.join(parts)}}}"
or return "{" + ", ".join(parts) + "}"
more readble.
Size
I do not know what exactly you mean by reviewing the code with regard to "size", but I don't think number of characters, number of bytes, number of lines or anything like that is relevant in this case.
-
\$\begingroup\$
prev is not None
is always true, so redundant. The assignment toprev
is overwritten by the next iteration of thefor prev, elem ...
loop, so is meaningless, and can be removed. \$\endgroup\$AJNeufeld– AJNeufeld2023年01月24日 05:03:36 +00:00Commented Jan 24, 2023 at 5:03 -
\$\begingroup\$ You are right, these are artifacts from an earlier stage when I did not use
itertools.pairwise
. I edited the code accordingly. \$\endgroup\$M472– M4722023年01月24日 08:26:43 +00:00Commented Jan 24, 2023 at 8:26 -
\$\begingroup\$ For completeness, you might add a
from_compressed()
class method to reconstruct a set from the compressed string. \$\endgroup\$RootTwo– RootTwo2023年01月25日 03:57:04 +00:00Commented Jan 25, 2023 at 3:57 -
\$\begingroup\$ @RootTwo Haha, I would like that as an exersize. \$\endgroup\$Looser User– Looser User2023年01月25日 14:04:48 +00:00Commented Jan 25, 2023 at 14:04
-
\$\begingroup\$ I started to look how I could implement the suggestions given in this thread. When I implemented my
PrettySet
class I dealt withint
set
's. However, now I think a type hint likeset[Ordered]
would be more appropriate, so a set using complex numbers as keys would not suffice. I've seen some suggestions, here: stackoverflow.com/questions/37669222/…. However, it would add more boilerplate code for little benefit. So I would perhaps stay withclass PrettySet(set)
??? \$\endgroup\$Looser User– Looser User2023年01月25日 14:34:02 +00:00Commented Jan 25, 2023 at 14:34
Here is my new implementation which adresses the following issues as suggested by AJNeufeld and M472. I just wanted to share this code. I'm not posting just to accept my own answer as a solution.
- Readability: Using
itertools.groupby
is hard to read, therefore usingitertools.pairwise
- The custom
set
was implemented as aclass
calledPrettySet
so it can be used as aset
and being passed around. However, the methods which should produce newPrettySet
's from a combination or copy returnset
s instead. This surprising behaviour contradicts the original justification for writing a class. Therefore, theclass
has been replaced by afunction
which takes a normalset[int]
as input. The overhead of implementing all those necessary overloads cannot be justified. - The code sections were properly separated by empty lines.
- Intermediate creation of data structures (e.g. nested lists) has been avoided in order to format the string directly. This addresses memory usage and performance.
- The corner case for formatting an empty set is solved by returning a string which cannot be confused with a dict.
- Type issues have been checked using
mypy
. - The original implementation compressed a section of consecutive elements of length 3 and up. This was considered an implementation detail and was not taken into account here.
import itertools
def pretty_format_set(src_set: set[int]) -> str:
""" Formats a set into a comma separated string compressing consecutive elements, for example
{1, 3, 4, 5, 7} is formatted as {1, 3..5, 7}. """
def make_string(range_start: int, range_end: int) -> str:
""" Returns a string representation of an integer range, compressing consecutive values """
return f"{range_start}" + ("" if range_start == range_end else f"..{range_end}")
if src_set:
formatted_str = "{"
ordered = sorted(src_set)
start = ordered[0]
# Iterate over the src_set to find consecutive elements and format string on the fly
for prev, elem in itertools.pairwise(ordered):
if elem - prev > 1:
formatted_str += make_string(start, prev) + ", "
start = elem
else:
# Terminate the output after the last loop iteration
formatted_str += make_string(start, ordered[-1]) + "}"
return formatted_str
return "{*[]}"
Usage:
test_array: list[set[int]] = [{1}, {1, 2}, {1, 2, 3}, {1, 3}, {1, 2, 4}, {1, 3, 4},
{1, 2, 3, 6, 7, 8, 10, 11, 14, 15, 16, 18, 20, 21, 22, 23, 24, 25}, {*[]}]
for data in test_array:
print(pretty_format_set(data))
Output:
{1}
{1..2}
{1..3}
{1, 3}
{1..2, 4}
{1, 3..4}
{1..3, 6..8, 10..11, 14..16, 18, 20..25}
{*[]}
Type check:
user@l-user:~/.config/JetBrains/PyCharm2022.3/scratches$ python3 -m mypy --strict pretty_print_set2.py
Success: no issues found in 1 source file