Feedback Request
- Is the manner in which I use
attrs
adequate (i.e. designed well)? - Should I not expect to be able to use argument ordering to order the joining of groups, given my current setup (see
def label(self):
below)? - Are there any general Pythonic suggestions for the code I have?
Context
I am working on creating labels for specimens in paleontological collections.
At the moment, I have a poetry
package called paleo_utils
. __init__.py
contains code that takes Label
, CollectionsLabel
(subclassed Label
), and SystematicsLabel
(subclassed Label
) from label.py
. For the sake of this question, the code in Label
and SystematicsLabel
does not matter.
I would prefer to not have to use __init__
while also using attrs
in the code below but at present I am not sure I am able to get {key: kwargs[key] for key in kwargs}
using just __attrs_post_init__
.
My wish is to be able to have these two cases work as follows:
import paleo_utils
label = paleo_utils.CollectionsLabel(
save_directory="../assets/saved_images/",
id_number="3244",
collection="AMNH",
collector="Dr. Montague",
location="North America",
formation="Navesink",
coordinates=(40.7128, -74.0060),
date_found="2024年01月01日",
title_overrides={"collection": "Museum: "},
)
print(label.label())
# ID Number: 3244
# Museum: AMNH
# Collector: Dr. Montague
# Location: North America
# Formation: Navesink
# Coordinates: (40.7128, -74.006)
# Date Found: 2024年01月01日
and
import paleo_utils
label = paleo_utils.CollectionsLabel(
save_directory="../assets/saved_images/",
date_found="2024年01月01日",
id_number="3244",
formation="Navesink",
collection="AMNH",
collector="Dr. Montague",
location="North America",
coordinates=(40.7128, -74.0060),
title_overrides={
"date_found": "Date Collected: ",
"location": "Locality: "},
)
print(label.label())
# Date Collected: 2024年01月01日
# ID Number: 3244
# Formation: Navesink
# Collection: AMNH
# Collector: Dr. Montague
# Locality: North America
# Coordinates: (40.7128, -74.006)
Code For CollectionsLabel
@attrs.define(kw_only=True)
class CollectionsLabel(Label):
collection: str | None = attrs.field(
default=None
)
id_number: str | None = attrs.field(
default=None
)
collector: str | None = attrs.field(
default=None
)
species: str | None = attrs.field(
default=None
)
species_author: str | None = attrs.field(
default=None
)
common_name: str | None = attrs.field(
default=None
)
location: str | None = attrs.field(
default=None
)
coordinates: tuple[float, float] | None = (
attrs.field(default=None)
)
coordinates_separate: bool = attrs.field(
default=False
)
date_found: str | None = attrs.field(
default=None
)
date_cataloged: str | None = attrs.field(
default=None
)
formation: str | None = attrs.field(
default=None
)
formation_author: str | None = attrs.field(
default=None
)
chrono_age: str | None = attrs.field(
default=None
)
chrono_age_author: str | None = attrs.field(
default=None
)
size: str | None = attrs.field(default=None)
link: str | None = attrs.field(default=None)
default_titles = {
"collection": "Collection: ",
"id_number": "ID Number: ",
"collector": "Collector: ",
"species": "Scientific Name: ",
"species_author": "Species Author: ",
"common_name": "Common Name: ",
"location": "Location: ",
"coordinates": "Coordinates: ",
"date_found": "Date Found: ",
"date_cataloged": "Date Cataloged: ",
"formation": "Formation: ",
"formation_author": "Formation Author: ",
"chrono_age": "Age: ",
"chrono_age_author": "Age Author: ",
"size": "Size: ",
"link": "Link: ",
}
title_overrides: dict[str, str] = attrs.field(
factory=dict
) # empty by default
_ordered_kwargs: dict = attrs.field(init=False)
def __init__(self, **kwargs):
self._ordered_kwargs = {key: kwargs[key] for key in kwargs}
def __attrs_post_init__(self):
# update title_overrides with any user-provided overrides
if self.title_overrides:
# merge user-provided titles, overriding defaults
for (
key,
value,
) in self.title_overrides.items():
if key in self.default_titles:
self.default_titles[key] = (
value
)
def _get_collections_attrs(self):
label_attrs = {
attr.name
for attr in Label.__attrs_attrs__
}
# collections_attrs = {
# attr.name: getattr(self, attr.name)
# for attr in self.__attrs_attrs__
# if attr.name not in label_attrs
# }
# print(self.__attrs_attrs__)
collections_attrs = {
key: value for key, value in self._ordered_kwargs.items() if key not in label_attrs
}
return collections_attrs
def label(self):
# empty list for parts of the final label
parts = []
# collections label exclusive attrs
collections_attrs = (
self._get_collections_attrs()
)
# iterative over collections attrs
for (
key,
value,
) in collections_attrs.items():
# for all non-None collections attrs, proceed
if (
value is not None
and not isinstance(value, dict)
):
# edit title with spaces and capitalized
title = self.default_titles.get(
key,
f"{key.replace('_', ' ').capitalize()}: ",
)
# add the group
parts.append(f"{title}{value}")
# consolidate to multiline label
return "\n".join(parts)
2 Answers 2
use pathlib
label = paleo_utils.CollectionsLabel(
save_directory="../assets/saved_images/",
date_found="2024年01月01日",
Rather than str
, consider storing that folder name as a Path.
Then it will be significantly more convenient for clients to make use of.
Kudos on using ISO8601, so lexical order matches chronological order.
optional values
You consistently admit of None
for each attribute.
This resembles a relational database table lacking
even a single NOT NULL clause in its
DDL.
I encourage you to go back and reconsider whether a CollectionsLabel
truly could have e.g. no collection
and no id_number
.
data type
Several date fields curiously use str
rather than a datetime
.
I imagine chrono_age: str
gives you the flexibility to store
human-friendly descriptions like "1.2 Bya" or "66 Mya",
which would lead to unfortunate lexical ordering.
And I'm concerned that someone might try to jam "late Devonian" in there.
Consider using one or two numeric fields to store age,
and worry about human-friendly formatting when outputting a retrieved record.
Giving size
a type of str
seems mind boggling to me,
but I imagine you have your reasons.
Perhaps it is a {small, medium, large} enum?
Or some sizes are in square meters while others are in cubic meters?
idiom
Certainly this works,
... {key: kwargs[key] for key in kwargs}
but it is more usual to iterate over kwargs.items()
.
And in this particularly simple case you should use kwargs.copy()
to produce the desired shallow copy.
But it's not clear why you'd want to make a copy at all,
given that each function call makes a shallow copy already.
The OP contains no unit test which would turn Red if we stop copying.
If you are trying to conform to some
documented
attrs advice, then please add a #
comment which cites the docs.
If there is another concern at work here, please comment on it,
and write a unit test which highlights the effect of interest.
I claim that any valid calling code one might write
wouldn't care about whether a copy had happened,
that is, one could not demonstrate a Green test
which turns Red when we remove that copying operation.
If inheritance plays into this, show that in your unit test.
As written the code is obscure -- I cannot think of a good
reason for making a copy of the args.
(In an inheritance situation is common enough for a V2 child class
to destructively .pop()
off V2 args that it knows about, so they
won't bother the V1 parent class. But that doesn't appear to be
relevant here.)
zero items
You have declared that title_overrides
cannot be None, which is a good choice.
def __attrs_post_init__(self):
...
if self.title_overrides:
No need for the guard, there. Just iterate over zero items in an empty dict, no harm done.
Please phrase the for
as for key, value in self.title_overrides.items():
,
and use similar single-line formatting down in label()
.
Wouldn't a simple .update() suffice here?
meaningful identifier
label_attrs = {
attr.name ...
That looks more like you're creating a new names
variable, to me.
Also, rather than merge that code fragment into the main
branch,
just delete it, along with the half dozen related lines of
commented code that follow it.
isinstance()
if (
value is not None
and not isinstance(value, dict)
):
Please simplify to
if not isinstance(value, (dict, NoneType)):
(Your IDE will help you add from types import NoneType
.)
embedded newlines
return "\n".join(parts)
I note in passing that you apparently prohibit certain characters in all those fields, such as NEWLINE, but that wasn't enforced anywhere. If one crept in, it would corrupt this returned value, turning a single record into apparently multiple records.
Overall this seems a reasonable design.
expect to be able to use argument ordering ...
You didn't mention your target environment, but I'm willing to assume it
does not include e.g. Jython, and will be restricted to just modern cPython.
The python language distinguishes between unordered dict
and
the OrderedDict
class which makes stronger guarantees.
The cPython interpreter now imposes ordering on dict
.
The old promise was "an arbitrary order which is non-random",
but that changed several years ago.
Starting with interpreter
3.7,
dict
iteration promises to preserve insertion order.
Also, PEP 468 ensures that kwargs
order shall match
the arg order seen in the source code.
So, yes, your expectation is reasonable.
-
1\$\begingroup\$ +1. Thank you very much for this thoughtful comment. I will likely approve your answer once I've gone through my code again (in the next 36 hours) with the above considerations. I will provide an update with how I've modified my code. \$\endgroup\$R509– R5092024年11月04日 02:41:22 +00:00Commented Nov 4, 2024 at 2:41
Attrs
You need to code your own __init__
function to be able to use **kwargs
and get the argument order. Otherwise, attrs will use the order the fields are defined. But you can still use attrs to initialize the class instance by calling __attrs_init__
from your __init__
:
def __init__(self, **kwargs):
self._ordered_kwargs = kwargs.copy()
self.__attrs_init__(**kwargs)
But should you?
This seems like an overly complicated way to provide different label formats. In the future, a poor hapless programmer tasked with adding a new label format would need to figure out that the order of the output depends on the order of the arguments in the call to the constructor.
Instead, use the attrs library (or dataclasses) to create the class, including __init__
and other useful dunder methods. And use string.format
to get your desired output. For example:
# use *string.format* to create functions that return a formatted label
AMNH = """\
ID Number: {0.id_number}
Museum: {0.collection}
Collector: {0.collector}
Location: {0.location}
Formation: {0.formation}
Coordinates: {0.coordinates}
Date Found: {0.date_found}"""
formation = """\
Date Collected: {0.date_found}
ID Number: {0.id_number}
Formation: {0.formation}
Collection: {0.collection}
Collector: {0.collector}
Locality: {0.location}
Coordinates: {0.coordinates}"""
# create an instance of the label
label = paleo_utils.CollectionsLabel(
save_directory="../assets/saved_images/",
id_number="3244",
collection="AMNH",
collector="Dr. Montague",
location="North America",
formation="Navesink",
coordinates=(40.7128, -74.0060),
date_found="2024年01月01日",
)
# print the label in any format
print(AMNH.format(label))
print(formation.format(label))
Because formats defined this way are just strings, it makes it much easier to select formats at run-time. For example, a number of format strings can be loaded from a configuration file (e.g., toml or ini) and one selected for use based on a command line argument.
-
\$\begingroup\$ Thank you for this and my apologies for the delay in getting back with an acknowledgement! This is simpler and only occurred to me in such vague mental terms that I did not have enough confidence to shift paths. \$\endgroup\$R509– R5092024年11月27日 05:59:39 +00:00Commented Nov 27, 2024 at 5:59
TypedDict
with a helper static class? \$\endgroup\$