One of the big problems I have is cleanly converting from an internal datatype to an external datatype. We can all do it the not so clean way, however I think this add too much mess.
I can use libraries that read from a filetype to a Python object, however some libraries don't allow you to convert the data from one type to another. Most libraries also don't allow converting from one structure to another, so if you need data to be nested when it comes flat, you have to perform the conversion manually.
This library still uses these filetype libraries to convert to a Python object. It just offloads some of the work from these libraries. And adds some features I don't think will be added to these libraries.
This library consists of two public classes; Converter
and Converters
. These work almost completely independently. A short explanation of most of the code is:
Converters
defines someproperty
functions, these interface withConverter._obj
to convert to and from the base class.ron
is used to raise whenConverter._obj
returns a 'null' (ABuilderObject
), this is as we build aBuilderObject
before building the actual class. This allows you to initialize usingsetattr
, rather than just passing a dictionary. Which I find to be a little cleaner at times.
This should be used whenever you get data fromConverter._obj
.BuilderObject
is a simple object that defaults nested objects to itself. This means we can build nested datatypes without having to build the objects themselves - as we don't have the data.Converter
is a small unobtrusive class to convert to and from the base class and itself. ProvidingT
when using the class is required for the code to work.
from datetime import datetime
from typing import Generic, TypeVar, Type, get_type_hints, Union, List, Optional, Tuple, Any
__all__ = ['ron', 'Converter', 'Converters']
T = TypeVar('T')
class BuilderObject:
def __init__(self):
super().__setattr__('__values', {})
def __getattr__(self, name):
return super().__getattribute__('__values').setdefault(name, BuilderObject())
def __setattr__(self, name, value):
super().__getattribute__('__values')[name] = value
def __delattr__(self, name):
del super().__getattribute__('__values')[name]
def _build(base: Type[T], values: Union[BuilderObject, dict]) -> T:
"""Build the object recursively, utilizes the type hints to create the correct types"""
types = get_type_hints(base)
if isinstance(values, BuilderObject):
values = super(BuilderObject, values).__getattribute__('__values')
for name, value in values.items():
if isinstance(value, BuilderObject) and name in types:
values[name] = _build(types[name], value)
return base(**values)
def _get_args(obj: object, orig: Type) -> Optional[Tuple[Type]]:
"""Get args from obj, filtering by orig type"""
bases = getattr(type(obj), '__orig_bases__', [])
for b in bases:
if b.__origin__ is orig:
return b.__args__
return None
class Converter(Generic[T]):
_obj: T
def __init__(self, **kwargs) -> None:
self._obj = BuilderObject()
for name, value in kwargs.items():
setattr(self, name, value)
def build(self, exists_ok: bool=False) -> T:
"""Build base object"""
t = _get_args(self, Converter)
if t is None:
raise ValueError('No base')
base_cls = t[0]
if isinstance(self._obj, base_cls):
if not exists_ok:
raise TypeError('Base type has been built already.')
return self._obj
self._obj = _build(base_cls, self._obj)
return self._obj
@classmethod
def from_(cls, b: T):
"""Build function from base object"""
c = cls()
c._obj = b
return c
def ron(obj: T) -> T:
"""Error on null result"""
if isinstance(obj, BuilderObject):
raise AttributeError()
return obj
TPath = Union[str, List[str]]
class Converters:
@staticmethod
def _read_path(path: TPath) -> List[str]:
"""Convert from public path formats to internal one"""
if isinstance(path, list):
return path
return path.split('.')
@staticmethod
def _get(obj: Any, path: List[str]) -> Any:
"""Helper for nested `getattr`s"""
for segment in path:
obj = getattr(obj, segment)
return obj
@classmethod
def property(cls, path: TPath, *, get_fn=None, set_fn=None):
"""
Allows getting data to and from `path`.
You can convert/type check the data using `get_fn` and `set_fn`. Both take and return one value.
"""
p = ['_obj'] + cls._read_path(path)
def get(self):
value = ron(cls._get(self, p))
if get_fn is not None:
return get_fn(value)
return value
def set(self, value: Any) -> Any:
if set_fn is not None:
value = set_fn(value)
setattr(cls._get(self, p[:-1]), p[-1], value)
def delete(self: Any) -> Any:
delattr(cls._get(self, p[:-1]), p[-1])
return property(get, set, delete)
@classmethod
def date(cls, path: TPath, format: str):
"""Convert to and from the date format specified"""
def get_fn(value: datetime) -> str:
return value.strftime(format)
def set_fn(value: str) -> datetime:
return datetime.strptime(value, format)
return cls.property(path, get_fn=get_fn, set_fn=set_fn)
An example of using this code is:
from dataclasses import dataclass
from datetime import datetime
from converters import Converter, Converters
from dataclasses_json import dataclass_json
@dataclass
class Range:
start: datetime
end: datetime
@dataclass
class Base:
date: datetime
range: Range
@dataclass_json
@dataclass(init=False)
class International(Converter[Base]):
date: str = Converters.date('date', '%d/%m/%y %H:%M')
start: str = Converters.date('range.start', '%d/%m/%y %H:%M')
end: str = Converters.date('range.end', '%d/%m/%y %H:%M')
class American(Converter[Base]):
date: str = Converters.date('date', '%m/%d/%y %H:%M')
start: str = Converters.date('range.start', '%m/%d/%y %H:%M')
end: str = Converters.date('range.end', '%m/%d/%y %H:%M')
if __name__ == '__main__':
i = International.from_json('''{
"date": "14/02/19 12:00",
"start": "14/02/19 12:00",
"end": "14/02/19 12:00"
}''')
b = i.build()
a = American.from_(b)
FORMAT = '{1}:\n\tdate: {0.date}\n\tstart: {0.range.start}\n\tend: {0.range.end}'
FORMAT_C = '{1}:\n\tdate: {0.date}\n\tstart: {0.start}\n\tend: {0.end}'
print(FORMAT.format(b, 'b'))
print(FORMAT_C.format(a, 'a'))
print(FORMAT_C.format(i, 'i'))
print('\nupdate b.date')
b.date = datetime(2019, 2, 14, 12, 30)
print(FORMAT.format(b, 'b'))
print(FORMAT_C.format(a, 'a'))
print(FORMAT_C.format(i, 'i'))
print('\nupdate b.range.start')
b.range.start = datetime(2019, 2, 14, 13, 00)
print(FORMAT.format(b, 'b'))
print(FORMAT_C.format(a, 'a'))
print(FORMAT_C.format(i, 'i'))
print('\njson dump')
print(i.to_json())
From a code review I mostly want to focus on increasing the readability of the code. I also want to keep Converter
to contain all of the logic, whilst also being very transparent so that most libraries like dataclasses_json
work with it. I don't care about performance just yet.
2 Answers 2
Let's get from the simplest thing to change to the hardest one:
BuilderObject
This is actually a recursive dictionary, which is more easily achieved using either collections.defaultdict
or by implementing dict.__missing__
. Since you also want __getattr__
to act like __getitem__
and so on, I’d go the dict
subclass route:
class BuilderObject(dict):
def __missing__(self, item):
self[item] = missing = BuilderObject()
return missing
def __getattr__(self, item):
return self[item]
def __setattr__(self, item, value):
self[item] = value
def __delattr__(self, item):
del self[item]
Simpler to read and understand. I’m not fond of the name however, but couldn't come up with something better.
Converters
These are in fact descriptors in disguise. Instead of writing the getter, setter and deleter functions to feed to property
, you could as well write them as __get__
, __set__
and __delete__
method on a custom class. Also operator.attrgetter
is your friend, no need to rewrite it yourself:
class AttributeProxy:
def __init__(self, path: str, *, get_fn=None, set_fn=None):
self.__path = '_obj.' + path
self.__parent, self.__attribute_name = self.__path.rsplit('.', 1)
self.__getter = get_fn
self.__setter = set_fn
def __get__(self, instance, owner):
if not issubclass(owner, Converter):
raise RuntimeError('cannot use Property descriptors on non Converter types')
if instance is None:
return self
value = operator.attrgetter(self.__path)(instance)
if isinstance(value, BuilderObject):
raise AttributeError
if self.__getter is not None:
value = self.__getter(value)
return value
def __set__(self, instance, value):
if self.__setter is not None:
value = self.__setter(value)
setattr(operator.attrgetter(self.__parent)(instance), self.__attribute_name, value)
def __delete__(self, instance):
delattr(operator.attrgetter(self.__parent)(instance), self.__attribute_name)
class DateProxy(AttributeProxy):
def __init__(self, path, format):
super().__init__(
path,
get_fn=lambda value: value.strftime(format),
set_fn=lambda value: datetime.strptime(value, format)
)
_get_args
The main purpose of this function is to check that, when one derives from Converter
, they provide a specialization of T
; and optionally retrieve that specialization. I find the implementation somewhat cryptic and potentially missing some edge cases. I fiddled with a metaclass so the check is performed very early in the program. This is the main advantage of the approach as you would get an error message right away, not latter on in an unrelated section of the code:
class CheckBaseExist(type):
def __new__(mcls, name, bases, attrs):
cls = super().__new__(mcls, name, bases, attrs)
if not issubclass(cls, Generic):
raise TypeError('CheckBaseExist metaclass should be used on typing.Generic subclasses')
if Generic not in bases:
# We already know that klass is a subclass of Generic so it must define __orig_bases__
try:
base, = cls.__orig_bases__
except ValueError:
raise TypeError('cannot use more than one specialization of a base CheckBaseExist class in the inheritance tree') from None
if base.__origin__ is Generic:
raise TypeError('no specialization provided when inheriting from a base CheckBaseExist class')
else:
generic_subclasses = ' or '.join(
klass.__name__
for klass in bases
if klass is not Generic and issubclass(klass, Generic)
)
if generic_subclasses:
raise TypeError(f'cannot use typing.Generic as a common base class with {generic_subclasses}')
return cls
Not too fond of the name either... Usage being:
class Converter(Generic[T], metaclass=CheckBaseExist):
_obj: T
def __init__(self, **kwargs) -> None:
self._obj = BuilderObject()
for name, value in kwargs.items():
setattr(self, name, value)
def to_base(self, exists_ok: bool=False) -> T:
"""Build base object"""
base_cls = self.__class__.__orig_bases__[0].__args__[0]
if isinstance(self._obj, BuilderObject):
self._obj = _build(base_cls, self._obj)
elif not exists_ok:
raise RuntimeError('Base type has been built already.')
return self._obj
@classmethod
def from_base(cls, base: T):
"""Build function from base object"""
instance = cls()
instance._obj = base
return instance
This works as:
- we checked in the metaclass that
self.__class__.__orig_bases__
contains a single item; Generic[T]
ensures that__args__
contains a single item.
_build
I really dislike the fact that this function relies on typing.get_type_hints
but couldn't come up with something clean that doesn't use it, or at least does it optionally. Maybe an extra argument in the AttributeProxy
constructor, defaulting to None. I’m not a huge fan of it, but you’ll have to provide type hints somehow anyway.
This is important in case you want to convert to/from objects in external libraries that don't use those type hints, so you must implement a fallback mechanism.
-
\$\begingroup\$ Holy moley this is a sick answer. Thank you for staying up so late to write it! I'm going to need to digest it again tomorrow too to fully understand it. Whilst your pros for the meta class are clear, I'm not sure they outweigh the cons. 1. It'll make converting to Python <3.7 hard, as Generic has its own (iirc private) metaclass, and it'll mean it won't work with 'filetype libraries' that require metaclasses. I think I'll fix this by adding another class variable to hold the base, and have
CheckBaseExist
automagically set it. AllowingConverter
andGenericConverter
classes. (Thoughts?) \$\endgroup\$2019年02月27日 02:47:05 +00:00Commented Feb 27, 2019 at 2:47 -
\$\begingroup\$ I'm not sure "This is important in case you want to convert to/from objects in external libraries that don't use those type hints" is quite true. This is as it's passed
Base
not aConverter
, and so it should be separated from these external libraries. Or have I misunderstood you here? \$\endgroup\$2019年02月27日 02:50:56 +00:00Commented Feb 27, 2019 at 2:50 -
\$\begingroup\$ @Peilonrayz No, it means that you can't do
Converter[socket.socket]
for instance, or anything that doesn't use type hints. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2019年02月27日 06:43:20 +00:00Commented Feb 27, 2019 at 6:43
There are a couple of things that come to mind when reading your code. I'll write them down, in no particular order of importance.
Your example imports your library like this:
from converters import Converter, Converters
That's a major code smell. You import both a Converter
and Converters
from a file named converters
. If you have 3 things with the same name, you should name them better. Why do you have a Converter
and a Converters
class anyway? I'd expect one to be a collection of the other, but since Converter
already takes a template and is generic, what the heck do I need it's multiple for? It's not intuitive and probably violates the Zen of Python on half a dozen principles.
I see a lot of single-letter variables. While T
is somewhat acceptable here, the rest is not. i = International.from_json(
What? No. i
is an index or some other integer, not something much more complicated than that.
b = i.build()
a = American.from_(b)
Please, no. American
and International
are terrible names for classes anyway. You could use it as a sub-class or sub-type or something, or an instance of a class if the class makes clear it's a date of some sort, but don't make an American
class.
Now we're talking about those classes anyway, did you notice the absurd amount of repetition?
class International(Converter[Base]):
date: str = Converters.date('date', '%d/%m/%y %H:%M')
start: str = Converters.date('range.start', '%d/%m/%y %H:%M')
end: str = Converters.date('range.end', '%d/%m/%y %H:%M')
class American(Converter[Base]):
date: str = Converters.date('date', datetime_format)
start: str = Converters.date('range.start', datetime_format)
end: str = Converters.date('range.end', datetime_format)
So, a class has 3 lines and all of those lines contain either '%d/%m/%y %H:%M'
or '%m/%d/%y %H:%M'
. Have you considered making something like this instead?
class TerriblyNamedGeneric(Converter[Base], datetime_format):
date: str = Converters.date('date', datetime_format)
start: str = Converters.date('range.start', datetime_format)
end: str = Converters.date('range.end', datetime_format)
That's still not pretty and it can probably be done with even less repetition, but you get the idea.
The rest of your code is riddled with ambiguity.
def from_(cls, b: T):
"""Build function from base object"""
c = cls()
c._obj = b
return c
Why is a build function named from_
? What is a cls (if it's not short for one of these, pick a better name) and why is that entire function just a jumble of one-letter variable names?
You say you want a review focussed on the readability. In short, I don't think it's that readable. It may work like a charm, but the readability leaves much to be desired.
-
1\$\begingroup\$ Good answer, as discussed in chat I think
ObjectConverter
andPropertyConverters
would be better names. I thinkfrom_
is standard, andcls
is standard inclassmethods
. But the rest aboutfrom_
I agree with. Thanks :) \$\endgroup\$2019年02月26日 15:38:36 +00:00Commented Feb 26, 2019 at 15:38 -
\$\begingroup\$ Why would
from converters import Converter, Converters
be inherently wrong? I often dofrom datetime import datetime
orfrom pprint import pprint
orimport socket; sock = socket.socket(...)
... \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2019年02月26日 22:10:44 +00:00Commented Feb 26, 2019 at 22:10 -
\$\begingroup\$ @MathiasEttinger We talked about that in chat. It's not so much wrong, as in it's unclear why you'd need to import both
Converter
andConverters
from a library calledconverters
. It's a naming problem. The whole thing leads to ambiguity. Plenty of libraries are plagued with it, I know. That doesn't make it right. \$\endgroup\$2019年02月27日 14:26:09 +00:00Commented Feb 27, 2019 at 14:26
Range
has a third attributestep: int
, how would you refer to it in your external classes?step: int = Converters.property('range.step')
? \$\endgroup\$Range
too. If you need to convert to int from str then you can usestep: int = Converters.property('range.step', get_fn=str, set_fn=int)
. \$\endgroup\$