In a Python (3.6) application I receive messages from Kafka in JSON format. (The code base makes heavy use of static type annotations, and every file is automatically checked using mypy --strict
to catch type errors as early as possible.)
So I try to deserialize the received messages into objects immediately in order to avoid working with Dict[str, Any]
instances in the downstream code. I'd like the deserialization to be type-safe, i.e., to not only fail if not all class members are defined in the JSON string, but also if one or more of them have an incorrect type.
My current approach looks as follows:
(The class Foo
in the unit tests is an example of a typical target_class
.)
#!/usr/bin/env python3
"""
Type-safe JSON deserialization
"""
import inspect
from typing import Any, List
import json
import unittest
def get_all_member_variable_names(target_class: Any) -> List[str]:
"""Return list of all public member variables."""
all_members = [name for name, _ in inspect.getmembers(target_class,\
lambda a: not inspect.isroutine(a))]
return list(filter(lambda name: not name.startswith("__"), all_members))
def deserialize_json(target_class: Any, object_repr: str) -> Any:
"""Constructs an object in a type-safe way from a JSON strings"""
data = json.loads(object_repr)
members = get_all_member_variable_names(target_class)
for needed_key in members:
if needed_key not in data:
raise ValueError(f'Key {needed_key} is missing.')
dummy = target_class()
for needed_key in members:
json_type = type(data[needed_key])
target_type = type(getattr(dummy, needed_key))
if json_type != target_type:
raise TypeError(f'Key {needed_key} has incorrect type. '
'({json_type} instead of {target_type}')
return target_class(**data)
class Foo():
"""Some dummy class"""
val: int = 0
msg: str = ''
frac: float = 0.0
def __init__(self, val: int = 0, msg: str = '', frac: float = 0.0) -> None:
self.val: int = val
self.msg: str = msg
self.frac: float = frac
class TestDeserialization(unittest.TestCase):
"""Test with valid and invalid JSON strings"""
def test_ok(self) -> None:
"""Valid JSON string"""
object_repr = '{"val": 42, "msg": "hello", "frac": 3.14}'
a_foo: Foo = deserialize_json(Foo, object_repr)
self.assertEqual(a_foo.val, 42)
self.assertEqual(a_foo.msg, 'hello')
self.assertEqual(a_foo.frac, 3.14)
def test_missing(self) -> None:
"""Invalid JSON string: missing a field"""
object_repr = '{"val": 42, "msg": "hello"}'
with self.assertRaises(ValueError):
deserialize_json(Foo, object_repr)
def test_incorrect_type(self) -> None:
"""Invalid JSON string: incorrect type of a field"""
object_repr = '{"val": 42, "msg": "hello", "frac": "incorrect"}'
with self.assertRaises(TypeError):
deserialize_json(Foo, object_repr)
It works and the unit tests succeed, but I'm not sure if I am missing some potential problem or other opportunities to improve this. It would be cool if you could give me some hints or general criticism.
2 Answers 2
You need to:
- Get the annotated signature of a
target_class
'__init__
method; - Apply whatever arguments come from the provided JSON string;
- Check that all arguments are present;
- Check that all arguments conform to the annotations of the class'
__init__
.
All except the last step is pretty straightforward using the inspect
module:
import json
import inspect
from typing import Callable, Any, T
def deserialize_json(target_class: Callable[[Any], T], object_repr: str) -> T:
data = json.loads(object_repr)
signature = inspect.signature(target_class)
bound_signature = signature.bind(**data)
bound_signature.apply_defaults()
return target_class(**bound_signature.arguments)
inspect.Signature.bind
validate arguments pretty much the same as a real call to target_class.__init__
would; raising TypeError
if a positionnal argument is missing or if an extra keyword argument is found.
Now you "just" need to validate the type of the arguments based on the annotations. A simple but potentially sufficient way of doing would check the annotation
attribute of each parameter of the Signature
and check it is the right type using isinstance
:
def deserialize_json(target_class: Callable[[Any], T], object_repr: str) -> T:
data = json.loads(object_repr)
signature = inspect.signature(target_class)
bound_signature = signature.bind(**data)
bound_signature.apply_defaults()
for name, value in bound_signature.arguments.items():
expected_type = signature.parameters[name].annotation
if not isinstance(value, expected_type):
raise TypeError('<error message>')
return target_class(**bound_signature.arguments)
Note that this simple example would most likely have troubles validating Generic types or parametrized Generic types. Adapting the code to fit those needs is left as an exercise for the reader (but Python 3.7 have better support for such checks).
As a side note, if all target classes ressemble the Foo
example, and you don't mind having them immutable, you can make use of typing.NamedTuple
:
class Foo(typing.NamedTuple):
val: int
msg: str
frac: float = 1.0
print(Foo(1, 'bar')) # Foo(val=1, msg='bar', frac=1.0)
print(Foo(2, 'baz', 0.0)) # Foo(val=2, msg='baz', frac=0.0)
print(Foo(3)) # TypeError: __new__() missing 1 required positional argument: 'msg'
or switch to Python 3.7 and use full-blown dataclasses
.
-
\$\begingroup\$ Awesome, thanks a lot. This is much better, since I don't need the default values for the class. Using the signature of the target-class constructor is very clever. :) I only had to make some adjustments for
mypy --strict
andpylint
(see edit of my original post). \$\endgroup\$Tobias Hermann– Tobias Hermann2018年07月27日 15:20:52 +00:00Commented Jul 27, 2018 at 15:20 -
\$\begingroup\$ @TobiasHermann As regard to this edit: avoid using
type
and preferisinstance
. Not that it matter much for JSON types, but think thatlist != typing.List
butisinstance([], typing.List)
is true. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2018年07月27日 17:49:50 +00:00Commented Jul 27, 2018 at 17:49 -
\$\begingroup\$ Thanks, you are right. I fixed this too. \$\endgroup\$Tobias Hermann– Tobias Hermann2018年07月27日 21:02:29 +00:00Commented Jul 27, 2018 at 21:02
Thanks to the help of Mathias Ettinger, here is the improved version:
(I only had to make some additional minor adjustments for mypy --strict
.)
#!/usr/bin/env python3
"""
Type-safe JSON deserialization
"""
import inspect
from typing import Any, Callable, Dict, TypeVar
import json
import unittest
TypeT = TypeVar('TypeT')
def deserialize_dict(target_class: Callable[..., TypeT], data: Dict[str, Any]) -> TypeT:
"""Constructs an object in a type-safe way from a dictionary"""
signature = inspect.signature(target_class)
bound_signature = signature.bind(**data)
bound_signature.apply_defaults()
for name, _ in bound_signature.arguments.items():
expected_type = signature.parameters[name].annotation
if not isinstance(data[name], expected_type):
json_type = type(data[name])
raise TypeError(f'Key {name} has incorrect type. '
f'{json_type.__name__} instead of '
f'{expected_type.__name__}')
return target_class(**bound_signature.arguments)
def deserialize_json(target_class: Callable[..., TypeT], object_repr: str) -> TypeT:
"""Constructs an object in a type-safe way from a JSON strings"""
return deserialize_dict(target_class, json.loads(object_repr))
class Foo():
"""Some dummy class"""
def __init__(self, val: int, msg: str, frac: float) -> None:
self.val: int = val
self.msg: str = msg
self.frac: float = frac
class TestDeserialization(unittest.TestCase):
"""Test with valid and invalid JSON strings"""
def test_dict(self) -> None:
"""Valid data dict"""
data = {"val": 42, "msg": "hello", "frac": 3.14}
a_foo: Foo = deserialize_dict(Foo, data)
self.assertEqual(a_foo.val, 42)
self.assertEqual(a_foo.msg, 'hello')
self.assertEqual(a_foo.frac, 3.14)
def test_ok(self) -> None:
"""Valid JSON string"""
object_repr = '{"val": 42, "msg": "hello", "frac": 3.14}'
a_foo: Foo = deserialize_json(Foo, object_repr)
self.assertEqual(a_foo.val, 42)
self.assertEqual(a_foo.msg, 'hello')
self.assertEqual(a_foo.frac, 3.14)
def test_additional(self) -> None:
"""Valid JSON string with an additional field"""
object_repr = '{"val": 42, "msg": "hello", "frac": 3.14, "ignore": 1}'
with self.assertRaises(TypeError):
deserialize_json(Foo, object_repr)
def test_missing(self) -> None:
"""Invalid JSON string: missing a field"""
object_repr = '{"val": 42, "msg": "hello"}'
with self.assertRaises(TypeError):
deserialize_json(Foo, object_repr)
def test_incorrect_type(self) -> None:
"""Invalid JSON string: incorrect type of a field"""
object_repr = '{"val": 42, "msg": "hello", "frac": "incorrect"}'
with self.assertRaises(TypeError):
deserialize_json(Foo, object_repr)
-
\$\begingroup\$ Shouldn't the
deserialize_*
functions annotations returnTypeT
? Or is it a limitation ofmypy
? \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2018年07月27日 21:25:58 +00:00Commented Jul 27, 2018 at 21:25 -
\$\begingroup\$ @MathiasEttinger Of course you are absolutely right again. This was a leftover from my original attempt. I just fixed it. \$\endgroup\$Tobias Hermann– Tobias Hermann2018年07月27日 21:54:26 +00:00Commented Jul 27, 2018 at 21:54
-
\$\begingroup\$ @MathiasEttinger: Just in case you are interested in hearing this: The idea continued to grow and I made a library out of it. So thank you again for your help. :) \$\endgroup\$Tobias Hermann– Tobias Hermann2018年08月24日 13:04:50 +00:00Commented Aug 24, 2018 at 13:04
Foo
used in the test cases is such an example. \$\endgroup\$typing
in Python 3.6 which, IMHO, is less predictible than in Python 3.7. The decorator at the end of the answer could be updated to work withtyping
3.6 but would be much verbose, I feel. \$\endgroup\$ValueError
if one is missing from a call with**kwargs
? \$\endgroup\$