I am writing a webapp with a Tornado backend and, of course, javascript and jquery on the frontend, so I am using the builtin json module in the standard library to serialize objects for the frontend. I had started writing a custom JSONEncoder for my classes, but then it occurred to me that I could simply write a very simple, generic object encoder:
class ObjectEncoder(json.JSONEncoder):
def default(self, obj):
return vars(obj)
It seems to be working nicely, so I wondered why this is not included in the module, and if this technique has drawbacks. I didn't experiment if it works nicely with check_circular, but I have no reason to believe it doesn't.
Any comments on my doubts? Otherwise, I suppose this technique may be useful to somebody, since I didn't find it with a search (admittedly, a quick one).
EDIT: here's an example, as simple as it gets, to show the behaviour of the json module:
>>> import json
>>> class Foo:
... def __init__(self):
... self.bar = 'bar'
...
>>> foo = Foo()
>>> json.dumps(foo)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <__main__.Foo object at 0x7f14660236d8> is not JSON serializable
>>> class ObjectEncoder(json.JSONEncoder):
... def default(self, obj):
... return vars(obj)
...
>>> json.dumps(foo, cls=ObjectEncoder)
'{"bar": "bar"}'
1 Answer 1
vars(obj) is syntactic sugar for obj.__dict__, so it doesn't work on any object without __dict__. This includes stuff like:
- User-defined objects where every level in the class hierarchy defined
__slots__(without a__dict__slot) to reduce memory usage - Objects of built-in types that don't opt-in to a
tp_dictslot
Worse, there are in-between cases, where some attributes are set on the __dict__, while others aren't (e.g. a class hierarchy where __slots__ was used for some levels, but other levels didn't use __slots__ and relied on the implicit __dict__). In cases like that, you wouldn't get an error to let you know something had gone wrong, you'd just serialize the __dict__ part of the object state and silently ignore the rest.
You'd have similar problems if the interface uses @propertys; they're used like attributes, but they're not on the instance __dict__, so you'd either lose the information completely (if there is no hidden underlying attribute), or serialize the "wrong" value (the internal name, rather than the API name exposed as an @property).
In short, lots of things can go subtly wrong by trying to guess at the correct behavior like this, which is why The Zen of Python (type import this in an interactive terminal to see it) includes stuff like:
Errors should never pass silently.
and
In the face of ambiguity, refuse the temptation to guess.
Beyond these errors, there's also the general problem of reversability. A general encoder of this form is definitionally incapable of being handled by a general decoder (because you lose all the type information). Offering an easy way to lose important information is... suboptimal.
2 Comments
set in on of my classes, and that hasn't got a __dict__, so I had to change approach: I use the vars() approach for a list of predefined user classes, and custom code for other classes like set.
json.dumps(obj)?TypeError: <... instance at ...> is not JSON serializable. This seems to be by design, since it is explicitly stated in the documentation, I would just like to understand why it is so, and if there are complication which I do not see.