I want you to look the following code for validating dictionary schemas. There are some libraries out there that do the same thing, but I just wanted to write my own for fun and not to overload my current project with a big third party library.
Feel free to comment regarding code optimization, duplication, API design and the testing part.
You can see the API usage in the test snippet.
class ValidationError(Exception):
def __init__(self, *args, **kwargs):
super(ValidationError, self).__init__()
class Schema(object):
"""
Schema class to validate the structure of dic objects.
Accepts a dict of Attr's and validates their type, a validation function, etc.
usage: Schema({'name': Attr(str, validator=lambda x: len(x) > 0),
'age': Attr(int, lambda x: x > 0)})
To denote a nested dict you use it like this Attr(schema=Schema({...}))
"""
def __init__(self, attrs):
self._attrs = attrs
def bind(self, data):
if isinstance(data, dict):
self._data = data
return self
raise TypeError('data parameter is not a dict')
def validate(self, data=None):
"""
returns True if the bound data is valid to the schema
raises an exception otherwise
"""
if data:
self.bind(data)
for key in self._attrs.keys():
attr = self._attrs[key]
if not attr.is_optional and not key in self._data:
raise ValidationError('Required attribute {0} not present'.format(key))
if key in self._data:
attr.validate(self._data[key])
return True
class Attr(object):
"""
Class to be used inside a Schema object.
It validates individual items in the target dict,
usage: Attr(int, validatos=some_func)
"""
def __init__(self, t=None, optional=False, validator=None, schema=None):
self._t = t
self._optional = optional
self._validator = validator
self._schema = schema
@property
def is_schema(self):
return isinstance(self._schema, Schema)
@property
def is_optional(self):
return self._optional
@property
def validator(self):
return self._validator
@validator.setter
def validator(self, val):
self._validator = val
def validate(self, data):
if self._schema:
if not isinstance(self._schema, Schema):
raise TypeError('{0} is not instance of Schema'.format(data))
return self._schema.validate(data)
elif not isinstance(data, self._t):
raise TypeError('{0} is not instance of {1}'.format(data, self._t))
if self.validator and not self.validator(data):
raise ValidationError('attribute did not pass validation test')
return True
If you think some test are missing just tell it.
import unittest
class TestAttr(unittest.TestCase):
def test_validates_type(self):
attr = Attr(int)
self.assertTrue(attr.validate(1), 'The attribute should be valid')
attr = Attr(int)
self.assertRaises(TypeError, attr.validate, '1')
def test_validates_validator(self):
attr = Attr(int, validator=lambda x: x > 1)
self.assertTrue(attr.validate(2), 'The attribute should be valid')
self.assertRaises(ValidationError, attr.validate, 0)
class TestSchema(unittest.TestCase):
def test_flat_schema(self):
schema = Schema({'name': Attr(str),
'age': Attr(int, validator=lambda x: x > 0),
'awesome': Attr(bool, optional=True, validator=lambda x: x == True)})
valid = {'name': 'python', 'age': 19, 'awesome': True}
self.assertTrue(schema.validate(valid), 'schema should be valid')
# omit optional
valid = {'name': 'python', 'age': 19}
self.assertTrue(schema.validate(valid), 'schema should be valid')
invalid = {'name': 'python'}
self.assertRaises(ValidationError, schema.validate, invalid)
# validate optional if present
invalid = {'name': 'python', 'age': 19, 'awesome': False}
self.assertRaises(ValidationError, schema.validate, invalid)
def test_nested_schema(self):
address = Schema({'street': Attr(str, validator=lambda x: len(x) > 0),
'number': Attr(int)})
schema = Schema({'name': Attr(str),
'age': Attr(int, validator=lambda x: x > 0),
'address': Attr(schema=address)})
valid_address = {'street': 'pip', 'number': 2}
invalid_address = {'street': '', 'number': 5}
valid_data = {'name': 'python', 'age': 4, 'address': valid_address}
invalid_data = {'name': 'python', 'age': 4, 'address': invalid_address}
self.assertTrue(schema.validate(valid_data), 'schema should be valid')
self.assertRaises(ValidationError, schema.validate, invalid_data)
1 Answer 1
1. Error messages
The biggest problem with this code is that the error messages are useless. If the purpose is to validate data structures, then you have to think about what happens when validation fails. How easily are users going to be able to track down the cause of the problem?
So imagine that you are validating some data using a schema that you got from some library:
>>> from somewhere import schema >>> data = dict(m=0) >>> schema.validate(data) Traceback (most recent call last): ... ValidationError: attribute did not pass validation test
Now what? Why didn't the attribute pass the validation test? What are you supposed to do?
The message really needs to explain what the problem is, for example this would be much better:
ValidationError: 0 is not a valid month (use 1=January to 12=December)
So the
Attr
class needs some way of generating good error messages. For example, maybe the constructor could take a function that generates the message:Attr(int, validator=lambda m:1 <= m <= 12, message="{} is not a valid month (use 1=January to 12=December)".format)
(There are other ways of doing this which I'm sure you can think of.)
Let's suppose you've implemented the suggestion above, and we try it out:
>>> data = dict(a=0, b=0, c=0) >>> schema.validate(data) Traceback (most recent call last): ... ValidationError: 0 is not a valid month (use 1=January to 12=December)
Oh. Which attribute was the invalid month? Was it
a
,b
orc
? In order to be any use to the developer, the error message needs to include the attribute name:ValidationError: b=0 is not a valid month (use 1=January to 12=December)
But the attribute name on its own is not enough when you are validating a nested structure:
>>> data = dict(coordinates=dict(a=0, b=0, c=0), date=dict(a=0, b=0, c=0)) >>> schema.validate(data) Traceback (most recent call last): ... ValidationError: b=0 is not a valid month (use 1=January to 12=December)
The error message needs to include the complete path through the data structure to the erroneous attribute, perhaps like this:
ValidationError: date.b=0 is not a valid month (use 1=January to 12=December)
2. Other commentary
The validator doesn't fail if there are attributes that are present in the data but missing in the schema. For example, shouldn't this fail validation:
>>> schema = Schema({}) >>> schema.validate(dict(a=1)) True
You added some documentation, but it's very sketchy. It's really not good enough to say, "You can see the API usage in the test snippet." Did you learn to use Python by reading the unit tests? If not, how can you in all fairness expect users of your class to do that?
For example, what does a
Schema
object represent? What is its purpose? Can you give us a use case? Can you provide some examples? What does thebind
method do? In thevalidate
method, what's the role of thedata
argument? In theAttr
constructor, what's the meaning of thet
,optional
, andschema
arguments? And so on.There are some mistakes in the examples (there's a missing
validator=
in one place, and a typovalidatos
in another). If you used thedoctest
module to make your examples runnable, then you would have found and fixed these mistakes.In
ValidationError.__init__
, if you're just going to call the superclass's method, there's no point in defining the method at all. Just write:class ValidationError(Exception): pass
What is the purpose of the
bind
method? It allows the caller to divide the validation process into two steps: first, binddata
to a schema object; second, call.validate()
with no argument to do the actual validation. But what does this achieve over calling.validate(data)
?By testing
isinstance(data, dict)
you limit the kinds of object that can validated. The only methods ondata
you actually call are__contains__
,__iter__
and__getitem__
, so it would be better to test againstcollections.abc.Mapping
rather thandict
.A dictionary with no keys converts to Boolean as
False
, so instead of writing:if data: self.bind(data)
(which would go wrong when
data={}
) you should write:if data is not None: self.bind(data)
Instead of:
for key in self._attrs.keys(): attr = self._attrs[key]
you should write:
for key, attr in self._attrs.items():
(or
.iteritems()
in Python 2). This avoids an unnecessary lookup, and in Python 2 also avoids the unnecessary allocation of a list of keys.In these lines:
if not attr.is_optional and not key in self._data: raise ValidationError('Required attribute {0} not present'.format(key)) if key in self._data: attr.validate(self._data[key])
you look up
key
inself._data
three times. This seems a waste. Turn the logic round and look up the key just once:try: attr.validate(self._data[key]) except KeyError: if not attr.is_optional: raise ValidationError(...)
It seems inelegant to me for the
is_optional
logic to be in theSchema
class. Shouldn't this be in theAttr
class?When you've removed the
bind
method, you'll see that the only method left on the class is thevalidate
method. So I think it would be simpler to name the classValidator
and use the__call__
method instead. (This has the advantage that you no longer need the special-caseschema
argument to theAttr
constriuctor.)Having a property
Attr.validator
with getter and setter seems pointless because these are only ever called from within theAttr
class. TheAttr.is_schema
property is pointless because it is never used.Having a property
Attr.is_optional
seems pointless: why not just use an attribute?Having a special case for
Attr(schema=Schema(...))
is unnecessary: one can just writeAttr(validator=Schema(...).validate)
. Or if you take my suggestion above about using the__call__
method, you'd be able to just writeAttr(validator=Schema(...))
.The
t
argument to theAttr
constructor would be better namedclassinfo
to match the name of the second argument toisinstance
. Since it's an optional argument, it should default toobject
(rather thanNone
), so that if you don't specify it, the test passes rather than fails.