DOMDocument field class

Question 1

Here is the class:

class Field(object):
 """Class which contains logic of field-processing
 """
 def __init__(self, routes, callback=None, is_iterable=False,default=None):
 """
 Arguments:
 - `routes`: sequence of xPath routes to the field, in order of descending priority.
 - `callback`: callable handler of field
 - `is_iterable`: if there may be a more then one field with a such route in the xml Document
 """
 self._routes = routes
 self._callback = callback
 self._is_iterable = is_iterable
 self._default = default
 def _clean_up(self, r):
 """Cleans up the result of xpath in according to
 self._is_iterable value and some parameters qualiteies.
 Sometimes xpath method returns list containing one element, but field is not iterable, so we need not the list, but it's single element
 Arguments:
 - `r`: conversable to True result of lxml.etree._Element.xpath method.
 """
 if isinstance(r,list) and self._is_iterable:
 return map(self._callback,r) if callable(self._callback) else r
 if isinstance(r,list) and not self._is_iterable:
 if len(r) == 1:
 return self._callback(r.pop()) if callable(self._callback) else r.pop() 
 else:
 raise error.RuntimeError('Many instances of non-iterable field')
 else:
 return self._callback(r) if callable(self._callback) else r
 def get_value(self,xml,nsmap = namespaces):
 """Returns value of the field in passed document
 If you passed False value of is_iterable to construct, you get a SINGLE result or error, not a list
 if you passed True, you get LIST, which may contain one element.
 if the field was not found, you get None. Anyway.
 Arguments:
 - `xml`: lxml.etree._Element instance
 - `nsmap`: dict with namespaces for xpath.
 """
 if not etree.iselement(xml):
 raise error.InvalidArgumentError('Passed document is not valid lxml.etree Element');
 for route in self._routes:
 if xml.xpath(route):
 return self._clean_up(xml.xpath(route,namespaces=nsmap))
 else:
 return self._default

And there are its unit tests:

Example XML:

<xml version="1.0">
<root>
 <non-iterable>Non-ascii content.Привет.</non-iterable>
 <iterable>Раз</iterable>
 <iterable>Два</iterable>
 <iterable>Три</iterable>
</root>

and test script:

document = etree.parse('examples/document_for_common_module_testing.xml').getroot()
class TestGetFieldValueMethod(unittest.TestCase):
 """
 """
 def test_invalid_argument_exception(self):
 """An exception must be risen if passed document isn't 
 valid lxml.etree Element
 """
 f = common.Field(['/root/iterable/text()'])
 try:
 f.get_value('Not an element...')
 except error.InvalidArgumentError:
 return
 self.fail('There mus be an exception')
 def test_non_iterable_value(self):
 """Testing if returned value is correct
 """
 f = common.Field(['/root/non-iterable/text()'])
 val = f.get_value(document)
 self.assertEqual(val,u'Non-ascii content.Привет.')
 #Now a callback is added
 f = common.Field(['string(/root/non-iterable/text())'],
 callback=(lambda x:x.upper()))
 val = f.get_value(document)
 self.assertEqual(val,u'NON-ASCII CONTENT.ПРИВЕТ.')
 #Now a default value is added.
 f = common.Field(['/unexisting'],default='Lol')
 val = f.get_value(document)
 self.assertEqual(val,'Lol')
 def test_non_iterable_exception(self):
 """The error must be raisen if there are more then one
 non-iterable element
 """
 f = common.Field(['/root/iterable/text()'])
 try:
 f.get_value(document)
 except error.RuntimeError:
 return
 self.fail('There must be a runtime error')
 def test_iterable_values(self, ):
 """
 """
 f = common.Field(['/root/iterable/text()'],is_iterable=True)#callback=(lambda x:x.upper())
 val = f.get_value(document)
 self.assertEqual(val,[u'Раз',u'Два',u'Три'])
 f = common.Field(['/root/iterable/text()'],is_iterable=True,
 callback=(lambda x:x.upper()))
 val = f.get_value(document)
 self.assertEqual(val,[u'РАЗ',u'ДВА',u'ТРИ'])
 def test_unexisting_value(self):
 """
 """
 f = common.Field(['/unexisting/text()'])
 self.assertEqual(f.get_value(document),None)
 f = common.Field(['/unexisting/text()'],
 callback=(lambda x:x.upper()))
 self.assertEquals(f.get_value(document),None)

It is used for parsing XML documents with lxml library. For example I create

f=Field(['string(/root/non-iterable/text())'],callback=(lambda x: x.upper()))

then do

f.get_value(document)

and it returns /root/title/text() value where all chars are uppercase.

What do you think of this class?
Is this simple to understand how it works and what it does?
Do you think it is a good piece of code?

Question 2

You are only evaluating the first matching route, but the expected behavior for is_iterable=True and multiple matching routes would be to match all routes. If this was not the intention, distinguish between all 4 use cases:

Only use first matching route, only match 1
Use all routes, only match 1 per route
Only use first matching route, match all
Use all routes, match all

When is_iterable=True and a default value is set, your class will either return a list or a single value, but never None. Your comment is deceiving.
Also: If default is no list, should it be encapsulated in a list in such case?

With is_iterable=True, I would expect an empty list as default return value.

Ambiguous XPath should match the first occurrence, but not throw if multiple are matched. Validating the XML is not your area of responsibility. Your current implementation will even throw an error when the XML is well formed according to an attached, authoritative XSD.

Unexpected mismatch in behavior between "multiple routes match once" and "single route matches multiple times". In the first case, you get the first result. In the second case you are throwing a runtime error.

Should the combination of default set and callback set apply the callback to the default value as well or return the plain default value? When the callback is supposed to consume default, and is_iterable=True is set, what is the expected behavior?

Why is the method called _clean_up? It's not cleaning up anything, it's processing the return value of the xpath call. Naming the parameter r isn't optimal either. The rule of choosing descriptive names applies to parameters of internal helper methods as well.

Question 3

Inaugurating your entry to the site with a very helpful answer. Welcome fellow CodeReviewer! I hope you enjoy your time here.

Ext3h Ext3h 2,83913 silver badges17 bronze badges · Answer 1 · 2015-03-26 12:12:09Z

You are only evaluating the first matching route, but the expected behavior for is_iterable=True and multiple matching routes would be to match all routes. If this was not the intention, distinguish between all 4 use cases:

Only use first matching route, only match 1
Use all routes, only match 1 per route
Only use first matching route, match all
Use all routes, match all

When is_iterable=True and a default value is set, your class will either return a list or a single value, but never None. Your comment is deceiving.
Also: If default is no list, should it be encapsulated in a list in such case?

With is_iterable=True, I would expect an empty list as default return value.

Ambiguous XPath should match the first occurrence, but not throw if multiple are matched. Validating the XML is not your area of responsibility. Your current implementation will even throw an error when the XML is well formed according to an attached, authoritative XSD.

Unexpected mismatch in behavior between "multiple routes match once" and "single route matches multiple times". In the first case, you get the first result. In the second case you are throwing a runtime error.

Should the combination of default set and callback set apply the callback to the default value as well or return the plain default value? When the callback is supposed to consume default, and is_iterable=True is set, what is the expected behavior?

Why is the method called _clean_up? It's not cleaning up anything, it's processing the return value of the xpath call. Naming the parameter r isn't optimal either. The rule of choosing descriptive names applies to parameters of internal helper methods as well.

Inaugurating your entry to the site with a very helpful answer. Welcome fellow CodeReviewer! I hope you enjoy your time here.

Stack Exchange Network

DOMDocument field class

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

DOMDocument field class

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions