Skip to main content
Code Review

Return to Answer

Bounty Awarded with 100 reputation awarded by Community Bot
fix typo
Source Link
Quentin Pradet
  • 7.1k
  • 1
  • 25
  • 44

You can use lxml XPath support here if you want to avoid the loop: meta.xpath("meta[name = $name]$name]", name=name)".

You can use lxml XPath support here if you want to avoid the loop: meta.xpath("meta[name = $name], name=name)".

You can use lxml XPath support here if you want to avoid the loop: meta.xpath("meta[name = $name]", name=name).

Source Link
Quentin Pradet
  • 7.1k
  • 1
  • 25
  • 44

Boilerplate

class etree(old_etree._ElementTree):
 @staticmethod
 def parse(path): 
 old_tdx = old_etree.parse(path)
 new_tdx = etree(old_tdx)
 new_tdx._setroot(old_tdx.getroot())
 return new_tdx
 @staticmethod
 def fromstring(string):
 old_tdx = old_etree.fromstring(string)
 new_tdx = etree(old_tdx.getroottree())
 new_tdx._setroot(old_tdx)
 return new_tdx

In those boilerplate functions, I don't know what "tdx" means. I guess it's linked to the file specification you use.

Those functions are the main problem I have with your code. You mixed the lxml.etree module with the lxml.etree.ElementTree class. Those four methods are methods of the module, not the class! I think you should:

  1. have a real etree module
  2. put your current etree class (renamed as ElementTree) in it
  3. add those functions in the module, not the class
  4. fix the return type to return what lxml returns

This would make things way less confusing because it would be closer to how lxml.etree and the standard xml.etree work. Your coworkers would be able to pick your code faster.

You would do from lxml.etree import * and only redefine the functions you want to redefine. Your probably can't expect a perfect compatibility, but at least the basic API would be the same.

I don't have much to say about the implementation of the functions apart from the return type: ugly but I did not find a way to make things better. And yes, it's quite easy to understand them.

 @staticmethod
 def SubElement(*args, **kargs):
 return old_etree.SubElement(*args, **kargs)
 
 @staticmethod
 def Element(*args, **kargs):
 return old_etree.Element(*args, **kargs)

You no longer would need do to this with my proposal above.

getISOTime

 @staticmethod
 def getISOTime():
 iso_time = datetime.now().isoformat()
 format_time = iso_time.split('.')[0] + 'Z'
 return format_time
 

Adding the 'Z' means you're saying this is an UTC time, but it's not since you've used datetime.now() instead of datetime.utcnow(). I live in UTC+4, and the format_time you're returning is 4 hours off for me.

getMetaNode

 def getMetaNode(self, name, default=''):
 metas = self.findall('meta')
 for meta in metas:
 if meta.get('name') == name:
 return meta

You can use lxml XPath support here if you want to avoid the loop: meta.xpath("meta[name = $name], name=name)".

Write

 def write(self, path, updateTool=True): 
 self.setMetaNode('saved', etree.getISOTime())
 
 if updateTool:
 self.setMetaNode('version', VERSION)
 self.setMetaNode('type', TOOL)
 
 super(etree, self).write(path)
 clean_xml(path)

nitpick: Cleaning at the ElementTree would prevent your filesystem to see two different versions. Say at some point you decide to use watchdog, the callback will kick in before you have a chance to run clean_xml, which could cause subtle bugs.

Answering your questions

  1. Is there a better way to extend the functionality of lxml.etree than inheriting from _ElementTree? Ultimately, I want to be changing the behavior of the method ElementTree.write and to add the methods ElementTree.getMetaNode and ElementTree.setMetaNode.

Since lxml is written in Cython, I think you can't monkeypatch lxml directly anyway, so subclassing is the way to go. lxml could have chosen to make things easier by using __new__ just like numpy does for facilitating ndarray subclassing.

  1. Is there a better way of incorporating the methods parse, fromstring, SubElement, and Element, rather than just creating thin wrappers that reference the original methods?

See "Boilerplate" above.

  1. Does anyone understand what my methods parse and fromstring are doing? I know that my method fromstring has different functionality from lxml.etree.fromstring because I want it to return an ElementTree rather than an Element. These methods were written largely through guess-and-check (oops).

Why do you want to get an ElementTree rather than an Element? I think it's a bad idea for two reasons:

  1. It's very easy to get an ElementTree from an Element.
  2. It's already hard enough to reason about the type of object you get when using lxml, if we can't use our knowledge about ElementTree/lxml it's only going to get worse.
  1. Is there a better way to write the return old_etree.SubElement(...) line so it doesn't have to take up two lines? This isn't major, but...

If anything, I think I would use more than two lines. If you want to stop worrying about such things, I highly recommand yapf.

Originally, I tried to just overwrite lxml.etree.ElementTree.write directly, but that throws the error AttributeError: 'lxml.etree._ElementTree' object attribute 'write' is read-only. General critiques are also welcomed. I know that I should have comments in here, but everything is pretty straight-forward except for the two methods that I don't understand.

Yep, you can't monkeypatch it (as said in answer to 1.), but that's not what you wanted to do, since you wanted to keep the original implementation too. And a decorator would not have improved things.

Naming

Also, I'm not sure if anyone is going to think that using the name etree in order to overwrite lxml.etree is a bad idea. The reason for this is so that I only have to change the import line at the top of any previously written file from from lxml import etree to from my_lxml import etree.

Nope, I think using the name etree is a good idea. It's a common way to provide replacements in Python, and this is way the standard library xml.etree does things. However, you need to do things correctly. Again, see the "Boilerplate" section above.

Oh, and PEP 8 recommends against underscores in module names unless it improves readability, but I think mylxml is readable enough. However, simply adding my in front of a module is not such a good idea, because you just lost an opportunity to explain what makes your lxml different. Since it seems specific to the file specification you use, why not use that in the name?

default

AltStyle によって変換されたページ (->オリジナル) /