This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年08月24日 08:18 by nemeskeyd, last changed 2022年04月11日 14:57 by admin.
| Messages (9) | |||
|---|---|---|---|
| msg168980 - (view) | Author: Dávid Nemeskey (nemeskeyd) | Date: 2012年08月24日 08:18 | |
The C expat library provides XML_StopParser() method that allows the parsing to be stopped from the handler functions. It would be nice to have this option in Python as well, maybe by adding StopParser() method to the XMLParser class. |
|||
| msg169207 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年08月27日 16:22 | |
If a handler function raises an exception, the Parse() method exits and the exception is propagated; internally, this also calls XML_StopParser(). Why would one call XML_StopParser() explicitely? |
|||
| msg169255 - (view) | Author: Dávid Nemeskey (nemeskeyd) | Date: 2012年08月28日 08:17 | |
OK, then this issue has a "bug" part, too: it is not mentioned in the documentation that exceptions from the handler methods propagate through the Parse() method. I guess the parser can be then stopped in this way too, but it is a dirty method as opposed to calling StopParser(). To answer your question, there are several situations where StopParser() could come in handy. For instance, the XML file might contain records (such as the output of a search engine), from which we only need the first n. Another example would be that reading through the file we realize halfway that e.g. it does not contain the information we need, contains wrong information, etc. so we want to skip the rest of it. Since the file might be huge and since XML parsing can in now way be considered fast, being able to stop the parsing in a clear way would spare the superfluous and possible lengthy computation. |
|||
| msg169281 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月28日 13:39 | |
nemeskeyd: would you like to work on a patch (for Python 3.4)? |
|||
| msg169285 - (view) | Author: Dávid Nemeskey (nemeskeyd) | Date: 2012年08月28日 15:34 | |
loewis: I don't think it would be difficult to fix, so theoretically I'd be in. However, I don't really have the time to work on this right now. |
|||
| msg169879 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年09月05日 16:33 | |
Below is a sample script that shows that it's possible to stop parsing XML in the middle, without an explicit call to XML_StopParser(): raise StopParsing from any handler, and catch it around the Parse() call. This method covers the two proposed use cases. Do we need another way to do it? import xml.parsers.expat class StopParsing(Exception): pass def findFirstElementByName(data, what): def end_element(name): if name == what: raise StopParsing(name) p = xml.parsers.expat.ParserCreate() p.EndElementHandler = end_element try: p.Parse(data, True) except StopParsing as e: print "Element found:", e else: print "Element not found" data = """<?xml version="1.0"?> <parent id="top"><child1 name="paul">Text goes here</child1> <child2 name="fred">More text</child2> </parent>""" findFirstElementByName(data, "child2") # Found findFirstElementByName(data, "child3") # Not found |
|||
| msg169905 - (view) | Author: Dávid Nemeskey (nemeskeyd) | Date: 2012年09月06日 07:28 | |
Amaury: see my previous comment. There are two problems with the method you proposed: 1. It is not mentioned in the documentation that exceptions are propagated through parse(). 2. Exceptions usually mean that an error has happened, and is not the preferred way for flow control (at least this is the policy in other languages e.g. Java, I don't know about Python). |
|||
| msg169906 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年09月06日 09:30 | |
Your first point is true, even if the Python zen (try "import this") states that "Errors should never pass silently." For your second point: exceptions are a common thing in Python code. This is similar to the EAFP principle http://docs.python.org/glossary.html#term-eafp Also, this example http://docs.python.org/release/2.7.3/library/imp.html#examples shows that exceptions can be part of the normal flow control. |
|||
| msg169913 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月06日 11:26 | |
Dávid: Another (similar) example is the Python for loop. In it's original form, it would increase an index and invoke __getitem__ until that *raised* IndexError. In the current definition, it converts the iterated-over object into an iterator, and keeps calling .next until that *raises* StopIteration. So raising an exception to indicate that something is finished is an established Python idiom. In any case, I still think adding StopParser is a useful addition, in particular since that would also allow giving True as the "resumable" argument. Any such change needs to be accompanied by also exposing XML_ResumeParser, and possibly XML_GetParsingStatus. Since we all agree that this is not an important change, I don't mind keeping this issue around until someone comes along to contribute code for it. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:35 | admin | set | github: 59979 |
| 2012年09月06日 11:26:29 | loewis | set | messages:
+ msg169913 title: Add StopParser() to expat -> Add StopParser(), ResumeParser, and GetParsingStatus to expat |
| 2012年09月06日 09:30:44 | amaury.forgeotdarc | set | nosy:
+ docs@python messages: + msg169906 assignee: docs@python components: + Documentation |
| 2012年09月06日 07:28:31 | nemeskeyd | set | messages: + msg169905 |
| 2012年09月05日 16:33:22 | amaury.forgeotdarc | set | messages: + msg169879 |
| 2012年08月31日 21:29:01 | berker.peksag | set | versions: + Python 3.4 |
| 2012年08月28日 15:34:58 | nemeskeyd | set | messages: + msg169285 |
| 2012年08月28日 13:39:37 | loewis | set | nosy:
+ loewis messages: + msg169281 |
| 2012年08月28日 08:17:04 | nemeskeyd | set | messages: + msg169255 |
| 2012年08月27日 16:22:20 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg169207 |
| 2012年08月24日 08:18:17 | nemeskeyd | create | |