This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003年06月17日 02:27 by smroid, last changed 2022年04月10日 16:09 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| patch.txt | smroid, 2003年06月17日 02:28 | |||
| htmlparser_error.diff | ajaksu2, 2009年02月12日 00:14 | Steven's patch updated to trunk | review | |
| parser.diff | BreamoreBoy, 2010年08月18日 13:52 | review | ||
| Messages (8) | |||
|---|---|---|---|
| msg44029 - (view) | Author: Steven Rosenthal (smroid) | Date: 2003年06月17日 02:27 | |
The HTMLParser.error method raises HTMLParseError, terminating the parse upon detection of a parse error. This patch is to allow HTMLParser to continue parsing if the error() method is overridden to not throw an exception. Doc impact is on the error() method. The existing test_htmlparser.py unit test is unaffected by the patch. The base file is HTMLParser.py, revision 1.11.2.1 |
|||
| msg44030 - (view) | Author: Steven Rosenthal (smroid) | Date: 2003年06月18日 03:13 | |
Logged In: YES user_id=159908 this fixes bug #736428 (submitted by me earlier) |
|||
| msg44031 - (view) | Author: Titus Brown (titus) | Date: 2004年12月19日 00:45 | |
Logged In: YES user_id=23486 This patch allows developers to override the behavior of HTMLParser when parsing malformed HTML. Normally HTMLParser calls the function self.error(), which raises an exception. This patch adds appropriate return values for situations where self.error has been redefined in subclasses to *not* raise an exception. It does not change the default behavior of HTMLParser and so presents no backwards compatibility issues. The patch itself consists of an added comment and two added lines of code that call 'return' with appropriate values after a self.error call. Nothing wrong with 'em. I can't verify that the "junk characters" error call will leave the parser in a good state, though, if execution returns from error(). The library documentation could be updated to reflect the ability to override error() behavior; I've written a short patch, available at http://issola.caltech.edu/~t/transfer/HTMLParser-doc-error.patch More problems exist with markupbase.py, upon which HTMLParser is based. markupbase calls error() as well, and has some stickier situations. See comments in bug 917188 as well. Comments in 683938 and 699079 suggest that raising an exception is the correct response to the parse errors. I recommend application of the patch anyway, because it (a) doesn't change any behavior by default and (b) may solve some problems for people. An alternative would be to distinguish between unrecoverable errors and recoverable errors by having two different functions, e.g. error() (for recoverable errors) and _fail() (for unrecoverable errors). By default error() would call _fail() and internal code could be changed to call _fail() where recovery is impossible. This might alter behavior in situations where subclasses override error() but then again that's not legitimate to do anyway, at least not at the moment -- error() isn't in the docs ;). If nothing done, at least close patch 755660 and bug 736428 with a comment saying that this behavior will not be addressed ;). |
|||
| msg81693 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2009年02月12日 00:14 | |
Tests still pass with updated patch, but new tests (and docs!) for this feature are needed if Titus' positive review stands. |
|||
| msg95107 - (view) | Author: Francesco Frassinelli (frafra) | Date: 2009年11月10日 12:16 | |
I'm using Python 3.1.1 and the patch (patch.txt, provided by smroid) works very well. It's usefull, and I really need it, thanks :) Without this patch, I can't parse: http://ftp.vim.org/pub/vim/ (due to a fake tag, like "<user@mail.com>"), and many others websites. I hope this patch will be merged in Python 3.2 :) |
|||
| msg95109 - (view) | Author: Francesco Frassinelli (frafra) | Date: 2009年11月10日 12:47 | |
Site: http://ftp.vim.org/pub/vim/unstable/patches/ Outuput without error customized function: [...] File "./takeit.py", line 54, in inspect parser.feed(data.read().decode()) File "/home/frafra/Scrivania/takeit/html/parser.py", line 107, in feed self.goahead(0) File "/home/frafra/Scrivania/takeit/html/parser.py", line 163, in goahead k = self.parse_declaration(i) File "/usr/local/lib/python3.1/_markupbase.py", line 97, in parse_declaration decltype, j = self._scan_name(j, i) File "/usr/local/lib/python3.1/_markupbase.py", line 387, in _scan_name % rawdata[declstartpos:declstartpos+20]) File "/home/frafra/Scrivania/takeit/html/parser.py", line 122, in error raise HTMLParseError(message, self.getpos()) html.parser.HTMLParseError: expected name token at '<! gives an error me', at line 153, column 48 Output with error customized function: [...] File "./takeit.py", line 55, in inspect parser.feed(data.read().decode()) File "/home/frafra/Scrivania/takeit/html/parser.py", line 107, in feed self.goahead(0) File "/home/frafra/Scrivania/takeit/html/parser.py", line 163, in goahead k = self.parse_declaration(i) File "/usr/local/lib/python3.1/_markupbase.py", line 97, in parse_declaration decltype, j = self._scan_name(j, i) TypeError: 'NoneType' object is not iterable |
|||
| msg114219 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年08月18日 13:52 | |
Attached a patch for py3k where the file name has changed. Doc changes could be based on the comment added to the error method in the patch. I don't think a unit test is needed but could easily be persuaded otherwise. |
|||
| msg158787 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2012年04月20日 00:33 | |
HTMLParser should now be able to parse invalid HTML too, so this patch is not necessary anymore. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月10日 16:09:15 | admin | set | github: 38663 |
| 2012年04月20日 00:33:01 | ezio.melotti | set | status: open -> closed assignee: ezio.melotti versions: + Python 3.3, - Python 3.2 nosy: + ezio.melotti messages: + msg158787 resolution: out of date stage: patch review -> resolved |
| 2011年11月14日 17:07:52 | ezio.melotti | unlink | issue755670 dependencies |
| 2010年08月18日 13:52:52 | BreamoreBoy | set | files:
+ parser.diff versions: + Python 3.2, - Python 2.7 keywords: + patch nosy: + BreamoreBoy messages: + msg114219 stage: test needed -> patch review |
| 2009年11月10日 12:47:08 | frafra | set | messages: + msg95109 |
| 2009年11月10日 12:17:00 | frafra | set | nosy:
+ frafra messages: + msg95107 |
| 2009年04月22日 18:49:51 | ajaksu2 | set | keywords: + easy, - patch |
| 2009年04月05日 18:45:17 | georg.brandl | link | issue736428 superseder |
| 2009年02月12日 03:01:19 | ajaksu2 | link | issue755670 dependencies |
| 2009年02月12日 00:15:27 | ajaksu2 | set | type: enhancement |
| 2009年02月12日 00:15:02 | ajaksu2 | set | files:
+ htmlparser_error.diff nosy: + ajaksu2 stage: test needed messages: + msg81693 versions: + Python 2.7, - Python 2.3 |
| 2003年06月17日 02:27:45 | smroid | create | |