This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004年10月13日 10:11 by nnseva, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| HTMLParser.py.patch | nnseva, 2004年10月15日 06:27 | This is a patch | ||
| html.parser.diff | svilend, 2011年05月05日 10:34 | patch to limit nonstrict-regexp from eating too much | ||
| test-htmlparser-attrs.py | svilend, 2011年05月05日 10:35 | test with unquoted attribtues | ||
| Messages (11) | |||
|---|---|---|---|
| msg22675 - (view) | Author: Vsevolod Novikov (nnseva) | Date: 2004年10月13日 10:11 | |
This is a patch to fix bugs #975556 and #921657. I think, it should be made just because the parser should accept as many pages as it can. At the other hand, the code near to fixed contains regexp to accept mailformed attributes in other cases: compare attrfind variable and locatestarttagend variable values. |
|||
| msg22676 - (view) | Author: Johannes Gijsbers (jlgijsbers) * (Python triager) | Date: 2004年10月13日 11:09 | |
Logged In: YES user_id=469548 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) |
|||
| msg22677 - (view) | Author: Vsevolod Novikov (nnseva) | Date: 2004年10月15日 06:27 | |
Logged In: YES user_id=325678 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) |
|||
| msg22678 - (view) | Author: Vsevolod Novikov (nnseva) | Date: 2004年10月15日 06:27 | |
Logged In: YES user_id=325678 Missed patch, sorry ... |
|||
| msg81692 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2009年02月11日 23:57 | |
Heh, the patch applies cleanly to trunk more than four years later and tests pass fine. We'll surely need better tests if the behavior change is considered an improvement. |
|||
| msg114333 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年08月19日 07:18 | |
The patch is a one line change to a compiled regex. Would someone with html and/or regex knowledge like to comment, thanks, as I've no idea as to the implications. I also agree with comments in msg81692 regarding better unit tests. Please don't ask me! :) |
|||
| msg121677 - (view) | Author: Neil Muller (Neil Muller) | Date: 2010年11月20日 16:31 | |
I think this change is makes the parser far too lenient. Something like the explicit tolerant mode proposed in #1486713 is a better solution. |
|||
| msg123176 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2010年12月03日 04:17 | |
Included this in the 'strict=False' mode in the issue 1486713 patch. |
|||
| msg135179 - (view) | Author: svilen dobrev (svilend) | Date: 2011年05月05日 10:34 | |
this seems to eat too much into data and gets past endpos of the chunk processed, and parser gets confused and treats any subsequent stuff as data. i didn't think out how to fix the regexp as such, but instead limited its span to :endpos so it doesnot eat too much. seems to happen with unquoted attributes. |
|||
| msg135180 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年05月05日 10:44 | |
This issue is closed, so it's better if you create a new issue. Even better if you can attach a patch that adds a testcase to Lib/test/test_htmlparser.py |
|||
| msg135701 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年05月10日 14:06 | |
For the record, the new issue is #12008. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:07 | admin | set | github: 41013 |
| 2011年05月10日 14:06:07 | ezio.melotti | set | messages: + msg135701 |
| 2011年05月05日 10:44:16 | ezio.melotti | set | nosy:
+ ezio.melotti messages: + msg135180 |
| 2011年05月05日 10:35:01 | svilend | set | files: + test-htmlparser-attrs.py |
| 2011年05月05日 10:34:11 | svilend | set | files:
+ html.parser.diff nosy: + svilend messages: + msg135179 |
| 2010年12月03日 04:24:57 | r.david.murray | set | title: HTMLParser fix to accept mailformed tag attributes -> HTMLParser fix to accept malformed tag attributes |
| 2010年12月03日 04:17:17 | r.david.murray | set | status: open -> closed nosy: + r.david.murray, - BreamoreBoy messages: + msg123176 resolution: accepted superseder: HTMLParser : A auto-tolerant parsing mode stage: patch review -> resolved |
| 2010年11月20日 16:31:00 | Neil Muller | set | nosy:
+ Neil Muller messages: + msg121677 |
| 2010年08月19日 07:18:41 | BreamoreBoy | set | versions:
+ Python 3.2, - Python 2.7 nosy: + BreamoreBoy messages: + msg114333 stage: test needed -> patch review |
| 2009年04月22日 18:49:57 | ajaksu2 | set | keywords:
+ patch, easy stage: test needed |
| 2009年02月11日 23:57:49 | ajaksu2 | set | nosy:
+ ajaksu2 messages: + msg81692 |
| 2009年02月09日 06:13:35 | ajaksu2 | set | type: enhancement versions: + Python 2.7, - Python 2.3 |
| 2004年10月13日 10:11:24 | nnseva | create | |