homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: HTMLParser fix to accept malformed tag attributes
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder: HTMLParser : A auto-tolerant parsing mode
View: 1486713
Assigned To: Nosy List: Neil Muller, ajaksu2, ezio.melotti, jlgijsbers, nnseva, r.david.murray, svilend
Priority: normal Keywords: easy, patch

Created on 2004年10月13日 10:11 by nnseva, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
HTMLParser.py.patch nnseva, 2004年10月15日 06:27 This is a patch
html.parser.diff svilend, 2011年05月05日 10:34 patch to limit nonstrict-regexp from eating too much
test-htmlparser-attrs.py svilend, 2011年05月05日 10:35 test with unquoted attribtues
Messages (11)
msg22675 - (view) Author: Vsevolod Novikov (nnseva) Date: 2004年10月13日 10:11
This is a patch to fix bugs #975556 and #921657.
I think, it should be made just because the parser
should accept as many pages as it can. At the other
hand, the code near to fixed contains regexp to accept
mailformed attributes in other cases: compare attrfind
variable and locatestarttagend variable values.
msg22676 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004年10月13日 11:09
Logged In: YES 
user_id=469548
There's no uploaded file! You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.
Please try again.
(This is a SourceForge annoyance that we can do
nothing about. :-( )
msg22677 - (view) Author: Vsevolod Novikov (nnseva) Date: 2004年10月15日 06:27
Logged In: YES 
user_id=325678
There's no uploaded file! You have to check the
checkbox labeled "Check to Upload & Attach File"
when you upload a file.
Please try again.
(This is a SourceForge annoyance that we can do
nothing about. :-( )
msg22678 - (view) Author: Vsevolod Novikov (nnseva) Date: 2004年10月15日 06:27
Logged In: YES 
user_id=325678
Missed patch, sorry ...
msg81692 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009年02月11日 23:57
Heh, the patch applies cleanly to trunk more than four years later and
tests pass fine. We'll surely need better tests if the behavior change
is considered an improvement.
msg114333 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年08月19日 07:18
The patch is a one line change to a compiled regex. Would someone with html and/or regex knowledge like to comment, thanks, as I've no idea as to the implications. I also agree with comments in msg81692 regarding better unit tests. Please don't ask me! :)
msg121677 - (view) Author: Neil Muller (Neil Muller) Date: 2010年11月20日 16:31
I think this change is makes the parser far too lenient. Something like the explicit tolerant mode proposed in #1486713 is a better solution.
msg123176 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010年12月03日 04:17
Included this in the 'strict=False' mode in the issue 1486713 patch.
msg135179 - (view) Author: svilen dobrev (svilend) Date: 2011年05月05日 10:34
this seems to eat too much into data and gets past endpos of the chunk processed, and parser gets confused and treats any subsequent stuff as data. i didn't think out how to fix the regexp as such, but instead limited its span to :endpos so it doesnot eat too much. 
seems to happen with unquoted attributes.
msg135180 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年05月05日 10:44
This issue is closed, so it's better if you create a new issue.
Even better if you can attach a patch that adds a testcase to Lib/test/test_htmlparser.py 
msg135701 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年05月10日 14:06
For the record, the new issue is #12008.
History
Date User Action Args
2022年04月11日 14:56:07adminsetgithub: 41013
2011年05月10日 14:06:07ezio.melottisetmessages: + msg135701
2011年05月05日 10:44:16ezio.melottisetnosy: + ezio.melotti
messages: + msg135180
2011年05月05日 10:35:01svilendsetfiles: + test-htmlparser-attrs.py
2011年05月05日 10:34:11svilendsetfiles: + html.parser.diff
nosy: + svilend
messages: + msg135179

2010年12月03日 04:24:57r.david.murraysettitle: HTMLParser fix to accept mailformed tag attributes -> HTMLParser fix to accept malformed tag attributes
2010年12月03日 04:17:17r.david.murraysetstatus: open -> closed
nosy: + r.david.murray, - BreamoreBoy
messages: + msg123176
resolution: accepted

superseder: HTMLParser : A auto-tolerant parsing mode
stage: patch review -> resolved
2010年11月20日 16:31:00Neil Mullersetnosy: + Neil Muller
messages: + msg121677
2010年08月19日 07:18:41BreamoreBoysetversions: + Python 3.2, - Python 2.7
nosy: + BreamoreBoy

messages: + msg114333

stage: test needed -> patch review
2009年04月22日 18:49:57ajaksu2setkeywords: + patch, easy
stage: test needed
2009年02月11日 23:57:49ajaksu2setnosy: + ajaksu2
messages: + msg81692
2009年02月09日 06:13:35ajaksu2settype: enhancement
versions: + Python 2.7, - Python 2.3
2004年10月13日 10:11:24nnsevacreate

AltStyle によって変換されたページ (->オリジナル) /