homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Handling of broken markup in HTMLParser on 2.7
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: benjamin.peterson, eli.bendersky, eric.araujo, ezio.melotti, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2012年02月10日 13:45 by ezio.melotti, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue13987.diff ezio.melotti, 2012年02月10日 13:45 First patch against 2.7.
Messages (5)
msg153043 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年02月10日 13:45
The attached patch fixes a few problems with HTMLParser on 2.7.
Instead of raising error when invalid markup is detected, the parser now consumes the invalid input and proceeds. This patch is a partial backport of #1486713.
After this two more patches will follow.
The first will get rid of errors raised while parsing declarations and should also solve #13576:
 def unknown_decl(self, data):
- self.error("unknown declaration: %r" % (data,))
+ pass
The second will take care of "bogus comments" (see #13960).
Once this is done HTMLParser should be able to parse (almost) everything. I'm planning to commit this before the release of 2.7.3.
msg153100 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年02月11日 05:28
LGTM, http://shipitsquirrel.github.com/ 
msg153398 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年02月15日 10:44
New changeset 11a31eb5da93 by Ezio Melotti in branch '2.7':
#13987: HTMLParser is now able to handle EOFs in the middle of a construct.
http://hg.python.org/cpython/rev/11a31eb5da93 
msg153399 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年02月15日 11:19
New changeset 3d7904e3f4b9 by Ezio Melotti in branch '2.7':
#13987: HTMLParser is now able to handle malformed start tags.
http://hg.python.org/cpython/rev/3d7904e3f4b9 
msg153400 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012年02月15日 11:27
This should be fixed now.
The first two chunks of the attached patch have been committed in the two changesets linked in the previous messages. The third chunk about the end tag has been fixed as part of #13933. The error previously raised by unknown_decl has been removed in 4743a3a1e669. More fixes have been backported as part of #13960.
2.7 should now behave like 3.2 non-strict.
History
Date User Action Args
2022年04月11日 14:57:26adminsetgithub: 58195
2012年02月15日 11:27:15ezio.melottisetstatus: open -> closed
resolution: fixed
messages: + msg153400

stage: patch review -> resolved
2012年02月15日 11:19:30python-devsetmessages: + msg153399
2012年02月15日 10:44:35python-devsetnosy: + python-dev
messages: + msg153398
2012年02月11日 05:28:41eric.araujosetmessages: + msg153100
2012年02月10日 13:46:39eli.benderskysetnosy: + eli.bendersky
2012年02月10日 13:45:58ezio.melotticreate

AltStyle によって変換されたページ (->オリジナル) /