[Python-checkins] cpython (3.3): #20288: fix handling of invalid numeric charrefs in HTMLParser.

ezio.melotti python-checkins at python.org
Sat Feb 1 20:23:13 CET 2014


http://hg.python.org/cpython/rev/32097f193892
changeset: 88887:32097f193892
branch: 3.3
parent: 88862:7a611b7aae38
user: Ezio Melotti <ezio.melotti at gmail.com>
date: Sat Feb 01 21:21:01 2014 +0200
summary:
 #20288: fix handling of invalid numeric charrefs in HTMLParser.
files:
 Lib/html/parser.py | 6 +++---
 Lib/test/test_htmlparser.py | 6 ++++++
 Misc/NEWS | 2 ++
 3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/Lib/html/parser.py b/Lib/html/parser.py
--- a/Lib/html/parser.py
+++ b/Lib/html/parser.py
@@ -228,9 +228,9 @@
 i = self.updatepos(i, k)
 continue
 else:
- if ";" in rawdata[i:]: #bail by consuming &#
- self.handle_data(rawdata[0:2])
- i = self.updatepos(i, 2)
+ if ";" in rawdata[i:]: # bail by consuming &#
+ self.handle_data(rawdata[i:i+2])
+ i = self.updatepos(i, i+2)
 break
 elif startswith('&', i):
 match = entityref.match(rawdata, i)
diff --git a/Lib/test/test_htmlparser.py b/Lib/test/test_htmlparser.py
--- a/Lib/test/test_htmlparser.py
+++ b/Lib/test/test_htmlparser.py
@@ -151,6 +151,12 @@
 ("data", "&#bad;"),
 ("endtag", "p"),
 ])
+ # add the [] as a workaround to avoid buffering (see #20288)
+ self._run_check(["<div>&#bad;</div>"], [
+ ("starttag", "div", []),
+ ("data", "&#bad;"),
+ ("endtag", "div"),
+ ])
 
 def test_unclosed_entityref(self):
 self._run_check("&entityref foo", [
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -45,6 +45,8 @@
 Library
 -------
 
+- Issue #20288: fix handling of invalid numeric charrefs in HTMLParser.
+
 - Issue #20424: Python implementation of io.StringIO now supports lone surrogates.
 
 - Issue #19456: ntpath.join() now joins relative paths correctly when a drive
-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list

AltStyle によって変換されたページ (->オリジナル) /