2

Using Python 3.2, I attempted the example straight from the html.parser documentation:

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
 def handle_starttag(self, tag, attrs):
 print("Encountered a start tag:", tag)
 def handle_endtag(self, tag):
 print("Encountered an end tag :", tag)
 def handle_data(self, data):
 print("Encountered some data :", data)
parser = MyHTMLParser(strict=False)
parser.feed('<html><head><title>Test</title></head>'
 '<body><h1>Parse me!</h1></body></html>')

Instead of getting the result shown on the documentation i get:

Encountered some data : <html>
Encountered some data : <head>
Encountered some data : <title>
Encountered some data : Test
Encountered an end tag : title
Encountered an end tag : head
Encountered some data : <body>
Encountered some data : <h1>
Encountered some data : Parse me!
Encountered an end tag : h1
Encountered an end tag : body
Encountered an end tag : html

For some reason, it treats some tags as data BUT only if strict=False. If strict=True i get the correct result:

Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data : Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
Encountered a start tag: h1
Encountered some data : Parse me!
Encountered an end tag : h1
Encountered an end tag : body
Encountered an end tag : html
Cairnarvon
28.2k9 gold badges55 silver badges66 bronze badges
asked Feb 18, 2012 at 15:59
1
  • This seems like a bug: at least the documentation should be changed. Consider filing it: bugs.python.org Commented Feb 18, 2012 at 16:27

1 Answer 1

2

This was a bug that has been fixed (http://bugs.python.org/issue13273). actually when you look at http://hg.python.org/cpython/log/9ce5d456138b/Lib/html/parser.py, there is a whole lot of log messages about problems with Strict=False; it almost feels like this should still be considered beta.

If you take the most recent version of the file (http://hg.python.org/cpython/raw-file/9ce5d456138b/Lib/html/parser.py) and use that, at least the example from the documentation works again. Still, personally I would be a bit weary for trusting Strict=False to work in "critical applications" at the moment.

answered Feb 18, 2012 at 17:43
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.