Message 60736 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Recipients
Author	jhylton
Date	2005年05月12日.02:30:55
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
The HTML spec describes two ways to encode an attribute value that contains a URI with an ampersand. http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2 >>> from HTMLParser import * >>> class P(HTMLParser): ... def handle_starttag(self, tag, attrs): ... print attrs ... >>> P().feed("<tag attr=\"&\">") [('attr', '&')] >>> P().feed("<tag attr=\"&\">") [('attr', '&')] It seems that each string should produce the same parsed value. I would hazard a guess that the easiest way to make this happen is to extend the current unescape() to unescape character references. Is there any reason not to do that? I'll provide a fix if that sounds like a reasonable answer.

Content

The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2
>>> from HTMLParser import *
>>> class P(HTMLParser):
... def handle_starttag(self, tag, attrs):
... print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
It seems that each string should produce the same
parsed value. I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references. Is there
any reason not to do that? I'll provide a fix if that
sounds like a reasonable answer.

History
Date	User	Action	Args
2008年01月20日 09:57:49	admin	link	issue1200313 messages
2008年01月20日 09:57:49	admin	create

homepage