homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jhylton
Recipients
Date 2005年05月12日.02:30:55
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2
>>> from HTMLParser import *
>>> class P(HTMLParser):
... def handle_starttag(self, tag, attrs):
... print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
It seems that each string should produce the same
parsed value. I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references. Is there
any reason not to do that? I'll provide a fix if that
sounds like a reasonable answer.
History
Date User Action Args
2008年01月20日 09:57:49adminlinkissue1200313 messages
2008年01月20日 09:57:49admincreate

AltStyle によって変換されたページ (->オリジナル) /