Message60736
| Author |
jhylton |
| Recipients |
| Date |
2005年05月12日.02:30:55 |
| SpamBayes Score |
| Marked as misclassified |
| Message-id |
| In-reply-to |
| Content |
The HTML spec describes two ways to encode an attribute
value that contains a URI with an ampersand.
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2
>>> from HTMLParser import *
>>> class P(HTMLParser):
... def handle_starttag(self, tag, attrs):
... print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
It seems that each string should produce the same
parsed value. I would hazard a guess that the easiest
way to make this happen is to extend the current
unescape() to unescape character references. Is there
any reason not to do that? I'll provide a fix if that
sounds like a reasonable answer.
|
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2008年01月20日 09:57:49 | admin | link | issue1200313 messages |
| 2008年01月20日 09:57:49 | admin | create |
|