homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: entity unescape for sgml/htmllib
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: duplicate
Dependencies: Superseder: expose html.parser.unescape
View: 2927
Assigned To: ezio.melotti Nosy List: BreamoreBoy, ezio.melotti, fdrake, glchapman
Priority: normal Keywords: easy

Created on 2002年02月06日 17:55 by glchapman, last changed 2022年04月10日 16:04 by admin. This issue is now closed.

Messages (4)
msg61076 - (view) Author: Greg Chapman (glchapman) Date: 2002年02月06日 17:55
The parsers defined in htmllib and sgmllib do not 
provide any facilities for unescaping a tag attribute 
which has an embedded html entityref (i.e., they do 
not provide a way to convert "a&b" to "a&b"). The 
parser in HTMLParser unescapes all tag attributes 
automatically. I'm not sure that's the right approach 
for sgmllib and htmllib (since it might break existing 
code), but it seems to me that one of the modules 
ought to provide a function or method which can do the 
unescaping if needed. (I'm not familiar with either 
the SGML or the HTML specification, but I assume one 
of them mandates the escaping of '&' (e.g.) in tag 
attributes. If so, then it seems appropriate for one 
of the modules to provide a function which undoes the 
mandated transformation.)
msg61077 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2006年06月22日 03:57
Logged In: YES 
user_id=3066
This request is making me reconsider some other changes that
have already been made on the trunk (and are now in 2.5b1).
Reading this, I thought "Doesn't it already do that?" Turns
out that in Python 2.4, it doesn't. Both versions handle
this in parsed character data; the difference is confined to
attribute values.
I'd like to propose adding a Boolean configuration attribute
on the parser instance that, when set, causes the parser to
decode entity and character references. By default, it
would be unset. This would support backward compatibility
and make it easier to get attribute value decoding.
Another possibility would be to revert the new feature and
add a separate method to perform the decoding.
msg114175 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年08月17日 21:41
Is anyone aware if this was implemented in 2.5 or later as hinted at in msg61077? If yes please close this. If no any point in putting this into 3.2?
msg185129 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013年03月24日 11:33
See also #2927.
History
Date User Action Args
2022年04月10日 16:04:57adminsetgithub: 36039
2013年11月18日 09:54:25ezio.melottisetstatus: open -> closed
assignee: ezio.melotti
superseder: expose html.parser.unescape
resolution: duplicate
stage: test needed -> resolved
2013年03月24日 11:33:06ezio.melottisetmessages: + msg185129
versions: + Python 3.4, - Python 3.2
2013年03月23日 22:22:01ezio.melottisetnosy: + ezio.melotti
2010年08月17日 21:41:06BreamoreBoysetnosy: + BreamoreBoy

messages: + msg114175
versions: + Python 3.2, - Python 2.7
2009年02月12日 20:03:12ajaksu2setkeywords: + easy
stage: test needed
versions: + Python 2.7
2002年02月06日 17:55:02glchapmancreate

AltStyle によって変換されたページ (->オリジナル) /