Message 148615 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	Brian.Jones, eric.araujo, eric.smith, ezio.melotti, hp.dekoning, loewis
Date	2011年11月29日.21:24:21
SpamBayes Score	3.0142555e-14
Marked as misclassified	No
Message-id	<4ED54D84.8050908@v.loewis.de>
In-reply-to	<1322556184.55.0.779951108031.issue11113@psf.upfronthosting.co.za>

Content
> 1) the current approach of having a dict with name -> intvalue doesn't work anymore, and a name -> valuelist should be used instead; > 2) the reverse dict for this would have to use tuples as keys, but I'm not sure how useful would that be (producing entities is not a common case, especially "unusual" ones like these). > 3) The name -> char dict might still be useful, and can easily become a name -> str dict in order to deal with the multichar entities; > > Since 1) is not backward-compatible the HTML5 entities should probably go in a separate dict. +1 for a separate dict; -1 for a value list. The right value type is 'str'; name2codepoint ought to be deprecated (it's a left-over from when the str type wasn't unicode in 2.x). As for the reverse mapping: I'd add a dictionary that is reverse to entitydefs (i.e. with str keys). That some keys then have two characters is no real issue: applications that want to use this dictionary can either ignore them, or follow the approach of always checking Unicode combining characters - I'd expect that all "second" characters are indeed combining. OTOH, it's easy enough to create an inverted dictionary yourself when you need it, and not every three-line function needs to be in the standard library. It might actually be more useful to compile the values into a regular expression which you can then use to find out whether characters can be escaped using entity references.

Content

> 1) the current approach of having a dict with name -> intvalue doesn't work anymore, and a name -> valuelist should be used instead;
> 2) the reverse dict for this would have to use tuples as keys, but I'm not sure how useful would that be (producing entities is not a common case, especially "unusual" ones like these).
> 3) The name -> char dict might still be useful, and can easily become a name -> str dict in order to deal with the multichar entities;
> 
> Since 1) is not backward-compatible the HTML5 entities should probably go in a separate dict.
+1 for a separate dict; -1 for a value list. The right value type is
'str'; name2codepoint ought to be deprecated (it's a left-over from
when the str type wasn't unicode in 2.x).
As for the reverse mapping: I'd add a dictionary that is reverse to
entitydefs (i.e. with str keys). That some keys then have two characters
is no real issue: applications that want to use this dictionary can
either ignore them, or follow the approach of always checking
Unicode combining characters - I'd expect that all "second" characters
are indeed combining.
OTOH, it's easy enough to create an inverted dictionary yourself
when you need it, and not every three-line function needs to be
in the standard library. It might actually be more useful to compile
the values into a regular expression which you can then use to
find out whether characters can be escaped using entity references.

History
Date	User	Action	Args
2011年11月29日 21:24:22	loewis	set	recipients: + loewis, eric.smith, ezio.melotti, eric.araujo, Brian.Jones, hp.dekoning
2011年11月29日 21:24:22	loewis	link	issue11113 messages
2011年11月29日 21:24:21	loewis	create

homepage