Putting Unicode characters in JSON

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Mar 23 06:29:24 EDT 2018


On 2018年3月23日 18:35:20 +1100, Chris Angelico wrote:
> That doesn't seem to be a strictly-correct Latin-1 decoder, then. There
> are a number of unassigned byte values in ISO-8859-1.

That's incorrect, but I don't blame you for getting it wrong. Who thought 
that it was a good idea to distinguish between "ISO 8859-1" and 
"ISO-8859-1" as two related but distinct encodings?
https://en.wikipedia.org/wiki/ISO/IEC_8859-1
The old ISO 8859-1 standard, the one with undefined values, is mostly of 
historical interest. For the last twenty years or so, anyone talking 
about either Latin-1 or ISO-8859-1 (with or without dashes) is almost 
meaning the 1992 IANA superset version which defines all 256 characters:
 "In 1992, the IANA registered the character map ISO_8859-1:1987,
 more commonly known by its preferred MIME name of ISO-8859-1
 (note the extra hyphen over ISO 8859-1), a superset of ISO 
 8859-1, for use on the Internet. This map assigns the C0 and C1
 control characters to the unassigned code values thus provides
 for 256 characters via every possible 8-bit value."
Either that, or they actually mean Windows-1252, but let's not go there.
-- 
Steve


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /