Message 76646 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	mark.dickinson
Recipients	MrJean1, amaury.forgeotdarc, loewis, mark.dickinson
Date	2008年11月30日.17:32:51
SpamBayes Score	6.209612e-05
Marked as misclassified	No
Message-id	<1228066372.85.0.95271528221.issue4388@psf.upfronthosting.co.za>

Content
I'm now very confused. In trying to follow things of type wchar_t* around the Python source, I discovered PyUnicode_FromWideChar in unicodebject.c. For OS X, the conversion lands in the following code, where w is the incoming WideChar array, declared as wchar_t . register Py_UNICODE u; register Py_ssize_t i; u = PyUnicode_AS_UNICODE(unicode); for (i = size; i > 0; i--) u++ = w++; But this looks wrong: on OS X, sizeof(wchar_t) is 4 and I think w is encoded in UTF-32. So I was expecting to see some kind of explicit conversion from UTF-32 to UCS-2 here. Instead, it looks as though the incoming values are implicitly truncated from 32 bits to 16. Doesn't this do the wrong thing for characters outside the BMP? Should I open an issue for this, or am I simply misunderstanding?

Content

I'm now very confused.
In trying to follow things of type wchar_t* around the Python source, I 
discovered PyUnicode_FromWideChar in unicodebject.c. For OS X, the 
conversion lands in the following code, where w is the incoming WideChar 
array, declared as wchar_t *.
	register Py_UNICODE *u;
	register Py_ssize_t i;
	u = PyUnicode_AS_UNICODE(unicode);
	for (i = size; i > 0; i--)
	 *u++ = *w++;
But this looks wrong: on OS X, sizeof(wchar_t) is 4 and I think w is 
encoded in UTF-32. So I was expecting to see some kind of explicit 
conversion from UTF-32 to UCS-2 here. Instead, it looks as though the 
incoming values are implicitly truncated from 32 bits to 16. Doesn't this 
do the wrong thing for characters outside the BMP?
Should I open an issue for this, or am I simply misunderstanding?

History
Date	User	Action	Args
2008年11月30日 17:32:53	mark.dickinson	set	recipients: + mark.dickinson, loewis, amaury.forgeotdarc, MrJean1
2008年11月30日 17:32:52	mark.dickinson	set	messageid: <1228066372.85.0.95271528221.issue4388@psf.upfronthosting.co.za>
2008年11月30日 17:32:51	mark.dickinson	link	issue4388 messages
2008年11月30日 17:32:51	mark.dickinson	create

homepage