Message76646
| Author |
mark.dickinson |
| Recipients |
MrJean1, amaury.forgeotdarc, loewis, mark.dickinson |
| Date |
2008年11月30日.17:32:51 |
| SpamBayes Score |
6.209612e-05 |
| Marked as misclassified |
No |
| Message-id |
<1228066372.85.0.95271528221.issue4388@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
I'm now very confused.
In trying to follow things of type wchar_t* around the Python source, I
discovered PyUnicode_FromWideChar in unicodebject.c. For OS X, the
conversion lands in the following code, where w is the incoming WideChar
array, declared as wchar_t *.
register Py_UNICODE *u;
register Py_ssize_t i;
u = PyUnicode_AS_UNICODE(unicode);
for (i = size; i > 0; i--)
*u++ = *w++;
But this looks wrong: on OS X, sizeof(wchar_t) is 4 and I think w is
encoded in UTF-32. So I was expecting to see some kind of explicit
conversion from UTF-32 to UCS-2 here. Instead, it looks as though the
incoming values are implicitly truncated from 32 bits to 16. Doesn't this
do the wrong thing for characters outside the BMP?
Should I open an issue for this, or am I simply misunderstanding? |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2008年11月30日 17:32:53 | mark.dickinson | set | recipients:
+ mark.dickinson, loewis, amaury.forgeotdarc, MrJean1 |
| 2008年11月30日 17:32:52 | mark.dickinson | set | messageid: <1228066372.85.0.95271528221.issue4388@psf.upfronthosting.co.za> |
| 2008年11月30日 17:32:51 | mark.dickinson | link | issue4388 messages |
| 2008年11月30日 17:32:51 | mark.dickinson | create |
|