homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode()
Type: enhancement Stage:
Components: Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: loewis, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2014年09月01日 22:39 by vstinner, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
wchar.patch vstinner, 2014年09月01日 22:39
wchar_posixmodule.patch vstinner, 2014年09月01日 22:53
Messages (8)
msg226247 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年09月01日 22:39
I would like to deprecate PyUnicode_AsUnicode(), see the issue #22271 for the rationale (hint: memory footprint).
To deprecate PyUnicode_AsUnicode(), we should stop using it internally.
The attached patch is a work-in-progress patch, untested on Windows (only tested on Linux). It gives an idea of how many files should be modified.
TODO:
* Modify posixmodule.c: I don't understand how the Argument Clinic generates the call to PyUnicode_AsUnicode() when the parameter type is declared as "unicode". What is the "unicode" type? Where is the code generating the call to PyUnicode_AsUnicode()?
* Modify a few other files.
msg226248 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年09月01日 22:41
Oh, I didn't generated wchar.patch correctly: please ignore changes in the unicodeobject.c files. These changes are part of issues #22271 and #22323.
msg226250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年09月01日 22:53
wchar_posixmodule.patch: patch for posixmodule.c. Sorry, the code calling PyUnicode_AsUnicode() was not generated by Argument Clinic in fact.
msg226295 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年09月03日 06:44
Will not this cause performance regression? When we hardly work with wchar_t-based API, it looks good to cache encoded value.
msg226298 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年09月03日 07:10
> Will not this cause performance regression? When we hardly work with wchar_t-based API, it looks good to cache encoded value.
Yes, it will be slower. But I prefer slower code with a lower memory footprint. On UNIX, I don't think that anyone will notice the difference.
My concern is that the cache is never released. If the conversion is only needed once at startup, the memory will stay until Python exits. It's not really efficient.
On Windows, conversion to wchar_t* is common because Python uses the Windows wide character API ("W" API vs "A" ANSI code page API). For example, most access to the filesystem use wchar_t* type.
On Python < 3.3, Python was compiled in narrow mode and so Unicode was already using wchar_t* internally to store characters. Since Python 3.3, Python uses a more compact representation. wchar_t* shares Unicode data only if sizeof(wchar_t*) == KIND where KIND is 1, 2 or 4 bytes per character. Examples: "\u20ac" on Windows (16 bits wchar_t) or "\U0010ffff" on Linux (32 bits wchar_t) .
msg226517 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年09月06日 20:46
The cache is released when the string is released. While the string exists it's wchar_t representation can be needed again. That is for what the cache exists.
msg228586 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014年10月05日 15:57
> The cache is released when the string is released. While the string exists it's wchar_t representation can be needed again. That is for what the cache exists.
I know. But I don't want to waste memory for this cache. I want to stop using it. IMO the performance overhead will be null.
In which use case do you think that the overhead of not using the cache would be important enough?
msg228588 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年10月05日 16:24
> In which use case do you think that the overhead of not using the cache would be important enough?
I suppose in the same use case that memory overhead of using the cache is important enough.
We need results of performance and memory consumption effect of these changes in a wide range of programs.
History
Date User Action Args
2022年04月11日 14:58:07adminsetgithub: 66520
2015年10月02日 21:06:01vstinnersetstatus: open -> closed
resolution: wont fix
2014年10月05日 16:24:47serhiy.storchakasetmessages: + msg228588
2014年10月05日 15:57:04vstinnersetmessages: + msg228586
2014年09月06日 20:46:13serhiy.storchakasetmessages: + msg226517
2014年09月03日 07:10:36vstinnersetmessages: + msg226298
2014年09月03日 06:44:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg226295
2014年09月01日 22:53:41vstinnersetfiles: + wchar_posixmodule.patch

messages: + msg226250
2014年09月01日 22:41:29vstinnersetmessages: + msg226248
2014年09月01日 22:39:50vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /