[Python-Dev] Re: Draft PEP: Remove wstr from Unicode

2020年6月22日 15:15:24 -0700

Hi INADA-san,
First of all, thanks for writing down a PEP!
Le jeu. 18 juin 2020 à 11:42, Inada Naoki <[email protected]> a écrit :
> To support legacy Unicode object created by
> ``PyUnicode_FromUnicode(NULL, length)``, many Unicode APIs has
> ``PyUnicode_READY()`` check.
I don't see PyUnicode_READY() removal in the specification section.
When can we remove these calls and the function itself?
> Support of legacy Unicode object makes Unicode implementation complex.
> Until we drop legacy Unicode object, it is very hard to try other Unicode
> implementation like UTF-8 based implementation in PyPy.
I'm not sure if it should be in the scope of the PEP or not, but there
are also other C API functions which are too close to the PEP 393
concrete implementation. For example, I'm not sure that
PyUnicode_MAX_CHAR_VALUE(str) would be relevant/efficient if Python
str is reimplemented to use UTF-8 internally. Should we deprecate it
as well? Do you think that it should be addressed in a separated PEP?
In fact, a large part of the Unicode C API is based on the current
implementation of the Python str type. For example, I'm not sure that
PyUnicode_New(size, max_char) would still make sense if we change the
code to store strings as UTF-8 internally.
In an ideal world, I would prefer to have a "string builder" API, like
the current _PyUnicodeWriter C API, to create a string, and only never
allow to modify a string in-place.
CPython "almost" immutable str "if reference count is equal to 1" has
corner cases and can be misused. But again, I don't think that it
should be part of this PEP :-) Sorry for being off-topic ;-)
> Specification
> =============
>
> Affected APIs
> --------------
>
> From the Unicode implementation, ``wstr`` and ``wstr_length`` members are
> removed.
>
> Macros and functions to be removed:
>
> * PyUnicode_GET_SIZE
> * PyUnicode_GET_DATA_SIZE
> * Py_UNICODE_WSTR_LENGTH
> * PyUnicode_AS_UNICODE
> * PyUnicode_AS_DATA
> * PyUnicode_AsUnicode
> * PyUnicode_AsUnicodeAndSize
Which ones are already deprecated?
> Behaviors to be removed:
>
> * PyUnicode_FromUnicode -- ``PyUnicode_FromUnicode(NULL, size)`` where
> ``size > 0`` cause RuntimeError instead of creating legacy Unicode
> object. While this API is deprecated by PEP 393, this API will be kept
> when ``wstr`` is removed. This API will be removed later.
I'm not sure that it's relevant to keep PyUnicode_FromUnicode()
whereas PyUnicode_FromWideChar() has a clean API (use wchar_t*, not
Py_UNICODE*). I also suggest to disallow PyUnicode_FromUnicode(NULL,
0) as well.
By the way, when can we finally remove the Py_UNICODE type?
I would prefer to remove Py_UNICODE and PyUnicode_FromUnicode().
> * PyUnicode_FromStringAndSize -- Like PyUnicode_FromUnicode,
> ``PyUnicode_FromStringAndSize(NULL, size)`` cause RuntimeError
> instead of creating legacy unicode object.
> All APIs to be changed should raise DeprecationWarning for behavior to be
> removed. Note that ``PyUnicode_FromUnicode`` has both of compiler deprecation
> warning and runtime DeprecationWarning. [3]_, [4]_.
Every function scheduled for removal? Even PyUnicode_GET_SIZE()? I'm
not sure that C extensions are prepared for PyUnicode_GET_SIZE()
raising an exception when using -Werror.
> All deprecations will be implemented in Python 3.10.
> Some deprecations will be backported in Python 3.9.
>
> Actual removal will happen in Python 3.12.
Many functions are already declared with Py_DEPRECATED() for a long
time. Would it make sense to remove these functions earlier?
Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/M7JFI5TWLM7KOYVSBFFTPQS5HHO4DF2M/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to