[Python-checkins] peps: Introduce union.
martin.v.loewis
python-checkins at python.org
Sun Aug 28 21:44:27 CEST 2011
http://hg.python.org/peps/rev/3953e5bcb9d9
changeset: 3935:3953e5bcb9d9
user: Martin v. Löwis <martin at v.loewis.de>
date: Sun Aug 28 20:51:49 2011 +0200
summary:
Introduce union.
files:
pep-0393.txt | 27 ++++++++++++++++-----------
1 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/pep-0393.txt b/pep-0393.txt
--- a/pep-0393.txt
+++ b/pep-0393.txt
@@ -57,7 +57,12 @@
typedef struct {
PyObject_HEAD
Py_ssize_t length;
- void *str;
+ union {
+ void *any;
+ Py_UCS1 *latin1;
+ Py_UCS2 *ucs2;
+ Py_UCS4 *ucs4;
+ } data;
Py_hash_t hash;
int state;
Py_ssize_t utf8_length;
@@ -69,7 +74,7 @@
These fields have the following interpretations:
- length: number of code points in the string (result of sq_length)
-- str: shortest-form representation of the unicode string.
+- data: shortest-form representation of the unicode string.
The string is null-terminated (in its respective representation).
- hash: same as in Python 3.2
- state:
@@ -77,7 +82,7 @@
* lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2
* next 2 bits (mask 0x0C) - form of str:
- + 00 => reserved
+ + 00 => str is not initialized (data are in wstr)
+ 01 => 1 byte (Latin-1)
+ 10 => 2 byte (UCS-2)
+ 11 => 4 byte (UCS-4);
@@ -89,7 +94,7 @@
(null-terminated). If wchar_t is 16-bit, this form may use surrogate
pairs (in which cast wstr_length differs form length).
-All three representations are optional, although the str form is
+All three representations are optional, although the data form is
considered the canonical representation which can be absent only
while the string is being created. If the representation is absent,
the pointer is NULL, and the corresponding length field may contain
@@ -99,8 +104,8 @@
defined as a typedef for wchar_t, so the wstr representation can double
as Py_UNICODE representation.
-The str and utf8 pointers point to the same memory if the string uses
-only ASCII characters (using only Latin-1 is not sufficient). The str
+The data and utf8 pointers point to the same memory if the string uses
+only ASCII characters (using only Latin-1 is not sufficient). The data
and wstr pointers point to the same memory if the string happens to
fit exactly to the wchar_t type of the platform (i.e. uses some
BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some
@@ -129,14 +134,14 @@
representation is not yet set for the string.
PyUnicode_FromUnicode remains supported but is deprecated. If the
-Py_UNICODE pointer is non-null, the str representation is set. If the
+Py_UNICODE pointer is non-null, the data representation is set. If the
pointer is NULL, a properly-sized wstr representation is allocated,
which can be modified until PyUnicode_Ready() is called (explicitly
or implicitly). Resizing a Unicode string remains possible until it
is finalized.
PyUnicode_Ready() converts a string containing only a wstr
-representation into the canonical representation. Unless wstr and str
+representation into the canonical representation. Unless wstr and data
can share the memory, the wstr representation is discarded after the
conversion. PyUnicode_FAST_READY() is a wrapper that avoids the
function call if the string is already ready. Both APIs return 0
@@ -219,7 +224,7 @@
The Py_UNICODE representation is not instantaneously available,
slowing down applications that request it. While this is also true,
applications that care about this problem can be rewritten to use the
-str representation.
+data representation.
The question was raised whether the wchar_t representation is
discouraged, or scheduled for removal. This is not the intent of this
@@ -234,8 +239,8 @@
expectation is that applications that have many large strings will see
a reduction in memory usage. For small strings, the effects depend on
the pointer size of the system, and the size of the Py_UNICODE/wchar_t
-type. The following table demonstrates this for various small string
-sizes and platforms.
+type. The following table demonstrates this for various small ASCII
+string sizes and platforms.
+-------+---------------------------------+----------------+
|string | Python 3.2 | This PEP |
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list