[Python-checkins] peps: Introduce union.

Sun Aug 28 21:44:27 CEST 2011

http://hg.python.org/peps/rev/3953e5bcb9d9
changeset: 3935:3953e5bcb9d9
user: Martin v. Löwis <martin at v.loewis.de>
date: Sun Aug 28 20:51:49 2011 +0200
summary:
 Introduce union.
files:
 pep-0393.txt | 27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/pep-0393.txt b/pep-0393.txt
--- a/pep-0393.txt
+++ b/pep-0393.txt
@@ -57,7 +57,12 @@
 typedef struct {
 PyObject_HEAD
 Py_ssize_t length;
- void *str;
+ union {
+ void *any;
+ Py_UCS1 *latin1;
+ Py_UCS2 *ucs2;
+ Py_UCS4 *ucs4;
+ } data;
 Py_hash_t hash;
 int state;
 Py_ssize_t utf8_length;
@@ -69,7 +74,7 @@
 These fields have the following interpretations:
 
 - length: number of code points in the string (result of sq_length)
-- str: shortest-form representation of the unicode string.
+- data: shortest-form representation of the unicode string.
 The string is null-terminated (in its respective representation).
 - hash: same as in Python 3.2
 - state:
@@ -77,7 +82,7 @@
 * lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2
 * next 2 bits (mask 0x0C) - form of str:
 
- + 00 => reserved
+ + 00 => str is not initialized (data are in wstr)
 + 01 => 1 byte (Latin-1)
 + 10 => 2 byte (UCS-2)
 + 11 => 4 byte (UCS-4);
@@ -89,7 +94,7 @@
 (null-terminated). If wchar_t is 16-bit, this form may use surrogate
 pairs (in which cast wstr_length differs form length).
 
-All three representations are optional, although the str form is
+All three representations are optional, although the data form is
 considered the canonical representation which can be absent only
 while the string is being created. If the representation is absent,
 the pointer is NULL, and the corresponding length field may contain
@@ -99,8 +104,8 @@
 defined as a typedef for wchar_t, so the wstr representation can double
 as Py_UNICODE representation.
 
-The str and utf8 pointers point to the same memory if the string uses
-only ASCII characters (using only Latin-1 is not sufficient). The str
+The data and utf8 pointers point to the same memory if the string uses
+only ASCII characters (using only Latin-1 is not sufficient). The data
 and wstr pointers point to the same memory if the string happens to
 fit exactly to the wchar_t type of the platform (i.e. uses some
 BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some
@@ -129,14 +134,14 @@
 representation is not yet set for the string.
 
 PyUnicode_FromUnicode remains supported but is deprecated. If the
-Py_UNICODE pointer is non-null, the str representation is set. If the
+Py_UNICODE pointer is non-null, the data representation is set. If the
 pointer is NULL, a properly-sized wstr representation is allocated,
 which can be modified until PyUnicode_Ready() is called (explicitly
 or implicitly). Resizing a Unicode string remains possible until it
 is finalized.
 
 PyUnicode_Ready() converts a string containing only a wstr
-representation into the canonical representation. Unless wstr and str
+representation into the canonical representation. Unless wstr and data
 can share the memory, the wstr representation is discarded after the
 conversion. PyUnicode_FAST_READY() is a wrapper that avoids the 
 function call if the string is already ready. Both APIs return 0
@@ -219,7 +224,7 @@
 The Py_UNICODE representation is not instantaneously available,
 slowing down applications that request it. While this is also true,
 applications that care about this problem can be rewritten to use the
-str representation.
+data representation.
 
 The question was raised whether the wchar_t representation is
 discouraged, or scheduled for removal. This is not the intent of this
@@ -234,8 +239,8 @@
 expectation is that applications that have many large strings will see
 a reduction in memory usage. For small strings, the effects depend on
 the pointer size of the system, and the size of the Py_UNICODE/wchar_t
-type. The following table demonstrates this for various small string
-sizes and platforms.
+type. The following table demonstrates this for various small ASCII
+string sizes and platforms.
 
 +-------+---------------------------------+----------------+
 |string | Python 3.2 | This PEP |
-- 
Repository URL: http://hg.python.org/peps