Commit 5615bc9

committed

update

1 parent 1684f28 commit 5615bc9Copy full SHA for 5615bc9

File tree

2 files changed

+266

-0

lines changed

README.md
ch03.md

2 files changed

+266

-0

lines changed

`‎README.md‎`

Lines changed: 10 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,8 +8,18 @@`
`8`	`8`
`9`	`9`	`使用 Docker 编译 Python 源代码,使用说明参考 [Docker 使用说明](docker.md)。`
`10`	`10`
	`11`	`+## 源代码`
	`12`	`+`
	`13`	`+在阅读《Python 源码剖析》过程中,为了验证一些想法,对 Python2.5的源代码进行了不少修改。修改过的代码在[这里](https://github.com/ausaki/python25)。`
	`14`	`+`
	`15`	`+master 分支是原始代码。`
	`16`	`+`
	`17`	`+每个 chxx 分支对应书中相应的章节,基于 master 分支修改而来。`
	`18`	`+`
	`19`	`+`
`11`	`20`	`## 目录`
`12`	`21`
`13`	`22`	`- [ch01 - Pyhton 对象初探](ch01.md)`
`14`	`23`	`- [ch02 - Pyhton 中的整数对象](ch02.md)`
	`24`	`+- [ch03 - Pyhton 中的字符串对象](ch03.md)`
`15`	`25`

`‎ch03.md‎`

Lines changed: 256 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,256 @@`
	`1`	`+# Pyhton 中的字符串对象`
	`2`	`+`
	`3`	`+字符串对象是变长的不可变类型,定义代码如下:`
	`4`	`+`
	`5`	+```C
	`6`	`+typedef struct {`
	`7`	`+ PyObject_VAR_HEAD`
	`8`	`+ long ob_shash;`
	`9`	`+ int ob_sstate;`
	`10`	`+ char ob_sval[1];`
	`11`	`+`
	`12`	`+ /* Invariants:`
	`13`	`+ * ob_sval contains space for 'ob_size+1' elements.`
	`14`	`+ * ob_sval[ob_size] == 0.`
	`15`	`+ * ob_shash is the hash of the string or -1 if not computed yet.`
	`16`	`+ * ob_sstate != 0 iff the string object is in stringobject.c's`
	`17`	`+ * 'interned' dictionary; in this case the two references`
	`18`	`+ * from 'interned' to this object are not counted in ob_refcnt.`
	`19`	`+ */`
	`20`	`+} PyStringObject;`
	`21`	+```
	`22`	`+`
	`23`	`+每个字段的含义看注释。`
	`24`	`+`
	`25`	`+字符串哈希的计算方法:`
	`26`	`+`
	`27`	+```C
	`28`	`+static long`
	`29`	`+string_hash(PyStringObject *a)`
	`30`	`+{`
	`31`	`+ register Py_ssize_t len;`
	`32`	`+ register unsigned char *p;`
	`33`	`+ register long x;`
	`34`	`+`
	`35`	`+ if (a->ob_shash != -1)`
	`36`	`+ return a->ob_shash;`
	`37`	`+ len = a->ob_size;`
	`38`	`+ p = (unsigned char *) a->ob_sval;`
	`39`	`+ x = *p << 7;`
	`40`	`+ while (--len>= 0)`
	`41`	`+ x = (1000003x) ^ p++;`
	`42`	`+ x ^= a->ob_size;`
	`43`	`+ if (x == -1)`
	`44`	`+ x = -2;`
	`45`	`+ a->ob_shash = x;`
	`46`	`+ return x;`
	`47`	`+}`
	`48`	+```
	`49`	`+`
	`50`	`+从 C 中的字符串创建 PyStringObject 的代码如下:`
	`51`	`+`
	`52`	+```C
	`53`	`+/* This dictionary holds all interned strings. Note that references to`
	`54`	`+ strings in this dictionary are not counted in the string's ob_refcnt.`
	`55`	`+ When the interned string reaches a refcnt of 0 the string deallocation`
	`56`	`+ function will delete the reference from this dictionary.`
	`57`	`+`
	`58`	`+ Another way to look at this is that to say that the actual reference`
	`59`	`+ count of a string is: s->ob_refcnt + (s->ob_sstate?2:0)`
	`60`	`+*/`
	`61`	`+static PyObject *interned;`
	`62`	`+`
	`63`	`+/*`
	`64`	`+ For both PyString_FromString() and PyString_FromStringAndSize(), the`
	`65`	+ parameter `size' denotes number of characters to allocate, not counting any
	`66`	`+ null terminating character.`
	`67`	`+`
	`68`	+ For PyString_FromString(), the parameter `str' points to a null-terminated
	`69`	+ string containing exactly `size' bytes.
	`70`	`+`
	`71`	+ For PyString_FromStringAndSize(), the parameter the parameter `str' is
	`72`	+ either NULL or else points to a string containing at least `size' bytes.
	`73`	+ For PyString_FromStringAndSize(), the string in the `str' parameter does
	`74`	`+ not have to be null-terminated. (Therefore it is safe to construct a`
	`75`	+ substring by calling `PyString_FromStringAndSize(origstring, substrlen)'.)
	`76`	+ If `str'is NULL then PyString_FromStringAndSize() will allocate `size+1'
	`77`	`+ bytes (setting the last byte to the null terminating character) and you can`
	`78`	+ fill in the data yourself. If `str' is non-NULL then the resulting
	`79`	`+ PyString object must be treated as immutable and you must not fill in nor`
	`80`	`+ alter the data yourself, since the strings may be shared.`
	`81`	`+`
	`82`	+ The PyObject member `op->ob_size', which denotes the number of"extra
	`83`	`+ items" in a variable-size object, will contain the number of bytes`
	`84`	`+ allocated for string data, not counting the null terminating character. It`
	`85`	+ is therefore equal to the equal to the `size' parameter (for
	`86`	+ PyString_FromStringAndSize()) or the length of the string in the `str'
	`87`	`+ parameter (for PyString_FromString()).`
	`88`	`+*/`
	`89`	`+PyObject *`
	`90`	`+PyString_FromStringAndSize(const char *str, Py_ssize_t size)`
	`91`	`+{`
	`92`	`+ register PyStringObject *op;`
	`93`	`+ assert(size>= 0);`
	`94`	`+ if (size == 0 && (op = nullstring) != NULL) {`
	`95`	`+#ifdef COUNT_ALLOCS`
	`96`	`+ null_strings++;`
	`97`	`+#endif`
	`98`	`+ Py_INCREF(op);`
	`99`	`+ return (PyObject *)op;`
	`100`	`+ }`
	`101`	`+ if (size == 1 && str != NULL &&`
	`102`	`+ (op = characters[*str & UCHAR_MAX]) != NULL)`
	`103`	`+ {`
	`104`	`+#ifdef COUNT_ALLOCS`
	`105`	`+ one_strings++;`
	`106`	`+#endif`
	`107`	`+ Py_INCREF(op);`
	`108`	`+ return (PyObject *)op;`
	`109`	`+ }`
	`110`	`+`
	`111`	`+ /* Inline PyObject_NewVar */`
	`112`	`+ op = (PyStringObject *)PyObject_MALLOC(sizeof(PyStringObject) + size);`
	`113`	`+ if (op == NULL)`
	`114`	`+ return PyErr_NoMemory();`
	`115`	`+ PyObject_INIT_VAR(op, &PyString_Type, size);`
	`116`	`+ op->ob_shash = -1;`
	`117`	`+ op->ob_sstate = SSTATE_NOT_INTERNED;`
	`118`	`+ if (str != NULL)`
	`119`	`+ Py_MEMCPY(op->ob_sval, str, size);`
	`120`	`+ op->ob_sval[size] = '0円';`
	`121`	`+ /* share short strings */`
	`122`	`+ if (size == 0) {`
	`123`	`+ PyObject t = (PyObject )op;`
	`124`	`+ PyString_InternInPlace(&t);`
	`125`	`+ op = (PyStringObject *)t;`
	`126`	`+ nullstring = op;`
	`127`	`+ Py_INCREF(op);`
	`128`	`+ } else if (size == 1 && str != NULL) {`
	`129`	`+ PyObject t = (PyObject )op;`
	`130`	`+ PyString_InternInPlace(&t);`
	`131`	`+ op = (PyStringObject *)t;`
	`132`	`+ characters[*str & UCHAR_MAX] = op;`
	`133`	`+ Py_INCREF(op);`
	`134`	`+ }`
	`135`	`+ return (PyObject *) op;`
	`136`	`+}`
	`137`	`+`
	`138`	`+PyObject *`
	`139`	`+PyString_FromString(const char *str)`
	`140`	`+{`
	`141`	`+ register size_t size;`
	`142`	`+ register PyStringObject *op;`
	`143`	`+`
	`144`	`+ assert(str != NULL);`
	`145`	`+ size = strlen(str);`
	`146`	`+ if (size> PY_SSIZE_T_MAX) {`
	`147`	`+ PyErr_SetString(PyExc_OverflowError,`
	`148`	`+ "string is too long for a Python string");`
	`149`	`+ return NULL;`
	`150`	`+ }`
	`151`	`+ if (size == 0 && (op = nullstring) != NULL) {`
	`152`	`+#ifdef COUNT_ALLOCS`
	`153`	`+ null_strings++;`
	`154`	`+#endif`
	`155`	`+ Py_INCREF(op);`
	`156`	`+ return (PyObject *)op;`
	`157`	`+ }`
	`158`	`+ if (size == 1 && (op = characters[*str & UCHAR_MAX]) != NULL) {`
	`159`	`+#ifdef COUNT_ALLOCS`
	`160`	`+ one_strings++;`
	`161`	`+#endif`
	`162`	`+ Py_INCREF(op);`
	`163`	`+ return (PyObject *)op;`
	`164`	`+ }`
	`165`	`+`
	`166`	`+ /* Inline PyObject_NewVar */`
	`167`	`+ op = (PyStringObject *)PyObject_MALLOC(sizeof(PyStringObject) + size);`
	`168`	`+ if (op == NULL)`
	`169`	`+ return PyErr_NoMemory();`
	`170`	`+ PyObject_INIT_VAR(op, &PyString_Type, size);`
	`171`	`+ op->ob_shash = -1;`
	`172`	`+ op->ob_sstate = SSTATE_NOT_INTERNED;`
	`173`	`+ Py_MEMCPY(op->ob_sval, str, size+1);`
	`174`	`+ /* share short strings */`
	`175`	`+ if (size == 0) {`
	`176`	`+ PyObject t = (PyObject )op;`
	`177`	`+ PyString_InternInPlace(&t);`
	`178`	`+ op = (PyStringObject *)t;`
	`179`	`+ nullstring = op;`
	`180`	`+ Py_INCREF(op);`
	`181`	`+ } else if (size == 1) {`
	`182`	`+ PyObject t = (PyObject )op;`
	`183`	`+ PyString_InternInPlace(&t);`
	`184`	`+ op = (PyStringObject *)t;`
	`185`	`+ characters[*str & UCHAR_MAX] = op;`
	`186`	`+ Py_INCREF(op);`
	`187`	`+ }`
	`188`	`+ return (PyObject *) op;`
	`189`	`+}`
	`190`	+```
	`191`	`+`
	`192`	`+一些要点都在注释里面了。`
	`193`	`+`
	`194`	`+另外要注意的一点是字符串的 intern 机制,intern 机制会共享短字符串(空字符串和长度为 1 的字符串)。`
	`195`	`+上面代码开头的一段定义了一个变量 interned,interned 其实是一个字典 PyDict,保存了短字符串的映射关系。`
	`196`	`+`
	`197`	`+在创建字符串时,如果该字符串是短字符串(假设为 "A"),如果 interned 中含有 "A",则返回 "A",并销毁之前创建的 PyStringObject(减少引用计数),如果 interned 中没有 "A",则将 "A" 保存起来。`
	`198`	`+`
	`199`	`+总的来说,intern 机制节省了字符串的内存使用率。即使你创建了 100 个短字符串 "A",但是在 Python 内部其实只有一份 "A",这 100 个 "A" 都引用了同一个 "A"。`
	`200`	`+`
	`201`	`+下面的代码描述了如何共享短字符串:`
	`202`	`+`
	`203`	+```C
	`204`	`+void`
	`205`	`+PyString_InternInPlace(PyObject **p)`
	`206`	`+{`
	`207`	`+ register PyStringObject s = (PyStringObject )(*p);`
	`208`	`+ PyObject *t;`
	`209`	`+ if (s == NULL \|\| !PyString_Check(s))`
	`210`	`+ Py_FatalError("PyString_InternInPlace: strings only please!");`
	`211`	`+ /* If it's a string subclass, we don't really know what putting`
	`212`	`+ it in the interned dict might do. */`
	`213`	`+ if (!PyString_CheckExact(s))`
	`214`	`+ return;`
	`215`	`+ if (PyString_CHECK_INTERNED(s))`
	`216`	`+ return;`
	`217`	`+ if (interned == NULL) {`
	`218`	`+ interned = PyDict_New();`
	`219`	`+ if (interned == NULL) {`
	`220`	`+ PyErr_Clear(); /* Don't leave an exception */`
	`221`	`+ return;`
	`222`	`+ }`
	`223`	`+ }`
	`224`	`+ t = PyDict_GetItem(interned, (PyObject *)s);`
	`225`	`+ if (t) {`
	`226`	`+ Py_INCREF(t);`
	`227`	`+ Py_DECREF(*p);`
	`228`	`+ *p = t;`
	`229`	`+ return;`
	`230`	`+ }`
	`231`	`+`
	`232`	`+ if (PyDict_SetItem(interned, (PyObject )s, (PyObject )s) < 0) {`
	`233`	`+ PyErr_Clear();`
	`234`	`+ return;`
	`235`	`+ }`
	`236`	`+ /* The two references in interned are not counted by refcnt.`
	`237`	`+ The string deallocator will take care of this */`
	`238`	`+ s->ob_refcnt -= 2;`
	`239`	`+ PyString_CHECK_INTERNED(s) = SSTATE_INTERNED_MORTAL;`
	`240`	`+}`
	`241`	+```
	`242`	`+`
	`243`	`+`
	`244`	`+## 短字符串缓冲池`
	`245`	`+`
	`246`	+Python 使用 `characters` 数组保存了长度为 1 的字符串,当创建新的字符串时,如果 `characters` 中已经存在该字符串,则直接返回 `characters` 中的字符串。
	`247`	`+`
	`248`	+```C
	`249`	`+static PyStringObject *characters[UCHAR_MAX + 1];`
	`250`	`+// UCHAR_MAX = 255`
	`251`	+```
	`252`	`+`
	`253`	`+`
	`254`	`+## 修改代码测试`
	`255`	`+`
	`256`	`+好像 characters 不会缓冲'*'之类的特殊字符。`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 5615bc9

File tree

2 files changed

2 files changed

`‎README.md‎`

`‎ch03.md‎`

0 commit comments