Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 5615bc9

Browse files
committed
update
1 parent 1684f28 commit 5615bc9

File tree

2 files changed

+266
-0
lines changed

2 files changed

+266
-0
lines changed

‎README.md‎

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,18 @@
88

99
使用 Docker 编译 Python 源代码,使用说明参考 [Docker 使用说明](docker.md)
1010

11+
## 源代码
12+
13+
在阅读《Python 源码剖析》过程中,为了验证一些想法,对 Python2.5的源代码进行了不少修改。修改过的代码在[这里](https://github.com/ausaki/python25)
14+
15+
master 分支是原始代码。
16+
17+
每个 chxx 分支对应书中相应的章节,基于 master 分支修改而来。
18+
19+
1120
## 目录
1221

1322
- [ch01 - Pyhton 对象初探](ch01.md)
1423
- [ch02 - Pyhton 中的整数对象](ch02.md)
24+
- [ch03 - Pyhton 中的字符串对象](ch03.md)
1525

‎ch03.md‎

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
# Pyhton 中的字符串对象
2+
3+
字符串对象是变长的不可变类型,定义代码如下:
4+
5+
```C
6+
typedef struct {
7+
PyObject_VAR_HEAD
8+
long ob_shash;
9+
int ob_sstate;
10+
char ob_sval[1];
11+
12+
/* Invariants:
13+
* ob_sval contains space for 'ob_size+1' elements.
14+
* ob_sval[ob_size] == 0.
15+
* ob_shash is the hash of the string or -1 if not computed yet.
16+
* ob_sstate != 0 iff the string object is in stringobject.c's
17+
* 'interned' dictionary; in this case the two references
18+
* from 'interned' to this object are *not counted* in ob_refcnt.
19+
*/
20+
} PyStringObject;
21+
```
22+
23+
每个字段的含义看注释。
24+
25+
字符串哈希的计算方法:
26+
27+
```C
28+
static long
29+
string_hash(PyStringObject *a)
30+
{
31+
register Py_ssize_t len;
32+
register unsigned char *p;
33+
register long x;
34+
35+
if (a->ob_shash != -1)
36+
return a->ob_shash;
37+
len = a->ob_size;
38+
p = (unsigned char *) a->ob_sval;
39+
x = *p << 7;
40+
while (--len>= 0)
41+
x = (1000003*x) ^ *p++;
42+
x ^= a->ob_size;
43+
if (x == -1)
44+
x = -2;
45+
a->ob_shash = x;
46+
return x;
47+
}
48+
```
49+
50+
从 C 中的字符串创建 PyStringObject 的代码如下:
51+
52+
```C
53+
/* This dictionary holds all interned strings. Note that references to
54+
strings in this dictionary are *not* counted in the string's ob_refcnt.
55+
When the interned string reaches a refcnt of 0 the string deallocation
56+
function will delete the reference from this dictionary.
57+
58+
Another way to look at this is that to say that the actual reference
59+
count of a string is: s->ob_refcnt + (s->ob_sstate?2:0)
60+
*/
61+
static PyObject *interned;
62+
63+
/*
64+
For both PyString_FromString() and PyString_FromStringAndSize(), the
65+
parameter `size' denotes number of characters to allocate, not counting any
66+
null terminating character.
67+
68+
For PyString_FromString(), the parameter `str' points to a null-terminated
69+
string containing exactly `size' bytes.
70+
71+
For PyString_FromStringAndSize(), the parameter the parameter `str' is
72+
either NULL or else points to a string containing at least `size' bytes.
73+
For PyString_FromStringAndSize(), the string in the `str' parameter does
74+
not have to be null-terminated. (Therefore it is safe to construct a
75+
substring by calling `PyString_FromStringAndSize(origstring, substrlen)'.)
76+
If `str'is NULL then PyString_FromStringAndSize() will allocate `size+1'
77+
bytes (setting the last byte to the null terminating character) and you can
78+
fill in the data yourself. If `str' is non-NULL then the resulting
79+
PyString object must be treated as immutable and you must not fill in nor
80+
alter the data yourself, since the strings may be shared.
81+
82+
The PyObject member `op->ob_size', which denotes the number of"extra
83+
items" in a variable-size object, will contain the number of bytes
84+
allocated for string data, not counting the null terminating character. It
85+
is therefore equal to the equal to the `size' parameter (for
86+
PyString_FromStringAndSize()) or the length of the string in the `str'
87+
parameter (for PyString_FromString()).
88+
*/
89+
PyObject *
90+
PyString_FromStringAndSize(const char *str, Py_ssize_t size)
91+
{
92+
register PyStringObject *op;
93+
assert(size>= 0);
94+
if (size == 0 && (op = nullstring) != NULL) {
95+
#ifdef COUNT_ALLOCS
96+
null_strings++;
97+
#endif
98+
Py_INCREF(op);
99+
return (PyObject *)op;
100+
}
101+
if (size == 1 && str != NULL &&
102+
(op = characters[*str & UCHAR_MAX]) != NULL)
103+
{
104+
#ifdef COUNT_ALLOCS
105+
one_strings++;
106+
#endif
107+
Py_INCREF(op);
108+
return (PyObject *)op;
109+
}
110+
111+
/* Inline PyObject_NewVar */
112+
op = (PyStringObject *)PyObject_MALLOC(sizeof(PyStringObject) + size);
113+
if (op == NULL)
114+
return PyErr_NoMemory();
115+
PyObject_INIT_VAR(op, &PyString_Type, size);
116+
op->ob_shash = -1;
117+
op->ob_sstate = SSTATE_NOT_INTERNED;
118+
if (str != NULL)
119+
Py_MEMCPY(op->ob_sval, str, size);
120+
op->ob_sval[size] = '0円';
121+
/* share short strings */
122+
if (size == 0) {
123+
PyObject *t = (PyObject *)op;
124+
PyString_InternInPlace(&t);
125+
op = (PyStringObject *)t;
126+
nullstring = op;
127+
Py_INCREF(op);
128+
} else if (size == 1 && str != NULL) {
129+
PyObject *t = (PyObject *)op;
130+
PyString_InternInPlace(&t);
131+
op = (PyStringObject *)t;
132+
characters[*str & UCHAR_MAX] = op;
133+
Py_INCREF(op);
134+
}
135+
return (PyObject *) op;
136+
}
137+
138+
PyObject *
139+
PyString_FromString(const char *str)
140+
{
141+
register size_t size;
142+
register PyStringObject *op;
143+
144+
assert(str != NULL);
145+
size = strlen(str);
146+
if (size> PY_SSIZE_T_MAX) {
147+
PyErr_SetString(PyExc_OverflowError,
148+
"string is too long for a Python string");
149+
return NULL;
150+
}
151+
if (size == 0 && (op = nullstring) != NULL) {
152+
#ifdef COUNT_ALLOCS
153+
null_strings++;
154+
#endif
155+
Py_INCREF(op);
156+
return (PyObject *)op;
157+
}
158+
if (size == 1 && (op = characters[*str & UCHAR_MAX]) != NULL) {
159+
#ifdef COUNT_ALLOCS
160+
one_strings++;
161+
#endif
162+
Py_INCREF(op);
163+
return (PyObject *)op;
164+
}
165+
166+
/* Inline PyObject_NewVar */
167+
op = (PyStringObject *)PyObject_MALLOC(sizeof(PyStringObject) + size);
168+
if (op == NULL)
169+
return PyErr_NoMemory();
170+
PyObject_INIT_VAR(op, &PyString_Type, size);
171+
op->ob_shash = -1;
172+
op->ob_sstate = SSTATE_NOT_INTERNED;
173+
Py_MEMCPY(op->ob_sval, str, size+1);
174+
/* share short strings */
175+
if (size == 0) {
176+
PyObject *t = (PyObject *)op;
177+
PyString_InternInPlace(&t);
178+
op = (PyStringObject *)t;
179+
nullstring = op;
180+
Py_INCREF(op);
181+
} else if (size == 1) {
182+
PyObject *t = (PyObject *)op;
183+
PyString_InternInPlace(&t);
184+
op = (PyStringObject *)t;
185+
characters[*str & UCHAR_MAX] = op;
186+
Py_INCREF(op);
187+
}
188+
return (PyObject *) op;
189+
}
190+
```
191+
192+
一些要点都在注释里面了。
193+
194+
另外要注意的一点是字符串的 intern 机制,intern 机制会共享短字符串(空字符串和长度为 1 的字符串)。
195+
上面代码开头的一段定义了一个变量 interned,interned 其实是一个字典 PyDict,保存了短字符串的映射关系。
196+
197+
在创建字符串时,如果该字符串是短字符串(假设为 "A"),如果 interned 中含有 "A",则返回 "A",并销毁之前创建的 PyStringObject(减少引用计数),如果 interned 中没有 "A",则将 "A" 保存起来。
198+
199+
总的来说,intern 机制节省了字符串的内存使用率。即使你创建了 100 个短字符串 "A",但是在 Python 内部其实只有一份 "A",这 100 个 "A" 都引用了同一个 "A"。
200+
201+
下面的代码描述了如何共享短字符串:
202+
203+
```C
204+
void
205+
PyString_InternInPlace(PyObject **p)
206+
{
207+
register PyStringObject *s = (PyStringObject *)(*p);
208+
PyObject *t;
209+
if (s == NULL || !PyString_Check(s))
210+
Py_FatalError("PyString_InternInPlace: strings only please!");
211+
/* If it's a string subclass, we don't really know what putting
212+
it in the interned dict might do. */
213+
if (!PyString_CheckExact(s))
214+
return;
215+
if (PyString_CHECK_INTERNED(s))
216+
return;
217+
if (interned == NULL) {
218+
interned = PyDict_New();
219+
if (interned == NULL) {
220+
PyErr_Clear(); /* Don't leave an exception */
221+
return;
222+
}
223+
}
224+
t = PyDict_GetItem(interned, (PyObject *)s);
225+
if (t) {
226+
Py_INCREF(t);
227+
Py_DECREF(*p);
228+
*p = t;
229+
return;
230+
}
231+
232+
if (PyDict_SetItem(interned, (PyObject *)s, (PyObject *)s) < 0) {
233+
PyErr_Clear();
234+
return;
235+
}
236+
/* The two references in interned are not counted by refcnt.
237+
The string deallocator will take care of this */
238+
s->ob_refcnt -= 2;
239+
PyString_CHECK_INTERNED(s) = SSTATE_INTERNED_MORTAL;
240+
}
241+
```
242+
243+
244+
## 短字符串缓冲池
245+
246+
Python 使用 `characters` 数组保存了长度为 1 的字符串,当创建新的字符串时,如果 `characters` 中已经存在该字符串,则直接返回 `characters` 中的字符串。
247+
248+
```C
249+
static PyStringObject *characters[UCHAR_MAX + 1];
250+
// UCHAR_MAX = 255
251+
```
252+
253+
254+
## 修改代码测试
255+
256+
好像 characters 不会缓冲'*'之类的特殊字符。

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /