Message115339
| Author |
vstinner |
| Recipients |
docs@python, vstinner |
| Date |
2010年09月01日.22:41:28 |
| SpamBayes Score |
5.7731597e-15 |
| Marked as misclassified |
No |
| Message-id |
<1283380895.91.0.777071955411.issue9738@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Many C functions have bytes argument (char* type) but the encoding is not documented. If would not be a problem if the encoding was always the same, but it is not. Examples:
- format of PyUnicode_FromFormat() should be encoded as ISO-8859-1
- filename of PyParser_ASTFromString() should be encoded as utf-8
- filename of PyErr_SetFromErrnoWithFilename() should be encoded to the filesystem encoding (with strict error handler, and not surrogateescape)
- 's' argument of PyParser_ASTFromString() should be encoded as utf-8 if PyPARSE_IGNORE_COOKIE flag is set, otherwise the parser checks for #coding:xxx cookie (if there is no cookie, utf-8 is used)
Attached patch is a try to document most low level functions. I choosed to add the name of function arguments in the headers because I consider that a header can be used as a quick documentation. I only touched .c files to change argument names.
It is hard to get the right encoding, so I cannot ensure that my patch is correct. My patch is just a draft.
I don't know if "encoded to utf-8" is the right expression. Or should it be "decoded as utf-8"? |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2010年09月01日 22:41:36 | vstinner | set | recipients:
+ vstinner, docs@python |
| 2010年09月01日 22:41:35 | vstinner | set | messageid: <1283380895.91.0.777071955411.issue9738@psf.upfronthosting.co.za> |
| 2010年09月01日 22:41:34 | vstinner | link | issue9738 messages |
| 2010年09月01日 22:41:34 | vstinner | create |
|