homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyUnicode_FromFormat: implement width and precision for %s, %S, %R, %V, %U, %A
Type: enhancement Stage: patch review
Components: Unicode Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Sean.Ochoa, eric.smith, ezio.melotti, lekma, lemburg, mark.dickinson, petri.lehtinen, python-dev, ron_adam, serhiy.storchaka, vstinner, ysj.ray
Priority: normal Keywords: needs review, patch

Created on 2009年11月15日 19:42 by mark.dickinson, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue_7330.diff ysj.ray, 2011年03月21日 15:48 review
unicode_fromformat_precision.patch vstinner, 2012年10月06日 22:53 review
unicode_fromformat_precision-2.patch vstinner, 2012年10月07日 20:35 review
unicode_fromformat_precision-3.patch vstinner, 2013年05月05日 23:03 review
Messages (42)
msg95306 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009年11月15日 19:42
There seems to be something wrong with the width handling code in 
PyUnicode_FromFormat; or perhaps I'm misusing it.
To reproduce: replace the line
 return PyUnicode_FromFormat("range(%R, %R)", r->start, r->stop);
in range_repr in Objects/rangeobject.c with
 return PyUnicode_FromFormat("range(%20R, %20R)", r->start, r->stop);
On my machine (OS X 10.6), this results in a segfault when invoking 
range_repr:
Python 3.2a0 (py3k:76311M, Nov 15 2009, 19:16:40) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> range(0, 10)
Segmentation fault
Perhaps these modifiers aren't supposed to be used with a width?
msg95310 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2009年11月15日 21:35
It looks like PyUnicode_FromFormatV is computing callcount incorrectly.
It's looking for 'S', 'R', or 'A' immediately following '%', before the
width. It seems to me it should be treating them the same as 's',
although I'll admit to not having looked at it close enough to know
exactly what's going on.
The whole routine could use some attention, I think.
msg111802 - (view) Author: ysj.ray (ysj.ray) Date: 2010年07月28日 13:11
I feel it's not proper to allow the width restrict on types %S, %R, %A. These types correspond to PyObject_Str(), PyObject_Repr, PyObject_ASCII() respectively, the results of them are usually a complete string representation of a object. If you put a width restriction on the string, it's likely that the result string is intercepted and is of no complete meaning. If you really want to put a width restriction on the result, you can use %s instead, with one or two more lines to get the corresponding char* from the object.
msg111808 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年07月28日 13:44
Ray.Allen wrote:
> 
> Ray.Allen <ysj.ray@gmail.com> added the comment:
> 
> I feel it's not proper to allow the width restrict on types %S, %R, %A. These types correspond to PyObject_Str(), PyObject_Repr, PyObject_ASCII() respectively, the results of them are usually a complete string representation of a object. If you put a width restriction on the string, it's likely that the result string is intercepted and is of no complete meaning. If you really want to put a width restriction on the result, you can use %s instead, with one or two more lines to get the corresponding char* from the object.
I agree with that, but don't feel strongly about not allowing this
use case.
If it's easy to support, why not have it ? Otherwise, I'd be +1 on
adding a check and raise an error in case a width modifier is used
with these markers.
msg111820 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010年07月28日 14:59
I think under the "we're all consenting adults" doctrine that it should be allowed. If you really want that behavior, why force the char*/%s dance at each call site when it's easy enough to do it in one place? I don't think anyone supplying a width would really be surprised that it would truncate the result and possibly break round-tripping through repr.
Besides, it's allowed in pure python code:
>>> '%.5r' % object()
'<obje'
msg111894 - (view) Author: ysj.ray (ysj.ray) Date: 2010年07月29日 06:08
You can write "%20s" as a argument for PyUnicode_FromFormat(), but it has no effect. The width and precision modifiers are not intended to apply to string formating(%s, %S, %R, %A), only apply to integer(%d, %u, %i, %x). Though you can write "%20s", but you cannot write "%20S", "%20R" and "%20A".
There can be several fixes:
1. make the presence of width and precision modifiers of %s, %S, %R, %A raise an Exception, like ValueError, instead of segment fault.
2. make the presence of width and precision modifiers of %s, %S, %R, %A have no effect, just like current %s.
3. make the presence of width and precision modifiers of %s, %S, %R, %A do have correct effect, like %r and %s in string formatting in python code.
Thanks to Eric's ideas. Now I'm sure I prefer the last fix. I will work out a patch for this.
msg112041 - (view) Author: ysj.ray (ysj.ray) Date: 2010年07月30日 06:38
Is this really worthy to fix?
msg112298 - (view) Author: ysj.ray (ysj.ray) Date: 2010年08月01日 09:27
Here is the patch, it add support to use width and precision formatters in PyUnicode_FromFormat() for type %s, %S, %R, %V, %U, %A, besides fixed two bugs, which at least I believe:
1. According to PyUnicode_FromFormat() doc: http://docs.python.org/dev/py3k/c-api/unicode.html?highlight=pyunicode_fromformat#PyUnicode_FromFormat, the "%A" should produce result of ascii(). But in the existing code, I only find code of call to ascii(object) and calculate the spaces needed for it, but not appending the ascii() output to result. Also according to my simple test, the %A doesn't work, as the following simple test function:
static PyObject *
getstr(PyObject *self, PyObject *args)
{
 const char *s = "hello world";
 PyObject *unicode = PyUnicode_FromString(s);
 return PyUnicode_FromFormat("%A", unicode);
}
Which should return the result of calling ascii() with the object named *unicode* as its argument. The result should be a unicode object with string "hello world". But it actually return a unicode object with string "%A". This can be fixed by adding the following line:
 case 'A':
in step 4.
2. another bug, here is a piece of code in Object/unicodeobject.c, PyUnicode_FromFormatV():
797 if (*f == '%') {
798 #ifdef HAVE_LONG_LONG
799 int longlongflag = 0;
800 #endif
801 const char* p = f;
802 width = 0;
803 while (ISDIGIT((unsigned)*f))
804 width = (width*10) + *f++ - '0';
Here the variable *width* cannot be correctly calculated, because the while loop will not execute, the *f currently is definitely '%'! So the width is always 0. But currently this doesn't cause error, since the following codes will ensure width >= MAX_LONG_CHARS:
834 case 'd': case 'u': case 'i': case 'x':
835 (void) va_arg(count, int);
836 #ifdef HAVE_LONG_LONG
837 if (longlongflag) {
838 if (width < MAX_LONG_LONG_CHARS)
839 width = MAX_LONG_LONG_CHARS;
840 }
841 else
842 #endif
843 /* MAX_LONG_CHARS is enough to hold a 64-bit integer,
844 including sign. Decimal takes the most space. This
845 isn't enough for octal. If a width is specified we
846 need more (which we allocate later). */
847 if (width < MAX_LONG_CHARS)
848 width = MAX_LONG_CHARS;
(currently width and precision only apply to integer types:%d, %u, %i, %x, not string and object types:%s, %S, %R, %A, %U, %V )
To fix, the following line:
801 const char* p = f;
should be:
801 const char* p = f++;
just as the similar loop in step 4, and add another line:
 f--;
after calculate width to adapting the character pointer.
My patch fixed these two problems. Hoping somebody could take a look at it.
msg117995 - (view) Author: ysj.ray (ysj.ray) Date: 2010年10月05日 08:23
I update the patch. Hope somebody could do a review.
msg117996 - (view) Author: ysj.ray (ysj.ray) Date: 2010年10月05日 08:24
I update the patch. Hope somebody could do a review.
msg117997 - (view) Author: ysj.ray (ysj.ray) Date: 2010年10月05日 08:26
Oooops! Sorry for re-submit the request...
msg127690 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年02月01日 10:52
I opened other tickets related to PyUnicode_FromFormatV:
 * #10833 :Replace %.100s by %s in PyErr_Format(): the arbitrary limit of 500 bytes is outdated
 * #10831: PyUnicode_FromFormatV() doesn't support %li, %lli, %zi
 * #10830: PyUnicode_FromFormatV("%c") doesn't support non-BMP characters on narrow build
 * #10829: PyUnicode_FromFormatV() bugs with "%" and "%%" format strings
(see also #10832: Add support of bytes objects in PyBytes_FromFormatV())
PyUnicode_FromFormatV() has now tests in test_unicode: issue_7330.diff should add new tests, at least to check that %20R doesn't crash.
msg128296 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月10日 15:37
Thanks haypo!
Here is the updated patch, it add the tests about width modifiers and precision modifiers of %S, %R, %A. Besides I don't know how to add tests of %s, since when calling through ctypes, I could not get correct result value as python object from PyUnicode_FromFormat() with '%s' in format string as argument.
msg128359 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月11日 02:40
Here's the complete patch, added unittest for width modifier and precision modifier for '%s' formatter of PyUnicode_FromFormat() function.
msg128381 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年02月11日 12:48
It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829).
---
unicode_format() looks suboptimal.
+ memset(buffer, ' ', width);
+ width_unicode = PyUnicode_FromStringAndSize(buffer, width);
You should avoid this byte string (buffer) and use memset() on the Unicode string directly. Something like:
Py_UNICODE *u;
Py_ssize_t i;
width_unicode = PyUnicode_FromUnicode(NULL, width);
u = PyUnicode_AS_UNICODE(width_unicode);
for(i=0; i < width; i++) {
 *u = (Py_UNICODE)' ';
 u++;
}
You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length.
---
I don't like "unicode_format" function name: it sounds like "str.format()" in Python. A suggestion: "unicode_format_align"
---
With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.
---
- n += PyUnicode_GET_SIZE(str);
+ n += width > PyUnicode_GET_SIZE(str) ? width : PyUnicode_GET_SIZE(str);
I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/).
---
Your patch implements %.100s (and %.100U): we might decide what to do with #10833 before commiting your patch.
---
In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829?
---
Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)
---
- "must be a sequence, not %200s",
+ "must be a sequence, not %.200s",
Hum, I think that they are many other places where such fix should be done. Nobody noticed this typo before because %.200s nor %200s were implemented (#10833).
---
Finally, do you really need to implement %200s, %2.5s and %.100s? I don't know, but I would be ok to commit the patch if you fix it for all of my remarks :-)
msg128725 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月17日 14:07
Thanks hyapo! 
> It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829).
Sorry, but I think my patch doesn't fix #10829. It seems link another issue. And by applying my patch and add tests from #10829's patch, the tests cannot passed. Or did I missed something?
> You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length.
In order to use Py_UNICODE_COPY, I have to create a unicode object with required length first. I feel this have the same cost as using PySequence_GetSlice(). If I understand correctly?
> With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.
Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 *bytes*? I think the %s formatter just indicate that the argument is c-style chars, the result is always unicode string, and the width and precision formatters are to applied after converting c-style chars to string. 
> I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/).
Do you mean combine step 3 and step 4 together? Currently step 3 is just to compute the biggest width value and step 4 is to compute exact width and do the real format work. Only by doing real format we can get the exact width of a string. So I have to compute each width twice in both step 3 and step 4. Is combining the two steps in to one a good idea?
> In my opinion, the patch is a little bit too big. We may first commit the fix on the code parsing the width and precision: fix #10829?
Again, I guess #10829 need another its own patch to fix. 
> Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)
Err, '%.s' causes unexpected result both with and without my patch. Maybe it's still another bug?
msg128773 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月18日 02:56
> Do you mean combine step 3 and step 4 together? Currently step 3 is just to compute the biggest width value and step 4 is to compute exact width and do the real format work. Only by doing real format we can get the exact width of a string. So I have to compute each width twice in both step 3 and step 4. Is combining the two steps in to one a good idea?
Sorry, Here I mean:
Do you mean combine step 3 and step 4 together? Currently step 3 is just to compute the biggest width value and step 4 is to compute exact width and do the convert work(by calling PyObject_Str()/PyObject_Repr()/PyObject_ASCII()/PyUnicode_DecodeUTF8() for %S/%R/%A/%s). Only by doing convert we can get the exact width of a string. So I have to compute each width twice in both step 3 and step 4. Is combining the two steps in to one a good idea?
msg128776 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年02月18日 07:36
> > It looks like your patch fixes #10829: you should add tests for that, you can just reuse the tests of my patch (attached to #10829).
> 
> Sorry, but I think my patch doesn't fix #10829.
Ah ok, so don't add failing tests :-)
> > You should also avoid the creation of a temporary unicode object (it can be slow if precision is large) using PySequence_GetSlice(). Py_UNICODE_COPY() does already truncate the string because you can pass an arbitrary length.
> 
> In order to use Py_UNICODE_COPY, I have to create a unicode object with required length first.
No you don't. You can copy a substring of the input string with
Py_UNICODE_COPY: just pass a smaller length.
> > With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.
> 
> Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 *bytes*?
You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
pass a size smaller than strlen(s).
case 's':
{
 /* UTF-8 */
 const char *s = va_arg(count, const char*);
 PyObject *str = PyUnicode_DecodeUTF8(s, strlen(s), "replace");
 if (!str)
 goto fail;
 n += PyUnicode_GET_SIZE(str);
 /* Remember the str and switch to the next slot */
 *callresult++ = str;
 break;
}
I don't know if we should truncate to a number of bytes, or a number of
characters.
> > I don't like this change because I hate having to compute manually strings length. It should that it would be easier if you format directly strings with width and precision at step 3, instead of doing it at step 4: so you can just read the length of the formatted string, and it avoids having to handle width/precision in two steps (which may be inconsistent :-/).
> 
> Do you mean combine step 3 and step 4 together? Currently step 3 is just to compute the biggest width value and step 4 is to compute exact width and do the real format work. Only by doing real format we can get the exact width of a string. So I have to compute each width twice in both step 3 and step 4. Is combining the two steps in to one a good idea?
"Do you mean combine step 3 and step 4 together?"
Yes, but I am no more sure that it is the right thing to do.
> > Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)
> 
> Err, '%.s' causes unexpected result both with and without my patch. Maybe it's still another bug?
If the fix (always have the same behaviour) is short, it would be nice
to include it in your patch.
msg128785 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月18日 15:02
> No you don't. You can copy a substring of the input string with
Py_UNICODE_COPY: just pass a smaller length.
Oh, yes, I got your meaning now. I'll follow this.
> You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
Oh, what if the trunked char* cannot be decoded correctly? e.g. a tow-bytes character is divided in the middle? 
> Yes, but I am no more sure that it is the right thing to do.
If I understand correctly(my English ability is limited), your suggestion is to combine, right? I'm afraid that combine may bring us too complicated code to write. The currently 4 steps just divide the process into smaller and simpler pieces. I'm not sure.
msg128786 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年02月18日 15:05
> Oh, what if the trunked char* cannot be decoded correctly?
> e.g. a tow-bytes character is divided in the middle? 
Yes, but PyUnicode_FromFormatV() uses UTF-8 decoder with replace error handler, and so the incomplete byte sequence will be replaced by � (it doesn't fail with an error). Example:
>>> "abc€".encode("utf-8")[:-1].decode("utf-8", "replace")
'abc�'
msg128790 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月18日 15:38
> Can you add tests for "%.s"? I would like to know if "%.s" is different than "%s" :-)
Oh sorry~~ I made an mistake. There is no bug here. I have attached tests that show that '%.s' is the same as '%s'.
Here is the updated patch:
1, changed the function name unicode_format() to 
1, remove
"""
- "must be a sequence, not %200s",
+ "must be a sequence, not %.200s",
"""
in Python/ceval.c
2, Removing using PySequence_GetSlice() in unicode_format_align() and do a refactor to optimize the process.
3, Add tests for '%.s' and '%s', as haypo wanted.
This is obviously not the final patch just convenient for other to do a review. Something more need to be discussed.
msg128933 - (view) Author: ysj.ray (ysj.ray) Date: 2011年02月21日 03:18
> > > With your patch, "%.200s" truncates the input string to 200 *characters*, but I think that it should truncate to 200 *bytes*, as printf does.
> > 
> > Sorry, I don't understand. The result of PyUnicode_FromFormatV() is a unicode object. Then how to truncate to 200 *bytes*?
> You can truncate the input char* on the call to PyUnicode_DecodeUTF8:
pass a size smaller than strlen(s).
Now I wonder how should we treat precision formatters of '%s'. First of all, the PyUnicode_FromFormat() should behave like C printf(). In C printf(), the precision formatter of %s is to specify a maximum width of the displayed result. If final result is longer than that value, it must be truncated. That means the precision is applied on the final result. While python's PyUnicode_FromFormat() is to produce unicode strings, so the width and precision formatter should be applied on the final unicode string result. And the format stage is split into two ones, one is converting each paramater to an unicode string, another one is to put the width and precision formatters on them. So I wonder if we should apply the precision formatter on the converting stage, that is, to PyUnicode_DecodeUTF8(). So in my opinion precision should not be applied to input chars, but output unicodes.
I hope I didn't misunderstand something.
So haypo, what's your opinion.
msg129942 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月03日 09:27
Here is the updated patch:
1, Work with function parse_format_flags() which is introduced in issue10829, and the patch is simpler and more clear than before.
2, Change parse_format_flags() to set precision value to -1 in the case of '%s' in order to differ with '%.0s'
3, Move call of unicode_format_align() in step 3 in order to avoid many codes like "n += width > PyUnicode_GET_SIZE(str) ? width : PyUnicode_GET_SIZE(str);", (following haypo's comments)
msg130258 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月07日 14:27
I noticed that after apply my last patch and running full unittest cases, some weird errors which I don't know the reasons occurred, for example:
AttributeError: 'dict' object has no attribute 'get'
and
AttributeError: 'Queue' object has no attribute 'get'
I didn't look deep into it. But I found after I optimist my patch, these errors disappeared: I removed the "unicode_format_align()" function in previous patch, directly add needed spaces and copy part of unicode got from parameters according to width and precision formatters in step 4(using Py_UNICODE_FILL() and Py_UNICODE_COPY()) . This avoid create temporary unicode objects using unicode_format_align() in step 3. And also the patch becomes simpler.
So this patch is intended to replace of the previous. And if I have more time, I will try to find the reasons of the weird errors.
msg131649 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月21日 13:37
Ray Allen: Your patch doesn't touch the documentation. At least, you should mention (using .. versionchanged:: 3.3) that PyUnicode_FromFormat() does now support width and precision. It is important to specify the unit of the sizes: number of bytes or number of characters? Because many developer may refer to printf() which counts in bytes (especially for %s). PyUnicode_FromFormat() is more close to wprintf(), but I don't know if wprintf() uses bytes or characters for width and precision with the %s and %ls formats.
I plan to fix #10833 by replacing %.100s by %s is most (or all) error messages, and then commit your patch.
msg131668 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月21日 15:48
Ooops! I found my last submitted patch is a wrong one.
Here is the updated patch add doc entries about the changes. The test cases which assert error messages generated by PyUnicode_FromFormat() with "%.200s" formatters equality would failed due to this patch. Hope you don't miss any of them.
msg131710 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年03月22日 00:10
New changeset d3ae3fe3eb97 by Victor Stinner in branch 'default':
Issue #7330, #10833: Replace %100s by %.100s and %200s by %.200s
http://hg.python.org/cpython/rev/d3ae3fe3eb97 
msg131964 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月24日 10:48
By the way, as my simple tests, wprintf() with "%ls" does apply the width and precision formatters on units of characters.
msg131965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月24日 10:53
There are 4 patches "issue 7030" attached to this issue. Some of them have a version number in their name, some doesn't. You did the same on other issues. It is more easy to follow a patch if it has a version number, for example: issue_7330.diff, issue_7330-2.diff, issue_7330-3.diff, issue_7330-4.diff, ... And I suppose that you can remove all old patches, except if they are alternative implementations or contain something special.
msg131968 - (view) Author: ysj.ray (ysj.ray) Date: 2011年03月24日 11:18
Sorry for having done that! I will remove old patches and leave a cleaner view.
msg132057 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年03月25日 00:11
I closed #10833 as invalid, because it is a regression of Python 3. PyErr_String() uses PyString_FromFormatV() in Python 2, which supports precision for %s, whereas it uses PyUnicode_FromFormatV() in Python 3, which never supported precision for %s.
msg144626 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年09月29日 20:06
Hum, the issue is still open, I will try to review it.
msg147861 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月18日 12:03
Issue #13428 has been marked as a duplicate of this issue.
msg147966 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年11月19日 19:30
Hi!
I'd like to have this committed to be able to fix #13349. So here's a review.
- In Doc/c-api/unicode.rst, the two "versionchanged:: 3.3" directives can be merged
- In tests, I'd use 'abcde' rather than 'xxxxx' to make sure that correct characters are copied to the output (hope you understand what I mean)
- No test checks that width and precision work on characters rather than bytes
- The changes to unicodeobject.c don't apply on top of current default branch.
msg172258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月06日 22:53
I rewrote PyUnicode_FromFormatV() to use a single step instead of four: see issue #16147. So it's now simpler to fix this issue. Here is a new patch to implement width and precision modifiers for %s, %A, %R, %S and %U formats.
msg172262 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月06日 23:09
I read again this old issue. I still think that it would be better to truncate to a number of *bytes* for "%s" format (and %V format when the first argument is NULL) to mimic printf(). The "replace" error handler of the UTF-8 decoder handles truncated string correctly. So I should update my patch.
msg172343 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年10月07日 20:35
Updated patch: precision for "%s" and "%V" (if the first PyObject* argument is NULL) formats is now a number of bytes, rather than a number of characters. width is still always a number of character.
The reason is that "%.100s" can be used for avoid a crash if the argument is not terminated by a null character (see issue #10833).
msg177206 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012年12月09日 10:24
I found one bug and add some nitpicks and optimization suggestion on Rietveld.
msg188476 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月05日 23:03
New version of my patch taking Serhiy's remarks into account:
 - add a check_format() function to cleanup unit tests
 - only call _PyUnicodeWriter_Prepare() once per formatted argument: compute the length and maximum character. Be more optimistic about sprintf() for integer and pointer: expect that the maximum character is 127 or less
 - uniformize code parsing width and precision
 - factorize code for '%s' and '%V'
Note: remove also _PyUnicode_WriteSubstring() from the patch, it was already added.
msg188477 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月05日 23:05
I didn't add the following optimization (proposed by Serhiy in his review) because I'm not convinced that it's faster, and it's unrelated to this issue:
 if (width > (PY_SSIZE_T_MAX - 9) / 10
 && width > (PY_SSIZE_T_MAX - ((int)*f - '0')) / 10)
 { ... }
instead of 
 if (width > (PY_SSIZE_T_MAX - ((int)*f - '0')) / 10)
 { ... }
msg188596 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年05月06日 21:23
New changeset 9e0f1c3bf9b6 by Victor Stinner in branch 'default':
Issue #7330: Implement width and precision (ex: "%5.3s") for the format string
http://hg.python.org/cpython/rev/9e0f1c3bf9b6 
msg188597 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月06日 21:36
Finally, I closed this issue. Sorry for the long delay, but many other PyUnicode_FromFormat() issues had to be discussed/fixed before this one can be fixed. It was also much easier to fix this issue since my refactoring of PyUnicode_FromFormat() to only parse the format string once (thanks to the _PyUnicodeWriter API) instead of having 4 steps.
Thanks to Ysj Ray, thanks to reviewers.
This is one of the oldest issue that I had to fix :-)
History
Date User Action Args
2022年04月11日 14:56:54adminsetgithub: 51579
2013年05月06日 21:36:01vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg188597
2013年05月06日 21:23:34python-devsetmessages: + msg188596
2013年05月05日 23:05:27vstinnersetmessages: + msg188477
2013年05月05日 23:03:09vstinnersetfiles: + unicode_fromformat_precision-3.patch

messages: + msg188476
2013年01月27日 12:44:22serhiy.storchakasettype: crash -> enhancement
versions: + Python 3.4, - Python 3.3
2012年12月09日 10:24:11serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg177206
2012年12月09日 07:18:03Sean.Ochoasetnosy: + Sean.Ochoa
2012年11月10日 16:59:30serhiy.storchakalinkissue13349 dependencies
2012年10月07日 20:35:40vstinnersetfiles: + unicode_fromformat_precision-2.patch

messages: + msg172343
2012年10月06日 23:09:17vstinnersetmessages: + msg172262
2012年10月06日 22:53:44vstinnersetfiles: + unicode_fromformat_precision.patch

messages: + msg172258
2011年11月19日 19:30:36petri.lehtinensetkeywords: + needs review

stage: patch review
messages: + msg147966
versions: + Python 3.3, - Python 3.2
2011年11月18日 12:03:43vstinnersetnosy: + petri.lehtinen
messages: + msg147861
2011年10月09日 08:48:07lekmasetnosy: + lekma
2011年09月29日 20:06:08vstinnersetmessages: + msg144626
2011年03月25日 00:11:01vstinnersetmessages: + msg132057
2011年03月24日 11:19:52ysj.raysetfiles: - issue7330_2.diff
2011年03月24日 11:19:46ysj.raysetfiles: - issue_7330.diff
2011年03月24日 11:18:43ysj.raysetfiles: - issue_7330.diff
2011年03月24日 11:18:34ysj.raysetmessages: + msg131968
2011年03月24日 10:53:10vstinnersetmessages: + msg131965
2011年03月24日 10:48:42ysj.raysetmessages: + msg131964
2011年03月22日 00:10:07python-devsetnosy: + python-dev
messages: + msg131710
2011年03月21日 15:48:38ysj.raysetfiles: + issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg131668
2011年03月21日 15:25:10ysj.raysetfiles: - issue7330_3.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011年03月21日 13:37:12vstinnersetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg131649
2011年03月07日 14:27:15ysj.raysetfiles: + issue7330_3.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg130258
2011年03月03日 09:43:30vstinnersetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
title: PyUnicode_FromFormat segfault -> PyUnicode_FromFormat: implement width and precision for %s, %S, %R, %V, %U, %A
2011年03月03日 09:27:35ysj.raysetfiles: + issue7330_2.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg129942
2011年02月21日 03:18:04ysj.raysetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128933
2011年02月18日 15:38:52ysj.raysetfiles: + issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128790
2011年02月18日 15:05:33vstinnersetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128786
2011年02月18日 15:02:35ysj.raysetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128785
2011年02月18日 07:36:11vstinnersetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128776
2011年02月18日 02:56:25ysj.raysetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128773
2011年02月17日 14:07:11ysj.raysetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128725
2011年02月14日 02:18:59ysj.raysetfiles: - issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011年02月14日 02:18:54ysj.raysetfiles: - issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011年02月14日 02:18:49ysj.raysetfiles: - issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011年02月11日 12:48:14vstinnersetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128381
2011年02月11日 02:40:01ysj.raysetfiles: + issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128359
2011年02月10日 15:37:32ysj.raysetfiles: + issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg128296
2011年02月10日 13:40:19ysj.raysetfiles: - issue_7330.diff
nosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
2011年02月01日 10:52:58vstinnersetnosy: lemburg, mark.dickinson, vstinner, eric.smith, ron_adam, ezio.melotti, ysj.ray
messages: + msg127690
2011年02月01日 05:30:07belopolskylinkissue7574 superseder
2011年02月01日 05:24:06belopolskysetnosy: + vstinner
components: + Unicode
2010年10月05日 08:26:14ysj.raysetmessages: + msg117997
2010年10月05日 08:24:56ysj.raysetfiles: + issue_7330.diff

messages: + msg117996
2010年10月05日 08:24:06ysj.raysetfiles: + issue_7330.diff

messages: + msg117995
2010年08月01日 09:27:52ysj.raysetfiles: + issue_7330.diff
keywords: + patch
messages: + msg112298
2010年07月30日 06:38:52ysj.raysetmessages: + msg112041
2010年07月29日 06:08:07ysj.raysetmessages: + msg111894
2010年07月28日 14:59:24eric.smithsetmessages: + msg111820
2010年07月28日 13:44:53lemburgsetnosy: + lemburg
messages: + msg111808
2010年07月28日 13:11:39ysj.raysetnosy: + ysj.ray
messages: + msg111802
2010年07月28日 04:09:14ezio.melottisetnosy: + ezio.melotti
2010年07月27日 18:46:52ron_adamsetnosy: + ron_adam

title: PyUnicode_FromFormat segfault when using widths. -> PyUnicode_FromFormat segfault
2009年11月15日 21:35:47eric.smithsetnosy: + eric.smith
messages: + msg95310
2009年11月15日 19:42:32mark.dickinsoncreate

AltStyle によって変換されたページ (->オリジナル) /