homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve f-string implementation: FORMAT_VALUE opcode
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: Mark.Shannon, berker.peksag, brett.cannon, eric.smith, larry, python-dev, serhiy.storchaka, skrah
Priority: normal Keywords: patch

Created on 2015年10月26日 15:44 by eric.smith, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
format-opcode.diff eric.smith, 2015年10月26日 15:44 review
format-opcode-1.diff eric.smith, 2015年10月26日 17:35 review
format-opcode-2.diff eric.smith, 2015年10月28日 20:59 review
format-opcode-3.diff eric.smith, 2015年11月02日 13:30 review
Messages (18)
msg253476 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年10月26日 15:44
Currently, the f-string f'a{3!r:10}' evaluates to bytecode that does the same thing as:
''.join(['a', format(repr(3), '10')])
That is, it literally calls the functions format() and repr(). The same holds true for str() and ascii() with !s and !a, respectively.
By redefining format, str, repr, and ascii, you can break or pervert the computation of the f-string's value:
>>> def format(v, fmt=None): return '42'
...
>>> f'{3}'
'42'
It's always been my intention to fix this. This patch adds an opcode FORMAT_VALUE, which instead of looking up format, etc., directly calls PyObject_Format, PyObject_Str, PyObject_Repr, and PyObject_ASCII. Thus, you can no longer modify what an f-string produces merely by overriding the named functions.
In addition, because I'm now saving the name lookups and function calls, performance is improved.
Here are the times without this patch:
$ ./python -m timeit -s 'x="test"' 'f"{x}"'
1000000 loops, best of 3: 0.3 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x!s}"'
1000000 loops, best of 3: 0.511 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x!r}"'
1000000 loops, best of 3: 0.497 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x!a}"'
1000000 loops, best of 3: 0.461 usec per loop
And with this patch:
$ ./python -m timeit -s 'x="test"' 'f"{x}"'
10000000 loops, best of 3: 0.02 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x!s}"'
100000000 loops, best of 3: 0.02 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x!r}"'
10000000 loops, best of 3: 0.0896 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x!a}"'
10000000 loops, best of 3: 0.0923 usec per loop
So a 90%+ speedup, for these simple cases.
Also, now f-strings are faster than %-formatting, at least for some types:
$ ./python -m timeit -s 'x="test"' '"%s"%x'
10000000 loops, best of 3: 0.0755 usec per loop
$ ./python -m timeit -s 'x="test"' 'f"{x}"'
10000000 loops, best of 3: 0.02 usec per loop
Note that people often "benchmark" %-formatting with code like the following. But the optimizer converts this to a constant string, so it's not a fair comparison:
$ ./python -m timeit '"%s"%"test"'
100000000 loops, best of 3: 0.0161 usec per loop
These microbenchmarks aren't the end of the story, since the string concatenation also takes some time. That's another optimization I might implement in the future.
Thanks to Mark and Larry for some advice on this.
msg253484 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年10月26日 17:35
Small cleanups. Fixed a bug in PyCompile_OpcodeStackEffect.
msg253505 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年10月26日 22:50
This patch addresses Larry's review, plus bumps the bytecode magic number.
msg253610 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年10月28日 17:22
Oops. Forgot to include the diff with that last message. But it turns out it doesn't work, anyway, because I put the #define's in opcode.h, which is generated (so my code got deleted!).
I'll try to find some reasonable .h file to use and submit a new patch soon.
msg253611 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015年10月28日 17:25
I know this issue is slowly turning into "make Eric update outdated docs", but if you find that https://docs.python.org/devguide/compiler.html#introducing-new-bytecode is outdated, could you update that doc?
msg253614 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年10月28日 18:35
> I'll try to find some reasonable .h file to use and submit a new patch soon.
It's Lib/opcode.py.
msg253615 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年10月28日 18:46
Brett: I'll take a look.
Serhiy: I'm looking for a place to put some #defines related to the bit masks and bit values that my FORMAT_VALUE opcode is using for opargs. One option is to just put them in Tools/scripts/generate_opcode_h.py, so that they end up in the generated opcode.h, but that seems a little sleazy. I can't find a better place they'd belong, though.
Specifically, I want to put these lines into a .h file to use by ceval.c and compile.c:
/* Masks and values for FORMAT_VALUE opcode. */
#define FVC_MASK 0x3
#define FVS_MASK 0x4
#define FVC_NONE 0x0
#define FVC_STR 0x1
#define FVC_REPR 0x2
#define FVC_ASCII 0x3
#define FVS_HAVE_SPEC 0x4
msg253618 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年10月28日 19:27
Does the dis module need these constants? If no, you can use either ceval.h or compile.h.
msg253630 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年10月28日 20:59
Thanks, Serihy. I looked at those, and neither one is a great fit. But not having a better option, I went with ceval.h. Here's the patch.
msg253910 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年11月02日 13:30
Some formatting improvements.
I removed one of the optimizations I was doing, because it's also done in PyObject_Format(). I plan on moving other optimizations into PyObject_Format(), but I'll open a separate issue for that.
I swapped the order of the parameters on the stack, so that I could use the micro-optimization of TOP() and SET_TOP().
I'll commit this shortly.
msg253912 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年11月02日 13:49
It looks to me that FVS_MASK and FVS_HAVE_SPEC are duplicates. FVS_HAVE_SPEC is set but FVS_MASK is tested.
msg253915 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年11月02日 14:10
Right, they're the same because it's a single bit. You 'and' with a mask to get the bits you want, and you 'or' together the values. It's an old habit left over from my bit-twiddling days.
I guess the test could really be:
have_fmt_spec = (oparg & FVS_MASK) == FVS_HAVE_SPEC;
to make it more clear what I'm doing.
It's easier to see the same thing with the FVC_MASK and FVC_* values, since that field is multiple bits.
msg253928 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年11月02日 15:50
The MASK idiom is nice and I think it's good to be exposed to
it from time to time.
msg254006 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年11月03日 17:45
New changeset 1ddeb2e175df by Eric V. Smith in branch 'default':
Issue 25483: Add an opcode to make f-string formatting more robust.
https://hg.python.org/cpython/rev/1ddeb2e175df 
msg254008 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年11月03日 18:09
New changeset 4734713a31ed by Eric V. Smith in branch 'default':
Issue 25483: Update dis.rst with FORMAT_VALUE opcode description.
https://hg.python.org/cpython/rev/4734713a31ed 
msg254009 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2015年11月03日 18:13
Brett: https://docs.python.org/devguide/compiler.html#introducing-new-bytecode looks correct (and reminded me to update dis.rst!).
msg254015 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015年11月03日 20:48
+ * ``(flags & 0x03) == 0x00``: *value* is formattedd as-is.
Just noticed a small typo: formattedd
Also, needs ``.. versionadded:: 3.6``
msg254019 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年11月03日 21:30
New changeset 93fd7adbc7dd by Eric V. Smith in branch 'default':
Issue 25483: Fix doc typo and added versionadded. Thanks, Berker Peksag.
https://hg.python.org/cpython/rev/93fd7adbc7dd 
History
Date User Action Args
2022年04月11日 14:58:23adminsetgithub: 69669
2015年11月03日 21:30:52python-devsetmessages: + msg254019
2015年11月03日 20:48:29berker.peksagsetnosy: + berker.peksag
messages: + msg254015
2015年11月03日 18:13:36eric.smithsetstatus: open -> closed
resolution: fixed
messages: + msg254009

stage: patch review -> resolved
2015年11月03日 18:09:29python-devsetmessages: + msg254008
2015年11月03日 17:45:35python-devsetnosy: + python-dev
messages: + msg254006
2015年11月02日 15:50:25skrahsetnosy: + skrah
messages: + msg253928
2015年11月02日 14:10:06eric.smithsetmessages: + msg253915
2015年11月02日 13:49:23serhiy.storchakasetmessages: + msg253912
2015年11月02日 13:30:15eric.smithsetfiles: + format-opcode-3.diff

messages: + msg253910
2015年10月28日 20:59:49eric.smithsetfiles: + format-opcode-2.diff

messages: + msg253630
2015年10月28日 19:27:42serhiy.storchakasetmessages: + msg253618
2015年10月28日 18:46:09eric.smithsetmessages: + msg253615
2015年10月28日 18:35:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg253614
2015年10月28日 17:25:52brett.cannonsetmessages: + msg253611
2015年10月28日 17:22:46eric.smithsetmessages: + msg253610
2015年10月27日 17:06:58brett.cannonsetnosy: + brett.cannon
2015年10月26日 22:50:48eric.smithsetmessages: + msg253505
2015年10月26日 17:35:21eric.smithsetfiles: + format-opcode-1.diff

messages: + msg253484
2015年10月26日 15:44:59eric.smithcreate

AltStyle によって変換されたページ (->オリジナル) /