This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月20日 20:31 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| profile.bat | loewis, 2012年09月21日 21:07 | |||
| profiletests.txt | loewis, 2012年09月21日 21:07 | |||
| issue15993.diff | skrah, 2012年09月21日 22:54 | review | ||
| ull_vctest.diff | skrah, 2014年06月18日 11:20 | Quick-and-dirty alternative version of PyLong_AsUnsignedLongLong(). Just for testing. | review | |
| Messages (32) | |||
|---|---|---|---|
| msg170844 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月20日 20:31 | |
I've installed 3.3.0-rc2 on Windows-7 64-bit using the msi installer. I'm getting these failures in test_buffer, but I can *not* reproduce them when I build Win-32/pgo/python.exe from source: ====================================================================== FAIL: test_memoryview_assign (test.test_buffer.TestBufferProtocol) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Program Files (x86)\Python33\lib\test\test_buffer.py", line 2863, self.assertEqual(m[i], 8) AssertionError: 34359738368 != 8 ====================================================================== FAIL: test_memoryview_struct_module (test.test_buffer.TestBufferProtocol) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Program Files (x86)\Python33\lib\test\test_buffer.py", line 2476, self.assertEqual(m[0], nd[0]) AssertionError: 15080797365275624638 != 6838299039298601293 |
|||
| msg170858 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月20日 22:05 | |
Both lzma and memoryview use PyLong_AsUnsignedLongLong() in the
affected code paths. I get this with the msi installed python.exe:
>>> import array
>>> a = array.array('Q', [1,2,3,4])
>>> m = memoryview(a)
>>> m[0] = 4
>>> m[0]
17179869184
>>>
And the correct result with the self-compiled (PGO) python.exe:
>>> import array
>>> a = array.array('Q', [1,2,3,4])
>>> m = memoryview(a)
>>> m[0] = 4
>>> m[0]
4
|
|||
| msg170873 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月21日 09:27 | |
The high and low words of the 64-bit value are switched:
>>> a = array.array('Q', [1])
>>> m = memoryview(a)
>>> m[0]= 2**32+5
>>> m[0]
21474836481
>>> struct.unpack_from('8s', m, 0)
(b'\x01\x00\x00\x00\x05\x00\x00\x00',)
Can anyone reproduce this in a source build? I think this should
be a blocker.
|
|||
| msg170916 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年09月21日 20:54 | |
Issue #15995 was marked as a duplicate of this issue. Copy of its initial message (msg170853): This is similar to #15993: With the installed Python from the rc2-msi test_lzma fails. I cannot reproduce the failure with python.exe (PGO) compiled from source: ====================================================================== ERROR: test__decode_filter_properties (test.test_lzma.MiscellaneousTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python33\lib\test\test_lzma.py", line 1105, in test__decode_filter_properties lzma.FILTER_LZMA1, b"]\x00\x00\x80\x00") ValueError: Invalid filter ID: 4611686018427387905 ====================================================================== ERROR: test_filter_properties_roundtrip (test.test_lzma.MiscellaneousTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python33\lib\test\test_lzma.py", line 1114, in test_filter_properties_roundtrip lzma.FILTER_LZMA1, b"]\x00\x00\x80\x00") ValueError: Invalid filter ID: 4611686018427387905 ---------------------------------------------------------------------- |
|||
| msg170917 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年09月21日 21:02 | |
I fail to reproduce the issue on Windows 7 (Version 6.0.1, number 7601, Service Pack 1): --- Microsoft Windows [version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. Tous droits réservés. C:\Users\haypo>cd \python33 C:\Python33>python.exe Python 3.3.0rc2 (v3.3.0rc2:88a0792e8ba3, Sep 9 2012, 10:13:38) [MSC v.1600 64 b it (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> exit() C:\Python33>python.exe -m test test_buffer test_lzma [1/2] test_buffer [2/2] test_lzma All 2 tests OK. --- I'm running Windows 7 in KVM, my host CPU is a Intel i7-2600. > I'm getting these failures in test_buffer, > but I can *not* reproduce them when I build > Win-32/pgo/python.exe from source: The issue looks to be specific to 64 bits binaries. You need the professional version of Visual Studio 10. The express version doesn't support 64 bits. I only have the Express version. |
|||
| msg170918 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月21日 21:04 | |
Declaring this a release blocker is technically difficult. If it is a release blocker, further releases cannot be done until it is resolved. Since it is an issue with the binary only, the only possible way to resolve this is with a release. So declaring this a release blocker essentially deadlocks the release. |
|||
| msg170919 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月21日 21:07 | |
For the record, the released binary is not just a PGO build, but has been trained with the attached training script. |
|||
| msg170920 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2012年09月21日 21:14 | |
> You need the professional version of Visual Studio 10. > The express version doesn't support 64 bits. I only have the > Express version. Ah yes, I now remember my issue with VS10 Express: when I set the project to 64 bits, I get such error popup: http://www.haypocalc.com/tmp/visual_studio_64bits.png Which version should I try? Ultimate? Premium? Professional? |
|||
| msg170923 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月21日 21:29 | |
STINNER Victor <report@bugs.python.org> wrote: > Which version should I try? Ultimate? Premium? Professional? Try Ultimate, it's AFAIK the only version these days that supports PGO. |
|||
| msg170924 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月21日 21:30 | |
Martin v. Löwis <report@bugs.python.org> wrote: > For the record, the released binary is not just a PGO build, but has > been trained with the attached training script. Thanks. Now I can reproduce the issue with a source build. |
|||
| msg170929 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月21日 21:40 | |
I'm using Ultimate, but I think Professional should provide you with all required tools. |
|||
| msg170935 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月21日 22:54 | |
It's a bit late here already, but unless I'm missing something I think this is an optimizer bug. I'm attaching a workaround for memoryview.c and _lzmamodule.c. |
|||
| msg170971 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月22日 08:32 | |
System: Windows 7 64-bit
Build (unpatched): PCBuild\Win32-pgo\python.exe, trained with profile.bat
In the unpatched version, I stepped through this test case in the debugger:
import array
a = array.array('Q', [1])
m = memoryview(a)
m[0] = 1
At Objects/memoryobject.c:1572: llu == 1
At Objects/memoryobject.c:1782: pylong_as_llu returned value 4294967296
So I think it's pretty safe to say that this is indeed an optimizer bug.
|
|||
| msg170973 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月22日 08:38 | |
Sorry, the line numbers are messed up. They should be: Objects/memoryobject.c:1569 Objects/memoryobject.c:1776 |
|||
| msg170981 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月22日 10:00 | |
Unfortunately (?) I now cannot produce this myself, anymore, either. Given all the problems, I'll stop using PGO on all branches and targets, as the compiler apparently generates bad code. Anybody curious to investigate the issue further who is able to reproduce it might want to look at the generated machine code, to find out where exactly the incorrect processing happens. |
|||
| msg170985 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月22日 11:09 | |
> Given all the problems, I'll stop using PGO on all branches and
> as the compiler apparently generates bad code.
That is probably the best solution.
The problem in memoryview.c:pack_single() is that Visual Studio optimizes the
memcpy() to mov instructions, but exchanges the low and high dwords:
unsigned __int64 llu = 1;
do { unsigned __int64 x;
x = (unsigned __int64)llu;
// At this point x=llu is in edx=1 and ecx=0.
// The memcpy() is optimized to:
// mov dword ptr [esi], ecx
// mov dword ptr [esi+4], edx
memcpy(ptr, (char *)&x, sizeof x);
} while (0);
|
|||
| msg170988 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月22日 11:48 | |
It's interesting that the compiler uses ecx:edx to represent a __int64 quantity when 64-bit registers are available... |
|||
| msg170989 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月22日 12:00 | |
Hmm, the bug is in the 32-bit build. The 64-bit build is fine. Or do you mean MMX registers? |
|||
| msg220180 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2014年06月10日 17:34 | |
"I ran a quick test with profile-guided optimization (PGO, pronounced "pogo"), which has supposedly been improved since VC9, and +saw a very unscientific 20% speed improvement on pybench.py and 10% size reduction in python35.dll. I'm not sure what we used +to get from VC9, but it certainly seems worth enabling provided it doesn't break anything. (Interestingly, PGO decided that +only 1% of functions needed to be compiled for speed. Not sure if I can find out which ones those are but if anyone's +interested I can give it a shot?)" Steve, could you try if this is still a problem? The (presumed) Visual Studio bug only surfaced when training the instrumented build with "profiletests.txt" and then doing the PGupdate build. I'm opening this issue again and leave it at release blocker for 3.5. |
|||
| msg220183 - (view) | Author: Steve Dower (steve.dower) * (Python committer) | Date: 2014年06月10日 18:17 | |
test_memoryview_assign seems to be okay, but the two test_lzma tests still fail with the same message. Both pass without PGO. I'll get in touch with the PGO team and try and get it fixed. I haven't checked, but it looks consistent with Stefan's analysis of the disassembly. My guess it that it's a broken space optimisation rather than a speed one - our release builds normally optimise everything for speed but PGO will often decide that space is better. |
|||
| msg220531 - (view) | Author: Steve Dower (steve.dower) * (Python committer) | Date: 2014年06月14日 07:12 | |
It's actually bad code generation for the switch statement in build_filter_spec() in _lzmamodule.c. I've filed a bug, so hopefully it will be fixed, but if not then it should be easy enough to exclude that function (or even the whole module - _lzmamodule.c doesn't have any of the speed critical parts in it, by the look of it). |
|||
| msg220545 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2014年06月14日 10:08 | |
Isn't PyLong_FromUnsignedLongLong() still involved through spec_add_field()? |
|||
| msg220551 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2014年06月14日 12:49 | |
Please don't. If the compiler is demonstrated to generate bad code in one case, we should *not* exclude that code from optimization, but not use optimization at all. How many other places will there be which also cause bad code being generated that just happens not to be uncovered by the test suite? |
|||
| msg220557 - (view) | Author: Steve Dower (steve.dower) * (Python committer) | Date: 2014年06月14日 14:55 | |
> Isn't PyLong_FromUnsignedLongLong() still involved through spec_add_field()? The two issues were unrelated - the 'invalid filter ID' (4611686018427387905 =わ=わ 0x40000000_00000001) is the correct value but the wrong branch in the switch was taken, leading to the error message. > If the compiler is demonstrated to generate bad code in one case, we should *not* exclude that code from optimization, but not use optimization at all. By that logic, we should be using a debug build on every platform... I've encountered various codegen bugs in gcc and MSVC, though they've all been fixed (apart from this one). All developers are human, including most compiler writers. That said, I'll wait on the response from the PGO team. If they don't give me enough confidence, then I'll happily forget about the whole idea of using it for 3.5. |
|||
| msg220894 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2014年06月17日 21:14 | |
I'd be fine to reconsider if a previously-demonstrated bug is now demonstrated-fixed. However, if the actual bug persists, optimization should be disabled for all code, not just for the code that allows to demonstrate the bug. This principle should indeed been followed for all platforms (and it has, e.g. on hpux). |
|||
| msg220932 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2014年06月18日 11:35 | |
> The two issues were unrelated - the 'invalid filter ID' > (4611686018427387905 =わ=わ 0x40000000_00000001) is the correct > value but the wrong branch in the switch was taken, leading > to the error message. Unfortunately I don't have a Visual Studio setup right now. It seems to me that at the time the wrong branch is taken, f->id could be in the registers in the wrong order (same as in msg170985), but when the error message is printed, the value is read from memory. This is just a guess of course. As Martin, I'm uncomfortable that the memoryview issue no longer appears, but this one still does. I've attached an alternative version of PyLong_AsUnsignedLongLong() that is just intended for testing the compiler. If the optimizer does whole progam optimization, it might choke on _PyLong_AsByteArray(). |
|||
| msg220992 - (view) | Author: Steve Dower (steve.dower) * (Python committer) | Date: 2014年06月19日 15:14 | |
> I'd be fine to reconsider if a previously-demonstrated bug is now > demonstrated-fixed. However, if the actual bug persists, optimization > should be disabled for all code, not just for the code that allows to > demonstrate the bug. I'm okay with that. I thought you meant never enable optimizations with that compiler ever again, which is obviously ridiculous and I should have dismissed the idea on that basis rather than posting a snarky response. Sorry. > It seems to me that at the time the wrong branch is taken, f->id > could be in the registers in the wrong order (same as in msg170985), > but when the error message is printed, the value is read from > memory. This is just a guess of course. I checked that and the registers are fine. Here's the snippet of disassembly I posted with the bug I filed: mov edx,dword ptr [edi+4] ; == 0x40000000 mov ecx,dword ptr [edi] ; == 0x00000001 test edx,edx ; should be cmp edx,40000000h or equiv. ja lbl1 ; 'default:' jb lbl2 ; should be je after change above cmp ecx,21h jbe lbl2 ; should probably be lbl3 lbl1: ; default: ... lbl2: cmp ecx,1 jne lbl3 ; case 0x4000000000000001 ... It's clearly an incorrect `test` opcode, and I'd expect switch statements where the first case is a 64-bit integer larger than 2**32 to be rare - I've certainly never encountered one before - which is why such a bug could go undiscovered. When I looked at the disassembly for memoryview it was fine. I actually spent far longer than I should have trying to find the bug that was no longer there... Also bear in mind that I'm working with VC14 and not VC10, so the difference is due to the compiler and not simply time or magic :) |
|||
| msg221380 - (view) | Author: Steve Dower (steve.dower) * (Python committer) | Date: 2014年06月23日 21:47 | |
This has been confirmed as a bug in VC14 (and earlier) and there'll be a fix going in soon. For those interested, here's a brief rundown of the root cause: * the switch in build_filter_spec() switches on a 64-bit value * one case is 0x4000000000000001 and the rest are <=0x21 * PGO detects that 0x4000000000000001 is the hot case (bug starts here) * PGO detects that the cold cases are 32-bits or less and so enables an optimisation to skip comparing the high DWORD * PGO adds check for the hot case, but using the 32-bit optimisation - it checks for "0x1" rather than the full value (bug ends here) * PGO adds checks for cold cases The fix will be to check both hot and cold cases to see whether the 32-bit optimisation can be used. A "workaround" (that I wouldn't dream of using, but it illustrates the issue) would be to add a dead case that requires 64-bits. This would show up in the list of cold cases and prevent the 32-bit optimisation from being used. No indication of when the fix will go in, but it should be in the next public release, and I'll certainly be able to test it in advance of that. |
|||
| msg221381 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2014年06月23日 21:59 | |
Thanks a lot for this investigation; I'm glad you are working on this. |
|||
| msg242548 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2015年05月04日 09:57 | |
Is this now fixed in VS? I don't believe I can test myself as I've only got express/community editions. |
|||
| msg246282 - (view) | Author: Larry Hastings (larry) * (Python committer) | Date: 2015年07月05日 01:05 | |
So, the purpose in marking this as a "release blocker" is so that we can hold up the release while we wait for Microsoft to release a new compiler? If our approach to fixing this is to get the compiler fixed, I can live with marking this as "critical", but not "release blocker". |
|||
| msg246287 - (view) | Author: Steve Dower (steve.dower) * (Python committer) | Date: 2015年07月05日 02:07 | |
Eh, why bother. I don't remember if the fix is in for 3.5.0b3, but I'll vouch that the compiler build with the fix does exist and will be used for 3.5, so this should just be closed (again). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:36 | admin | set | github: 60197 |
| 2015年07月05日 02:07:22 | steve.dower | set | status: open -> closed messages: + msg246287 stage: resolved |
| 2015年07月05日 01:05:33 | larry | set | priority: release blocker -> critical nosy: + larry messages: + msg246282 |
| 2015年05月04日 09:57:14 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg242548 components: + Windows |
| 2014年10月14日 18:03:01 | skrah | set | nosy:
- skrah |
| 2014年06月23日 22:24:14 | vstinner | set | nosy:
+ vstinner |
| 2014年06月23日 22:21:53 | vstinner | set | nosy:
+ tim.golden, zach.ware, - vstinner |
| 2014年06月23日 21:59:45 | loewis | set | messages: + msg221381 |
| 2014年06月23日 21:47:48 | steve.dower | set | messages: + msg221380 |
| 2014年06月19日 15:14:59 | steve.dower | set | messages: + msg220992 |
| 2014年06月18日 11:35:49 | skrah | set | messages: + msg220932 |
| 2014年06月18日 11:20:22 | skrah | set | files: + ull_vctest.diff |
| 2014年06月17日 21:14:12 | loewis | set | messages: + msg220894 |
| 2014年06月14日 14:55:51 | steve.dower | set | messages: + msg220557 |
| 2014年06月14日 12:49:18 | loewis | set | messages: + msg220551 |
| 2014年06月14日 10:08:49 | skrah | set | messages: + msg220545 |
| 2014年06月14日 07:12:53 | steve.dower | set | messages: + msg220531 |
| 2014年06月10日 18:17:38 | steve.dower | set | messages: + msg220183 |
| 2014年06月10日 17:34:57 | skrah | set | status: closed -> open versions: + Python 3.5, - Python 3.3 nosy: + steve.dower messages: + msg220180 type: compile error |
| 2012年09月22日 12:00:06 | skrah | set | messages: + msg170989 |
| 2012年09月22日 11:48:53 | loewis | set | messages: + msg170988 |
| 2012年09月22日 11:09:28 | skrah | set | messages: + msg170985 |
| 2012年09月22日 10:01:07 | loewis | set | status: open -> closed resolution: fixed |
| 2012年09月22日 10:00:59 | loewis | set | messages: + msg170981 |
| 2012年09月22日 08:38:31 | skrah | set | messages: + msg170973 |
| 2012年09月22日 08:32:17 | skrah | set | messages: + msg170971 |
| 2012年09月21日 22:54:14 | skrah | set | files:
+ issue15993.diff keywords: + patch messages: + msg170935 |
| 2012年09月21日 21:40:14 | loewis | set | messages: + msg170929 |
| 2012年09月21日 21:30:59 | skrah | set | messages: + msg170924 |
| 2012年09月21日 21:29:38 | skrah | set | messages: + msg170923 |
| 2012年09月21日 21:14:39 | vstinner | set | messages: + msg170920 |
| 2012年09月21日 21:07:21 | loewis | set | files: + profiletests.txt |
| 2012年09月21日 21:07:12 | loewis | set | files:
+ profile.bat messages: + msg170919 |
| 2012年09月21日 21:04:08 | loewis | set | messages: + msg170918 |
| 2012年09月21日 21:02:24 | vstinner | set | messages: + msg170917 |
| 2012年09月21日 20:54:19 | vstinner | set | messages: + msg170916 |
| 2012年09月21日 09:30:53 | skrah | set | nosy:
+ nadeem.vawda |
| 2012年09月21日 09:30:09 | skrah | link | issue15995 superseder |
| 2012年09月21日 09:27:29 | skrah | set | priority: critical -> release blocker messages: + msg170873 |
| 2012年09月20日 22:05:45 | skrah | set | messages: + msg170858 |
| 2012年09月20日 21:53:37 | vstinner | set | nosy:
+ vstinner |
| 2012年09月20日 20:31:49 | skrah | create | |