This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016年05月03日 15:37 by xdegaye, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_output.txt | xdegaye, 2016年05月03日 19:07 | gdb back trace | ||
| python_tests_on_arm.txt | yan12125, 2016年05月06日 10:20 | |||
| atomic_explicit.patch | pitrou, 2016年05月06日 11:30 | review | ||
| build-failure.txt | xdegaye, 2016年05月06日 14:53 | |||
| atomic_explicit2.patch | pitrou, 2016年05月06日 14:58 | review | ||
| pycond_timedwait.patch | xdegaye, 2016年05月16日 16:09 | review | ||
| test_lru_cache_threaded-x86-printf.txt | xdegaye, 2016年06月04日 09:16 | |||
| test_lru_cache_threaded-armv7-printf.txt | xdegaye, 2016年06月04日 09:16 | |||
| pycond_timedwait_2.patch | xdegaye, 2016年06月04日 09:20 | review | ||
| setswitchinterval.patch | xdegaye, 2016年10月28日 07:15 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 552 | closed | dstufft, 2017年03月31日 16:36 | |
| Messages (22) | |||
|---|---|---|---|
| msg264742 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月03日 15:37 | |
test_functools hangs on an android emulator running an armv7 system image (but not on x86) at API level 21. The test suite hangs at test_lru_cache_threaded (test.test_functools.TestLRUC) ... A cause of this problem may be related to the fact that the android arm emulator runs very, very slowly. |
|||
| msg264756 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月03日 19:07 | |
The attached test_output.txt file is the corresponding gdb backtrace[1]. Interestingly, all the tests (issues #26938, #26939, #26940 and #26941) that hang on arm use sys.setswitchinterval() aggressively. [1] The backtrace is interleaved with: * Python Exception <type 'exceptions.UnicodeDecodeError'> 'utf8' codec can't decode byte 0xb8 in posit ion 0: invalid start byte: that seem to come from libpython, the python that is embeded in gdb to pretty-print gdb output * ---Type <return> to continue, or q <return> to quit--- sorry, had to resort to copy/paste, gdb 'set logging' does not seem to work |
|||
| msg264922 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2016年05月05日 21:28 | |
Can you post the backtraces of all threads (at least the innermost frames)? Also, can you reproduce on an actual armv7 machine? Just wondering if there may be something peculiar in the emulator. |
|||
| msg264957 - (view) | Author: (yan12125) * | Date: 2016年05月06日 10:20 | |
FWIW, test_functools passes on ASUS Zenfone 2 Laser ZE500KL with stock ROM (rooted) and my patchset. [1] The CPU is ARM64 and the Python is built as 32-bit ARM binaries. test_threading (#26941) and test_importlib (#26940) pass, too. test_concurrent_futures (#26938) fails with 27 errors, all with "OSError: [Errno 38] Function not implemented", which seems the same as #26924. The attached file contains logs of the four tests mentioned here. [1] https://github.com/yan12125/python3-android |
|||
| msg264963 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2016年05月06日 11:11 | |
I suspect this bug is caused by memory ordering differences between x86 and ARM (see https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering ), which is why it may not reproduce easily (may depend on the particular CPU implementation and/or various timing subtleties in the tests themselves). Xavier / Chi, can you tell if Android has stdatomic.h? See Include/pyatomic.h for the difference it makes. |
|||
| msg264965 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2016年05月06日 11:30 | |
Xavier, as a strating point, can you try the following patch? (atomic_explicit.patch) If it solves the issue, can you also measure the performance drop compared to unpatched (on a non-debug build)? `python -m test.pystone` should be sufficient. |
|||
| msg264974 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月06日 14:53 | |
Android has stdatomic.h and HAVE_STD_ATOMIC is defined. The build fails with the patch, see the errors in the attached file. This is the native compilation of python, the patches in my build system are also applied to the native build so that they can be checked. I can prevent that if needed. |
|||
| msg264976 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2016年05月06日 14:58 | |
Oops, sorry, hadn't seen the patch didn't compile. Can you tried with this new patch? (atomic_explicit2.patch) |
|||
| msg264981 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月06日 15:28 | |
Still hangs at test_lru_cache_threaded with this last patch :( |
|||
| msg264988 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月06日 15:55 | |
> Can you post the backtraces of all threads (at least the innermost frames)? There are two threads, gdb does not print the backtrace of the other thread. My current gdb setup with a connection to a gdb-server misses some stuff (access to the loader or some libraries I guess) and needs to be completed with the same setup as the one used by the ndk-gdb script (released by google for android applications). > Also, can you reproduce on an actual armv7 machine? Just wondering if there may be something peculiar in the emulator. I don't have an armv7 device that can be used for testing. |
|||
| msg264995 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月06日 16:55 | |
The strace of test_lru_cache_threaded:
...
gettimeofday({1462553238, 580860}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 581276772}) = 0
gettimeofday({1462553238, 581672}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 582078767}) = 0
gettimeofday({1462553238, 583956}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 585865959}) = 0
gettimeofday({1462553238, 587597}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 589381724}) = 0
gettimeofday({1462553238, 589973}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 591858839}) = 0
gettimeofday({1462553238, 593003}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 593452750}) = 0
gettimeofday({1462553238, 593886}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 594364345}) = 0
gettimeofday({1462553238, 594764}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 595227771}) = 0
gettimeofday({1462553238, 595626}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 596027590}) = 0
gettimeofday({1462553238, 596448}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 596848736}) = 0
gettimeofday({1462553238, 597259}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 597665484}) = 0
gettimeofday({1462553238, 598060}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 598869963}) = 0
gettimeofday({1462553238, 599547}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1462553238, 601053152}) = 0
gettimeofday({1462553238, 601593}, NULL) = 0
clock_gettime(CLOCK_REALTIME, ^CProcess 908 detached
<detached ...>
|
|||
| msg265003 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2016年05月06日 17:40 | |
> Still hangs at test_lru_cache_threaded with this last patch :( What about the other tests (test_importlib, test_threading)? Do you know in which precise test it hangs? (use "-v" to print test names) > There are two threads, gdb does not print the backtrace of the other thread. Even with "thread apply all backtrace" (Google tells me this is the incantation)? |
|||
| msg265020 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月06日 20:11 | |
> Do you know in which precise test it hangs? They are named in the first msg of each issue, the issues are: issue #26938: android: test_concurrent_futures hangs on armv7 issue #26939: android: test_functools hangs on armv7 issue #26940: android: test_importlib hangs on armv7 issue #26941: android: test_threading hangs on armv7 The test_threading and test_importlib issues have also now a gdb backtrace, so we know not only the test name but also the full python frame stack with line numbers thanks to python-gdb.py magic. It seems that test_functools, test_threading and test_importlib all hang within take_gil(). test_concurrent_futures is different and hangs in a futex(). With "thread apply all backtrace", nothing changes and gdb still prints: Cannot access memory at address 0x0 #0 0xb6efae6c in ?? () #1 0xb6efd830 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) The problem seems to be that gdb does not have enough information to unwind the stack for the other threads and for thread #1 it only works after an interrupted continue command, and not on gdb startup. |
|||
| msg265027 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月06日 20:45 | |
Forgot to say that the backtraces for test_threading and test_importlib were obtained with the patch applied while they were hanging. |
|||
| msg265701 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月16日 15:52 | |
Got now a full fledged gdb. Thread #1 loops infinitely in the "while (_Py_atomic_load_relaxed(&gil_locked))" loop in take_gil(). Thread #2 is stuck in the MUTEX_LOCK(gil_mutex) statement in drop_gil() as can be seen by setting a breakpoint at the following line and seeing that this breakpoint is never hit. Setting gil_interval to 10 (from the initial value of 1) from gdb, allows the test to pass. With this patch of PyCOND_TIMEDWAIT(): diff -r eee959fee5f5 Python/condvar.h --- a/Python/condvar.h Sat May 07 21:13:50 2016 +0300 +++ b/Python/condvar.h Mon May 16 15:52:49 2016 +0200 @@ -99,6 +99,7 @@ PyCOND_ADD_MICROSECONDS(deadline, us); ts.tv_sec = deadline.tv_sec; ts.tv_nsec = deadline.tv_usec * 1000; + PyCOND_GETTIMEOFDAY(&deadline); and a breakpoint set on pthread_cond_timedwait(). When this breakpoint is hit, there is 20 usec elapsed time instead of the value of 1 (as requested by 'us') between ts and deadline as shown here: (gdb) p ts 1ドル = {tv_sec = 1463407197, tv_nsec = 468546000} (gdb) p deadline 2ドル = {tv_sec = 1463407197, tv_usec = 468566} And when iterating the loop, the measured elapsed times are: 20, 12, 12, 11, 11, 11. Clearly, the ts date is in the past when pthread_cond_timedwait() is called, and the function returns immediately without releasing the mutex. This is the expected behavior[1]. [1] http://linux.die.net/man/3/pthread_cond_wait |
|||
| msg265704 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2016年05月16日 16:02 | |
I don't know if it can fix the issue, but you may see my issue #23428: "Use the monotonic clock for thread conditions on POSIX platforms". |
|||
| msg265707 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月16日 16:09 | |
On a heavily loaded system, there is no guarantee that the date obtained from a call to gettimeofday() in PyCOND_TIMEDWAIT(), is not in the past when pthread_cond_timedwait() is called, as there may be a context switch in between. When this happens, a Python thread cannot take the gil even if it is available. This tentative patch monitors the load of the system and dynamically increases the value used for the switch interval when needed. With this patch on the android-21-armv7 emulator: python -m test -m test_lru_cache_threaded -v test_functools Ok python -m test -m test_deadlock -v test_importlib Ok python -m test -m test_is_alive_after_fork -v test_threading Ok The measured elapsed time is not correct in this patch when it is above 1 second. |
|||
| msg265721 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年05月16日 19:07 | |
> I don't know if it can fix the issue, but you may see my issue #23428: "Use the monotonic clock for thread conditions on POSIX platforms". pthread_condattr_setclock() is declared in the Android API 21 headers. When a monotonic clock is set for pthread_cond_timedwait(), one could measure the monotonic time elapsed to execute the instructions from _PyTime_monotonic() to the setting of the ts fields in PyCOND_TIMEDWAIT(), i.e the time to execute those statements taken from your patch: #ifdef MONOTONIC _PyTime_monotonic(&deadline); #else _PyTime_gettimeofday(&deadline); #endif /* TODO: add overflow and truncation checks */ assert(us <= LONG_MAX); deadline.tv_usec += (long)us; deadline.tv_sec += deadline.tv_usec / 1000000; deadline.tv_usec %= 1000000; and prevent gil_interval to be set below that measured value. This would solve the problem for this soooo slow emulator. _PyTime_monotonic() does not exist, is this _PyTime_GetMonotonicClock() ? So I did not try to test it, not knowing the status of your patch. IMHO PyCOND_TIMEDWAIT() should use a monotonic clock when available. |
|||
| msg267250 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年06月04日 09:20 | |
pycond_timedwait_2.patch prints the delta time in microseconds when deadline is in the past. Running the patch with the following command: python -m test -m test_lru_cache_threaded -v test_functools gives these results: a) natively: deadline is never in the past and the test run prints only a bunch of 'Ok's. b) on the x86 emulator: Although the test does not hang when python is not patched, the output of the test run, test_lru_cache_threaded-x86-printf.txt, shows that deadline is almost always in the past. c) on the armv7 emulator: The output of the test run, test_lru_cache_threaded-armv7-printf.txt (this is a truncated version, the original file is half a Mbytes long), shows that deadline is always in the past. With the patch, the test does not hang. It is possible that this problem could be a quirk in the implementation of the qemu emulator, given the results of b), and the fact that the python test suite runs at about the same speed when run natively and on the x86 emulator. |
|||
| msg267410 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年06月05日 11:56 | |
The problem can be fixed by changing the switch interval to a higher value when the test is run on an Android emulator. Issue 26855 can provide a way to know when this is the case through the value of the ro.kernel.qemu property. |
|||
| msg279588 - (view) | Author: Xavier de Gaye (xdegaye) * (Python triager) | Date: 2016年10月28日 07:15 | |
Patch attached. |
|||
| msg282703 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2016年12月08日 10:10 | |
New changeset fd1718badb67 by Xavier de Gaye in branch '3.6': Issue #26939: Add the support.setswitchinterval() function to fix https://hg.python.org/cpython/rev/fd1718badb67 New changeset c5d7e46926ac by Xavier de Gaye in branch 'default': Issue #26939: Merge 3.6. https://hg.python.org/cpython/rev/c5d7e46926ac |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:30 | admin | set | github: 71126 |
| 2017年03月31日 16:36:11 | dstufft | set | pull_requests: + pull_request871 |
| 2016年12月08日 11:02:40 | xdegaye | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
| 2016年12月08日 10:10:36 | python-dev | set | nosy:
+ python-dev messages: + msg282703 |
| 2016年11月17日 16:28:25 | xdegaye | set | stage: patch review -> commit review |
| 2016年10月28日 07:29:02 | xdegaye | link | issue26941 dependencies |
| 2016年10月28日 07:25:09 | xdegaye | link | issue26940 dependencies |
| 2016年10月28日 07:15:16 | xdegaye | set | files:
+ setswitchinterval.patch dependencies: - android: add platform.android_ver() assignee: xdegaye components: + Tests, - Library (Lib) versions: + Python 3.7 messages: + msg279588 stage: patch review |
| 2016年06月05日 11:56:32 | xdegaye | set | dependencies:
+ android: add platform.android_ver() messages: + msg267410 |
| 2016年06月04日 09:20:20 | xdegaye | set | files:
+ pycond_timedwait_2.patch messages: + msg267250 |
| 2016年06月04日 09:16:56 | xdegaye | set | files: + test_lru_cache_threaded-armv7-printf.txt |
| 2016年06月04日 09:16:42 | xdegaye | set | files: + test_lru_cache_threaded-x86-printf.txt |
| 2016年05月21日 07:06:39 | xdegaye | link | issue26865 dependencies |
| 2016年05月16日 19:07:19 | xdegaye | set | messages: + msg265721 |
| 2016年05月16日 16:09:21 | xdegaye | set | files:
+ pycond_timedwait.patch messages: + msg265707 |
| 2016年05月16日 16:02:31 | vstinner | set | nosy:
+ vstinner messages: + msg265704 |
| 2016年05月16日 15:52:27 | xdegaye | set | messages: + msg265701 |
| 2016年05月06日 20:45:56 | xdegaye | set | messages: + msg265027 |
| 2016年05月06日 20:11:19 | xdegaye | set | messages: + msg265020 |
| 2016年05月06日 17:40:20 | pitrou | set | messages: + msg265003 |
| 2016年05月06日 16:55:47 | xdegaye | set | messages: + msg264995 |
| 2016年05月06日 15:55:47 | xdegaye | set | messages: + msg264988 |
| 2016年05月06日 15:28:15 | xdegaye | set | messages: + msg264981 |
| 2016年05月06日 14:58:38 | pitrou | set | files:
+ atomic_explicit2.patch messages: + msg264976 |
| 2016年05月06日 14:53:04 | xdegaye | set | files:
+ build-failure.txt messages: + msg264974 |
| 2016年05月06日 11:30:24 | pitrou | set | files:
+ atomic_explicit.patch keywords: + patch messages: + msg264965 |
| 2016年05月06日 11:12:00 | pitrou | set | nosy:
+ neologix |
| 2016年05月06日 11:11:51 | pitrou | set | messages: + msg264963 |
| 2016年05月06日 10:20:55 | yan12125 | set | files:
+ python_tests_on_arm.txt nosy: + yan12125 messages: + msg264957 |
| 2016年05月05日 21:28:32 | pitrou | set | nosy:
+ pitrou messages: + msg264922 |
| 2016年05月03日 19:07:46 | xdegaye | set | files:
+ test_output.txt messages: + msg264756 |
| 2016年05月03日 15:37:46 | xdegaye | create | |