This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年06月08日 18:31 by kristjan.jonsson, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| 77369.patch | kristjan.jonsson, 2012年06月08日 18:31 | review | ||
| Messages (35) | |||
|---|---|---|---|
| msg162545 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月08日 18:31 | |
The attached patch does three things: " - Abstract the condition variable used by ceval_gil.h into a separate file, condvar.h. It now defines a PyMUTEX_T, PyCOND_T and associated functions. This file can be used by different parts of the python core. - Implement locking on windows using custom structures based on condition variables, rather than using a semaphore kernel object. This avoids kernel transitions for uncontensted locks and provides a large speedup for windows. - Add a condition variable implementation using native primitives for builds targeted for Vista. Experimental and disabled by default. " Using this locking mechanism on windows results in a 60% speedup of using uncontested locks, due to the removal of the necessary kernel transition that is required by regular semaphore objects. Before: D:\pydev\hg\cpython3\PCbuild\amd64>.\python.exe -m timeit -s "from _thread import allocate_lock; l=allocate_lock()" "l.acquire();l .release()" 1000000 loops, best of 3: 0.731 usec per loop After: D:\pydev\hg\cpython3\PCbuild\amd64>.\python.exe -m timeit -s "from _thread import allocate_lock; l=allocate_lock()" "l.acquire();l .release()" 1000000 loops, best of 3: 0.27 usec per loop |
|||
| msg162546 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月08日 18:32 | |
This defect springs out of issue #11618 |
|||
| msg162669 - (view) | Author: Paul Moore (paul.moore) * (Python committer) | Date: 2012年06月12日 12:56 | |
Applies and builds cleanly on Win7 32-bit. The speed difference is visible here too: PS D:\Data\cpython\PCbuild> .\python.exe -m timeit -s "from _thread import allocate_lock; l=allocate_lock()" "l.acquire();l.release()" 1000000 loops, best of 3: 0.608 usec per loop PS D:\Data\cpython\PCbuild> hg revert --all reverting pythoncore.vcxproj reverting pythoncore.vcxproj.filters reverting ..\Python\ceval_gil.h forgetting ..\Python\condvar.h reverting ..\Python\thread_nt.h PS D:\Data\cpython\PCbuild> .\python.exe -m timeit -s "from _thread import allocate_lock; l=allocate_lock()" "l.acquire();l.release()" 1000000 loops, best of 3: 1.66 usec per loop The test suite had a few Python crashes, but a build of trunk did too. No time to diagnose these now, but I didn't see any failures that weren't also in the unpatched build. Basically, test suite results look the same as for the unpatched build. |
|||
| msg162680 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月12日 14:58 | |
I've tested Ubuntu 64 myself using a Virtualbox, confirming that the pythread functionality is untouched. (funny how those vi keystrokes seem to be embedded into your amygdala after decades of disuse) |
|||
| msg162702 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月13日 09:44 | |
While I'm confident about the correctness of this implementation (it ́s in production use right now) I ́d like comments on the architecture. - Are people comfortable with the notion of an include file with an inline implementation of portable condition variables like this? - Martin voiced concerns over having a _future_ compatible implementation in there with Windows 7 support only. I thought it cool to have around, since it strenghtens the "portability" design of the pattern. - Names of types/apis. Am I doing it right? |
|||
| msg163066 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月17日 12:50 | |
Ok, I take the lack of negative reviews as a general approvement. I'll improve comments a bit, write the appropriate NEWS item and make a commit soon. |
|||
| msg163120 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年06月18日 20:33 | |
New changeset 978326f98316 by Kristján Valur Jónsson in branch 'default': Issue #15038: Optimize python Locks on Windows http://hg.python.org/cpython/rev/978326f98316 |
|||
| msg163126 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月18日 21:52 | |
There's a problem here: Fatal Python error: PyCOND_SIGNAL(gil_cond) failed http://www.python.org/dev/buildbot/all/builders/x86%20XP-4%203.x/builds/6859/steps/test/logs/stdio |
|||
| msg163128 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月18日 23:20 | |
Py_LOCAL_INLINE(int)
_PyCOND_WAIT_MS(PyCOND_T *cv, PyMUTEX_T *cs, DWORD ms)
{
DWORD wait;
cv->waiting++;
PyMUTEX_UNLOCK(cs);
/* "lost wakeup bug" would occur if the caller were interrupted here,
* but we are safe because we are using a semaphore wich has an internal
* count.
*/
wait = WaitForSingleObject(cv->sem, ms);
PyMUTEX_LOCK(cs);
if (wait != WAIT_OBJECT_0)
--cv->waiting;
/* Here we have a benign race condition with PyCOND_SIGNAL.
* When failure occurs or timeout, it is possible that
* PyCOND_SIGNAL also decrements this value
* and signals releases the mutex. This is benign because it
* just means an extra spurious wakeup for a waiting thread.
*/
...
Are you really sure this race is benign?
If cv->waiting gets double decremented then it can become negative. PyCOND_SIGNAL() is defined as
Py_LOCAL_INLINE(int)
PyCOND_SIGNAL(PyCOND_T *cv)
{
if (cv->waiting) {
cv->waiting--;
return ReleaseSemaphore(cv->sem, 1, NULL) ? 0 : -1;
}
return 0;
}
While cv->waiting is negative, each call of PyCOND_SIGNAL() decrements cv->waiting, and increments the semaphore, while each call of PyCOND_WAIT() will increment cv->waiting and decrement the semaphore.
So if calls of PyCOND_SIGNAL() outnumber calls of PyCOND_WAIT() then we can have cv->waiting becoming very negative and the semaphore overflowing.
Maybe just changing the test in PyCOND_SIGNAL() to
if (cv->waiting > 0) {
would be enough, but I am not convinced.
|
|||
| msg163151 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 09:14 | |
Thanks Antoine. I tested this in my virtualbox so something new must have happened... Anyway, the GIL code should not have changed from before, only moved about slightly. I ́ll figure out what happened |
|||
| msg163152 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 09:44 | |
You are right, Richard. Thanks for pointing this out. This is not a new problem, however, because this code has been in the New GIL since it was launched. The purpose of the "n_waiting" member is to make "signal" a no-op when no one is there waiting. Otherwise, we could increase the semaphore's internal count without bound, since the condition variable protocol allows "signal" to be called as often as one desires. It is important that it doesn't fall out of sync with the actual semaphore count. Normally what will happen when the race occurs is this: n_waiting will be double decremented, and semaphore internal count will be increased by one (the wait timed out, yet we called signal). Thus, for example, we could end up with the idle state of semaphore_count == 1 and n_waiting == -1. (the invariant is semaphore_count + n_waiting - waiting_threads == 0) When the next waiter comes along, with semaphore count==1, it will just pass though, having incremented n_waiting to 0. The problem you describe is that if if 'signal' is ever hit with n_waiting<0, it will continue to tip this balance. What this will do is just cause the semaphore count to start growing. And thus, we have defeated the purpose of n_waiting. This oversight is my fault. However rest assured that it is rare, having been in the new GIL implementation since its launch :) There are two fixes: 1) As you describe, make a positive test in the "signal". 2) Make the waiter the only one who updates n_waiting, whenever it wakes up. The side effect of this is a possible slight increase of spurious wakeups in some usage scenarios. But it is simpler. I'll implement number 1), and improve the documentation to that effect. |
|||
| msg163154 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 09:57 | |
The old version was
243 __inline static void _cond_signal(COND_T *cond) {
244 /* NOTE: This must be called with the mutex held */
245 if (cond->n_waiting > 0) {
246 if (!ReleaseSemaphore(cond->sem, 1, NULL))
247 Py_FatalError("ReleaseSemaphore() failed");
248 --cond->n_waiting;
249 }
250 }
So the test should be "if (cv->waiting > 0)" not "if (cv->waiting)".
|
|||
| msg163155 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 10:07 | |
Well spotted. This probably fixes the failure we saw in the buildbots as well. |
|||
| msg163156 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年06月19日 10:11 | |
New changeset 110b38c36a31 by Kristjan Valur Jonsson in branch 'default': Issue #15038: http://hg.python.org/cpython/rev/110b38c36a31 |
|||
| msg163158 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 13:08 | |
Standard condition variables have the following guarantees:
* if there are any waiters then signal()/notify() will awaken at least one of them;
* if there are any waiters then broadcast()/notify_all() will awaken all of them.
The implementation in condvar.h does not have these guarantees since a future waiter (possibly the signalling thread) may steal the signal intended for a current waiter.
In many cases this does not matter, but in some it can cause a deadlock. For instance, consider
from threading import Condition, Thread
import time
def set_to_value(value, cond, state):
while 1:
with cond:
while state.value == value:
cond.wait()
state.value = value
print("set_to_value(%s)" % value)
cond.notify_all()
class state:
value = False
c = Condition()
for i in (0, 1):
t = Thread(target=set_to_value, args=(i, c, state))
t.daemon = True
t.start()
time.sleep(5)
This *should* make state.value bounce back and forth between 0 and 1 continually for five seconds.
But using a condition variable implemented like in condvar.h this program is liable to deadlock because the signalling thread steals the signal intended for the other thread.
I think a note about this should be added to condvar.h.
|
|||
| msg163162 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 13:26 | |
Yes, another correct observation. This can if a thread is interrupted in between releasing the mutex and waiting for the semaphore. I see no way around this apart from manually creating event objects for every thread. Personally I'm not sure it is a wise guarantee to make. The shift has been away from "handover" semantics towards "retry" semantics for locking in general over the last few years, particularly with the rise of multiprocessing. But this distinction should be made clear and I will make sure to document it. Thanks for pointing this out. |
|||
| msg163164 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月19日 13:47 | |
> Personally I'm not sure it is a wise guarantee to make. If you make sure internal users are immune to this issue, then fine (but make sure to document it somewhere). However, if this lost wakeup problem can affect current users of the API, then it sounds unacceptable. |
|||
| msg163165 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月19日 13:51 | |
> However, if this lost wakeup problem can affect current users of the API, then it sounds unacceptable. Let me elaborate: the GIL can perhaps suffer lost wakeups from time to time. The Lock API certainly shouldn't. |
|||
| msg163166 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 14:05 | |
> Let me elaborate: the GIL can perhaps suffer lost wakeups from time to > time. The Lock API certainly shouldn't. I think with FORCE_SWITCHING defined (the default?) it is not possible for the thread releasing the GIL to immediately reacquire it (unless there is a spurious wakeup when waiting on switch_cond). If all threads which wait on a condition are testing the same predicate then the stolen wakeup issue probably won't cause any misbehaviour. |
|||
| msg163173 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 14:45 | |
The implementation in condvar.h is basically the same as one of the attempts mentioned in http://birrell.org/andrew/papers/ImplementingCVs.pdf (Listing 2 fixed to use non-binary semaphores.) The implementation for multiprocessing.Condition is virtually the same as Listing 3 which the author says he thinks is "formally correct" but with "a fundamental performance problem". |
|||
| msg163177 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月19日 15:05 | |
> The implementation for multiprocessing.Condition is virtually the same > as Listing 3 which the author says he thinks is "formally correct" but > with "a fundamental performance problem". To me, it seems similar to the last listing (under "The Sequel—NT and PThreads"): there's a separate semaphore per waiter, used to wake it up when signal() or broadcast() is called. |
|||
| msg163178 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 15:06 | |
The problem Richard describes isn ́t a lost wakeup. PyCOND_SIGNAL _will_ wake up _at least_ one thread. It just isn't guaranteed to be one of those who previously called PyCOND_WAIT(): It could be a latecomer to the game, including the one who called Signal himself. If no such thread comes in to steal it, then one of the waiting threads _will_ wake up. None of the internal usages of condition variables makes this assumption about the order of wakeup from PyCOND_WAIT(). |
|||
| msg163179 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月19日 15:07 | |
> > The implementation for multiprocessing.Condition is virtually the same > > as Listing 3 which the author says he thinks is "formally correct" but > > with "a fundamental performance problem". > > To me, it seems similar to the last listing (under "The Sequel—NT and > PThreads"): there's a separate semaphore per waiter, used to wake it up > when signal() or broadcast() is called. Ah, you said multiprocessing.Condition. Sorry. I was thinking about threading.Condition. |
|||
| msg163180 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年06月19日 15:07 | |
> The problem Richard describes isn ́t a lost wakeup. PyCOND_SIGNAL > _will_ wake up _at least_ one thread. It just isn't guaranteed to be > one of those who previously called PyCOND_WAIT(): It could be a > latecomer to the game, including the one who called Signal himself. > If no such thread comes in to steal it, then one of the waiting > threads _will_ wake up. Ok, thanks for clearing up. |
|||
| msg163183 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 15:51 | |
It's an interesting article Richard, but I don't see how their 2nd attempt solves the probvlem. All it does is block the thread doing the Signal(), not other threads, from stealing the wakeup. I think I know how to fix this correctly, using a separate internal "locking" condition variable. I will make some offline experiments with that, to see if it makes sense, given the added complexity. In the mean time, I will document this issue and add the link to the article you mentioned. |
|||
| msg163184 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 15:54 | |
The notes should also mention that PyCOND_SIGNAL() and PyCOND_BROADCAST() must be called while holding the mutex. (pthreads does not have that restriction.) |
|||
| msg163187 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 16:08 | |
Right. Without holding the mutex, the definition of "already blocked" is of course meaningless, since only the holding the mutex can define any ordering. pthread standard indeed says "however, if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast()." There are a number of implementations that are subjects to serious problems if the mutex isn't held when doing pthread_cond_signal(), including the notorious 'lost wakeup' bug, eg: http://docs.oracle.com/cd/E19963-01/html/821-1601/sync-21067.html, so it is certainly recommended practice to use pthread_cond_signal() with the mutex held regardless. But you are right, the emulation implementation depends on the mutex not only for predicable scheduling but for synchronizing access to the internal state, and this should be documented. |
|||
| msg163191 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年06月19日 16:31 | |
New changeset d7a72fdcc168 by Kristjan Valur Jonsson in branch 'default': Issue #15038: Document caveats with the emulated condition variables. http://hg.python.org/cpython/rev/d7a72fdcc168 |
|||
| msg163194 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 17:14 | |
> It's an interesting article Richard, but I don't see how their 2nd attempt > solves the problem. All it does is block the thread doing the Signal(), > not other threads, from stealing the wakeup. Do you mean the listing on page 5? (The earlier attempts were failures.) The signalling thread holds the lock "x" while issuing the signal "s.V()" and waiting for notification of wakeup "h.P()". A new thread cannot steal the wakeup because it needs to acquire the lock "x" before it can start its wait. Of course, if the main mutex is always held when doing signal()/broadcast() then the lock "x" is unnecessary. I don't think trying to do a full emulation is necessary. Better to just document the limitations. |
|||
| msg163195 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月19日 17:17 | |
Ah, right, the lock x, I forgot about that. |
|||
| msg163198 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年06月19日 18:08 | |
1.41 Generic emulations of the pthread_cond_* API using 1.42 earlier Win32 functions can be found on the Web. 1.43 The following read can be edificating (or not): 1.44 http://www.cse.wustl.edu/~schmidt/win32-cv-1.html 1.45 + 1.46 + See also 1.47 */ 1.45 and 1.46 should be removed? Also I would not recommend the win32-cv-1.html page as "edificating" (edifying?). The implementations all either suffer from the same stolen wakeup issue or are broken.* * win32-cv-1.html assumes that SignalObjectAndWait() is atomic and that PulseEvent() is guaranteed to wake a waiting thread if there is one -- but Microsoft's documentation now admits that both these assumptions are false. |
|||
| msg163262 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年06月20日 08:57 | |
Thanks. The comments remain as the last bit of the original Condition Variable emulation code that was put in place for the new GIL. |
|||
| msg165850 - (view) | Author: Benjamin Peterson (benjamin.peterson) * (Python committer) | Date: 2012年07月19日 18:36 | |
I see dead code here:
Py_LOCAL_INLINE(int)
PyCOND_BROADCAST(PyCOND_T *cv)
{
if (cv->waiting > 0) {
return ReleaseSemaphore(cv->sem, cv->waiting, NULL) ? 0 : -1;
cv->waiting = 0;
}
return 0;
}
|
|||
| msg165943 - (view) | Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) | Date: 2012年07月20日 16:00 | |
Thanks, Benjamin. That's what reviews are for :) |
|||
| msg184730 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年03月20日 03:35 | |
New changeset 08b87dda6f6a by Kristján Valur Jónsson in branch 'default': Issue #15038 : Fixing the condition broadcast and docs. http://hg.python.org/cpython/rev/08b87dda6f6a New changeset fde60d3f542e by Kristján Valur Jónsson in branch '3.3': Issue #15038 : Fixing the condition broadcast and docs. http://hg.python.org/cpython/rev/fde60d3f542e |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:31 | admin | set | github: 59243 |
| 2013年03月20日 03:37:50 | kristjan.jonsson | set | status: open -> closed resolution: fixed |
| 2013年03月20日 03:35:53 | python-dev | set | messages: + msg184730 |
| 2012年07月20日 16:00:38 | kristjan.jonsson | set | messages: + msg165943 |
| 2012年07月19日 18:36:30 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg165850 |
| 2012年06月20日 08:57:19 | kristjan.jonsson | set | messages: + msg163262 |
| 2012年06月19日 18:08:54 | sbt | set | messages: + msg163198 |
| 2012年06月19日 17:17:14 | kristjan.jonsson | set | messages: + msg163195 |
| 2012年06月19日 17:14:58 | sbt | set | messages: + msg163194 |
| 2012年06月19日 16:31:17 | python-dev | set | messages: + msg163191 |
| 2012年06月19日 16:08:11 | kristjan.jonsson | set | messages: + msg163187 |
| 2012年06月19日 15:54:56 | sbt | set | messages: + msg163184 |
| 2012年06月19日 15:51:36 | kristjan.jonsson | set | messages: + msg163183 |
| 2012年06月19日 15:07:47 | pitrou | set | messages: + msg163180 |
| 2012年06月19日 15:07:04 | pitrou | set | messages: + msg163179 |
| 2012年06月19日 15:06:24 | kristjan.jonsson | set | messages: + msg163178 |
| 2012年06月19日 15:05:59 | pitrou | set | messages: + msg163177 |
| 2012年06月19日 14:45:45 | sbt | set | messages: + msg163173 |
| 2012年06月19日 14:05:03 | sbt | set | messages: + msg163166 |
| 2012年06月19日 13:51:41 | pitrou | set | messages: + msg163165 |
| 2012年06月19日 13:47:03 | pitrou | set | messages: + msg163164 |
| 2012年06月19日 13:26:30 | kristjan.jonsson | set | messages: + msg163162 |
| 2012年06月19日 13:08:56 | sbt | set | messages: + msg163158 |
| 2012年06月19日 10:11:12 | python-dev | set | messages: + msg163156 |
| 2012年06月19日 10:07:11 | kristjan.jonsson | set | messages: + msg163155 |
| 2012年06月19日 09:57:06 | sbt | set | messages: + msg163154 |
| 2012年06月19日 09:44:49 | kristjan.jonsson | set | messages: + msg163152 |
| 2012年06月19日 09:14:30 | kristjan.jonsson | set | messages: + msg163151 |
| 2012年06月18日 23:20:34 | sbt | set | nosy:
+ sbt messages: + msg163128 |
| 2012年06月18日 21:52:07 | pitrou | set | messages: + msg163126 |
| 2012年06月18日 20:33:22 | python-dev | set | nosy:
+ python-dev messages: + msg163120 |
| 2012年06月17日 12:50:56 | kristjan.jonsson | set | messages: + msg163066 |
| 2012年06月13日 09:44:26 | kristjan.jonsson | set | messages: + msg162702 |
| 2012年06月13日 09:38:55 | pitrou | set | type: enhancement -> performance stage: patch review |
| 2012年06月13日 09:38:48 | pitrou | set | superseder: Locks broken wrt timeouts on Windows -> |
| 2012年06月13日 08:31:56 | vstinner | set | nosy:
+ vstinner |
| 2012年06月12日 14:58:05 | kristjan.jonsson | set | messages: + msg162680 |
| 2012年06月12日 12:56:10 | paul.moore | set | nosy:
+ paul.moore messages: + msg162669 |
| 2012年06月08日 18:32:44 | kristjan.jonsson | set | superseder: Locks broken wrt timeouts on Windows messages: + msg162546 |
| 2012年06月08日 18:31:40 | kristjan.jonsson | create | |