This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年12月18日 19:41 by jacksontj, last changed 2022年04月11日 14:58 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| worker_ignore_interrupt.patch | jacksontj, 2015年12月18日 19:41 | |||
| Messages (4) | |||
|---|---|---|---|
| msg256703 - (view) | Author: Thomas Jackson (jacksontj) * | Date: 2015年12月18日 19:41 | |
If a KeyboardInterrupt is received while a worker process is grabbing an item off of the queue that worker process dies with an uncaught exception. This means that the ProcessPool now has lost a process, and currently has no mechanism to recover from dead processes. This is especially noticeable if the CallItem is relatively large (as the call_queue.get() includes all the pickle time). A simple fix is to have the worker process not do anything with the keyboard interrupt-- since it would have no idea what to do. This cannot be implemented with a regular try/except-- as the item will be partially pulled off of the queue and lost. My proposed fix is to disable the SIGINT handler in the worker process while getting items off of the queue. An alternate approach is to actually change multiprocessing.Queue.get() to leave the item on the queue if it is interrupted with a keyboard interrupt to this is to catch the KeyboardInterrupt and simply continue on-- then we can rely on the caller to do the cleanup.This cannot be done by simply Proposed patch attached |
|||
| msg256705 - (view) | Author: Thomas Jackson (jacksontj) * | Date: 2015年12月18日 19:46 | |
Seems that I accidentally hit submit, so let me finish the last bit of my message here: An alternate approach is to actually change multiprocessing.Queue.get() to leave the item on the queue if it is interrupted with a keyboard interrupt. Then the worker process could handle the exception in a more meaningful way It is also interesting to note, that in the event that the caller gets a KeyboardInterrupt there is no `terminate` method which would let you kill jobs before they run. I'm not certain if that should be included in this issue, or if I should file a separate ticket since they are related but different. |
|||
| msg256711 - (view) | Author: Thomas Jackson (jacksontj) * | Date: 2015年12月18日 21:45 | |
Some more investigation, it seems that the alternate `Queue` fix is a non-starter. From my investigation it seems that the ProcessPoolExecutor is assuming that multiprocess.Queue is gauranteed delivery, and it isn't (because of the pickling). So the issue is that the worker process drops the message if its interrupted while unpickling and the Pool class has no idea-- and assumes that the job is still running. With that being said it seems like my attached patch is probably the most reasonable fix without a major rework of how the ProcessPoolExecutor works. |
|||
| msg257085 - (view) | Author: Davin Potts (davin) * (Python committer) | Date: 2015年12月27日 17:11 | |
Noting the connection to issue22393. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:25 | admin | set | github: 70096 |
| 2015年12月27日 17:11:24 | davin | set | dependencies:
+ multiprocessing.Pool shouldn't hang forever if a worker process dies unexpectedly messages: + msg257085 |
| 2015年12月27日 16:47:57 | davin | set | nosy:
+ davin |
| 2015年12月25日 17:40:12 | terry.reedy | set | versions: - Python 3.2, Python 3.3, Python 3.4 |
| 2015年12月18日 21:45:31 | jacksontj | set | messages: + msg256711 |
| 2015年12月18日 19:46:29 | jacksontj | set | messages: + msg256705 |
| 2015年12月18日 19:41:10 | jacksontj | create | |