This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年09月17日 20:08 by desbma, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue25156.patch | desbma, 2015年10月28日 23:41 | First patch | review | |
| issue25156_v2.patch | desbma, 2015年10月30日 22:53 | review | ||
| issue25156_v3.patch | desbma, 2015年10月31日 16:13 | review | ||
| issue25156_v4.patch | desbma, 2015年12月05日 00:14 | review | ||
| issue25156_v5.patch | desbma, 2015年12月05日 21:48 | review | ||
| Messages (21) | |||
|---|---|---|---|
| msg250918 - (view) | Author: desbma (desbma) * | Date: 2015年09月17日 20:08 | |
This is related to issue25063 (https://bugs.python.org/issue25063). Trying to use sendfile internally in shutil.copyfileobj was considered risky because of special Python files that expose a file descriptor but wrap it to add special behavior (eg: GzipFile). I believe such risk does not exist for shutil.copyfile, and it would be possible to use sendfile if available. |
|||
| msg251246 - (view) | Author: desbma (desbma) * | Date: 2015年09月21日 18:16 | |
Additional advantage of calling sendfile from shutil.copyfile: other fonctions in shutil module would automatically benefit from the use of senfile because they call copyfile directly (copy, copy2) or indirectly (copytree). So for example, the performance of shutil.copytree should be improved for free for directory trees containing big files. |
|||
| msg253643 - (view) | Author: desbma (desbma) * | Date: 2015年10月28日 23:41 | |
Thoughts anyone? Here is a patch that implements the change. My tests show a 30-40% performance improvement for 128KB-512MB single file copy: 128 KB file copy: $ dd if=/dev/urandom of=/tmp/f1 bs=1K count=128 Without the patch: $ ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)' 10000 loops, best of 3: 109 usec per loop With the patch: $ ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)' 10000 loops, best of 3: 75.7 usec per loop -------- 8 MB file copy: $ dd if=/dev/urandom of=/tmp/f1 bs=1M count=8 Without the patch: $ ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)' 100 loops, best of 3: 4.99 msec per loop With the patch: $ ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)' 100 loops, best of 3: 3.03 msec per loop -------- 512 MB file copy: $ dd if=/dev/urandom of=/tmp/f1 bs=1M count=512 Without the patch: $ ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)' 10 loops, best of 3: 305 msec per loop With the patch: $ ./python -m timeit -s 'import shutil; p1 = "/tmp/f1"; p2 = "/tmp/f2"' 'shutil.copyfile(p1, p2)' 10 loops, best of 3: 178 msec per loop |
|||
| msg253648 - (view) | Author: Josh Rosenberg (josh.r) * (Python triager) | Date: 2015年10月29日 00:51 | |
Adding interested parties from earlier ticket. |
|||
| msg253649 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年10月29日 01:18 | |
I’ve never used sendfile() nor shutil.copyfile(), but my immediate reaction is maybe we need a backup plan if os.sendfile() is available but not supported in the circumstances. E.g. if it is practical to use copyfile() to copy from a named socket in the filesystem, the Linux man page <http://man7.org/linux/man-pages/man2/sendfile.2.html> says it will raise EINVAL in this case. Maybe a test case would be good to prove this is still handled. |
|||
| msg253650 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年10月29日 01:24 | |
Also, the os.sendfile() doc suggests that some platforms only support writing to sockets, so I definitely think a backup plan is needed. |
|||
| msg253669 - (view) | Author: desbma (desbma) * | Date: 2015年10月29日 09:36 | |
Thanks for the comment. > Also, the os.sendfile() doc suggests that some platforms only support writing to sockets, so I definitely think a backup plan is needed. You are right, the man page clearly says: > Applications may wish to fall back to read(2)/write(2) in the case > where sendfile() fails with EINVAL or ENOSYS. I will improve the code and add tests for conditions where sendfile fails. I have tested it manually, but I will also add a test with a copy of a file > 4GB (it causes several calls to sendfile). |
|||
| msg253701 - (view) | Author: desbma (desbma) * | Date: 2015年10月29日 23:25 | |
I played a bit with Unix domain sockets, and it appears you can not open them like a file with open(). So they do no work with the current implementation of shutil.copyfile anyway. |
|||
| msg253767 - (view) | Author: desbma (desbma) * | Date: 2015年10月30日 22:53 | |
Here is an improved patch with the following changes: * Fallback to copyfileobj if sendfile fails with errno set to EINVAL or ENOSYS * Add a test for > 4GB file |
|||
| msg253778 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年10月31日 07:36 | |
I left a few comments. But it might be good if someone more familiar with the APIs could review this. Have you seen the socket.sendfile() implementation? It’s a bit of a mess, and the circumstances are slightly different, but it may offer partial inspiration. It seems a shame to have two separate high-level sendfile() wrappers, but I guess the use cases are just a little too far apart to be worth sharing much code. For the test case, it may be worth skipping the test if you run out of disk space. Similar to test_mmap and test_largefile. |
|||
| msg253781 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2015年10月31日 08:08 | |
Also, man pages for Free BSD and OS X (where writing to a disk file is not supported) say it raises: * ENOTSUP if "the ‘fd’ argument does not refer to a regular file" * EBADF if "the ‘s’ argument is not a valid socket descriptor" * ENOTSOCK if "the ‘s’ argument is not a socket" * EOPNOTSUPP if "the file system for descriptor fd does not support sendfile()" It is not clear what the priority of these errors is, so it might be safest to catch them all. But I wouldn’t catch any arbitrary OSError, because you may end up doing weird double copying or something for an out-of-space error. |
|||
| msg253790 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2015年10月31日 14:29 | |
I'm not at all sure this is worth the maintenance burden at this point in time. So let's say I'm -0.5. I agree that a review and opinion by someone more familiar with the API would be best. I'm adding gps as nosy since this feels like the kind of performance improvement he might find interesting, so if he thinks it is worth it I'll change my vote :) |
|||
| msg253799 - (view) | Author: desbma (desbma) * | Date: 2015年10月31日 16:13 | |
Here is an updated patch that takes into account Martin's suggestions, both here and in the code review. |
|||
| msg255769 - (view) | Author: desbma (desbma) * | Date: 2015年12月02日 20:13 | |
Ping A small patch, but a good performance improvement :) |
|||
| msg255908 - (view) | Author: desbma (desbma) * | Date: 2015年12月05日 00:14 | |
Here is a new patch, with changes suggested by SilentGhost and josh.rosenberg in the review. |
|||
| msg255981 - (view) | Author: desbma (desbma) * | Date: 2015年12月05日 21:48 | |
Thank you SilentGhost for the second review on the v4 patch. Attached is the v5 patch which hopefully is getting even better. |
|||
| msg255983 - (view) | Author: SilentGhost (SilentGhost) * (Python triager) | Date: 2015年12月05日 21:56 | |
No further comments from me. I haven't run the test, but I trust it passes without any warnings. |
|||
| msg257345 - (view) | Author: desbma (desbma) * | Date: 2016年01月02日 11:55 | |
Can this patch be merged, or is there something I can do to improve it? |
|||
| msg259808 - (view) | Author: desbma (desbma) * | Date: 2016年02月07日 23:03 | |
If anyone is interested, I have created a package to monkey patch shutil.copyfile to benefit from sendfile, similarly to the last patch, but it also works on older Python versions down to 2.7. PyPI link: https://pypi.python.org/pypi/pyfastcopy/ |
|||
| msg317628 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2018年05月24日 22:03 | |
Different implementation: bpo-33639. |
|||
| msg322489 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2018年07月27日 13:13 | |
Closing as duplicate of #33671. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:21 | admin | set | github: 69343 |
| 2018年07月27日 13:26:04 | berker.peksag | set | superseder: Efficient zero-copy for shutil.copy* functions (Linux, OSX and Win) |
| 2018年07月27日 13:13:35 | giampaolo.rodola | set | status: open -> closed resolution: duplicate messages: + msg322489 stage: patch review -> resolved |
| 2018年05月24日 22:03:49 | vstinner | set | nosy:
+ vstinner messages: + msg317628 |
| 2018年05月19日 11:25:56 | giampaolo.rodola | set | nosy:
+ giampaolo.rodola |
| 2016年02月07日 23:03:57 | desbma | set | messages: + msg259808 |
| 2016年01月02日 11:55:27 | desbma | set | messages: + msg257345 |
| 2015年12月05日 21:56:29 | SilentGhost | set | nosy:
+ SilentGhost messages: + msg255983 |
| 2015年12月05日 21:48:05 | desbma | set | files:
+ issue25156_v5.patch messages: + msg255981 |
| 2015年12月05日 00:14:21 | desbma | set | files:
+ issue25156_v4.patch messages: + msg255908 |
| 2015年12月02日 20:13:59 | desbma | set | messages: + msg255769 |
| 2015年10月31日 16:13:50 | desbma | set | files:
+ issue25156_v3.patch messages: + msg253799 |
| 2015年10月31日 14:29:39 | r.david.murray | set | nosy:
+ gps messages: + msg253790 |
| 2015年10月31日 08:08:00 | martin.panter | set | messages: + msg253781 |
| 2015年10月31日 07:36:33 | martin.panter | set | messages: + msg253778 |
| 2015年10月30日 22:53:31 | desbma | set | files:
+ issue25156_v2.patch messages: + msg253767 |
| 2015年10月29日 23:25:28 | desbma | set | messages: + msg253701 |
| 2015年10月29日 09:36:16 | desbma | set | messages: + msg253669 |
| 2015年10月29日 01:24:17 | martin.panter | set | messages:
+ msg253650 stage: patch review |
| 2015年10月29日 01:18:13 | martin.panter | set | messages: + msg253649 |
| 2015年10月29日 00:51:00 | josh.r | set | nosy:
+ r.david.murray, martin.panter, josh.r messages: + msg253648 |
| 2015年10月28日 23:41:24 | desbma | set | files:
+ issue25156.patch keywords: + patch messages: + msg253643 |
| 2015年09月21日 18:16:17 | desbma | set | messages: + msg251246 |
| 2015年09月17日 20:08:05 | desbma | create | |