This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年12月09日 06:56 by giampaolo.rodola, last changed 2022年04月11日 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| ftplib-sendfile.patch | giampaolo.rodola, 2011年12月09日 06:56 | initial draft | review | |
| ftplib-sendfile2.patch | giampaolo.rodola, 2013年03月07日 12:28 | nobytes = fiilesize | review | |
| ftplib-sendfile3.patch | giampaolo.rodola, 2013年03月08日 19:22 | use poll(); sock timeout; look for EMFILE; | review | |
| ftplib-sendfile4.patch | giampaolo.rodola, 2013年03月09日 16:08 | do not set non-blocking; factor out support functions; improve test | review | |
| ftplib-sendfile5.patch | giampaolo.rodola, 2014年06月11日 12:29 | uses socket.sendfile() | review | |
| Messages (32) | |||
|---|---|---|---|
| msg149076 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2011年12月09日 06:56 | |
In attachment. This is actually just an excuse to store the patch somewhere and possibly collect opinions as I don't really think this should go in because: - it's UNIX only - as such, deciding whether using sendfile() should probably be done silently (no explicit argument) - on the other hand, I don't think it's safe to decide this automatically because: - the input fd should be a regular file and it's not clear how to determine this beforehand - in case of disconnection OSError is raised instead of socket.error (minor backward compatibility issue) |
|||
| msg149170 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年12月10日 16:40 | |
> deciding whether using sendfile() should probably be done silently (no explicit argument) As an optimization taking advantage from OS support, I think this should be automatic too. But if there are too many issues, then explicit argument sounds better. > the input fd should be a regular file and it's not clear how to determine this beforehand Calling some function like os.stat that only works with real files? (not sure os.stat is the right function, just giving an idea) > in case of disconnection OSError is raised instead of socket.error PEP 3151! >>> socket.error, OSError, IOError (<class 'OSError'>, <class 'OSError'>, <class 'OSError'>) :) |
|||
| msg149237 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月11日 18:52 | |
os.fstat wouldn't work since it succeeds with non-"regular" files, e.g. standard I/O: >>> os.fstat(0) posix.stat_result(st_mode=8592, st_ino=5, st_dev=11, st_nlink=1, st_uid=500, st_gid=5, st_size=0, st_atime=1323629303, st_mtime=1323629303, st_ctime=1323628616) I think the best solution is to call sendfile() and catch OSError, then fallback on the generic loop. However, you must also guard against fileno() failing: >>> io.BytesIO().fileno() Traceback (most recent call last): File "<stdin>", line 1, in <module> io.UnsupportedOperation: fileno |
|||
| msg183599 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月06日 16:13 | |
Here's the result of a benchmark sending a 1GB file over a Gb/s ethernet network: vanilla: real 0m9.446s user 0m0.493s sys 0m1.425s sendfile: real 0m9.143s user 0m0.055s sys 0m0.986s The total time doesn't vary much (the reduction above is just jitter). But it reduces user+sys time by almost a factor of 2. Note that is changed Giampaolo's patch to call sendfile on the whole file, not by block. |
|||
| msg183601 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月06日 16:31 | |
> Note that is changed Giampaolo's patch to call sendfile > on the whole file, not by block. That's not compatible across POSIX platforms. |
|||
| msg183602 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月06日 16:40 | |
> That's not compatible across POSIX platforms. What do you mean ? I juste call fstat() before calling senfile() to find out the file size, and pass it as `count`. |
|||
| msg183603 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月06日 16:44 | |
Ah ok, I misinterpreted what you wrote then. Generally speaking though, you don't need to know the file size: you just decide a blocksize (>= 65536 is usually ok) and use sendfile() as you use send(). |
|||
| msg183606 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月06日 16:55 | |
> Ah ok, I misinterpreted what you wrote then. > Generally speaking though, you don't need to know the file size: you just decide a blocksize (>= 65536 is usually ok) and use sendfile() as you use send(). But then you make much more syscalls than what's necessary, and your heuristic for the block size is never going to match the choice the kernel makes. Anyway, here are the numbers, do you think it's interesting to merge (I mean, you're the Python ftp expert ;-) ? |
|||
| msg183607 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月06日 18:09 | |
Specifying a big blocksize doesn't mean the transfer will be faster. send/sendfile won't send more than a certain amount of bytes anyways. If I'm not mistaken I recall from previous benchmarks that after a certain point (131072 or something) increasing the blocksize results in equal or even worse performances. Another thing I don't like is that by doing so you implicitly assume that the file is "fstat-eable". I don't know if there are cases where it's not, but the less assumptions we do the better. Note: I'm sure that for both send() and sendfile() blocksize>=65536 is faster than blocksize=8192 (the current default) so it probably makes sense to change that (I'll file a separate issue). |
|||
| msg183609 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月06日 19:49 | |
> Specifying a big blocksize doesn't mean the transfer will be faster. > send/sendfile won't send more than a certain amount of bytes anyways. The transfer won't be faster mainly because it's really I/O bound. But it will use less CPU, only because you're making less syscalls. > If I'm not mistaken I recall from previous benchmarks that after a certain point (131072 or something) increasing the blocksize results in equal or even worse performances. I can perfectly believe this for a send loop, maybe because you're exceeding the socket buffer size, or because your working set doesn't fit into caches anymore, etc. But for sendfile(), I don't see how calling it repeatedly could not be slower than calling it once with the overall size: that's how netperf and vsftpd use it, and probably others. > Another thing I don't like is that by doing so you implicitly assume that the file is "fstat-eable". I don't know if there are cases where it's not, but the less assumptions we do the better. Well, the file must be mmap-able, so I doubt fstat() is the biggest concern... |
|||
| msg183622 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月06日 23:53 | |
> The transfer won't be faster mainly because it's really I/O bound. > But it will use less CPU, only because you're making less syscalls. Have you actually measured this? |
|||
| msg183637 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月07日 07:58 | |
>> The transfer won't be faster mainly because it's really I/O bound. >> But it will use less CPU, only because you're making less syscalls. > > Have you actually measured this? """ vanilla over Gb/s: real 0m9.035s user 0m0.523s sys 0m1.412s block-sendfile over Gb/s: real 0m9.683s user 0m0.253s sys 0m1.212s full-sendfile over Gb/s: real 0m9.014s user 0m0.059s sys 0m1.000s """ As you can see, the throughput doesn't vary (the difference in "real time" is just part of the variance). However, the CPU usage (user+sys) is less for block-sendfile than send loop, and less for full-sendfile than block-sendfile. """ vanilla over loopback: real 0m3.200s user 0m0.541s sys 0m0.702s block-sendfile over loopback: real 0m2.713s user 0m0.248s sys 0m0.197s full-sendfile over loopback: real 0m1.718s user 0m0.055s sys 0m0.082s """ Same thing for loopback, except that here, zero-copy makes a difference on the throughput because we're not I/O bound, but really CPU/memory bound (and here sendfile of the complete file really outperforms block-sendfile). I don't have access to a 10Gb/s network, but basic math hints that sendfile could make a difference on the overall throughput. |
|||
| msg183654 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月07日 12:28 | |
It seems you're right, sorry. We need to take that into account then. In the meantime I rewrote the original patch and got rid of the "use_sendfile" explicit argument in order to attempt to use sendfile() by default and fall back on using send() if bytes sent were 0. TSL_FTP related changes are still left out for now as I'm planning a little refactoring first. |
|||
| msg183655 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月07日 13:02 | |
> In the meantime I rewrote the original patch and got rid of the "use_sendfile" explicit argument in order to attempt to use sendfile() by default and fall back on using send() if bytes sent were 0. """ # block until socket is writable select.select([], [sockno], []) """ I don't get it, why do you use select? |
|||
| msg183666 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月07日 14:51 | |
It's necessary because sendfile() can fail with EAGAIN. As for your "blocksize = filesize" argument I changed my opinion: despite being less CPU consuming we might incur into problems if that number is too big. 'count' parameter on Linux, for example, is expected to be an unsigned int. Other plarforms will also use different data types so we better stick with a fixed blocksize value (currently 8192). |
|||
| msg183669 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月07日 15:07 | |
> It's necessary because sendfile() can fail with EAGAIN. It can fail with EAGAIN if the input FD is non-blocking, exactly like the current implementation which calls fp.read(). Furthermore, since sendfile actually supports only regular file and regular files don't support non-blocking I/O, it's unlikely to ever happen. > As for your "blocksize = filesize" argument I changed my opinion: despite being less CPU consuming we might incur into problems if that number is too big. 'count' parameter on Linux, for example, is expected to be an unsigned int. 'count' is size_t, like for mmap() and any other function accepting a length, so nothing wrong can happen. A platform which would have a sendfile prototype which doesn't support sending a complete file at once would be completely broken... |
|||
| msg183670 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年03月07日 15:27 | |
> As for your "blocksize = filesize" argument I changed my opinion: > despite being less CPU consuming we might incur into problems if > that number is too big. 'count' parameter on Linux, for example, is > expected to be an unsigned int. > Other plarforms will also use different data types so we better stick > with a fixed blocksize value (currently 8192). If you really think a blocksize is necessary, you could choose a much larger one for sendfile() (such as 16 MB). Then the overhead of system calls would be much smaller. |
|||
| msg183673 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月07日 15:46 | |
> 'count' is size_t, like for mmap() and any other function accepting a length, so nothing wrong can happen. Then why 'offset' and 'count' parameters have a different data type? Linux: sendfile(..., off_t *offset, size_t count); Solaris: sendfile(..., off_t *off, size_t len); HP-UX: sendfile(..., off_t offset, bsize_t nbytes); > A platform which would have a sendfile prototype which doesn't support > sending a complete file at once would be completely broken... You can't send a complete file at once in the first place unless it's very small. The usual way to send a file is chunk by chunk, so it wouldn't surprise me if sendfile() prototype does not support the use case you're describing. Anyway, Antoine's suggestion makes sense to me: it's probably ok to just use a big value and be done with it. 16MB looks a little bit too much to me as the maximum amount of bytes sent per call is a lot less than 1MB, but even then it would probably be ok. >> It's necessary because sendfile() can fail with EAGAIN. > It can fail with EAGAIN if the input FD is non-blocking It will. Try it yourself. > Furthermore, since sendfile actually supports only regular file and regular > files don't support non-blocking I/O, it's unlikely to ever happen. EAGAIN is caused by the socket fd not being ready yet, not the file fd. Please try the patch before making such assumptions. We're going OT here. |
|||
| msg183680 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年03月07日 16:38 | |
> Then why 'offset' and 'count' parameters have a different data type? Because offsets can be negative (e.g. for lseek), while a size can't. That's why 'count' is size_t, not ssize_t. >> Furthermore, since sendfile actually supports only regular file and regular >> files don't support non-blocking I/O, it's unlikely to ever happen. > > EAGAIN is caused by the socket fd not being ready yet, not the file fd. > Please try the patch before making such assumptions. I didn't see the socket could be set to non-blocking. In that case, there's a problem with the patch, since select can block arbitrarily long because it doesn't take the socket timeout into account. Also, apparently socket.sendall() doesn't retry on EAGAIN, it doesn't use BEGIN_SELECT_LOOP. The risk of false positive (EAGAIN after select reported ready) shouldn't be as bad as for sendto(), since usually you'll just get a partial write for a stream oriented socket, but this could be bad for e.g. a SCTP socket (since it's message-oriented). > We're going OT here. I'm leaving this topic, you can do as you like... |
|||
| msg183754 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月08日 19:03 | |
> Because offsets can be negative On Linux (and presumably on all POSIX platforms) passing a negative offset results in EINVAL. > In that case, there's a problem with the patch, since select can block > arbitrarily long because it doesn't take the socket timeout into > account. Right. I will fix that. > Also, apparently socket.sendall() doesn't retry on EAGAIN, > it doesn't use BEGIN_SELECT_LOOP. socket.sendall() is not supposed to return EAGAIN in the first place. And again, I don't see how this is related with the issue at hand. > I'm leaving this topic, you can do as you like... Bye. |
|||
| msg183755 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月08日 19:22 | |
A much larger patch which should address all issues is in attachment. Updates: - use poll() instead of select() whenever possible - take socket timeout into account - take SSL/FTPS into account - when using select() look for EMFILE in case num fds > FD_SETSIZE - look for (AttributeError, io.UnsupportedOperation) when invoking file.fileno() instead of Exception, which seemed too general |
|||
| msg183814 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年03月09日 13:09 | |
I don't understand why you must put the socket in non-blocking mode for sendfile(). Also I think the support code (_use_send() / _use_sendfile()) should be factored out somewhere. There could even be a socket.sendfile() method with the appropriate fallbacks? |
|||
| msg183827 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月09日 16:08 | |
> I don't understand why you must put the socket in > non-blocking mode for sendfile(). I did that mainly because I'm using select() / poll() and it seems kind of "natural" to set the socket in non-blocking mode (proftpd does the same). I'm not sure whether it actually makes any difference though (on Linux it works either way, blocking or not). I'm removing the non-blocking mode in the new attached patch. We can take take a look at the buildbots later and see how they behave. > Also I think the support code (_use_send() / _use_sendfile()) should be factored out somewhere. Agreed and turned them into methods. > There could even be a socket.sendfile() method with the appropriate fallbacks? Perhaps. The only thing which is not clear is how to deal with blocking vs. non-blocking sockets. Also, Windows should also be covered and expose TransmitFile. It's probably better to discuss this elsewhere. |
|||
| msg183828 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年03月09日 16:28 | |
> > I don't understand why you must put the socket in > > non-blocking mode for sendfile(). > > I did that mainly because I'm using select() / poll() and it seems > kind of "natural" to set the socket in non-blocking mode (proftpd does > the same). But why do you need to use select() / poll() ? Can't you just call sendfile() right from the start? |
|||
| msg183831 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月09日 16:41 | |
Because otherwise sendfile() fails with EAGAIN many times before sending any actual data. select() / poll() make sure the while loop awakens only when the socket is ready to be written (as opposed to continuously catching EAGAIN and wait for sendfile() to succeed). |
|||
| msg183832 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年03月09日 16:48 | |
> Because otherwise sendfile() fails with EAGAIN many times before > sending any actual data. EAGAIN on a blocking fd? Is it documented somewhere? The Linux man page for sendfile() says: EAGAIN Nonblocking I/O has been selected using O_NONBLOCK and the write would block. FreeBSD apparently says something similar: [EAGAIN] The socket is marked for non-blocking I/O and not all data was sent due to the socket buffer being filled. If specified, the number of bytes successfully sent will be returned in *sbytes. |
|||
| msg183834 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月09日 16:57 | |
I see. Well, what I'm experiencing right now if I remove the select() / poll() call is a sequence of EAGAIN errors alternated by successful sendfile() calls. Either the man page is wrong or I'm missing something. |
|||
| msg183835 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年03月09日 17:00 | |
> I see. Well, what I'm experiencing right now if I remove the > select() / poll() call is a sequence of EAGAIN errors alternated by > successful sendfile() calls. > Either the man page is wrong or I'm missing something. Weird. I guess it would be nice to dig a bit more :-) |
|||
| msg184881 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月21日 16:12 | |
After digging a bit further it seems EAGAIN occurs in case a timeout was previously set against the socket as in ftplib.FTP(..., timeout=2) (at least on Linux, FWICT). As such, we can debate whether avoid using select/poll if timeout was not set. I'll that a look at the man pages of the other POSIX platforms and figure whether EAGAIN is interpreted as on Linux. Other than that, the patch is reasonably ok to me and can be committed as-is and blocksize argument tuning can be discussed in a separate ticket. |
|||
| msg184882 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年03月21日 16:17 | |
> After digging a bit further it seems EAGAIN occurs in case a timeout was previously > set against the socket as in ftplib.FTP(..., timeout=2) (at least on Linux, FWICT). Ah, indeed. That's because socket timeout makes the underlying fd non-blocking. Which means there probably should be a higher-level sendfile() facility for sockets, taking into account the socket timeout... |
|||
| msg185293 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2013年03月26日 19:39 | |
I created issue 17552 in order to discuss about socket.sendfile() addition. |
|||
| msg220261 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) | Date: 2014年06月11日 12:28 | |
Updated patch which uses the newly added socket.sendfile() method (issue 17552). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:24 | admin | set | github: 57773 |
| 2016年09月08日 22:54:07 | christian.heimes | set | versions: + Python 3.6, Python 3.7, - Python 3.5 |
| 2014年06月11日 12:29:03 | giampaolo.rodola | set | files: + ftplib-sendfile5.patch |
| 2014年06月11日 12:28:42 | giampaolo.rodola | set | assignee: giampaolo.rodola type: enhancement -> performance messages: + msg220261 versions: + Python 3.5, - Python 3.4 |
| 2013年03月26日 19:39:46 | giampaolo.rodola | set | dependencies:
+ Add a new socket.sendfile() method messages: + msg185293 |
| 2013年03月21日 16:17:48 | pitrou | set | messages: + msg184882 |
| 2013年03月21日 16:12:00 | giampaolo.rodola | set | messages: + msg184881 |
| 2013年03月09日 17:00:02 | pitrou | set | messages: + msg183835 |
| 2013年03月09日 16:57:06 | giampaolo.rodola | set | messages: + msg183834 |
| 2013年03月09日 16:48:34 | pitrou | set | messages: + msg183832 |
| 2013年03月09日 16:41:27 | giampaolo.rodola | set | messages: + msg183831 |
| 2013年03月09日 16:28:37 | pitrou | set | messages: + msg183828 |
| 2013年03月09日 16:08:48 | giampaolo.rodola | set | files:
+ ftplib-sendfile4.patch messages: + msg183827 |
| 2013年03月09日 13:09:21 | pitrou | set | messages: + msg183814 |
| 2013年03月08日 19:22:46 | giampaolo.rodola | set | versions: + Python 3.4, - Python 3.3 |
| 2013年03月08日 19:22:34 | giampaolo.rodola | set | files:
+ ftplib-sendfile3.patch messages: + msg183755 |
| 2013年03月08日 19:03:26 | giampaolo.rodola | set | messages: + msg183754 |
| 2013年03月07日 16:39:29 | neologix | set | nosy:
- neologix |
| 2013年03月07日 16:38:57 | neologix | set | messages: + msg183680 |
| 2013年03月07日 15:46:57 | giampaolo.rodola | set | messages: + msg183673 |
| 2013年03月07日 15:27:51 | pitrou | set | messages: + msg183670 |
| 2013年03月07日 15:07:41 | neologix | set | messages: + msg183669 |
| 2013年03月07日 14:51:12 | giampaolo.rodola | set | messages: + msg183666 |
| 2013年03月07日 13:02:54 | neologix | set | messages: + msg183655 |
| 2013年03月07日 12:28:33 | giampaolo.rodola | set | files:
+ ftplib-sendfile2.patch messages: + msg183654 |
| 2013年03月07日 07:58:11 | neologix | set | messages: + msg183637 |
| 2013年03月06日 23:53:07 | giampaolo.rodola | set | messages: + msg183622 |
| 2013年03月06日 19:49:04 | neologix | set | messages: + msg183609 |
| 2013年03月06日 18:09:57 | giampaolo.rodola | set | messages: + msg183607 |
| 2013年03月06日 16:55:36 | neologix | set | messages: + msg183606 |
| 2013年03月06日 16:44:03 | giampaolo.rodola | set | messages: + msg183603 |
| 2013年03月06日 16:40:18 | neologix | set | messages: + msg183602 |
| 2013年03月06日 16:31:52 | giampaolo.rodola | set | messages: + msg183601 |
| 2013年03月06日 16:13:55 | neologix | set | nosy:
+ neologix messages: + msg183599 |
| 2011年12月11日 18:52:44 | pitrou | set | nosy:
+ pitrou messages: + msg149237 |
| 2011年12月10日 16:40:36 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg149170 |
| 2011年12月10日 13:33:03 | rosslagerwall | set | nosy:
+ rosslagerwall |
| 2011年12月09日 06:56:06 | giampaolo.rodola | create | |