homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 does not download 4 MB file completely using ftp
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: bbrazil, ctheune, giampaolo.rodola, iritkatriel, orsenthil, pitrou, python-dev, r.david.murray, sspapilin
Priority: normal Keywords: patch

Created on 2012年06月04日 20:14 by sspapilin, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py sspapilin, 2012年06月04日 20:14 Downloads file into stdout
issue15002-tcpdump-early-fin bbrazil, 2012年07月07日 12:12
d3c6ab639306.diff ctheune, 2014年04月16日 17:30 review
issue15002.patch orsenthil, 2014年04月19日 07:34 review
Repositories containing patches
https://bitbucket.org/ctheune/cpython#3.4-15002
Messages (17)
msg162287 - (view) Author: (sspapilin) Date: 2012年06月04日 20:14
File test.py is
#!/usr/bin/env python
import urllib2
print urllib2.urlopen('ftp://ftp.ripe.net/pub/stats/ripencc/delegated-ripencc-extended-latest').read()
When I issue
python test.py > out.txt
, I get file about 100KB in size, the beginning of the actual file. I repeated it a hundred times, and almost every time I get 98305 byte file, and a couple of times a 49153 bytes or 188417 bytes file.
When I replace urllib2 with urllib in test.py, I get full (4 MB) file.
I have Ubuntu 12.04 64-bit, Python 2.7.3 (from default Ubuntu repository, up-to-date as of 4-june-2012) and slow, 64KB/s, Internet connection.
However, I asked my friend with Windows and faster connection to check it, and he got partial download as well, while he had another size of partial file (50109 bytes). I do not know his OS ant Python versions.
msg162319 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012年06月05日 02:17
The same problem exists in Python3.
msg162320 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012年06月05日 02:22
That's surprising! I shall test it with http debug mode and see what's happening.
msg164828 - (view) Author: Brian Brazil (bbrazil) * Date: 2012年07月07日 12:12
I've tested this on head, and the issue appears to be buggy ftp code in python.
From the attached tcpdump for fetching delegated-ripencc-20120706:
12:57:19.933607 IP myhost.39627 > ftp.ripe.net.ftp: Flags [.], ack 511, win 115, options [nop,nop,TS val 129353190 ecr 1632444059], length 0
12:57:19.934853 IP myhost.39627 > ftp.ripe.net.ftp: Flags [F.], seq 97, ack 511, win 115, options [nop,nop,TS val 129353191 ecr 1632444059], length 0
and a bit later:
12:57:20.043701 IP ftp.ripe.net.42707 > myhost.50818: Flags [.], seq 46337:47785, ack 1, win 227, options [nop,nop,TS val 2552550247 ecr 129353204], length 1448
12:57:20.043717 IP myhost.50818 > ftp.ripe.net.42707: Flags [.], ack 47785, win 353, options [nop,nop,TS val 129353218 ecr 2552550247], length 0
12:57:20.043816 IP ftp.ripe.net.42707 > myhost.50818: Flags [FP.], seq 47785:49153, ack 1, win 227, options [nop,nop,TS val 2552550247 ecr 129353204], length 1368
12:57:20.043992 IP myhost.50818 > ftp.ripe.net.42707: Flags [F.], seq 1, ack 49154, win 376, options [nop,nop,TS val 129353218 ecr 2552550247], length 0
12:57:20.094067 IP ftp.ripe.net.42707 > myhost.50818: Flags [.], ack 2, win 227, options [nop,nop,TS val 2552550299 ecr 129353218], length 0
As you can see we're sending a FIN without sending a close command to the control connection, and in response the server stops sending data about 49k in. Per RFC 959 section 2.3: "The server may abort data transfer if the control connections are closed without command." so this is acceptable behaviour on the part of the server, and means we need to keep the control connection open for longer.
msg164833 - (view) Author: Brian Brazil (bbrazil) * Date: 2012年07月07日 12:48
More particularly, the ftpwrapper's ftp member is being GCed sometime after FtpHandler.ftp_open returns.
msg216465 - (view) Author: Christian Theune (ctheune) * Date: 2014年04月16日 13:26
Looking into this. 
It seems that it doesn't happen for all servers, I can download large files reliably from other sources.
I'll make another wireshark recording to get more details for me to analyze.
msg216466 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014年04月16日 13:28
> I'll make another wireshark recording to get more details for me to analyze.
Thank you! That will be useful. Please test it against 3.x version as it has seen cleanups recently.
msg216472 - (view) Author: Christian Theune (ctheune) * Date: 2014年04月16日 14:22
This is actually the same problem as #18879.
Changing the sample to keep a reference to the addinfourl object avoids this issue.
This is even worse than #18879 in the sense that the error goes undetected and just leaves you with partial data.
Looking at the solution in #18879 I think we can reuse that, maybe even better by refactoring that to a common file proxy object.
msg216507 - (view) Author: Christian Theune (ctheune) * Date: 2014年04月16日 17:27
I wasn't able to come up with a good testcase. :(
I tried similar approaches as in #18879 but I wasn't able to make them trigger the behaviour as it also seems to be an issue regarding actual network performance ... :/
Backport to 2.7 is currently missing as I'd need #18879 to be backported. If that is OK (I'd like to have this in 2.7) then I'd be happy to port both.
msg216508 - (view) Author: Christian Theune (ctheune) * Date: 2014年04月16日 17:34
Antoine, I'm adding you here as I'm leveraging your patch from #18879. 
I'd need some feedback about the backport, but this patch should be OK for 3.4. Also, if you had an idea how to test this - I tried, but failed so far.
msg216509 - (view) Author: Christian Theune (ctheune) * Date: 2014年04月16日 17:35
Antoine, could you check my last comment in here?
(The nosy list got reset accidentally when I made that comment and got a conflict from the tracker).
msg216511 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014年04月16日 17:39
Christian , with respect to patch, I agree with the logic (using something similar to #18879). Does all current unittests succeed with this? (I suspect not) A unittest for coverage would be helpful.
msg216513 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014年04月16日 17:42
Well, this looks ok on the principle, but I haven't investigated the urllib-specific parts, so I'll let Senthil delve into this :)
msg216849 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014年04月19日 07:34
Christian's patch is good.It helps in setting the socket.makefile file descriptor to a well behaving well file close wrapper and thus will help us prevent some tricky fd close issues.
I added tests for coverage to ensure that we are asserting the type and covering all the methods in urllib.response. Attaching the patch built on top of Chritians one.
msg216910 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年04月20日 16:45
New changeset bb71b71322a3 by Senthil Kumaran in branch '3.4':
urllib.response object to use _TemporaryFileWrapper (and _TemporaryFileCloser)
http://hg.python.org/cpython/rev/bb71b71322a3
New changeset 72fe23edfec6 by Senthil Kumaran in branch '3.4':
NEWS entry for #15002
http://hg.python.org/cpython/rev/72fe23edfec6
New changeset 8c8315bac6a8 by Senthil Kumaran in branch 'default':
merge 3.4
http://hg.python.org/cpython/rev/8c8315bac6a8 
msg216911 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014年04月20日 16:47
This is fixed in 3.4 and 3.5. I will backport to 2.7 ( I think, it is worth it).
msg414287 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022年03月01日 18:22
2.7 backport is no longer relevant.
History
Date User Action Args
2022年04月11日 14:57:31adminsetgithub: 59207
2022年03月01日 18:22:22iritkatrielsetstatus: open -> closed


messages: + msg414287
nosy: + iritkatriel
2014年04月20日 16:47:02orsenthilsetresolution: fixed
messages: + msg216911
stage: patch review -> resolved
2014年04月20日 16:45:09python-devsetnosy: + python-dev
messages: + msg216910
2014年04月19日 07:34:44orsenthilsetfiles: + issue15002.patch

messages: + msg216849
2014年04月16日 17:42:36pitrousetmessages: + msg216513
2014年04月16日 17:39:03orsenthilsetmessages: + msg216511
2014年04月16日 17:35:29ctheunesetnosy: + pitrou
messages: + msg216509
2014年04月16日 17:34:33ctheunesetmessages: + msg216508
2014年04月16日 17:31:43twouterssetstage: needs patch -> patch review
2014年04月16日 17:30:29ctheunesetfiles: + d3c6ab639306.diff
keywords: + patch
2014年04月16日 17:28:05ctheunesethgrepos: + hgrepo237
2014年04月16日 17:27:17ctheunesetmessages: + msg216507
2014年04月16日 17:20:33ctheunesetversions: + Python 3.4, Python 3.5, - Python 3.2, Python 3.3
2014年04月16日 14:22:37ctheunesetmessages: + msg216472
2014年04月16日 13:28:49orsenthilsetmessages: + msg216466
2014年04月16日 13:26:40ctheunesetnosy: + ctheune
messages: + msg216465
2012年07月07日 12:48:02bbrazilsetmessages: + msg164833
2012年07月07日 12:17:00pitrousetnosy: + giampaolo.rodola
2012年07月07日 12:12:26bbrazilsetfiles: + issue15002-tcpdump-early-fin
nosy: + bbrazil
messages: + msg164828

2012年06月05日 02:22:07orsenthilsetassignee: orsenthil
messages: + msg162320
2012年06月05日 02:17:40r.david.murraysetversions: + Python 3.2, Python 3.3
nosy: + r.david.murray, orsenthil

messages: + msg162319

stage: needs patch
2012年06月04日 20:17:36sspapilinsettitle: urllib2 does not download 4 MB file fully using ftp -> urllib2 does not download 4 MB file completely using ftp
2012年06月04日 20:14:41sspapilincreate

AltStyle によって変換されたページ (->オリジナル) /