This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年03月06日 16:10 by carljm, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| urllib-issue.patch | carljm, 2008年03月06日 16:09 | patch | ||
| urllib_issue_updated.patch | ysj.ray, 2010年07月20日 08:24 | patch against the trunk. | review | |
| urllib_issue_updated.diff | ysj.ray, 2010年07月20日 08:43 | patch against py3k | review | |
| urllib_ftptests_doubleencode.patch | ted.turocy, 2010年08月01日 01:23 | patch against py3k to add test cases exhibiting original bug. | review | |
| Messages (10) | |||
|---|---|---|---|
| msg63324 - (view) | Author: Carl Meyer (carljm) * | Date: 2008年03月06日 16:09 | |
Both urllib and urllib2 call urllib.unquote() multiple times on data in the userinfo section of an FTP URL. One call occurs at the end of the urllib.splituser() function. In urllib, the other call appears in URLOpener.open_ftp(). In urllib2, the other two occur in FTPHandler.ftp_open() and Request.get_host(). The effect of this is that if the userinfo section of an FTP url should need to contain a literal % sign followed by two digits, the % sign must be double-encoded as %2525 (for urllib) or triple-encoded as %252525 (for urllib2) in order for the URL to be accessed. The proper behavior would be to only ever unquote a given data segment once. The W3's URI: Generic Syntax RFC (http://gbiv.com/protocols/uri/rfc/rfc3986.html) addresses this very issue in section 2.4 (When to Encode or Decode): "Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string." The solution would be to standardize where in urllib and urllib2 the unquoting happens, and then make sure it happens nowhere else. I'm not familiar enough with the libraries to know where it should be removed without possibly breaking other behavior. It seems that just removing the map/unquote call in urllib.splituser() would fix the problem in urllib. I would guess the call in urllib2 Request.get_host() should also be removed, as the RFC referenced above says clearly that only individual data segments of the URL should be decoded, not larger portions that might contain delimiters (: and @). I've attached a patchset for these suggested changes. Very superficial testing suggests that the patch doesn't break anything obvious, but I make no guarantees. |
|||
| msg64201 - (view) | Author: Sean Reifschneider (jafo) * (Python committer) | Date: 2008年03月20日 20:23 | |
Fred: You most recently touched the code impacted by this test, does this sound reasonable? |
|||
| msg110858 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年07月20日 03:39 | |
Re-assigned as per maintainers list. |
|||
| msg110872 - (view) | Author: ysj.ray (ysj.ray) | Date: 2010年07月20日 07:39 | |
By confirming the "%2525" case, I guess this is really a bug. But the patch cause test_urllib2.py failed. I modified the patch to fix it. |
|||
| msg110875 - (view) | Author: ysj.ray (ysj.ray) | Date: 2010年07月20日 08:43 | |
This bug exists on py3k also, so I worked out a patch for py3k, too. |
|||
| msg110876 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年07月20日 08:46 | |
Thanks Ray Allen. Usually the patch against the py3k branch is enough, it will be ported to other branches. |
|||
| msg112248 - (view) | Author: Theodore Turocy (ted.turocy) | Date: 2010年08月01日 01:23 | |
I am attaching a patch against py3k which adds some tests to demonstrate the original bug when issuing a Request() to a ftp:// URL. To do this, the tests add checks for user and passwd. The previous version of checks asserted that the .user and .passwd of the returned request should be "". Checking .user is necessary to verify the original bug. I was confused by a comment in the fixture, "ftp authentication not yet implemented by FTPHandler", which appeared to justify the assumption that .user and .passwd must be "". This may be true, but .user and .passwd are being set by the Request. The test includes the minimal augmentation to extend on the original behavior of the test. Someone with greater knowledge than I might look at that comment to see if it is out-of-date, or simply too vague. |
|||
| msg121257 - (view) | Author: Jessica McKellar (jesstess) * (Python triager) | Date: 2010年11月16日 01:17 | |
I can confirm that the combination of urllib_issue_updated.diff and urllib_ftptests_doubleencode.patch apply cleanly against py3k, that the added tests exercise the described bug, and that the full test suite passes after applying the patches. The patches look clean and conforming to PEP 8. |
|||
| msg121476 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年11月18日 16:47 | |
Fixed in r86520 (py3k) and r86522 (release31-maint). unquote happens at the first level inside Request class, _parse method. Shall port to py2.7. |
|||
| msg121611 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2010年11月20日 11:24 | |
back-ported to release27-maint in r86554. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:31 | admin | set | github: 46497 |
| 2010年11月20日 11:24:35 | orsenthil | set | status: open -> closed messages: + msg121611 |
| 2010年11月18日 16:47:46 | orsenthil | set | nosy:
- BreamoreBoy messages: + msg121476 resolution: fixed stage: patch review -> resolved |
| 2010年11月16日 01:17:32 | jesstess | set | nosy:
+ jesstess messages: + msg121257 |
| 2010年08月01日 01:28:07 | r.david.murray | set | type: resource usage -> behavior stage: test needed -> patch review |
| 2010年08月01日 01:23:56 | ted.turocy | set | files:
+ urllib_ftptests_doubleencode.patch nosy: + ted.turocy messages: + msg112248 |
| 2010年07月31日 17:39:57 | dstanek | set | nosy:
+ dstanek |
| 2010年07月20日 08:46:11 | orsenthil | set | messages: + msg110876 |
| 2010年07月20日 08:43:55 | ysj.ray | set | files:
+ urllib_issue_updated.diff messages: + msg110875 |
| 2010年07月20日 08:24:15 | ysj.ray | set | files: + urllib_issue_updated.patch |
| 2010年07月20日 08:23:55 | ysj.ray | set | files: - urllib_issue_updated.patch |
| 2010年07月20日 07:39:34 | ysj.ray | set | files:
+ urllib_issue_updated.patch nosy: + ysj.ray messages: + msg110872 |
| 2010年07月20日 03:39:47 | BreamoreBoy | set | versions:
+ Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 3.0 nosy: + BreamoreBoy messages: + msg110858 assignee: fdrake -> orsenthil |
| 2009年04月22日 17:22:30 | ajaksu2 | set | keywords: + easy |
| 2009年02月13日 02:04:12 | ajaksu2 | set | nosy: + jjlee |
| 2009年02月12日 18:00:36 | ajaksu2 | set | nosy:
+ orsenthil stage: test needed versions: + Python 3.0, - Python 2.5 |
| 2008年03月20日 20:23:47 | jafo | set | versions:
+ Python 2.6 nosy: + fdrake, jafo messages: + msg64201 priority: normal assignee: fdrake type: behavior -> resource usage |
| 2008年03月06日 16:10:03 | carljm | create | |