This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年05月29日 04:04 by Nam.Nguyen, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 1849 | merged | Nam.Nguyen, 2017年05月29日 04:04 | |
| PR 1850 | closed | Nam.Nguyen, 2017年05月29日 09:58 | |
| PR 2289 | merged | vstinner, 2017年06月20日 13:12 | |
| PR 2290 | merged | vstinner, 2017年06月20日 13:14 | |
| PR 2291 | merged | vstinner, 2017年06月20日 13:19 | |
| PR 2292 | merged | vstinner, 2017年06月20日 13:29 | |
| PR 2293 | merged | vstinner, 2017年06月20日 13:43 | |
| PR 2295 | merged | vstinner, 2017年06月20日 13:47 | |
| PR 2296 | merged | vstinner, 2017年06月20日 13:49 | |
| PR 2294 | merged | vstinner, 2017年06月20日 13:54 | |
| Messages (20) | |||
|---|---|---|---|
| msg294667 - (view) | Author: Nam Nguyen (Nam.Nguyen) * | Date: 2017年05月29日 04:04 | |
Reported by Orange Tsai: ========== Hi, Python Security Team import urllib from urlparse import urlparse url = 'http://127.0.0.1#@evil.com/' print urlparse(url).netloc # 127.0.0.1 print urllib.urlopen(url).read() # will access evil.com I have tested on the latest version of Python 2.7.13. ========== |
|||
| msg294671 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2017年05月29日 05:06 | |
See also Issue 18140, where it looks like people _want_ the hash (#) to be part of the username and/or password. Another option may be to raise an exception. |
|||
| msg294694 - (view) | Author: Nam Nguyen (Nam.Nguyen) * | Date: 2017年05月29日 13:24 | |
I think the best behavior is to do what popular web browsers do. Chrome and Firefox, for example, parses this is host 127.0.0.1, path /, fragment #@evil.com. If the code does want to support username/password, it should do a custom opener (with basic HTTP authentication) instead. |
|||
| msg295318 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月07日 08:30 | |
> I think the best behavior is to do what popular web browsers do. Chrome and Firefox, for example, parses this is host 127.0.0.1, path /, fragment #@evil.com. I agree that in case of doubt, it's better to follow the implementation of most popular web browser which indirectly define the "standard". |
|||
| msg296422 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 13:23 | |
When porting the change to Python 3.4, I found this older change:
if _hostprog is None:
- _hostprog = re.compile('^//([^/?]*)(.*)$')
+ _hostprog = re.compile('//([^/?]*)(.*)', re.DOTALL)
match = _hostprog.match(url)
if match:
- host_port = match.group(1)
- path = match.group(2)
- if path and not path.startswith('/'):
+ host_port, path = match.groups()
+ if path and path[0] != '/':
path = '/' + path
return host_port, path
made by:
commit 44eceb6e2aca4e6d12ce64fa0f90279ffff19c25
Author: Serhiy Storchaka <storchaka@gmail.com>
Date: Tue Mar 3 20:21:35 2015 +0200
Issue #23563: Optimized utility functions in urllib.parse.
With this change, newlines are now accepted in URLs.
@Serhiy: Was it a deliberate change?
|
|||
| msg296426 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 13:33 | |
New changeset 4899d847ed3f56b2a712799f896aa1f28540a5c0 by Victor Stinner in branch '3.5': bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2290) https://github.com/python/cpython/commit/4899d847ed3f56b2a712799f896aa1f28540a5c0 |
|||
| msg296427 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 13:37 | |
New changeset 536c1f1246f4faa302f9f5613fc3444e7ae09b4a by Victor Stinner in branch '3.6': bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2289) https://github.com/python/cpython/commit/536c1f1246f4faa302f9f5613fc3444e7ae09b4a |
|||
| msg296430 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 14:09 | |
New changeset 82acabd3c52508d9e3f83a41fe7c684619cbbe7b by Victor Stinner in branch '3.6': bpo-30500: Fix the NEWS entry (#2296) https://github.com/python/cpython/commit/82acabd3c52508d9e3f83a41fe7c684619cbbe7b |
|||
| msg296431 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 14:09 | |
New changeset 410860662f53945cddf5886801c5a88a84801fec by Victor Stinner in branch '3.5': bpo-30500: Fix the NEWS entry (#2295) https://github.com/python/cpython/commit/410860662f53945cddf5886801c5a88a84801fec |
|||
| msg296432 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 14:09 | |
New changeset 8457706ee308a621103e9b9c760ca9da3cc4e7c0 by Victor Stinner in branch 'master': bpo-30500: Fix the NEWS entry (#2293) https://github.com/python/cpython/commit/8457706ee308a621103e9b9c760ca9da3cc4e7c0 |
|||
| msg296433 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月20日 14:11 | |
Oh, I didn't expected that newlines can be in a host name. In any case if newlines are a problem, it is better to check explicitly whether a host name contains CR, LF or other special characters. And it is better to do such checks when format a request rather than when parse an URL. |
|||
| msg296436 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 14:20 | |
New changeset d4324baca4c03eb8d55446cd1b74b32ec5633af5 by Victor Stinner in branch '2.7': bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2294) https://github.com/python/cpython/commit/d4324baca4c03eb8d55446cd1b74b32ec5633af5 |
|||
| msg296439 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 14:31 | |
> New changeset d4324baca4c03eb8d55446cd1b74b32ec5633af5 by Victor Stinner in branch '2.7': Oh, I was too fast. I wanted to see an agreement on DOTALL before merging this one. I missed that the 2.7 change also added DOTALL. I had to handle many branches (2.7, 3.3, 3.4, 3.5, 3.6, master), conflicts (especially in 2.7!), and there was a mistake in the NEWS entry (http://... => //...). Sorry, I missed the DOTALL in 2.7 :-p |
|||
| msg296442 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 14:39 | |
I tested my system python2 (Python 2.7.13 on Fedora 25): haypo@selma$ python2 Python 2.7.13 (default, May 10 2017, 20:04:28) >>> urllib.splithost('//hostname/url') ('hostname', '/url') >>> urllib.splithost('//host\nname/url') # newline in hostname, accepted ('host\nname', '/url') >>> print(urllib.splithost('//host\nname/url')[0]) # newline in hostname, accepted host name >>> urllib.splithost('//hostname/ur\nl') # newline in URL, rejected (None, '//hostname/ur\nl') => Newline is accepted in the hostname, but not in the URL path. With my change (adding DOTALL), newlines are accepted in the hostname and in the URL: haypo@selma$ ./python Python 2.7.13+ (heads/2.7:b39a748, Jun 19 2017, 18:07:19) >>> import urllib >>> urllib.splithost("//hostname/url") ('hostname', '/url') >>> urllib.splithost("//host\nname/url") # newline in hostname, accepted ('host\nname', '/url') >>> urllib.splithost("//hostname/ur\nl") # newline in URL, accepted ('hostname', '/ur\nl') More generally, it seems like the urllib module doesn't try to validate URL content. It just try to "split" them. Who is responsible to validate URLs? Is it the responsability of the application developer to implement a whitelist? |
|||
| msg296446 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月20日 14:57 | |
The urllib package consists of two parts: urllib.parse and urllib.request. I think urllib.request is responsible for making valid requests and validate arguments if needed. |
|||
| msg296448 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2017年06月20日 14:58 | |
Or more low-level modules used by urllib.request: http, ftplib, etc. |
|||
| msg296455 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年06月20日 15:26 | |
I created bpo-30713: "Reject newline character (U+000A) in URLs in urllib.parse", to discuss how to handle newlines in URLs. |
|||
| msg297939 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2017年07月08日 04:51 | |
New changeset b0fba8874a4bd6bf4773e6efdbd8fa762e9f05bd by Ned Deily (Victor Stinner) in branch '3.6': bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2289) https://github.com/python/cpython/commit/b0fba8874a4bd6bf4773e6efdbd8fa762e9f05bd |
|||
| msg298211 - (view) | Author: Larry Hastings (larry) * (Python committer) | Date: 2017年07月12日 12:51 | |
New changeset cc54c1c0d2d05fe7404ba64c53df4b1352ed2262 by larryhastings (Victor Stinner) in branch '3.4': bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2291) https://github.com/python/cpython/commit/cc54c1c0d2d05fe7404ba64c53df4b1352ed2262 |
|||
| msg299187 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2017年07月26日 02:43 | |
New changeset 052f9d6860c48c5abcff8e16212e77cf4249d66c by Ned Deily (Victor Stinner) in branch '3.3': [3.3] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) (#2292) https://github.com/python/cpython/commit/052f9d6860c48c5abcff8e16212e77cf4249d66c |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:47 | admin | set | github: 74685 |
| 2019年05月10日 17:59:50 | ned.deily | set | messages: - msg342098 |
| 2019年05月10日 17:36:39 | ned.deily | set | messages: + msg342098 |
| 2017年07月26日 02:48:01 | ned.deily | set | status: open -> closed assignee: vstinner resolution: fixed stage: resolved |
| 2017年07月26日 02:43:55 | ned.deily | set | messages: + msg299187 |
| 2017年07月12日 12:51:48 | larry | set | nosy:
+ larry messages: + msg298211 |
| 2017年07月08日 04:51:41 | ned.deily | set | nosy:
+ ned.deily messages: + msg297939 |
| 2017年06月20日 15:26:26 | vstinner | set | messages: + msg296455 |
| 2017年06月20日 14:58:59 | serhiy.storchaka | set | messages: + msg296448 |
| 2017年06月20日 14:57:28 | serhiy.storchaka | set | messages: + msg296446 |
| 2017年06月20日 14:39:54 | vstinner | set | messages: + msg296442 |
| 2017年06月20日 14:31:26 | vstinner | set | messages: + msg296439 |
| 2017年06月20日 14:20:38 | vstinner | set | messages: + msg296436 |
| 2017年06月20日 14:11:19 | serhiy.storchaka | set | messages: + msg296433 |
| 2017年06月20日 14:09:24 | vstinner | set | messages: + msg296432 |
| 2017年06月20日 14:09:08 | vstinner | set | messages: + msg296431 |
| 2017年06月20日 14:09:08 | vstinner | set | messages: + msg296430 |
| 2017年06月20日 13:54:25 | vstinner | set | pull_requests: + pull_request2344 |
| 2017年06月20日 13:50:10 | vstinner | set | versions: + Python 3.3, Python 3.4 |
| 2017年06月20日 13:49:27 | vstinner | set | pull_requests: + pull_request2343 |
| 2017年06月20日 13:47:56 | vstinner | set | pull_requests: + pull_request2342 |
| 2017年06月20日 13:43:58 | vstinner | set | pull_requests: + pull_request2341 |
| 2017年06月20日 13:37:26 | vstinner | set | messages: + msg296427 |
| 2017年06月20日 13:33:37 | vstinner | set | messages: + msg296426 |
| 2017年06月20日 13:29:57 | vstinner | set | pull_requests: + pull_request2340 |
| 2017年06月20日 13:23:25 | vstinner | set | nosy:
+ serhiy.storchaka messages: + msg296422 |
| 2017年06月20日 13:19:43 | vstinner | set | pull_requests: + pull_request2339 |
| 2017年06月20日 13:14:16 | vstinner | set | pull_requests: + pull_request2338 |
| 2017年06月20日 13:12:19 | vstinner | set | pull_requests: + pull_request2337 |
| 2017年06月17日 01:57:01 | martin.panter | link | issue18140 dependencies |
| 2017年06月07日 08:47:09 | vstinner | set | title: urllib connects to a wrong host -> [security] urllib connects to a wrong host |
| 2017年06月07日 08:30:25 | vstinner | set | nosy:
+ vstinner messages: + msg295318 |
| 2017年05月29日 13:24:14 | Nam.Nguyen | set | messages: + msg294694 |
| 2017年05月29日 09:58:56 | Nam.Nguyen | set | pull_requests: + pull_request1937 |
| 2017年05月29日 05:06:10 | martin.panter | set | nosy:
+ martin.panter messages: + msg294671 |
| 2017年05月29日 04:23:50 | Mariatta | set | versions: + Python 3.5, Python 3.6 |
| 2017年05月29日 04:04:12 | Nam.Nguyen | create | |