homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.parse.urlopen shouldn't ignore installed opener when called with any SSL argument
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Alecz, David Ford (FirefighterBlu3), ber, martin.panter, moritzs, r.david.murray
Priority: normal Keywords: patch

Created on 2013年07月24日 12:26 by moritzs, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
urlopen-explained.py Alecz, 2014年07月24日 20:46
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015年07月02日 20:29 patch is obsolete
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015年07月02日 23:40 patch is obsolete
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015年07月03日 04:32 patch is obsolete
ssl_context_tests.py David Ford (FirefighterBlu3), 2015年07月03日 04:34
urllib.request.py-do-not-overwrite-existing-opener.diff David Ford (FirefighterBlu3), 2015年07月04日 17:59
Messages (12)
msg193640 - (view) Author: Moritz Sichert (moritzs) * Date: 2013年07月24日 12:26
If you pass any of cafile, capath or cadefault to urllib.parse.urlopen it creates a new opener that contains the HTTPSHandler that was according to the ca* arguments. It then uses this new opener to execute the request.
If you installed a custom opener with urllib.parse.install_opener it won't be used.
urlopen should either add a HTTPSHandler to the installed opener or not accept any Handler specific arguments at all.
msg223894 - (view) Author: Alecz (Alecz) Date: 2014年07月24日 20:46
I just want to point out that the documentation states that an opener can be used with urlopen:
 urllib.request.install_opener(opener)
 Install an OpenerDirector instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; 
Reference:
https://docs.python.org/3/library/urllib.request.html?highlight=urllib.request#urllib.request.install_opener
Issue 17483 was closed (rejected) because it was considered that a custom opener can be used when opening https links.
msg246096 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年07月02日 20:29
the attached diff (for py3) fixes the (badly) broken urlopen :}
previously, if SSL args were detected, all installed handlers via _opener got wiped out. now, we check if _opener already exists and if so we just make sure the HTTPSHandler is added thus preserving installed handlers.
msg246105 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年07月02日 23:40
Unfortunately more breakage exists within urllib.request. A context supplied to urlopen() is useless in the following pseudo code:
build_some_openers()
context = ssl.foo()
urlopen('foo.com', context=context)
<test against foo.com -- foo.com ssl setup is munged with non-verify, out of date, or something that doesn't make happy with a default context>
When urlopen() runs, it does indeed (with my earlier patch) add another HTTPSHandler(context). However, the default added HTTPSHandler is called first in the chain (see the bisect.insort) and will cause the urlopen attempt to fail if the SSL connection does not work with a default or void context.
The end-user specified context will never be reached whether they attempt to install their own HTTPSHandler or not since the default installed HTTPSHandler will raise an exception.
Therefore, I've attached another patch to urllib.request which ensures that (a) existing opener chain is not discarded and (b) a default opener chain is not made with an HTTPSHandler in it, only adding the HTTPSHandler at urlopen() time if 'https' is found in the URL.
msg246108 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年07月03日 00:30
See also issue 23166.
msg246120 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年07月03日 01:51
perhaps an HTTPSHandler() should only merged into the handler chain if an https URI is found and no existing handler is found that has an https_open() defined
msg246126 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年07月03日 04:32
Third version of this patch and a short test suite specifically for this problem.
per awareness of :issue:`23166` I rewrote my patch to handle subclassed HTTPS handlers. There are also exceptions raised if an attempt to install more than one HTTPS handler is done. HTTPS does not play well with multiple handlers trying to process data and will immediately fail on a second attempt to negotiate a handshake on an established socket.
Additionally, if (a) an HTTPSHandler is already in the chain and (b) doesn't have an SSL context, then we create an appropriate context and attach it to the existing handler
msg246127 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年07月03日 04:34
Test cases for SSL _opener, contexts and HTTPSHandlers in this file
msg246268 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年07月04日 17:59
In my quest for completeness, I discovered a lack of handling given HTTP->HTTPS redirect. So I've attached another version of this patch which ensures an HTTPS handler is installed if such a redirect is found.
msg257133 - (view) Author: David Ford (FirefighterBlu3) (David Ford (FirefighterBlu3)) * Date: 2015年12月28日 22:07
can we get a review on this please? i'd love to move forward and get the fixes in place.
msg257282 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016年01月01日 08:22
I closed Issue 18543 as a duplicate, which points out this also affects Python 2.7.9+, and can also be triggered by setting the "context" parameter.
Overall, I am not comfortable with the complication and hacks in the current patch. Maybe it would be simpler to not fix the bug, and just document that you cannot use the SSL context parameters with a custom opener. If necessary, we could add a new API to pass the context to the handler properly, as suggested by Benjamin in <https://bugs.python.org/issue23166#msg233484>, but that might have to be in 3.6+ only.
Some specific comments on the patch:
The url[:5] check should probably check for a colon as well.
What’s the problem with multiple HTTPS handlers? Is it a side effect of your patch, or did it already exist? Is it only a problem for urlopen(), or also for calling open() directly?
You can use any() instead of building a list and testing if it is empty.
I think you can handle https: URLs without defining https_open(), for instance I have a handler that defines default_open() and checks the request manually. Does your patch account for that?
The patch seems to modify the default installed opener with the supplied SSL parameters. Also I wonder if it would break any multithreaded usage that currently works.
It would be good to have test cases integrated into the existing suite.
msg309761 - (view) Author: Bernhard Reiter (ber) (Python committer) Date: 2018年01月10日 09:51
Yesterday I ran into the same problem and I agree that it should be solved, by either documenting the issue or fixing it.
As the code in https://github.com/python/cpython/blob/master/Lib/urllib/request.py#L199
still shows the branch and
https://docs.python.org/3.7/library/urllib.request.html
today does not hint upon it, I'm adding the Python 3.7 Version indicator.
In 3.7 (dev) docs context is indicated as preferred so the situation may be a little bit better, but still not resolved.
Maybe a fix should change OpenerDirector() to be able to replace or remove a handler.
History
Date User Action Args
2022年04月11日 14:57:48adminsetgithub: 62743
2018年01月10日 09:51:33bersetnosy: + ber

messages: + msg309761
versions: + Python 3.7
2017年01月27日 00:21:38martin.panterlinkissue29379 superseder
2017年01月27日 00:14:31martin.panterlinkissue15769 superseder
2016年01月01日 08:22:55martin.pantersetnosy: + martin.panter
title: urllib.parse.urlopen shouldn't ignore installed opener when called with any ca* argument -> urllib.parse.urlopen shouldn't ignore installed opener when called with any SSL argument
messages: + msg257282

versions: + Python 2.7
stage: patch review
2016年01月01日 08:14:34martin.panterlinkissue23166 superseder
2015年12月28日 22:07:14David Ford (FirefighterBlu3)setmessages: + msg257133
versions: + Python 3.5, Python 3.6
2015年07月04日 17:59:36David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

messages: + msg246268
2015年07月03日 04:34:23David Ford (FirefighterBlu3)setfiles: + ssl_context_tests.py

messages: + msg246127
2015年07月03日 04:32:31David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

messages: + msg246126
2015年07月03日 01:51:22David Ford (FirefighterBlu3)setmessages: + msg246120
2015年07月03日 00:30:22r.david.murraysetmessages: + msg246108
2015年07月02日 23:40:17David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

messages: + msg246105
2015年07月02日 20:30:00David Ford (FirefighterBlu3)setfiles: + urllib.request.py-do-not-overwrite-existing-opener.diff

nosy: + David Ford (FirefighterBlu3)
messages: + msg246096

keywords: + patch
2014年07月24日 20:46:47Aleczsetfiles: + urlopen-explained.py
nosy: + Alecz
messages: + msg223894

2013年07月24日 15:11:36r.david.murraysetnosy: + r.david.murray
2013年07月24日 12:26:58moritzscreate

AltStyle によって変換されたページ (->オリジナル) /