homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urljoin should raise a TypeError if URL is not a string
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, iritkatriel, jacobtylerwalls, jaraco, orsenthil, serhiy.storchaka, vajrasky
Priority: normal Keywords: patch

Created on 2013年09月25日 20:55 by jaraco, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
urljoin_throws_type_error.patch vajrasky, 2013年09月26日 10:12 review
urljoin_throws_type_error_v2.patch vajrasky, 2013年09月28日 09:02 review
urljoin_throws_type_error_v3.patch vajrasky, 2013年10月01日 14:11 review
Pull Requests
URL Status Linked Edit
PR 26687 open jacobtylerwalls, 2021年06月12日 01:47
Messages (12)
msg198418 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013年09月25日 20:55
>>> import urllib.parse
>>> urllib.parse.urljoin('foo', [])
'foo'
That should raise a TypeError, not succeed quietly as if an empty string were passed.
msg198435 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013年09月26日 10:12
This is the preliminary patch to raise TypeError in that situation.
msg198455 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013年09月26日 22:59
Thanks Vajraski for the patch (especially the tests).
A colleague reminded me of an aphorism by Raymond Hettinger from the recent PyCon paraphrased: duck typing is superior to isinstance.
Maybe instead consider something like this:
diff -r 7f13d5ecf71f Lib/urllib/parse.py
--- a/Lib/urllib/parse.py	Sun Sep 22 09:33:45 2013 -0400
+++ b/Lib/urllib/parse.py	Thu Sep 26 18:52:18 2013 -0400
@@ -405,10 +405,9 @@
 def urljoin(base, url, allow_fragments=True):
 """Join a base URL and a possibly relative URL to form an absolute
 interpretation of the latter."""
- if not base:
- return url
- if not url:
- return base
+ if not base or not url:
+ # one of the inputs is empty, so simply concatenate
+ return base + url
 base, url, _coerce_result = _coerce_args(base, url)
 bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
 urlparse(base, '', allow_fragments)
This implementation is not only shorter and raises TypeErrors if the types aren't compatible, but those errors are triggered by the underlying implementation. Furthermore, if either the url or base were to be a duck-type that met the necessary interface, it need not be a string subclass.
I don't feel strongly either way, but I'm slightly more inclined to accept the simpler implementation. I'm interested in the everyone's thoughts on either approach.
msg198462 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013年09月27日 05:21
What about this one?
>>> urljoin(['a'], [])
['a']
>>> urljoin(['a'], ['b'])
..... omitted ......
AttributeError: 'list' object has no attribute 'decode'
Is this desirable? Both patches missed this case. Should we only accept str and bytes?
msg198493 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013年09月27日 17:40
I don't mind the AttributeError - that's a reasonable exception when passing invalid types in, and that's in fact the current behavior. The example of (['a'], []) does bother me though. Those inputs are also seemingly invalid, though somewhat more compatible from a 'urljoin' perspective than two different types.
You've given me reason to re-consider your patch.
msg198510 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013年09月28日 09:02
I modified the patch to handle the last case using your way as well.
Anyway, I found out that urlsplit and urlparse got the same issue as well.
>>> urlparse('python.org', b'http://')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 292, in urlparse
 url, scheme, _coerce_result = _coerce_args(url, scheme)
 File "/home/sky/Code/python/programming_language/cpython/Lib/urllib/parse.py", line 109, in _coerce_args
 raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments
>>> urlparse('python.org', b'')
ParseResult(scheme=b'', netloc='', path='python.org', params='', query='', fragment='')
>>> urlparse('python.org', 0)
ParseResult(scheme=0, netloc='', path='python.org', params='', query='', fragment='')
Same thing happens in urlsplit. Fortunately, urlunsplit and urlunparse don't have this issue.
msg198750 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年10月01日 05:07
I have provided my comments in the review tool. Please check them out.
msg198784 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013年10月01日 14:11
Okay, attached the patch based on your comment, Senthil Kumaran. Thanks.
The reason I use isinstance to check the type is because I want to support the inheritance as much as possible.
>>> class new_str(str):
... pass
>>> urljoin(new_str('http://python.org') + new_str('hehe'))
will not work with the newest patch.
Also, in Lib/urllib/parse.py, most of the time, we use isinstance(x, str) not type(x) == str.
But I don't know. Maybe it does not matter.
msg198808 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013年10月01日 21:14
Vajrasky, you're right. Comparing against type(obj) is an anti-pattern. isinstance is better. Duck typing is even better (in many cases).
msg230742 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年11月06日 13:50
About acceptable behavior with wrong arguments types see discussions in issue22766 and http://comments.gmane.org/gmane.comp.python.devel/150073 .
msg386875 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021年02月12日 20:01
Still broken in 3.10:
Running Release|x64 interpreter...
Python 3.10.0a5+ (heads/master:bf2e7e55d7, Feb 11 2021, 23:09:25) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse
>>> urllib.parse.urljoin('foo', [])
'foo'
>>>
msg395598 - (view) Author: Jacob Walls (jacobtylerwalls) * Date: 2021年06月11日 03:57
Hi vajrasky, do you have any interest in converting your patch to a GitHub PR? If not I can see about doing so myself. Cheers.
History
Date User Action Args
2022年04月11日 14:57:51adminsetgithub: 63293
2021年06月12日 01:47:56jacobtylerwallssetpull_requests: + pull_request25276
2021年06月11日 10:49:11iritkatrielsetversions: + Python 3.11, - Python 3.8
2021年06月11日 03:57:18jacobtylerwallssetnosy: + jacobtylerwalls
messages: + msg395598
2021年02月12日 20:01:14iritkatrielsetnosy: + iritkatriel

messages: + msg386875
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.4, Python 3.5
2014年11月06日 13:50:51serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg230742
versions: + Python 3.5, - Python 3.3
2013年10月01日 21:14:06jaracosetmessages: + msg198808
2013年10月01日 14:11:39vajraskysetfiles: + urljoin_throws_type_error_v3.patch

messages: + msg198784
2013年10月01日 05:07:45orsenthilsetmessages: + msg198750
2013年09月28日 21:28:10terry.reedysetnosy: + orsenthil

type: behavior
stage: patch review
2013年09月28日 09:02:46vajraskysetfiles: + urljoin_throws_type_error_v2.patch

messages: + msg198510
2013年09月27日 17:40:02jaracosetmessages: + msg198493
2013年09月27日 05:21:49vajraskysetmessages: + msg198462
2013年09月26日 22:59:51jaracosetmessages: + msg198455
2013年09月26日 10:12:03vajraskysetfiles: + urljoin_throws_type_error.patch

nosy: + vajrasky
messages: + msg198435

keywords: + patch
2013年09月26日 08:36:24berker.peksagsetnosy: + berker.peksag
2013年09月25日 20:55:41jaracocreate

AltStyle によって変換されたページ (->オリジナル) /