homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [urllib.request]: Comparison of HTTP headers should be insensitive to the case
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: 18986 Superseder: Make urllib.request.Request.has_header() etc case-insensitive
View: 2275
Assigned To: orsenthil Nosy List: BreamoreBoy, cocobear, karlcow, martin.panter, orsenthil, yselivanov
Priority: normal Keywords: patch

Created on 2009年03月24日 04:48 by cocobear, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue-5550-test-1.patch karlcow, 2013年03月20日 15:03 review
issue-5550-2.patch karlcow, 2013年03月20日 20:38 review
issue5550-5.patch karlcow, 2014年09月25日 02:02 review
Messages (19)
msg84058 - (view) Author: cocobear (cocobear) Date: 2009年03月24日 04:48
take a look at following code:
<code>
import urllib2
headers = [("Content-Type","application/oct-stream"),]
opener = urllib2.build_opener()
opener.addheaders = headers
urllib2.install_opener(opener)
print "after install_opener"
ret = opener.open('http://www.google.com',data="word=ss")
print ret.read()
</code>
I got real send data by wireshark:
POST / HTTP/1.1
Accept-Encoding: identity
Content-Length: 7
Host: www.dict.cn
Content-Type: application/x-www-form-urlencoded
Connection: close
word=ss
I had already set HTTP header of Content-Type, but actally urllib2
change this header.
I got this piece of code from urllib2.py:
 if request.has_data(): # POST
 data = request.get_data()
 if not request.has_header('Content-type'):
 request.add_unredirected_header(
 'Content-type',
 'application/x-www-form-urlencoded')
 if not request.has_header('Content-length'):
 request.add_unredirected_header(
 'Content-length', '%d' % len(data))
 scheme, sel = splittype(request.get_selector())
 sel_host, sel_path = splithost(sel)
 if not request.has_header('Host'):
 request.add_unredirected_header('Host', sel_host or host)
 for name, value in self.parent.addheaders:
 name = name.capitalize()
 if not request.has_header(name):
 request.add_unredirected_header(name, value)
first,urllib2 will add Content-Type if it's POST method, then it try to
add headers which holded by opener, but there's already 'Content-Type',
so the attribute->addheaders of opener will not append.
I can use Request to add 'Content-Type' header:
 request = urllib2.Request(url,headers=headers,data=body)
The example code that I wrote doesn't correct? or it's a problem of urllib2?
msg184684 - (view) Author: karl (karlcow) * Date: 2013年03月19日 21:34
The wireshark trace is a different domain than the code example. But let's see. cocobear added:
headers = [("Content-Type","application/oct-stream"),]
with a "Content-Type", not the capitalized "Type".
BUT in the source code or urllib/request.py there is
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l1184
if not request.has_header('Content-type')
And if you check has_header at
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l367
it compares exactly the string it has received, when HTTP headers are case insensitive. It reminds me of
http://bugs.python.org/issue12455
http://bugs.python.org/issue17322
with capitalization issues.
I'm changing the title to be more exact.
msg184768 - (view) Author: karl (karlcow) * Date: 2013年03月20日 15:03
orsenthil,
Added test for python 3.3 for testing the case insensitivity for has_header and get_header. The tests fail as it should.
See issue-5550-test-1.patch
I will make a patch to urllib/request.py to make it work.
OK with you?
msg184790 - (view) Author: karl (karlcow) * Date: 2013年03月20日 20:38
First, Sanity check for myself to be sure to understand.
==============================================
→ python3.3
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> req = urllib.request.Request('http://www.python.org/')
>>> req.headers
{}
>>> req.unredirected_hdrs
{}
>>> r = urllib.request.urlopen(req)
>>> req.headers
{}
>>> req.unredirected_hdrs
{'User-agent': 'Python-urllib/3.3', 'Host': 'www.python.org'}
>>> req.header_items()
[('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')]
>>> req.has_header('host')
False
>>> req.has_header('Host')
True
>>> req.get_header('host')
>>> req.get_header('Host')
'www.python.org'
>>> 'host' in req.unredirected_hdrs
False
>>> 'Host' in req.unredirected_hdrs
True
>>> 'host' in req.header_items()
False
>>> 'Host' in req.header_items()
False
>>> req.unredirected_hdrs.items()
dict_items([('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')])
>>> req.headers.get('Host', req.unredirected_hdrs.get('Host',None))
'www.python.org'
>>> req.headers.get('Host')
>>> req.headers.get('host'.capitalize(), req.unredirected_hdrs.get('host'.capitalize(),None))
'www.python.org'
>>> req.headers.get('HOST'.capitalize(), req.unredirected_hdrs.get('HOST'.capitalize(),None))
'www.python.org'
>>> req.headers.get('host'.title(), req.unredirected_hdrs.get('host'.title(),None))
'www.python.org'
>>> 'host'.capitalize() in req.unredirected_hdrs
True
==============================================
OK. The two add methods force the capitalization thought capitalize() (And not title() which is an issue by itself)
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l359
 def add_header(self, key, val):
 # useful for something like authentication
 self.headers[key.capitalize()] = val
 def add_unredirected_header(self, key, val):
 # will not be added to a redirected request
 self.unredirected_hdrs[key.capitalize()] = val
HTTP headers are case insensitive. The way the methods get_header() and has_header() are currently designed. We could also capitalize the variable before requesting it.
So something like 
 def get_header(self, header_name, default=None):
 return self.headers.get(
 header_name.capitalize(),
 self.unredirected_hdrs.get(header_name.capitalize(), default))
 def has_header(self, header_name):
 return (header_name.capitalize() in self.headers or
 header_name.capitalize() in self.unredirected_hdrs)
The method to add headers on request is 
>>> req.add_header("foo-bar","booh")
>>> req.headers
{'Foo-bar': 'booh'}
>>> req.unredirected_hdrs
{'User-agent': 'Python-urllib/3.3', 'Host': 'www.python.org'}
>>> req.header_items()
[('Foo-bar', 'booh'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')]
So if someone add an header it will be capitalized. And the query will be correct.
The issue is more with addheader which doesn't have the same constraint.
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l1624
Personally I would have made everything case insensitive ;)
Also note that in this module the casing is not consistent when the values are hardcoded. Sometimes Content-Type, sometimes Content-type.
Anyway I submitted a patch with the code modification AND the test. And this is the result when running the test suite.
→ ./python.exe Lib/test/test_urllib2net.py 
test_sni (__main__.HTTPSTests) ... skipped 'test disabled - test server needed'
test_custom_headers (__main__.OtherNetworkTests) ... ok
test_file (__main__.OtherNetworkTests) ... ok
test_ftp (__main__.OtherNetworkTests) ... ok
test_headers_case_sensitivity (__main__.OtherNetworkTests) ... ok
test_sites_no_connection_close (__main__.OtherNetworkTests) ... /Users/karl/Documents/2011/cpython/Lib/socket.py:370: ResourceWarning: unclosed <socket.socket object, fd=5, family=2, type=1, proto=6>
 self._sock = None
ok
test_urlwithfrag (__main__.OtherNetworkTests) ... ok
test_close (__main__.CloseSocketTest) ... ok
test_ftp_basic (__main__.TimeoutTest) ... ok
test_ftp_default_timeout (__main__.TimeoutTest) ... ok
test_ftp_no_timeout (__main__.TimeoutTest) ... ok
test_ftp_timeout (__main__.TimeoutTest) ... ok
test_http_basic (__main__.TimeoutTest) ... ok
test_http_default_timeout (__main__.TimeoutTest) ... ok
test_http_no_timeout (__main__.TimeoutTest) ... ok
test_http_timeout (__main__.TimeoutTest) ... ok
----------------------------------------------------------------------
Ran 16 tests in 15.259s
OK (skipped=1)
[137983 refs]
msg184799 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年03月20日 21:25
Karl - a patch is welcome. BTW, I wanted to search for an already existing and I think open issue with a patch ( probably written by me) somewhere. We have to see what's the diff of that issue with new one. I will share that issue in this id. But yeah, this is a good discussion and +1 for this fix.
msg185402 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年03月28日 06:02
Case insensitive lookup will be more fool-proof than capitalized look up.
I would in favor of going with case insensitive look up than the capitalized look up one. Because the headers could be added in multiple ways and we cannot be assured that they are all stored as Capitalized header values.
msg185441 - (view) Author: karl (karlcow) * Date: 2013年03月28日 12:24
Yes case insensitive make sense. It seems it will require a lot of modifications in the code. So I guess maybe the first step is to identify where it could break.
msg220885 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014年06月17日 20:45
@Karl do you intend following up on this issue?
msg221401 - (view) Author: karl (karlcow) * Date: 2014年06月24日 06:46
@Mark,
yup, I can do that.
I just realized that since my contribution there was a PSF Contributor Agreement. This is signed.
I need to dive a bit again in the code to remember where things fail.
msg227348 - (view) Author: karl (karlcow) * Date: 2014年09月23日 12:06
Ok this is an attempt at solving the issue with lowercase. I find my get_header a bit complicated, but if you have a better idea. :) I'll modify the patches.
I have try to run the tests on the mac here but I have an issue currently.
→ ./python.exe -V
Python 3.5.0a0
Traceback (most recent call last):
 File "./Tools/scripts/patchcheck.py", line 6, in <module>
 import subprocess
 File "/Users/karl/code/cpython/Lib/subprocess.py", line 353, in <module>
 import signal
ImportError: No module named 'signal'
make: *** [patchcheck] Error 1
msg227353 - (view) Author: karl (karlcow) * Date: 2014年09月23日 12:47
And I had to do a typo in patch3. Submitting patch4. Sorry about that.
msg227411 - (view) Author: karl (karlcow) * Date: 2014年09月24日 07:26
Ok my tests are ok.
→ ./python.exe -m unittest -v Lib/test/test_urllib2net.py 
test_close (Lib.test.test_urllib2net.CloseSocketTest) ... ok
test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... FAIL
test_file (Lib.test.test_urllib2net.OtherNetworkTests) ... test_ftp (Lib.test.test_urllib2net.OtherNetworkTests) ... skipped "Resource 'ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/00README-Legal-Rules-Regs' is not available"
test_headers_case_sensitivity (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_redirect_url_withfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_sites_no_connection_close (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_urlwithfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_ftp_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_ftp_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_ftp_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_ftp_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
I wonder if test_custom_headers fails because of my modifications.
msg227418 - (view) Author: karl (karlcow) * Date: 2014年09月24日 07:52
Opened issue #22478 for the tests failing. Not related to my modification.
msg227501 - (view) Author: karl (karlcow) * Date: 2014年09月25日 02:02
OK after fixing my repo (Thanks orsenthil) I got the tests running properly. The inspection order of the two dictionary was not right, so I had to modify a bit the patch.
→ ./python.exe -m unittest -v 
Lib.test.test_urllib2net.OtherNetworkTests.test_headers_case_sensitivity
test_headers_case_sensitivity (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.286s
OK
→ ./python.exe -m unittest -v 
Lib.test.test_urllib2net.OtherNetworkTests.test_custom_headers
test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
----------------------------------------------------------------------
Ran 1 test in 0.575s
OK
New patch issue5550-5.patch
unlinking issue5550-4.patch
msg227646 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014年09月26日 20:31
I left some comments in the codereview.
I think that having some half-baked solution is great, but I really would like to see a proper fix, i.e. with remove_header and other methods fixed. Ideally, you should add a UserDict implementation for headers that implements proper __XXXitem__ methods.
msg227648 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014年09月26日 20:35
Ideally, we should just wait when PEP 455 lands, so we can use TransformDict for headers.
Also, I don't think we can land a fix for this in any pythons out there, I would focus on making this right in 3.5
msg239651 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年03月31日 00:58
According to <https://bugs.python.org/issue18986#msg236161> that PEP isn’t going to be accepted, although no reason seems to have been given. So I wouldn’t hold back because of that.
msg254050 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年11月04日 10:57
Regarding the original bug reported by Cocobear, I think this is by design. The "addheaders" attribute seems to serve as a set of default header fields, although this is hardly obvious from the documentation. In fact the current documentation suggests you have to use the application/x-www-form-urlencoded content type for request data. I think the documentation should be relaxed; I opened Issue 23360 about that.
The rest of the thread seems to have side-tracked into patches to make Request.has_header() and friends case insensitive. I’ll have to study the threads closer, but I think this side-track is a duplicate of Issue 2275.
msg254085 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年11月05日 06:50
The patch here does the same thing as John’s issue2775.patch in Issue 2275. I am closing this thread, because I think John’s patch in the other one is superior (except for being way out of date).
History
Date User Action Args
2022年04月11日 14:56:46adminsetgithub: 49800
2015年11月05日 06:51:52martin.pantersetsuperseder: Implement PEP 3108 -> Make urllib.request.Request.has_header() etc case-insensitive
2015年11月05日 06:50:43martin.pantersetstatus: open -> closed
superseder: Implement PEP 3108
resolution: duplicate
messages: + msg254085
2015年11月04日 10:57:34martin.pantersetmessages: + msg254050
2015年03月31日 00:58:26martin.pantersetnosy: + martin.panter
dependencies: + Add a case-insensitive case-preserving dict
messages: + msg239651
2014年09月26日 20:35:44yselivanovsetmessages: + msg227648
2014年09月26日 20:31:13yselivanovsetnosy: + yselivanov
messages: + msg227646
2014年09月25日 02:02:28karlcowsetfiles: - issue-5550-4.patch
2014年09月25日 02:02:14karlcowsetfiles: + issue5550-5.patch

messages: + msg227501
2014年09月24日 07:52:18karlcowsetmessages: + msg227418
2014年09月24日 07:26:54karlcowsetmessages: + msg227411
2014年09月23日 12:47:41karlcowsetfiles: + issue-5550-4.patch

messages: + msg227353
2014年09月23日 12:46:53karlcowsetfiles: - issue-5550-3.patch
2014年09月23日 12:06:37karlcowsetfiles: + issue-5550-3.patch

messages: + msg227348
2014年06月24日 06:46:10karlcowsetmessages: + msg221401
2014年06月17日 20:45:38BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220885
2013年03月28日 12:24:05karlcowsetmessages: + msg185441
2013年03月28日 06:02:45orsenthilsetmessages: + msg185402
2013年03月20日 21:25:53orsenthilsetmessages: + msg184799
2013年03月20日 20:38:47karlcowsetfiles: + issue-5550-2.patch

messages: + msg184790
2013年03月20日 15:03:02karlcowsetfiles: + issue-5550-test-1.patch
keywords: + patch
messages: + msg184768
2013年03月20日 01:17:13orsenthilsetassignee: orsenthil
2013年03月19日 21:34:56karlcowsetnosy: + karlcow

messages: + msg184684
title: urllib2 use of opener.addheaders -> [urllib.request]: Comparison of HTTP headers should be insensitive to the case
2012年11月09日 13:26:39ezio.melottisetversions: + Python 3.3, Python 3.4, - Python 3.1
2010年07月09日 17:14:13BreamoreBoysetnosy: + orsenthil
stage: test needed

versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.5
2009年03月24日 04:48:59cocobearcreate

AltStyle によって変換されたページ (->オリジナル) /