homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib.request add_header() currently allows trailing spaces (and other weird stuff)
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: karlcow, martin.panter, orsenthil, piotr.dobrogost, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013年02月28日 21:27 by karlcow, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
issue-17322-1.patch karlcow, 2013年03月03日 03:08 review
request_header_space_removal.patch r.david.murray, 2013年03月03日 15:42
Messages (22)
msg183234 - (view) Author: karl (karlcow) * Date: 2013年02月28日 21:27
For HTTP header field names parsing, see http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-22#section-3.2.4
 No whitespace is allowed between the header field-name and colon. In
 the past, differences in the handling of such whitespace have led to
 security vulnerabilities in request routing and response handling. A
 server MUST reject any received request message that contains
 whitespace between a header field-name and colon with a response code
 of 400 (Bad Request). A proxy MUST remove any such whitespace from a
 response message before forwarding the message downstream.
In python3.3 currently
 
>>> import urllib.request
>>> req = urllib.request.Request('http://www.example.com/')
>>> req.add_header('FoO ', 'Yeah')
>>> req.header_items()
[('Foo ', 'Yeah'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.example.com')]
The space has not been removed. So we should fix that at least. This is a bug. I'm not familiar with the specific security issues mentioned in the spec. 
Note that many things can be done too: :/
>>> req.add_header('FoO \n blah', 'Yeah')
>>> req.add_header('Foo:Bar\nFoo2', 'Yeah')
>>> req.header_items()
[('Foo:bar\nfoo2', 'Yeah'), ('Foo \n blah', 'Yeah'), ('Foo ', 'Yeah'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.example.com')]
I will check for making a patch tomorrow.
msg183236 - (view) Author: karl (karlcow) * Date: 2013年02月28日 21:46
Yet another one leading spaces :(
>>> req = urllib.request.Request('http://www.example.com/')
>>> req.header_items()
[]
>>> req.add_header(' Foo3', 'Ooops')
>>> req.header_items()
[(' foo3', 'Ooops')]
>>> req.headers
{' foo3': 'Ooops'}
msg183322 - (view) Author: karl (karlcow) * Date: 2013年03月02日 14:21
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l359
 def add_header(self, key, val):
 # useful for something like authentication
 self.headers[key.capitalize()] = val
and http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l271
in __init__ of class Request:
 for key, value in headers.items():
 self.add_header(key, value)
Tests seem to be there, but there are none. Is there a reason why there are no tests?
http://hg.python.org/cpython/file/3.3/Lib/test/test_urllib2.py#l98
Or should I add a test in
http://hg.python.org/cpython/file/3.3/Lib/test/test_urllib.py
I'm confused :) 
I need guidance.
msg183357 - (view) Author: karl (karlcow) * Date: 2013年03月03日 03:08
I created 4 tests for testing trailing and leading spaces on
* add_unredirected_header()
* add_header()
and modified the functions.
Tests passed.
→ ./python.exe Lib/test/test_urllib2net.py 
[...]
test_headers_with_spaces (__main__.OtherNetworkTests) ... ok
[...]
----------------------------------------------------------------------
Ran 16 tests in 17.148s
OK (skipped=1)
[137988 refs]
See the patch issue-17322-1.patch
msg183382 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月03日 14:48
Given that this is an RFC violation it looks appropriate as a bug fix for all active versions to me. The patch looks good, though I might decide to break up the test.
msg183383 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月03日 15:20
Ah, but that's a draft and not approved. Hmm. There is a possibility this change would break code, if the code tries to look up a header by the same key it used to set it. On the other hand, the key use for the set is already modified by the existing code...difficult call.
Having looked in more detail at the test, I think it actually belongs in test_urllib2.
msg183384 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月03日 15:42
Here is a modified patch with the tests moved to test_urllib2. I'll give people some time to comment on whether this should be applied at all, and if so if it should be backported. I'm leaning toward doing both, at the moment.
Karl, thanks for the report and patch. The urllib tests could use a bit of reorganization to make them more discoverable, but I don't know that that is very likely to happen :)
The case of doing an add_header with newlines in it is probably worth a separate issue. We fixed a similar issue in the email package a while back, but that one was much more likely to be an issue since it is much more likely for a program to be adding headers to an email message based on user input than it is to be adding headers to a Request based on user input. But I'm sure it does happen, so it is probably worth fixing.
msg183385 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年03月03日 16:29
David & Karl - 
I had been thinking on this. My understanding of the RFC implies that "server" should reject when the headers contain the whitespace and I had a little concern if the client library should feel free to cleanup a wrongly set headers? Would it be a good idea to see what curl is doing?
msg183393 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月03日 18:15
Good point. Seeing what curl does sounds like a good idea.
msg183397 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年03月03日 18:57
Looks like curl is sending the headers without removing spaces. 
Check Raw here link here. (The link would probably be online a week)
http://requestb.in/1kfodmj1?inspect
$ curl --header "X-MyHeader : 123" http://requestb.in/1kfodmj1
ok
$ curl --header " X-MyHeader : 123"http://requestb.in/1kfodmj1
ok
$ curl --header " X-MyHeader\nY-Header : 123" http://requestb.in/1kfodmj1
ok
msg183398 - (view) Author: karl (karlcow) * Date: 2013年03月03日 19:14
Hello,
So I tested a bit. The production rules defined by the specification are clear. Spaces before and after are forbidden. 
 header-field = field-name ":" OWS field-value BWS
 field-name = token
 field-value = *( field-content / obs-fold )
 field-content = *( HTAB / SP / VCHAR / obs-text )
 obs-fold = CRLF ( SP / HTAB )
 ; obsolete line folding
 ; see Section 3.2.4
and 
 token = 1*tchar
and tchar as 
 tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
 "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
Here are the production rules for HTTP headers for messages (so both Request and Responses). 
You can have funky headers, I guess that would be interesting to add to the urllib tests too. Basically to have something in the library, which check if header contains the tchar characters and sends back a warning of exception when not part of it.
curl has a bug too, IMHO. Though, one might argue that it is practical for testing bugs. :)
On the side of parsing it's clear for the trailing space but unknown for the leading spaces. I sent a long email explaining the issue to the HTTP WG.
See http://lists.w3.org/Archives/Public/ietf-http-wg/2013JanMar/1166.html
Let's see what will be the answers
msg183399 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年03月03日 19:18
Oh wow. Thank you very much Karl for the care. I am having the same
inclination are you too, but determining a definite answer is really
helpful before going ahead into making the change.
msg183400 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013年03月03日 19:29
The curl example also suggest to think about "pragamatic de-facto"
stuff. Will removing the spaces cause any breakage? I can say for
sure. But if someone can think of it, it would be good for at least us
know.
msg183410 - (view) Author: karl (karlcow) * Date: 2013年03月03日 23:08
OK. I'm inclined to think that we should both remove trailing and leading spaces/tabs should be removed.
Reasons: 
1. Production rules forbid them.
2. Trailing spaces 
 2.a Conformant servers will ignore with a 400 bad request (opportunity for another bugs?)
 2.b Conformant proxies will rewrite the header by removing spaces.
3. Leading spaces are continuation lines. See below.
I had completely missed that. The syntax for headers is:
 header-field = field-name ":" OWS field-value BWS
 field-name = token
 field-value = *( field-content / obs-fold )
 field-content = *( HTAB / SP / VCHAR / obs-text )
 obs-fold = CRLF ( SP / HTAB )
 ; obsolete line folding
 ; see Section 3.2.4
obs-fold is about line folding which is basically a header that would look like:
foo: bar\n
 blah: something
which could be the equivalent of:
foo: bar blah: something
with "foo" the header field-name and "bar blah: something" the header field-value.
which is clearly not the intent of an author doing:
request.add_header('Accept', 'text/html')
request.add_header(' User-Agent', 'foobar/1.0')
because this would be parsed by a conformant server as
Accept: text/html User-Agent: foobar/1.0
msg183412 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月04日 01:34
It is a bug in the program, though, and not particularly in the client library. As mentioned, it can even be useful for testing servers.
In the email package we faithfully reproduce such headers if they are passed in. The only one we disallow is something that looks like a field label preceded by a newline, since that could be used for a header injection attack if the field value comes from user input.
msg183415 - (view) Author: karl (karlcow) * Date: 2013年03月04日 03:50
R. David Murray,
You are right it is not specific to the client library. HTTP headers are part of the message (Request/Response) with both the same constraints. Constraints are put on receivers (receiving a message) and senders (sending a message) of the message (which is not specifically related to client or server).
Maybe the way forward in the future is to have a header factory shared by all HTTP libs? I noticed that http.client and http.server had similar issues: 
in http.server
 send_header
in http.client
 putheader
Which are similar features aka constructing headers for sending with the message. 
And what would be the elegant way to solve this current bug?
Ah... before I forget... The WG is having a meeting in 2 weeks. To make a summary of the HTTPBIS work. See the agenda. 
http://tools.ietf.org/wg/httpbis/agenda?item=agenda-86-httpbis.html
The current documents are in Last Call with no issues unresolved.
http://trac.tools.ietf.org/wg/httpbis/trac/report/20
So if R. David is worried that it will change, we can wait a bit more before taking actions, if we are going the way of removing leading/trailing spaces.
msg183451 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月04日 13:40
A crazy idea that occurred to me was to create an "rfc822-style-header management" module, and share it between email, http, and urllib. We'd probably break too many things backward-compatibility wise if we did that, but I still think it is an interesting thought :)
On the other hand, having thought about this particular issue some more: in the email package we have the constraint of needing to be able to exactly reproduce the input data, whereas in the http world that constraint does not apply. So in the http world, given that headers are *already* being transformed by the code in question (title casing), it seems reasonable that blank stripping also be done, even just from an API standpoint.
Really I guess the only question is how likely this is to break existing code. I'm pretty sure it is small enough that doing this in 3.4 would be fine, but I don't know how to estimate if it is small enough to also change it in maintenance releases. Since this particular bit is a new standard, maybe we just go with 3.4?
msg183459 - (view) Author: karl (karlcow) * Date: 2013年03月04日 14:48
R. David.:
> A crazy idea that occurred to me was to create an "rfc822-style-header management" module, and share it between email, http, and urllib.
Yes it is basically what I had in mind when I said:
>Maybe the way forward in the future is to have a header factory shared by all HTTP libs?
I'm not sure if it's easy to share in between emails and HTTP. Emails are now defined by RFC5322:
https://tools.ietf.org/html/rfc5322#section-2.2
2.2. Header Fields
 Header fields are lines beginning with a field name, followed by a
 colon (":"), followed by a field body, and terminated by CRLF. A
 field name MUST be composed of printable US-ASCII characters (i.e.,
 characters that have values between 33 and 126, inclusive), except
 colon. A field body may be composed of printable US-ASCII characters
 as well as the space (SP, ASCII value 32) and horizontal tab (HTAB,
 ASCII value 9) characters (together known as the white space
 characters, WSP). A field body MUST NOT include CR and LF except
 when used in "folding" and "unfolding", as described in section
 2.2.3. All field bodies MUST conform to the syntax described in
 sections 3 and 4 of this specification.
Maybe it's doable and worth exploring but I have the feeling we would end up with something along
field-name ":" field-value
and all the rules for field-name and field-value being different for HTTP and email, because they really are. Folding, set of characters, etc. :)
Another thing which should be also the opportunity for opening another issue. HTTP headers and python dictionaries is not a very good match. :) But this is drifting off-topic. :)
msg183467 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年03月04日 15:58
Aren't the folding rules are the same? The character set rules are different, I think, but the email package is going to be flexible in that regard.
The email package also uses a data structure that is not a python dictionary (it is actually a list with an extra dict-like interface), and its features may well be useful for handling http headers.
But you are right, we are wandering off topic :)
msg227361 - (view) Author: karl (karlcow) * Date: 2014年09月23日 14:34
Just a follow up for giving the stable version of the now new RFC version for HTTP 1.1
HTTP header field names parsing
http://tools.ietf.org/html/rfc7230#section-3.2.4 
msg236425 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年02月23日 04:23
See Issue 22928 for a patch making the "http.client" module even more stricter with header field names and values.
msg237920 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月12日 09:25
Trailing spaces, newlines and like were disabled in put_header() in issue22928. Now we can start long-standing deprecation process for headers that don't conform RFC 7230.
History
Date User Action Args
2022年04月11日 14:57:42adminsetgithub: 61524
2015年03月12日 09:25:59serhiy.storchakasetversions: + Python 3.5, - Python 2.7, Python 3.2, Python 3.3, Python 3.4
nosy: + serhiy.storchaka

messages: + msg237920

stage: commit review -> needs patch
2015年02月23日 04:23:27martin.pantersetnosy: + martin.panter
messages: + msg236425
2014年09月23日 14:34:06karlcowsetmessages: + msg227361
2013年03月06日 20:10:31piotr.dobrogostsetnosy: + piotr.dobrogost
2013年03月04日 15:58:53r.david.murraysetmessages: + msg183467
2013年03月04日 14:48:02karlcowsetmessages: + msg183459
2013年03月04日 13:40:03r.david.murraysetmessages: + msg183451
2013年03月04日 03:50:35karlcowsetmessages: + msg183415
2013年03月04日 01:34:03r.david.murraysetmessages: + msg183412
2013年03月03日 23:08:09karlcowsetmessages: + msg183410
2013年03月03日 19:29:32orsenthilsetmessages: + msg183400
2013年03月03日 19:18:19orsenthilsetmessages: + msg183399
2013年03月03日 19:14:28karlcowsetmessages: + msg183398
2013年03月03日 18:57:55orsenthilsetmessages: + msg183397
2013年03月03日 18:15:11r.david.murraysetmessages: + msg183393
2013年03月03日 16:29:23orsenthilsetmessages: + msg183385
2013年03月03日 15:42:42r.david.murraysetfiles: + request_header_space_removal.patch

messages: + msg183384
2013年03月03日 15:20:24r.david.murraysetmessages: + msg183383
2013年03月03日 14:48:37r.david.murraysetversions: + Python 2.7, Python 3.2, Python 3.4
nosy: + r.david.murray

messages: + msg183382

type: behavior
stage: commit review
2013年03月03日 03:08:08karlcowsetfiles: + issue-17322-1.patch
keywords: + patch
messages: + msg183357
2013年03月02日 14:21:14karlcowsetmessages: + msg183322
2013年02月28日 21:46:38karlcowsetmessages: + msg183236
2013年02月28日 21:27:54karlcowsettitle: urllib.request add_header() currently allows trailing spaces -> urllib.request add_header() currently allows trailing spaces (and other weird stuff)
2013年02月28日 21:27:14karlcowcreate

AltStyle によって変換されたページ (->オリジナル) /