homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2 requests history + HEAD support
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: ajaksu2, brian.curtin, denversc, dstanek, eric.araujo, ezio.melotti, ipatrol, jackdied, jjlee, jorend, koder_ua, orsenthil, pitrou, poke, python-dev, santoso.wijaya
Priority: normal Keywords: easy, needs review, patch

Created on 2007年03月03日 14:01 by koder_ua, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
http_history_and_rtype_svn.diff koder_ua, 2007年03月07日 09:41 review
1673007_methods.patch dstanek, 2010年08月04日 03:55
issue1673007_urllib_Request_method_v1.patch denversc, 2011年03月27日 19:41 patch in cpython tree at changeset c1787fa6a3d3 to add "method" parameter to Request constructor review
issue1673007.diff ezio.melotti, 2011年10月10日 11:20 Updated patch review
urllib.request-cleanup.diff eric.araujo, 2011年10月17日 13:07 review
urllib.request-doc.diff eric.araujo, 2011年10月18日 16:42 review
Messages (26)
msg52030 - (view) Author: KDanilov aka koder (koder_ua) Date: 2007年03月03日 14:01
1)Add history off all sent and received headers/requests 
to addinfourl object. Save redirections history too.
>>> fd = urllib2.urlopen("http://www.python.org/")
>>> print fd.history[0].request_line
GET / HTTP/1.1
>>> print fd.history[0].sended
[('Accept-Encoding', 'identity'), ('Host', 'www.python.org'), ('Connection', 'cl
ose'), ('User-Agent', 'Python-urllib/2.6')]
2)Add support for HEAD (and other) requests:
>>> fd = urllib2.urlopen("http://www.python.org/",
 request_cmd = "HEAD")
>>> print len(fd.read())
0
Please send email here:
koder_dot_mail_at_gmail_dot_com
msg52031 - (view) Author: KDanilov aka koder (koder_ua) Date: 2007年03月07日 09:41
Make diff file with svn and delete previos
File Added: http_history_and_rtype_svn.diff
msg52032 - (view) Author: Jason Orendorff (jorend) Date: 2007年04月20日 21:45
koder_ua, what's the history useful for? I'm not against it... exactly... it just seems like I would never use it. If I did want a history of HTTP activity, I would need to maintain it myself. I wouldn't want it tied to a particular object--these urlopen() objects, typically you open them, read them, and then throw them away.
I'm iffy on the API for 2) as well, but I can see the appeal.
The thing is... this patch is really, really rough. In one of the tests, an attribute is called "sended_hdrs" in one place and "sended_headers" in another. (Actually it should be called "request_headers". Likewise "recived" and "res" should each be spelled "response".) The whitespace isn't PEP 8. It's also missing a doc patch.
msg81793 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009年02月12日 18:13
Patch has tests too, might need updating.
:)
msg96705 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009年12月20日 17:59
Having a HEAD request for urllib2 might be a good idea. I shall use this
patch to add the functionality.
But, having a history support in the urllib2 module is not a good idea
IMO. It is best left to the clients which might use urllib2.
msg96706 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009年12月20日 18:01
+1 for HEAD
msg96708 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2009年12月20日 18:14
If you know you want an HEAD request, it means you already know it
will be an HTTP request, so why not directly use httplib or httplib2
instead of urllib?
Aside: s/sended/sent/
Cheers
msg96791 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2009年12月22日 02:09
Here is a discussion and explanation from the submitter on what is meant
by history of requests.
http://mail.python.org/pipermail/python-dev/2007-March/072069.html 
msg110528 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010年07月16日 23:45
Seems to be +1 for HEAD, -1 for history, keyword says easy so should be feasable for 3.2. :)
msg112748 - (view) Author: David Stanek (dstanek) Date: 2010年08月04日 03:55
I have attached a patch to add support for HEAD, PUT and DELETE methods. The code review is available here: http://codereview.appspot.com/1696061.
I have started working on another patch that validates that the method is properly set. For instance, it doesn't make sense to have a HEAD or DELETE with post data. The problem is that the interface is so wide open that it is hard to catch all possible user errors. A user could call Request.__init__ correctly, but then set req.method to an invalid method. If there is some interest I'll finish up the patch.
msg115366 - (view) Author: (ipatrol) Date: 2010年09月02日 12:03
Can this be somehow implemented as a bugfix patch as well on other versions?
msg115370 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010年09月02日 13:00
Only stable (2.7 and 3.1) and development versions (3.2) get bug fixes.
msg115372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年09月02日 13:42
New features can only go in 3.2. From a quick look, the patch looks ok.
msg132359 - (view) Author: Denver Coneybeare (denversc) * Date: 2011年03月27日 19:41
I decided to take a look at this old, forgotten issue and propose an updated patch. I like the submitter's idea that urllib.Request.__init__() should take a "method" parameter to override the return value of get_method(). I've created and attached a patch (issue1673007_urllib_Request_method_v1.patch) which implements this functionality, adds unit tests, and updates the documentation.
msg145294 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年10月10日 11:20
Attached an updated patch that addresses the comments of Éric in the review and adds an entry to the whatsnew.
msg145298 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年10月10日 13:42
Hi Ezio, I had probably overlooked this one. But It's a very interesting one
for me. Do you mind if I commit it ?
On Oct 10, 2011 7:20 PM, "Ezio Melotti" <report@bugs.python.org> wrote:
>
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
>
> Attached an updated patch that addresses the comments of Éric in the review
> and adds an entry to the whatsnew.
>
> ----------
> assignee: orsenthil -> ezio.melotti
> keywords: +needs review
> nosy: -BreamoreBoy
> stage: patch review -> commit review
> Added file: http://bugs.python.org/file23366/issue1673007.diff
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1673007>
> _______________________________________
> _______________________________________________
> Python-bugs-list mailing list
> Unsubscribe:
> http://mail.python.org/mailman/options/python-bugs-list/senthil%40uthcode.com
>
>
msg145302 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年10月10日 15:31
I made some more comments.
msg145329 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年10月11日 03:56
We have discussed the API a bit on IRC and these are the outcomes:
1) should method always have priority or should 'POST' always be used whenever data is passed?
2) if the method is e.g. 'GET' and data is passed, should an error be raised?
3) should Request.method be private?
4) should the value of Request.method be initialized in the __init__ or can it also be None?
5) if the value of Request.method is always initialized, should we deprecate get_method?
IMHO
 r = Request(url, data, method=foo)
should behave like
 class FooRequest(Request):
 def get_method(self):
 return foo
 r = FooRequest(url, data)
when foo is not None and
 class FooRequest(Request): pass
 r = FooRequest(url, data)
when foo is None, so the answers to the 5 questions would be
1) method always has the highest priority, data is used to decide between GET and POST when method is None;
2) data is simply ignored if the method doesn't use it -- no errors are raised;
3) maybe, currently is not (see below);
4) method matches the value passed to the constructor, defaulting to None;
5) since it's not initialized, get_method shouldn't be deprecated;
This is also what the patch implements.
Regarding 3), one might look at Request.method and see None instead of GET/POST, and this might be confusing. We could document that get_method() should be used instead to see what method will be actually used or make Request.method private and avoid the problem. If we do so there won't be a way to change the method after instance creation (but that wasn't even supported before[0], so it's probably not a problem). OTOH, the other args (e.g. data) are available and documented. I'm fine with either documenting that get_method() should be used instead or making Request.method private.
Another approach is to move the get_method() logic in the __init__, set self.method to method (if specified) or either POST (if there are data) or GET. Then have get_method() simply return self.method (and possibly deprecate it).
[0]: it's actually possible to replace the get_method() method, but if Request.method is public one could set it to a new value or to None to restore the normal behavior (GET or POST depending on data).
msg145331 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年10月11日 05:42
Our discussion stemmed from this point. If you look at the change proposed, Request class is taking a new parameter by name 'method' and it is initialized to None:
 class Request:
 def __init__(self, url, data=None, headers={},
- origin_req_host=None, unverifiable=False):
+ origin_req_host=None, unverifiable=False,
+ method=None):
But that actually defaults to GET method in terms of HTTP request for often used scenarios where the only required parameter (url) is sent.
This happens in the get_method call:
 def get_method(self):
- if self.data is not None:
+ """Return a string indicating the HTTP request method."""
+ if self.method is not None:
+ return self.method
+ elif self.data is not None:
 return "POST"
 else:
 return "GET"
Since, it is understood that the default action of Request(url) is to do a GET, I proposed that we have Request's method parameter default to GET instead of None, so the change would look like:
 class Request:
 def __init__(self, url, data=None, headers={},
- origin_req_host=None, unverifiable=False):
+ origin_req_host=None, unverifiable=False,
+ method="GET"):
And it is more meaningful when someone is looking at the Request signature. Specifying method=None and implicitly meaning it as "GET" for normal situations was not intuitive to me. (This is not case when we do not pass an explicit method arg).
At this point, Ezio's summary of API changes discussed becomes interesting. I read again and it seems to me that, the assumption is get_method is an important method which determines what method should be used and method should be given preference over data.
My point is, get_method is an useful, helper function that is helpful is sending the correct method to the http.client code which does the actual task. In the current situation, get_method "determines" based on data parameter should it send a "GET" or a "POST", but if we start using method=arg then, get_method should just return what was initialized by the method arg (defaulting to "GET").
2) The next problem comes when a user has specified both data and method="GET". This becomes an invalid scenario, but a decision has been to taken as what should be given preference? 
- As the user has sent "data", should the request be considered a POST?
- But user specified it as "GET" (intentionally or by mistake), so should the data not be used and Request should do only a GET?
- Or should we throw an error?
My personal on this is -1 on throwing an error and when data is sent, just do the POST (data overrides method).
BTW, this needs to discussed irrespective of point 1). But having method="GET" could give raise to his scenario more often. A person would just send data and forget about changing the method to "POST".
Coming to specific questions which Ezio pointed:
My take:
1) should method always have priority or should 'POST' always be used whenever data is passed?
If data is passed use POST.
2) if the method is e.g. 'GET' and data is passed, should an error be raised?
Nope, give data the priority and do POST. (As urllib is currently doing)
3) should Request.method be private?
Not necessarily, it should be public.
4) should the value of Request.method be initialized in the __init__ or can it also be None?
My take - It should be initialized to default (GET), instead of None.
5) if the value of Request.method is always initialized, should we deprecate get_method?
This is an interesting question. get_method essentially becomes less useful or it could serve as an arbiter when data and GET is sent and may be used as reliable way to get the Request's method. It should not be deprecated.
msg145335 - (view) Author: Patrick Westerhoff (poke) Date: 2011年10月11日 09:01
Senthil, I highly disagree with what you said:
> The next problem comes when a user has specified both data and method="GET".
> This becomes an invalid scenario, but a decision has been to taken as what
> should be given preference?
That is incorrect, RFC2616 states that the server should ignore the message body when the request method does not define any semantics for it, but there is nothing that makes the inclusion of a message body along with the GET request method invalid.
> - As the user has sent "data", should the request be considered a POST?
No, absolutely not. Sending data via other request methods, like DELETE or PUT, is semantically correct and should be supported completely if we are going to include a way to set the request method. If I set the method to PUT, and include data, I don’t want the library to override that to POST just because I set data.
> - But user specified it as "GET" (intentionally or by mistake), so should the
> data not be used and Request should do only a GET?
If I data is included and the request method is explicitely set to GET, make a GET request and include the data in the request body. It might not be a semantically good decision to send data over GET, but it still is not disallowed and as such should be possible (for whatever reasons).
> - Or should we throw an error?
We especially should’t throw an error, as this is not invalid at all.
> A person would just send data and forget about changing the method to "POST".
I agree that the library should still default to POST if data is included and the request method was not explicitely set (see also below).
> 1) should method always have priority or should 'POST' always be used whenever
> data is passed?
> If data is passed use POST.
No, if data is passed and no special method is set, use POST, otherwise use the method the user specified, because that is what he expects.
> 2) if the method is e.g. 'GET' and data is passed, should an error be raised?
> Nope, give data the priority and do POST. (As urllib is currently doing)
Don't give data any priority if the method was set explicitely. Otherwise the ability to set a custom method is totally useless, especially with request methods where a message body is semantically useful (i.e. all other than HEAD and GET).
> 3) should Request.method be private?
> Not necessarily, it should be public.
Depends on the way the method will be set. Looking at the way, the other request parameters are set (especially with the accessors being "deprecated"), it makes sense to leave this public.
> 4) should the value of Request.method be initialized in the __init__ or can it
> also be None?
> My take - It should be initialized to default (GET), instead of None.
Initializing it to GET will make the implementation difficult, especially if we want to continue supporting the behaviour of setting the method to POST when data is changed (and the method was not explicitely set). Unless we override the built-in property accessors I don’t think this is possible.
> 5) if the value of Request.method is always initialized, should we deprecate
> get_method?
> This is an interesting question. get_method essentially becomes less useful or
> it could serve as an arbiter when data and GET is sent and may be used as
> reliable way to get the Request's method. It should not be deprecated.
To my understanding, and this is also why I provided the same patch on the duplicate bug, `get_method` is used by the other libraries to get the used request method. Unless we change the other libraries to determine the method in a different way, deprecating `get_method` won’t get us anywhere.
What I tried to respect with the patch, and I think this was also Denver’s intention, is to add this functionality without breaking the current behaviour. That behaviour is that GET is default, and POST is set as default if there is any data. The functionality requires that the request method can be set (and the default behaviour can be overriden) without looking at the data (as explained above).
Ideally I would probably like to see the functionality of `get_method` done in another library, which performs the request. I.e. check `request.method` and use that if it’s set, otherwise check `request.data` and choose either POST or GET. But again this would require far too many changes in other libraries for this simple change.
And again on the `data` property: I think the name "data" is a bit confusing. Request does not provide any encoding details on that "data", and it is actually just the request body in its original form. What I usually do in my subclass of Request is to provide a way to encode the data I pass to the constructor (often even with multipart encoding for file streams), while the `request.data` attribute to me still means "request body".
Regarding those questions on the implementation, I agree with Ezio, and as I said this is probably the only way to add this functionality without breaking previous usages. And if we break previous usages (or restrict its functionality with weird priority rules based on the data), we better not add this functionality at all.
msg145337 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年10月11日 10:02
Patrick, 
Lots of valid points. I had not looked at the RFC spec when I mentioned about data over request (GET) method, but was trying to derive the current functionality of module (so that users can have a seamless experience) with additional method="GET" as default. Note, my intention was to be explicit when we give method arg.
But yeah, when user has specified the methods (PUT/DELETE etc) and given the data, correct rules should apply on how that method should deal with data.
As you pointed out to RFC, I realize RFC clearly points out the data (message-body) should be ignored and method should be given preference, whenever specification of method has nothing to with data. I should take back my argument on giving data as preference even over GET.
Now, question arises- Can we in anyway default the method="GET" and maintain compatibility as well consistency with user expectations? At the moment, just by sending data over Request, the method is assumed to be POST. If that is not possible, then the way, current patch does seems to be a good way to acheive the purpose.
msg145622 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年10月16日 15:55
New changeset 0a0aafaa9bf5 by Senthil Kumaran in branch 'default':
Fix closes issue 1673007 urllib.request to support HEAD requests with a new method arg.
http://hg.python.org/cpython/rev/0a0aafaa9bf5 
msg145623 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年10月16日 15:56
I have committed the patch accomodating the doc review comments which Ezio had mentioned. At the moment, the current way seems to be most backwards compatible one and did not want to delay it. Let's hope we get some feedback on this method arg. Thanks!
msg145686 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年10月17日 13:07
Attached patch changes one occurrence of ugly whitespace, changes "not x == y" to "x != y" and "not x in y" to "x not in y". Senthil, feel free to apply none, some or all of these minor cleanups.
msg145734 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011年10月17日 17:19
Hi Eric, 
The changes suggested in the patch are good for readability, I
shall include them all. Thanks!
msg145840 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011年10月18日 16:42
Doc patch to fix a reST error and tweak a few things.
History
Date User Action Args
2022年04月11日 14:56:22adminsetgithub: 44649
2012年02月29日 13:09:45eric.araujolinkissue9845 superseder
2011年10月18日 16:42:25eric.araujosetfiles: + urllib.request-doc.diff

messages: + msg145840
2011年10月17日 17:19:40orsenthilsetmessages: + msg145734
2011年10月17日 13:07:27eric.araujosetfiles: + urllib.request-cleanup.diff

messages: + msg145686
2011年10月16日 15:56:59orsenthilsetmessages: + msg145623
2011年10月16日 15:55:00python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg145622

resolution: fixed
stage: commit review -> resolved
2011年10月11日 10:02:08orsenthilsetmessages: + msg145337
2011年10月11日 09:01:26pokesetmessages: + msg145335
2011年10月11日 05:42:18orsenthilsetmessages: + msg145331
2011年10月11日 03:56:52ezio.melottisetmessages: + msg145329
2011年10月11日 02:38:12ezio.melottisetassignee: ezio.melotti -> orsenthil
2011年10月10日 15:31:09eric.araujosetmessages: + msg145302
2011年10月10日 13:42:37orsenthilsetmessages: + msg145298
2011年10月10日 11:20:41ezio.melottisetfiles: + issue1673007.diff

assignee: orsenthil -> ezio.melotti

keywords: + needs review
nosy: - BreamoreBoy
messages: + msg145294
stage: patch review -> commit review
2011年10月09日 22:44:22pokesetnosy: + poke
2011年10月09日 22:36:59ezio.melottilinkissue13142 superseder
2011年03月28日 14:47:13brian.curtinlinkissue8150 superseder
2011年03月28日 14:45:43brian.curtinsetnosy: + brian.curtin
2011年03月27日 21:26:53santoso.wijayasetnosy: + santoso.wijaya

versions: + Python 3.3, - Python 3.2
2011年03月27日 19:41:39denverscsetfiles: + issue1673007_urllib_Request_method_v1.patch
nosy: + denversc
messages: + msg132359

2010年09月02日 13:42:47pitrousetnosy: + pitrou

messages: + msg115372
versions: - Python 3.1, Python 2.7
2010年09月02日 13:00:09eric.araujosetmessages: + msg115370
2010年09月02日 12:03:47ipatrolsetnosy: + ipatrol

messages: + msg115366
versions: + Python 3.1, Python 2.7
2010年08月09日 17:42:16jackdiedsetnosy: + jackdied
2010年08月04日 03:55:13dstaneksetfiles: + 1673007_methods.patch
nosy: + dstanek
messages: + msg112748

2010年07月16日 23:45:13BreamoreBoysetnosy: + BreamoreBoy

messages: + msg110528
versions: - Python 2.7
2009年12月22日 02:09:39orsenthilsetmessages: + msg96791
2009年12月20日 18:14:42eric.araujosetnosy: + eric.araujo
messages: + msg96708
2009年12月20日 18:01:55ezio.melottisetmessages: + msg96706
versions: + Python 3.2
2009年12月20日 17:59:47orsenthilsetmessages: + msg96705
2009年10月19日 02:56:48ezio.melottisetnosy: + ezio.melotti
2009年10月17日 01:22:16orsenthilsetassignee: orsenthil
2009年04月22日 17:25:51ajaksu2setkeywords: + easy
2009年02月13日 01:21:08ajaksu2setnosy: + jjlee
2009年02月12日 18:13:41ajaksu2setnosy: + ajaksu2, orsenthil
stage: patch review
type: enhancement
messages: + msg81793
versions: + Python 2.7, - Python 2.6
2007年03月03日 14:01:08koder_uacreate

AltStyle によって変換されたページ (->オリジナル) /