Issue 23498: Expose http.cookiejar.split_header_words()

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67686

classification

Title:	Expose http.cookiejar.split_header_words()
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:

process

Dependencies:	Superseder:
Status:	open	Resolution:
Assigned To:	orsenthil	Nosy List:	berker.peksag, martin.panter, orsenthil, r.david.murray
Priority:	normal	Keywords:

Created on 2015年02月22日 00:58 by martin.panter, last changed 2022年04月11日 14:58 by admin.

Messages (1)
msg236397 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2015年02月22日 00:58
I propose to document the split_header_words() so that it can be used to parse various kinds of HTTP-based header fields. Perhaps it should live in a more general module like "http", or "email.policy.HTTP" (hinted in Issue 3609). Perhaps there is also room for finding a better name, such as parse_header_attributes() or something, since splitting space-separated words is not its most important property. The function takes a series of header field values, as returned from Message.get_all(failobj=()). The field values may be separate strings and may also be comma-separated. It parses space- or semicolon-separated name=value attributes from each field value. Examples: RFC 2965 Set-Cookie2 fields: >>> cookies = ( ... 'Cookie1="VALUE";Version=1;Discard, Cookie2="Same field";Version=1', ... 'Cookie3="Separate header field";Version=1', ... ) >>> pprint(http.cookiejar.split_header_words(cookies)) [[('Cookie1', 'VALUE'), ('Version', '1'), ('Discard', None)], [('Cookie2', 'Same field'), ('Version', '1')], [('Cookie3', 'Separate header field'), ('Version', '1')]] RTSP 1.0 (RFC 2326) Transport header field: >>> transport = 'RTP/AVP;unicast;mode="PLAY, RECORD", RTP/AVP/TCP;interleaved=0-1' >>> pprint(http.cookiejar.split_header_words((transport,))) [[('RTP/AVP', None), ('unicast', None), ('mode', 'PLAY, RECORD')], [('RTP/AVP/TCP', None), ('interleaved', '0-1')]] The parsing of spaces seems to be an attempt to parse headers like WWW-Authenticate, although it mixes up the parameters when given this example from RFC 7235: >>> auth = 'Newauth realm="apps", type=1, title="Login to \\"apps\\"", Basic realm="simple"' >>> pprint(http.cookiejar.split_header_words((auth,))) [[('Newauth', None), ('realm', 'apps')], [('type', '1')], [('title', 'Login to "apps"')], [('Basic', None), ('realm', 'simple')]] Despite that, the function is still very useful for parsing many kinds of header fields that use semicolons. All the alternatives in the standard library that I know of have disadvantages: * cgi.parse_header() does not split comma-separated values apart, and ignores any attribute without an equals sign, such as "Discard" and "unicast" above * email.message.Message.get_params() and get_param() do not split comma-separated values either, and parsing header values other than the first one in a Message object is awkward * email.headerregistry.ParameterizedMIMEHeader looks relevant, but I couldn’t figure out how to use it

Messages (1)

msg236397 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015年02月22日 00:58

I propose to document the split_header_words() so that it can be used to parse various kinds of HTTP-based header fields. Perhaps it should live in a more general module like "http", or "email.policy.HTTP" (hinted in Issue 3609). Perhaps there is also room for finding a better name, such as parse_header_attributes() or something, since splitting space-separated words is not its most important property.
The function takes a series of header field values, as returned from Message.get_all(failobj=()). The field values may be separate strings and may also be comma-separated. It parses space- or semicolon-separated name=value attributes from each field value. Examples:
RFC 2965 Set-Cookie2 fields:
>>> cookies = (
... 'Cookie1="VALUE";Version=1;Discard, Cookie2="Same field";Version=1',
... 'Cookie3="Separate header field";Version=1',
... )
>>> pprint(http.cookiejar.split_header_words(cookies))
[[('Cookie1', 'VALUE'), ('Version', '1'), ('Discard', None)],
 [('Cookie2', 'Same field'), ('Version', '1')],
 [('Cookie3', 'Separate header field'), ('Version', '1')]]
RTSP 1.0 (RFC 2326) Transport header field:
>>> transport = 'RTP/AVP;unicast;mode="PLAY, RECORD", RTP/AVP/TCP;interleaved=0-1'
>>> pprint(http.cookiejar.split_header_words((transport,)))
[[('RTP/AVP', None), ('unicast', None), ('mode', 'PLAY, RECORD')],
 [('RTP/AVP/TCP', None), ('interleaved', '0-1')]]
The parsing of spaces seems to be an attempt to parse headers like WWW-Authenticate, although it mixes up the parameters when given this example from RFC 7235:
>>> auth = 'Newauth realm="apps", type=1, title="Login to \\"apps\\"", Basic realm="simple"'
>>> pprint(http.cookiejar.split_header_words((auth,)))
[[('Newauth', None), ('realm', 'apps')],
 [('type', '1')],
 [('title', 'Login to "apps"')],
 [('Basic', None), ('realm', 'simple')]]
Despite that, the function is still very useful for parsing many kinds of header fields that use semicolons. All the alternatives in the standard library that I know of have disadvantages:
* cgi.parse_header() does not split comma-separated values apart, and ignores any attribute without an equals sign, such as "Discard" and "unicast" above
* email.message.Message.get_params() and get_param() do not split comma-separated values either, and parsing header values other than the first one in a Message object is awkward
* email.headerregistry.ParameterizedMIMEHeader looks relevant, but I couldn’t figure out how to use it

History
Date	User	Action	Args
2022年04月11日 14:58:13	admin	set	github: 67686
2021年04月27日 01:30:40	orsenthil	set	assignee: orsenthil
2015年02月22日 18:55:44	berker.peksag	set	nosy: + orsenthil, r.david.murray, berker.peksag
2015年02月22日 00:58:10	martin.panter	create

homepage