Message 237411 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	yaaboukir
Recipients	PaulMcMillan, benjamin.peterson, martin.panter, orsenthil, pitrou, python-dev, soilandreyes, vstinner, yaaboukir
Date	2015年03月07日.02:53:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1425696791.58.0.0875698066931.issue23505@psf.upfronthosting.co.za>

Content
From: cve-assign () mitre org Date: Thu, 5 Mar 2015 16:42:02 -0500 (EST) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We think that the issue reduces to the question of whether it's acceptable for urlparse to provide inconsistent information about the structure of a URL. https://docs.python.org/2/library/urlparse.html says: urlparse.urlparse(urlstring[, scheme[, allow_fragments]]) Parse a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment. urlparse.urlunparse(parts) Construct a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent). The first issue is that the urlunparse documentation is ambiguous. We believe the reasonable interpretation is that there is a missing third sentence: "This ALWAYS results in a URL that is either identical or equivalent to the URL that was parsed originally." There's another interpretation that we believe is unreasonable: "This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters. If the URL that was parsed originally did not have unnecessary delimiters, then the behavior of urlunparse is UNDEFINED." So, our expectation is that urlunparse(urlparse(original_url)) should not have any significant effect on the meaning of original_url. We also think that a Python user should be able to rely on that property to make security-relevant decisions. To simply the situation, consider a case where the URL is used exclusively within Python code, and is never accessed by any web browser. The actual behavior is: >>> from urlparse import urlparse, urlunparse >>> print urlparse("////example.com") ParseResult(scheme='', netloc='', path='//example.com', params='', query='', fragment='') >>> print urlparse(urlunparse(urlparse("////example.com"))) ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='') >>> print urlparse(urlunparse(urlparse(urlunparse(urlparse("////example.com"))))) ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='') Here, urlparse(urlunparse(original_url)) does have a significant effect on the meaning of original_url. The Python user may have wanted to make a security-relevant decision based on whether netloc was an empty string. However, netloc is different depending on whether urlparse(urlunparse(original_url)) occurs at least once. The user's application (suppose it's called "PyNetlocExaminer") is affected in a security-relevant way. The next question is, if there is a CVE for a report of a security-relevant problem, what product is named as the primary affected product within that CVE. There is no perfect answer to this question. Especially in the case of a general-purpose language such as Python, there's an extremely wide range of bugs that might become security-relevant in some applications. What we usually try to do is make the CVE useful to users who may need to perform a software update. Specifically: 1. If the language implementation is not ever going to be changed (for example: because the language maintainer believes the observed behavior has always been correct, or the language maintainer believes that it has retroactively become correct because any change would break compatibility with other applications), then the application is named as the primary affected product in the CVE. In other words, if the inconsistency between netloc='' and netloc='example.com' were actually the intended behavior all along, then PyNetlocExaminer would be named in the CVE. Here, realistically, the end user would need to update or manually fix PyNetlocExaminer. 2. If the language implementation is incorrect and is planned to be changed at some point, and that would eliminate the security-relevant problem, then the language implementation is named in the CVE. (An application might also be named in the CVE, especially if there are very few affected applications.) This option occurs regardless of whether the language maintainer believes that it is a language vulnerability. (The language maintainer has the option of composing a dispute that would be appended to the CVE.) Here, the end user may ultimately decide to address the problem by updating their Python installation, not by updating PyNetlocExaminer. Again, this is imperfect. It works best in the relatively common case where a language bug has security relevance in many applications. It might work especially poorly in a case where a language bug has security relevance in exactly one application. However, it seems preferable to do the above consistently, rather than make the outcome depend on application populations, or depend on reaching universal agreement about what code should have been written differently. - -- CVE assignment team, MITRE CVE Numbering Authority M/S M300 202 Burlington Road, Bedford, MA 01730 USA [ PGP key available through http://cve.mitre.org/cve/request_id.html ]

Content

From: cve-assign () mitre org
Date: Thu, 5 Mar 2015 16:42:02 -0500 (EST)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
We think that the issue reduces to the question of whether it's
acceptable for urlparse to provide inconsistent information about the
structure of a URL.
https://docs.python.org/2/library/urlparse.html says:
 urlparse.urlparse(urlstring[, scheme[, allow_fragments]])
 Parse a URL into six components, returning a 6-tuple. This
 corresponds to the general structure of a URL:
 scheme://netloc/path;parameters?query#fragment.
 urlparse.urlunparse(parts)
 Construct a URL from a tuple as returned by urlparse(). The parts
 argument can be any six-item iterable. This may result in a
 slightly different, but equivalent URL, if the URL that was parsed
 originally had unnecessary delimiters (for example, a ? with an
 empty query; the RFC states that these are equivalent).
The first issue is that the urlunparse documentation is ambiguous. We
believe the reasonable interpretation is that there is a missing third
sentence: "This ALWAYS results in a URL that is either identical or
equivalent to the URL that was parsed originally." There's another
interpretation that we believe is unreasonable: "This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had unnecessary delimiters. If the URL that was parsed
originally did not have unnecessary delimiters, then the behavior of
urlunparse is UNDEFINED."
So, our expectation is that urlunparse(urlparse(original_url)) should
not have any significant effect on the meaning of original_url. We
also think that a Python user should be able to rely on that property
to make security-relevant decisions. To simply the situation, consider
a case where the URL is used exclusively within Python code, and is
never accessed by any web browser.
The actual behavior is:
 >>> from urlparse import urlparse, urlunparse
 >>> print urlparse("////example.com")
 ParseResult(scheme='', netloc='', path='//example.com', params='', query='', fragment='')
 >>> print urlparse(urlunparse(urlparse("////example.com")))
 ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='')
 >>> print urlparse(urlunparse(urlparse(urlunparse(urlparse("////example.com")))))
 ParseResult(scheme='', netloc='example.com', path='', params='', query='', fragment='')
Here, urlparse(urlunparse(original_url)) does have a significant
effect on the meaning of original_url. The Python user may have wanted
to make a security-relevant decision based on whether netloc was an
empty string. However, netloc is different depending on whether
urlparse(urlunparse(original_url)) occurs at least once. The user's
application (suppose it's called "PyNetlocExaminer") is affected in a
security-relevant way.
The next question is, if there is a CVE for a report of a
security-relevant problem, what product is named as the primary
affected product within that CVE. There is no perfect answer to this
question. Especially in the case of a general-purpose language such as
Python, there's an extremely wide range of bugs that might become
security-relevant in some applications. What we usually try to do is
make the CVE useful to users who may need to perform a software
update. Specifically:
 1. If the language implementation is not ever going to be changed
 (for example: because the language maintainer believes the
 observed behavior has always been correct, or the language
 maintainer believes that it has retroactively become correct
 because any change would break compatibility with other
 applications), then the application is named as the primary
 affected product in the CVE. In other words, if the inconsistency
 between netloc='' and netloc='example.com' were actually the
 intended behavior all along, then PyNetlocExaminer would be named
 in the CVE. Here, realistically, the end user would need to
 update or manually fix PyNetlocExaminer.
 2. If the language implementation is incorrect and is planned to be
 changed at some point, and that would eliminate the
 security-relevant problem, then the language implementation is
 named in the CVE. (An application might also be named in the CVE,
 especially if there are very few affected applications.) This
 option occurs regardless of whether the language maintainer
 believes that it is a language vulnerability. (The language
 maintainer has the option of composing a dispute that would be
 appended to the CVE.) Here, the end user may ultimately decide to
 address the problem by updating their Python installation, not by
 updating PyNetlocExaminer.
Again, this is imperfect. It works best in the relatively common case
where a language bug has security relevance in many applications. It
might work especially poorly in a case where a language bug has
security relevance in exactly one application. However, it seems
preferable to do the above consistently, rather than make the outcome
depend on application populations, or depend on reaching universal
agreement about what code should have been written differently.
- -- 
CVE assignment team, MITRE CVE Numbering Authority
M/S M300
202 Burlington Road, Bedford, MA 01730 USA
[ PGP key available through http://cve.mitre.org/cve/request_id.html ]

History
Date	User	Action	Args
2015年03月07日 02:53:11	yaaboukir	set	recipients: + yaaboukir, orsenthil, pitrou, vstinner, benjamin.peterson, python-dev, martin.panter, PaulMcMillan, soilandreyes
2015年03月07日 02:53:11	yaaboukir	set	messageid: <1425696791.58.0.0875698066931.issue23505@psf.upfronthosting.co.za>
2015年03月07日 02:53:11	yaaboukir	link	issue23505 messages
2015年03月07日 02:53:09	yaaboukir	create

homepage