This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
| Author | mgiuca |
|---|---|
| Recipients | mgiuca |
| Date | 2008年07月12日.13:46:13 |
| SpamBayes Score | 0.00023638742 |
| Marked as misclassified | No |
| Message-id | <1215870377.1.0.267386206374.issue3347@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
urllib.robotparser is broken in Python 3.0, due to a bytes object
appearing where a str is expected.
Example:
>>> import urllib.robotparser
>>> r =
urllib.robotparser.RobotFileParser('http://www.python.org/robots.txt')
>>> r.read()
TypeError: expected an object with the buffer interface
This is because the variable f in RobotFileParser.read is opened by
urlopen as a binary file, so f.read() returns a bytes object.
I've included a patch, which checks if it's a bytes, and if so, decodes
it with 'utf-8'. A more thorough fix might figure out what the charset
of the document is (in f.headers['Content-Type']), but at least this
works, and will be sufficient in almost all cases.
Also there are no test cases for urllib.robotparser.
Patch (robotparser.py.patch) is for branch /branches/py3k, revision 64891.
Commit log:
Lib/urllib/robotparser.py: Fixed robotparser for Python 3.0. urlopen
returns bytes objects where str expected. Decode the bytes using UTF-8. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008年07月12日 13:46:17 | mgiuca | set | spambayes_score: 0.000236387 -> 0.00023638742 recipients: + mgiuca |
| 2008年07月12日 13:46:17 | mgiuca | set | spambayes_score: 0.000236387 -> 0.000236387 messageid: <1215870377.1.0.267386206374.issue3347@psf.upfronthosting.co.za> |
| 2008年07月12日 13:46:15 | mgiuca | link | issue3347 messages |
| 2008年07月12日 13:46:14 | mgiuca | create | |