Message138893
| Author |
Devin Jeanpierre |
| Recipients |
Devin Jeanpierre, benjamin.peterson, petri.lehtinen, r.david.murray, tim.peters |
| Date |
2011年06月24日.08:40:44 |
| SpamBayes Score |
2.4577007e-12 |
| Marked as misclassified |
No |
| Message-id |
<1308904845.51.0.466556934052.issue11909@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.
One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored.
My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too.
One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work.
I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend. |
|