homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Doctest sees directives in strings when it should only see them in comments
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Devin Jeanpierre, benjamin.peterson, petri.lehtinen, r.david.murray, tim.peters
Priority: normal Keywords: patch

Created on 2011年04月22日 19:31 by Devin Jeanpierre, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
comments.diff Devin Jeanpierre, 2011年04月22日 19:31 patch to tip review
comments2.diff Devin Jeanpierre, 2011年07月03日 23:15 review
comments3.diff Devin Jeanpierre, 2011年07月04日 03:45
Messages (8)
msg134278 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2011年04月22日 19:31
From the doctest source:
'Option directives are comments starting with "doctest:". Warning: this may give false positives for string-literals that contain the string "#doctest:". Eliminating these false positives would require actually parsing the string; but we limit them by ignoring any line containing "#doctest:" that is *followed* by a quote mark.'
This isn't a huge deal, but it's a bit annoying. Above being confusing, this is in contradiction with the doctest documentation, which states:
'Doctest directives are expressed as a special Python comment following an example’s source code'
No mention is made of this corner case where the regexp breaks.
As per the comment in the source, the patched version parses the source using the tokenize module, and runs a modified directive regex on all comment tokens to find directives.
msg138780 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011年06月21日 11:10
The patch looks good to me. It passes the old doctests tests and adds a new test case for what it's fixing.
msg138886 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年06月24日 01:45
For the most part the patch looks good to me, too. My one concern is the encoding. tokenize detects the encoding...is it possible for the doctest fragment to be detected to be some encoding other than utf-8?
msg138893 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2011年06月24日 08:40
You're right, and good catch. If a doctest starts with a "#coding:XXX" line, this should break.
One option is to replace the call to tokenize.tokenize with a call to tokenize._tokenize and pass 'utf-8' as a parameter. Downside: that's a private and undocumented API. The alternative is to manually add a coding line that specifies UTF-8, so that any coding line in the doctest would be ignored. 
My preferred option would be to add the ability to read unicode to the tokenize API, and then use that. I can file a separate ticket if that sounds good, since it's probably useful to others too.
One other thing to be worried about -- I'm not sure how doctest would treat tests with leading "coding:XXX" lines. I'd hope it ignores them, if it doesn't then this is more complicated and the above stuff wouldn't work.
I'll see if I have the time to play around with this (and add more test cases to the patch, correspondingly) this weekend.
msg138948 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年06月24日 14:16
I agree that having a unicode API for tokenize seems to make sense, and that would indeed require a separate issue.
That's a good point about doctest not otherwise supporting coding cookies. Those only really apply to source files. So no doctest fragments ought to contain coding cookies at the start, so your patch ought to be fine. But I'm not familiar with the doctest internals, so having some tests to prove everything is fine would be great.
Your code could use the tokenize sniffer to make sure the fragment reads as utf-8 and throw an error otherwise. But using a unicode interface to tokenize would probably be cleaner, since I suspect it would mimic what doctest does otherwise (ignore coding cookies). But I don't *know* the latter, so your checking it would be appreciated.
msg139715 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2011年07月03日 23:15
Updated patch to newest revision, and to use _tokenize function and includes a test case to verify that it ignores the encoding directive during the tokenization (and every other) step.
I'll file a tokenize bug separately.
msg139721 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2011年07月04日 00:40
Erp I forgot to run this against the rest of the tests. Disregard, I'll fix it up a bit later.
msg139732 - (view) Author: Devin Jeanpierre (Devin Jeanpierre) * Date: 2011年07月04日 03:45
Updated.
History
Date User Action Args
2022年04月11日 14:57:16adminsetgithub: 56118
2011年07月04日 03:45:34Devin Jeanpierresetfiles: + comments3.diff

messages: + msg139732
2011年07月04日 00:40:33Devin Jeanpierresetmessages: + msg139721
2011年07月03日 23:15:50Devin Jeanpierresetfiles: + comments2.diff

messages: + msg139715
2011年06月24日 14:16:25r.david.murraysetmessages: + msg138948
2011年06月24日 08:40:44Devin Jeanpierresetmessages: + msg138893
2011年06月24日 01:45:14r.david.murraysetnosy: + r.david.murray, benjamin.peterson
messages: + msg138886
2011年06月21日 11:10:42petri.lehtinensetnosy: + tim.peters, petri.lehtinen
messages: + msg138780
2011年04月22日 19:47:07r.david.murraysetstage: patch review
type: enhancement
versions: + Python 3.3
2011年04月22日 19:31:22Devin Jeanpierrecreate

AltStyle によって変換されたページ (->オリジナル) /