This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年10月13日 15:43 by belopolsky, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| Calendar for 2010.pdf | belopolsky, 2010年10月13日 15:43 | |||
| calendar.diff | doerwalter, 2010年10月13日 16:20 | review | ||
| calendar_bytes_output.patch | lambacck, 2010年11月22日 03:25 | review | ||
| calendar_bytes_output_v2.patch | lambacck, 2010年11月22日 16:55 | review | ||
| test_calendar_byteoutput.patch | catherine, 2011年07月31日 03:34 | patch to test_calendar.py | review | |
| Messages (31) | |||
|---|---|---|---|
| msg118532 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月13日 15:43 | |
$ python3 -m calendar --type=html 2010 Produces HTML which renders into the attached PDF file. It looks like bytes vs. strings issue. |
|||
| msg118533 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月13日 16:00 | |
The issue is clearly with using print() for bytes' output: if len(args) == 1: print(cal.formatyearpage(datetime.date.today().year, **optdict)) I am not sure, however how this can be fixed because calendar interface allows user to specify the encoding to be used for HTML and that encoding may not be the same or even compatible with sys.stdout encoding. I would suggest adding -o/--output option to calendar CLI to specify the output file and make it required when specified encoding is not "compatible" with that of sys.stdout. What constitutes "compatible" still needs to be defined. |
|||
| msg118536 - (view) | Author: Walter Dörwald (doerwalter) * (Python committer) | Date: 2010年10月13日 16:20 | |
Does the following patch fix your problems? |
|||
| msg118542 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月13日 16:42 | |
On Wed, Oct 13, 2010 at 12:20 PM, Walter Dörwald <report@bugs.python.org> wrote: .. > Does the following patch fix your problems? > Probably not because with your patch print() will encode HTML stream with the system default encoding which may be different from that specified by -e/--encode argument to calendar. The result would be an ill-formed XML stream with encoding specified by the meta-tag not matching the actual encoding. |
|||
| msg118555 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月13日 18:13 | |
There is a similar issue with printing text calendar with specified encoding: $ python3 -m calendar --encoding=utf8 b' 2010\n\n .. |
|||
| msg118576 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年10月13日 20:13 | |
Note that .decode(encoding) calls have been deliberately removed in py3k: see issue 3059. |
|||
| msg122082 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2010年11月22日 03:25 | |
I have attached a patch that conditionally uses sys.stdout.buffer.write to output binary (encoded) data to stdout. |
|||
| msg122084 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月22日 03:35 | |
It may be acceptable, but your patch elides the line breaks in output. Can you add unit tests? |
|||
| msg122131 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2010年11月22日 14:48 | |
I don't understand what you mean by "elides the line breaks in output". They are still there, you just don't see them as \n because print() is no longer implicitly converting the bytes to a string (which appears to actually result in a repr). |
|||
| msg122136 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月22日 15:27 | |
On Mon, Nov 22, 2010 at 9:48 AM, Chris Lambacher <report@bugs.python.org> wrote: .. > I don't understand what you mean by "elides the line breaks in output". It is actually not that bad: $ ./python.exe -m calendar -t html| wc -l 121 $ python2.7 -m calendar -t html| wc -l 122 At first, I thought that html was printed one line at a time, but now I realize that it is prepared in-memory and printed in one shot. The extra '\n' in python2.7 is probably a bug in 2.7. I think this is a reasonable approach. Just a few comments on the patch: 1. Unit tests needed. 2. It may be appropriate to add a warning to the documentation stating that using -e option may mess up the terminal. 3. In the following snippet, p is an unconditional shortcut to sys.stdout.buffer.write. I would just call it "write" optdict = dict(encoding=encoding, css=options.css) + p = sys.stdout.buffer.write 4. While white space consistency with 2.7 is not very important, the following should be fixed, IMO: $ ./python.exe -m calendar -e ascii| wc -l 35 $ ./python.exe -m calendar| wc -l 36 5. I wonder how ./python.exe -m calendar -e ascii will look on Windows. I don't think cmd window is smart about displaying unix line endings. |
|||
| msg122147 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2010年11月22日 16:55 | |
I am attaching a new patch which fixes the majority of the comments raised. 1. Any suggestions about how to test the output of the console program (the case that this bug affects) would be appreciated. 2. Agreed, included in the output for --help 3. Agreed, included in new patch 4. Okay. That is because of using write vs print. Changed to always use write 5. Doesn't seem to cause any issues on windows terminal. |
|||
| msg122148 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月22日 17:14 | |
On Mon, Nov 22, 2010 at 11:55 AM, Chris Lambacher <report@bugs.python.org> wrote: .. > 1. Any suggestions about how to test the output of the console program (the case that this bug affects) > would be appreciated. For month displays, a doctest may be most appropriate. See Lib/test/test_syntax.py for one example of using doctests in regrtest suite. Note that sys.stdout being StringIO may present a challenge to your approach. You can also take a look at test_pydoc which also tests console-oriented text and html output. |
|||
| msg122149 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2010年11月22日 17:24 | |
The test_pydoc method looks workable, but I'll need to come back to it later because I don't have any more time to work on it today. |
|||
| msg122191 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2010年11月23日 05:03 | |
I wonder: do we really need "encoding" option to the calendar output? It is really not the job of the calendar module to deal with encodings. The calendar module should produce text calendars as unicode strings and HTML calendars as UTF-8 bytes. If someone wants to save them in a specific encoding, there are plenty of ways to achieve that. |
|||
| msg122219 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2010年11月23日 15:11 | |
I don't think we *need* to have the encoding in the HTML calendar, but I doubt we could remove it at this point, deprecate maybe, but since I don't use the module I don't have a sense for how often the need for encoding comes up. The one thing that the encoding is doing in the HTML calendar is ensuring that Unicode characters that are not directly translatable in the target encoding are replaced with html character references (the "xmlcharrefreplace" option to the errors argument in the encode method). The encoding may be a holdover from Python 2x where the expected result was a string. The docs say HTMLCalendar was added in Python 2.5 at the same time that the Local versions were added. |
|||
| msg124409 - (view) | Author: Ron Adam (ron_adam) * | Date: 2010年12月20日 22:13 | |
The problem is in the following line...
return ''.join(v).encode(encoding, "xmlcharrefreplace")
The .encode(encoding, "xmlcharrefreplace") is returning a bytes object.
Here is the simplest change to resolve the problem.
return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
encoding=encoding)
But maybe a different way to implement "xmlcharrefreplace" is what is needed. Would it be better if it were a function or method in the xml (or html) module?
Just because it can be implemented as an encoding doesn't mean it should be.
|
|||
| msg124410 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2010年12月20日 22:38 | |
Sorry in advance for the long winded response.
Ron, have you looked at my patch?
The underlying issue is that the semantics for print() between Python 1 and 3. print() does not accept a bytes type in Python 3. In Python 2 str was a "bytes" type and so print happily sent encoded strings to stdout.
This presents an issue for both --type=html and the text version if an encoding is asked for. Just using print() will result in repr being called on the byte string and you get either an invalid HTML file or a text file with extra junk in it (same junk in both).
If you ask for an encoding, you are going to get bytes. Changing it back into a string to mask that effect does not actually fix things for you because once you do print() you are back to a default encoding and therefore more broken because you are not doing what the user asked for (which is a particular encoding).
In order for:
return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
encoding=encoding)
to solve the issue, you would also need to take away the ability for the user to specify an encoding (at the command line and via the API). It's already a string, why make it a byte and then a string again? If you don't want to deal with encoding, then return a string and leave it up to the consumer of the API to handle the desired encoding (and the "xmlcharrefreplace", maybe with a note in the docs).
If you do want to deal with encoding (which I think we are stuck with), then solve the real issue by not using print() (see my patch).
I think the only reason that my patch was not accepted, and why this is still languishing is that I said I would provide tests and have not had time to do so.
Please feel free to correct me if I am wrong about any of the above.
|
|||
| msg124414 - (view) | Author: Ron Adam (ron_adam) * | Date: 2010年12月21日 02:06 | |
Oops. You're right. I miss understood how the encode method works in this particular case. ;-/ I agree with your comments as well. |
|||
| msg141457 - (view) | Author: Catherine Devlin (catherine) | Date: 2011年07月31日 03:34 | |
Very simplistic patch to test_calendar.py that adds a test for this fix. This fails on the unpatched trunk b/c the unpatched output begins with b"b'<?xm" instead of b"<?xml ". Test needs a quick review... I'm not familiar with the conventions yet! |
|||
| msg141518 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年08月01日 14:37 | |
Patch looks good. I made a few very minor style comments on Rietveld (follow the "review" link). |
|||
| msg141732 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年08月07日 01:00 | |
FWIW, I think removing the encoding facility (more so because it can interfere with the terminal encoding) and just dealing with strings to output the html version of calendar seems a neater way than definiing write = sys.stdout.buffer.write and writing the bytes to the STDOUT. Is this not feasible at the moment? |
|||
| msg141734 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年08月07日 01:23 | |
BTW, accessing sys.stdout.buffer is not really safe, because if sys.stdout has been replaced with some other object, the buffer attribute might not be available (IIRC StringIO doesn't have it). |
|||
| msg141736 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2011年08月07日 01:40 | |
Senthil: I think that would fundamentally make things worse. The HTML calendar apparently always provides a bytes type, but lets assume it provided you with unicode and we used sys.stdout.write on the output. Fundamentally you would get the same behavior as the patch provides since if you don't provide an encoding it uses sys.getdefaultencoding() which, IIUC is fundamentally what happens if you sys.stdout.write a unicode string. Ezio: I think it is highly unlikely that someone would be fiddling around with sys.stdout and then calling the main() function on calendar.main() (after setting up sys.args so that option parser gets the right parameters). |
|||
| msg141738 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年08月07日 06:06 | |
Chris, as you already suggested in msg124410, I would prefer the removal of encoding specification from command line. Providing the encoding argument via API with deprecating warning is a good idea. The output should be string and it should be just be printed. It should not be converted to bytes and then decoded back to string. |
|||
| msg141744 - (view) | Author: Chris Lambacher (lambacck) * | Date: 2011年08月07日 15:13 | |
Senthil: I wasn't advocating the removal of the ability to specify encoding, only stating that avoiding the use of sys.stdout.buffer.write could only sensibly be done if we also removed the ability to specify an encoding (I can see how my wording might have been confusing). Earlier (msg122219) I said that we probably can't remove the encoding option because that would break backwards compatibility. FWIW, I think that the encoding parameter to the console program is probably sensible because there are no guarantees that the sys.getdefaultencoding() is actually set to the console encoding and I would guess that most people using the html option are not sending output to the console anyway. They are likely redirecting the output somewhere else because there is no "output this to a file" option. |
|||
| msg141746 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月07日 19:31 | |
I don't think "WARNING: this may corrupt your terminal" is warranted. Also, I think we can safely remove the --encoding option. Calling a module from the command-line is intended as a demonstration of the module's functionality, it's not an official API that we have to keep compatible. Apart from that, Chris' latest patch and Catherine's minimal test look ok to me. |
|||
| msg141765 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年08月08日 10:46 | |
I agree with Chris's point that people using html format in console are probably redirecting to a file and encoding parameter should be helpful. What do we settle upon? Leave the encoding argument as such? Antoine - I was specifically interested to know if sys.stdout.buffer.write as in the patch was good(/proper) way to go forward writing the bytes? |
|||
| msg141839 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月09日 19:17 | |
Le 2011年8月08日 10:46:20 +0000, Senthil Kumaran <report@bugs.python.org> a écrit : > > Antoine - I was specifically interested to know if > sys.stdout.buffer.write as in the patch was good(/proper) way to go > forward writing the bytes? Yes, it is. |
|||
| msg141889 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年08月11日 01:25 | |
New changeset 9fc7ef60ea06 by Senthil Kumaran in branch '3.2': Fix closes Issue10087 - fixing the output of calendar display in the html format. Patch by Chris Lambacher. Test Contributed by catherine. http://hg.python.org/cpython/rev/9fc7ef60ea06 New changeset 23316468ed4f by Senthil Kumaran in branch '3.2': News item for Issue10087. http://hg.python.org/cpython/rev/23316468ed4f |
|||
| msg141973 - (view) | Author: Éric Araujo (eric.araujo) * (Python committer) | Date: 2011年08月12日 16:55 | |
There were comments by Ezio and me on Rietveld. Also, the commit adds a period after the help text for --encoding, but all other help text aren’t capitalized and don’t use periods, as is usual in help messages. |
|||
| msg142002 - (view) | Author: Senthil Kumaran (orsenthil) * (Python committer) | Date: 2011年08月13日 00:36 | |
Hello Éric, I might have ignored some minor stylistic comments. The '.' in the help text and , after the last TestName, I am not sure if it is of concern. I think, to update the stylistic comments, if the submitters (if they care) could have updated the patch, or a separate commit on style changes can be done (if they are worth it). And yeah, it is bad to ignore substantiative comments. I hope, I did not overlook any. Ezio's main point was he was worried about using sys.stdout.buffer (as I was too) and we got that clarified. Thanks, Senthil |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:07 | admin | set | github: 54296 |
| 2011年08月13日 00:36:55 | orsenthil | set | messages: + msg142002 |
| 2011年08月12日 16:55:23 | eric.araujo | set | messages: + msg141973 |
| 2011年08月11日 01:25:32 | python-dev | set | status: open -> closed nosy: + python-dev messages: + msg141889 resolution: fixed stage: test needed -> resolved |
| 2011年08月09日 19:17:00 | pitrou | set | messages: + msg141839 |
| 2011年08月08日 11:19:25 | vstinner | set | nosy:
- vstinner |
| 2011年08月08日 10:46:19 | orsenthil | set | messages: + msg141765 |
| 2011年08月07日 19:31:28 | pitrou | set | messages: + msg141746 |
| 2011年08月07日 15:13:36 | lambacck | set | messages: + msg141744 |
| 2011年08月07日 06:06:42 | orsenthil | set | nosy:
+ pitrou messages: + msg141738 |
| 2011年08月07日 01:40:11 | lambacck | set | messages: + msg141736 |
| 2011年08月07日 01:23:33 | ezio.melotti | set | messages: + msg141734 |
| 2011年08月07日 01:00:59 | orsenthil | set | nosy:
+ orsenthil messages: + msg141732 |
| 2011年08月01日 14:37:35 | eric.araujo | set | messages: + msg141518 |
| 2011年07月31日 13:56:43 | r.david.murray | set | nosy:
+ r.david.murray |
| 2011年07月31日 03:34:11 | catherine | set | files:
+ test_calendar_byteoutput.patch nosy: + catherine messages: + msg141457 |
| 2010年12月21日 02:06:11 | ron_adam | set | nosy:
doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck messages: + msg124414 |
| 2010年12月20日 22:38:29 | lambacck | set | nosy:
doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck messages: + msg124410 |
| 2010年12月20日 22:13:20 | ron_adam | set | nosy:
doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck messages: + msg124409 |
| 2010年12月20日 16:55:24 | ron_adam | set | nosy:
+ ron_adam |
| 2010年11月23日 15:11:16 | lambacck | set | messages: + msg122219 |
| 2010年11月23日 05:03:14 | belopolsky | set | messages: + msg122191 |
| 2010年11月22日 17:24:27 | lambacck | set | messages: + msg122149 |
| 2010年11月22日 17:14:33 | belopolsky | set | messages: + msg122148 |
| 2010年11月22日 16:55:43 | lambacck | set | files:
+ calendar_bytes_output_v2.patch messages: + msg122147 |
| 2010年11月22日 15:27:45 | belopolsky | set | messages: + msg122136 |
| 2010年11月22日 14:48:54 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2010年11月22日 14:48:17 | lambacck | set | messages: + msg122131 |
| 2010年11月22日 03:35:58 | belopolsky | set | messages: + msg122084 |
| 2010年11月22日 03:25:39 | lambacck | set | files:
+ calendar_bytes_output.patch nosy: + lambacck messages: + msg122082 |
| 2010年11月22日 01:54:52 | belopolsky | set | nosy:
+ rhettinger |
| 2010年10月18日 16:22:48 | eric.araujo | set | nosy:
+ eric.araujo |
| 2010年10月13日 20:13:17 | belopolsky | set | nosy:
+ georg.brandl, gpolo messages: + msg118576 |
| 2010年10月13日 18:13:49 | belopolsky | set | messages: + msg118555 |
| 2010年10月13日 16:42:36 | belopolsky | set | messages: + msg118542 |
| 2010年10月13日 16:20:35 | doerwalter | set | files:
+ calendar.diff nosy: + doerwalter messages: + msg118536 keywords: + patch |
| 2010年10月13日 16:00:24 | belopolsky | set | nosy:
+ vstinner messages: + msg118533 |
| 2010年10月13日 15:43:26 | belopolsky | create | |