Message184726
| Author |
terry.reedy |
| Recipients |
BreamoreBoy, ezio.melotti, josephoenix, orsenthil, r.david.murray, terry.reedy |
| Date |
2013年03月20日.02:57:36 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1363748257.59.0.189145622848.issue2052@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
In 3.2, it is line 1629:
content="text/html; charset=ISO-8859-1" />
That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly.
For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as 'cフィート'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'cãŒ3'. To check:
>>> 'cフィート'.encode().decode(encoding='Latin-1')
'cã\x8c3'
To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out. |
|