Message327082
| Author |
nascheme |
| Recipients |
nascheme, xtreak |
| Date |
2018年10月04日.20:17:40 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Thank you for the research. The problem is indeed that \v is getting treated as a line separator. That is an intentional design choice, see:
https://bugs.python.org/issue12855
It would seem to have some surprising implications for CSV parsing. E.g. if someone embeds a \v character in a quoted field, parsing the file using codecs.getreader() will cause the field to be split across two rows.
Someone else has run into the same issue:
https://www.enigma.com/blog/the-secret-world-of-newline-characters
I'm not sure anything should be done. Perhaps we should do something to reduce that chances that people trip over this issue. E.g. if I want to parse a file containing Unicode text with the CSV module, how do I do it while allowing \v characters (or other new-line like characters other than \n) within fields? |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2018年10月04日 20:17:40 | nascheme | set | recipients:
+ nascheme, xtreak |
| 2018年10月04日 20:17:40 | nascheme | set | messageid: <1538684260.3.0.545547206417.issue34801@psf.upfronthosting.co.za> |
| 2018年10月04日 20:17:40 | nascheme | link | issue34801 messages |
| 2018年10月04日 20:17:40 | nascheme | create |
|