Message155579
| Author |
eli.bendersky |
| Recipients |
Arfrever, effbot, eli.bendersky, flox, philthompson10, scoder |
| Date |
2012年03月13日.08:24:19 |
| SpamBayes Score |
6.7723605e-15 |
| Marked as misclassified |
No |
| Message-id |
<1331627060.08.0.8066115602.issue14246@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Stefan,
Thanks a lot for taking the time to review the patch. As you correctly say, the current pathch's goal is just to align with existing behavior in the Python implementation of ET.
I understand the problem you are describing, but at least it's not a regression vs. previous behavior, while the original problem this issue complains about *is* a regression.
I propose to commit this to fix the regression and open a separate issue with the insight you provided. One easy solution could be to just require the encoding to be UTF-8 when passing unicode to the module, and to document it explicitly. Another solution would be to actually fix it in the module itself.
If there is a decision to fix it, the fix should then cover both the C and Python implementations, in all possible places (all functions reading XML from strings will also suffer from the same problem, since they get passed to xmlparse_Parse in pyexpat, which just uses PyArg_ParseTuple with the "s#" format - encoding unicode in utf-8 without looking at the XML encoding itself). |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2012年03月13日 08:24:20 | eli.bendersky | set | recipients:
+ eli.bendersky, effbot, philthompson10, scoder, Arfrever, flox |
| 2012年03月13日 08:24:20 | eli.bendersky | set | messageid: <1331627060.08.0.8066115602.issue14246@psf.upfronthosting.co.za> |
| 2012年03月13日 08:24:19 | eli.bendersky | link | issue14246 messages |
| 2012年03月13日 08:24:19 | eli.bendersky | create |
|