Message 155579 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	eli.bendersky
Recipients	Arfrever, effbot, eli.bendersky, flox, philthompson10, scoder
Date	2012年03月13日.08:24:19
SpamBayes Score	6.7723605e-15
Marked as misclassified	No
Message-id	<1331627060.08.0.8066115602.issue14246@psf.upfronthosting.co.za>

Content
Stefan, Thanks a lot for taking the time to review the patch. As you correctly say, the current pathch's goal is just to align with existing behavior in the Python implementation of ET. I understand the problem you are describing, but at least it's not a regression vs. previous behavior, while the original problem this issue complains about is a regression. I propose to commit this to fix the regression and open a separate issue with the insight you provided. One easy solution could be to just require the encoding to be UTF-8 when passing unicode to the module, and to document it explicitly. Another solution would be to actually fix it in the module itself. If there is a decision to fix it, the fix should then cover both the C and Python implementations, in all possible places (all functions reading XML from strings will also suffer from the same problem, since they get passed to xmlparse_Parse in pyexpat, which just uses PyArg_ParseTuple with the "s#" format - encoding unicode in utf-8 without looking at the XML encoding itself).

Content

Stefan,
Thanks a lot for taking the time to review the patch. As you correctly say, the current pathch's goal is just to align with existing behavior in the Python implementation of ET.
I understand the problem you are describing, but at least it's not a regression vs. previous behavior, while the original problem this issue complains about *is* a regression.
I propose to commit this to fix the regression and open a separate issue with the insight you provided. One easy solution could be to just require the encoding to be UTF-8 when passing unicode to the module, and to document it explicitly. Another solution would be to actually fix it in the module itself.
If there is a decision to fix it, the fix should then cover both the C and Python implementations, in all possible places (all functions reading XML from strings will also suffer from the same problem, since they get passed to xmlparse_Parse in pyexpat, which just uses PyArg_ParseTuple with the "s#" format - encoding unicode in utf-8 without looking at the XML encoding itself).

History
Date	User	Action	Args
2012年03月13日 08:24:20	eli.bendersky	set	recipients: + eli.bendersky, effbot, philthompson10, scoder, Arfrever, flox
2012年03月13日 08:24:20	eli.bendersky	set	messageid: <1331627060.08.0.8066115602.issue14246@psf.upfronthosting.co.za>
2012年03月13日 08:24:19	eli.bendersky	link	issue14246 messages
2012年03月13日 08:24:19	eli.bendersky	create

homepage