homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eli.bendersky
Recipients Arfrever, effbot, eli.bendersky, flox, philthompson10, scoder
Date 2012年03月13日.08:24:19
SpamBayes Score 6.7723605e-15
Marked as misclassified No
Message-id <1331627060.08.0.8066115602.issue14246@psf.upfronthosting.co.za>
In-reply-to
Content
Stefan,
Thanks a lot for taking the time to review the patch. As you correctly say, the current pathch's goal is just to align with existing behavior in the Python implementation of ET.
I understand the problem you are describing, but at least it's not a regression vs. previous behavior, while the original problem this issue complains about *is* a regression.
I propose to commit this to fix the regression and open a separate issue with the insight you provided. One easy solution could be to just require the encoding to be UTF-8 when passing unicode to the module, and to document it explicitly. Another solution would be to actually fix it in the module itself.
If there is a decision to fix it, the fix should then cover both the C and Python implementations, in all possible places (all functions reading XML from strings will also suffer from the same problem, since they get passed to xmlparse_Parse in pyexpat, which just uses PyArg_ParseTuple with the "s#" format - encoding unicode in utf-8 without looking at the XML encoding itself).
History
Date User Action Args
2012年03月13日 08:24:20eli.benderskysetrecipients: + eli.bendersky, effbot, philthompson10, scoder, Arfrever, flox
2012年03月13日 08:24:20eli.benderskysetmessageid: <1331627060.08.0.8066115602.issue14246@psf.upfronthosting.co.za>
2012年03月13日 08:24:19eli.benderskylinkissue14246 messages
2012年03月13日 08:24:19eli.benderskycreate

AltStyle によって変換されたページ (->オリジナル) /