This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月07日 06:38 by dcallagh, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Messages (4) | |||
|---|---|---|---|
| msg169974 - (view) | Author: Dan Callaghan (dcallagh) | Date: 2012年09月07日 06:38 | |
Python 2.7.3 (default, Jul 24 2012, 10:05:38)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> c = u'\u65e5\u672c\u8a9e'
>>> import xml.dom.minidom
Encoded as UTF-8, everything is fine:
>>> xml.dom.minidom.parseString('<?xml version="1.0" encoding="UTF-8" ?><x>%s</x>' % c.encode('UTF-8'))
<xml.dom.minidom.Document instance at 0x7f310d27dcf8>
but not ISO-2022-JP:
>>> xml.dom.minidom.parseString('<?xml version="1.0" encoding="ISO-2022-JP" ?><x>%s</x>' % c.encode('ISO-2022-JP'))
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/usr/lib64/python2.7/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString
return expatbuilder.parseString(string)
File "/usr/lib64/python2.7/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString
return builder.parseString(string)
File "/usr/lib64/python2.7/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 48
lxml can handle it fine though:
>>> import lxml.etree
>>> lxml.etree.fromstring('<?xml version="1.0" encoding="ISO-2022-JP" ?><x>%s</x>' % c.encode('ISO-2022-JP'))
<Element x at 0x7f310d284960>
>>> _.text == c
True
|
|||
| msg169982 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年09月07日 09:22 | |
This is similar to issue13612: pyexpat does not support multibytes encodings. |
|||
| msg377715 - (view) | Author: Irit Katriel (iritkatriel) * (Python committer) | Date: 2020年09月30日 18:06 | |
I don't see this problem on 3.10. Is this still an issue or can this issue be closed?
Running Release|Win32 interpreter...
Python 3.10.0a0 (heads/bpo17490-dirty:00eb063b66, Sep 27 2020, 13:20:24) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> c = u'\u65e5\u672c\u8a9e'
>>> import xml.dom.minidom
>>> xml.dom.minidom.parseString('<?xml version="1.0" encoding="UTF-8" ?><x>%s</x>' % c.encode('UTF-8'))
<xml.dom.minidom.Document object at 0x015FC9E8>
>>> xml.dom.minidom.parseString('<?xml version="1.0" encoding="ISO-2022-JP" ?><x>%s</x>' % c.encode('ISO-2022-JP'))
<xml.dom.minidom.Document object at 0x01493208>
>>>
|
|||
| msg378996 - (view) | Author: Irit Katriel (iritkatriel) * (Python committer) | Date: 2020年10月19日 19:27 | |
Closing - this now works for me on Python 3.8 and 3.10. It was fixed sometime in the last 8 years. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:35 | admin | set | github: 60081 |
| 2020年10月19日 19:27:05 | iritkatriel | set | status: open -> closed resolution: works for me messages: + msg378996 stage: resolved |
| 2020年09月30日 18:06:32 | iritkatriel | set | nosy:
+ iritkatriel messages: + msg377715 |
| 2012年09月07日 09:22:32 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg169982 |
| 2012年09月07日 06:38:03 | dcallagh | create | |