This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008年10月19日 15:32 by thomaslee, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| minidom-toprettyxml-01.patch | thomaslee, 2008年10月19日 15:32 | A patch implementing the proposed changes. | ||
| minidom-Text-toprettyxml.patch | danken, 2011年10月01日 18:21 | toprettyxml() should conserve Text.data | review | |
| minidom-Text-toprettyxml-02.patch | danken, 2011年10月02日 12:03 | preserve only <elem>Text</elem> nodes. | review | |
| issue4147.diff | ezio.melotti, 2011年11月15日 10:21 | patch against 2.7 | ||
| Messages (21) | |||
|---|---|---|---|
| msg74978 - (view) | Author: Thomas Lee (thomaslee) (Python committer) | Date: 2008年10月19日 15:32 | |
For XML elements containing only text data, it would be nice if toprettyxml could omit the whitespace it normally injects before & after the text, e.g. <person> <first-name> Bob </first-name> </person> Becomes: <person> <first-name>Bob</first-name> </person> From what I understand the handling of whitespace within XML elements is application-defined, so I'm classifying this as a nice-to-have feature. However it should be noted that in our particular case, the existing behaviour caused a few problems with a third-party system which treated whitespace as being significant. |
|||
| msg90546 - (view) | Author: Jim Garrison (jgarrison) | Date: 2009年07月15日 22:22 | |
Also needed here. While pretty-printing should be able to insert non-significant whitespace BETWEEN xml elements, it should never alter the content of (i.e. insert whitespace into) existing text elements. |
|||
| msg90574 - (view) | Author: Jim Garrison (jgarrison) | Date: 2009年07月16日 16:55 | |
To clarify: ... it should never alter the content of (i.e. insert whitespace into) existing text elements that contain non-whitespace characters. |
|||
| msg102247 - (view) | Author: Michel Samia (m-samia) | Date: 2010年04月03日 12:11 | |
Could you please apply that patch? People are starting to use non-standard libraries to process xml files because of this issue for example this man is using lxml or pyxml: http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/ Is there any problem with that patch? |
|||
| msg111604 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2010年07月26日 12:05 | |
@Thomas: could you provide a unit test to go with your patch. |
|||
| msg122289 - (view) | Author: anatoly techtonik (techtonik) | Date: 2010年11月24日 16:50 | |
This one is bug. |
|||
| msg144745 - (view) | Author: Dan Kenigsberg (danken) | Date: 2011年10月01日 18:21 | |
Here's another take on fixing this bug, with an accompanying unit test. Personally, I'm monkey-patching xml.dom.minidom in order to avoid it, but please consider fixing it properly upstream. |
|||
| msg144746 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年10月01日 20:49 | |
New changeset 086ca132e161 by R David Murray in branch '3.2': #4147: minidom's toprettyxml no longer adds whitespace to text nodes. http://hg.python.org/cpython/rev/086ca132e161 New changeset fa0b1e50270f by R David Murray in branch 'default': merge #4147: minidom's toprettyxml no longer adds whitespace to text nodes. http://hg.python.org/cpython/rev/fa0b1e50270f |
|||
| msg144747 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年10月01日 20:49 | |
New changeset 406c5b69cb1b by R David Murray in branch '2.7': #4147: minidom's toprettyxml no longer adds whitespace to text nodes. http://hg.python.org/cpython/rev/406c5b69cb1b |
|||
| msg144748 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年10月01日 20:53 | |
This looks correct to me, and it tested out fine on the test suite (and the provided test failed without the provided fix), so I committed it. I have a small concern that the change in output might be a bit radical for a bug fix release, but it does seem to me that this is clearly a bug. If people think it shouldn't go in the bug fix releases let me know and I'll back it out. Thanks for the patch, Dan. |
|||
| msg144755 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年10月02日 01:14 | |
The patch seems wrong to me:
>>> d = minidom.parseString('<foo><bar>AAA</bar>BBB<bar>CCC</bar></foo>')
>>> print(d.toprettyxml())
<?xml version="1.0" ?>
<foo>
<bar>AAA </bar>
BBB <bar>CCC </bar>
</foo>
Even if the newlines are gone, the indentation before the closing tag is preserved. Also a newline is added before the text node BBB.
It would be good to check what the XML standard says about the whitespace. I'm pretty sure HTML has well defined rules about it, but I don't know if that's the same for XML.
FWIW the link in msg102247 contains a different fix (not sure if it's any better), and also a link to an article about XML and whitespace: http://www.oracle.com/technetwork/articles/wang-whitespace-092897.html (the link seems broken in the page).
|
|||
| msg144770 - (view) | Author: Dan Kenigsberg (danken) | Date: 2011年10月02日 12:03 | |
Oh dear. Thanks, Enzio, for pointing out that former patch is wrong. It is also quite naive, since the whole NATURE of toprettyprint() is to add whitespace to Text nodes. Tomas Lee's http://bugs.python.org/file11832/minidom-toprettyxml-01.patch made an effort to touch only "simple" Text nodes, that are confined within a single <elem></elem>. I did not expect http://bugs.python.org/file23286/minidom-Text-toprettyxml.patch to get in so quickly, after the former one spent several years on queue. However now is time to fix it, possible by my second patch. |
|||
| msg144771 - (view) | Author: Dan Kenigsberg (danken) | Date: 2011年10月02日 12:57 | |
btw, http://www.w3.org/TR/xml/#sec-white-space is a bit vague on how should a parser deal with whitespace, and seems to allow non-preservation of text nodes. Preserving "simple" text nodes is allowed, too, and is more polite to applications reading the prettyxml. |
|||
| msg144778 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年10月02日 16:55 | |
Heh, you happened to post your patch at a time when I wanted something to do as a break from something I didn't want to do...and I *thought* I understood the problem, after reading the various links. But clearly I didn't. We don't have someone who has stepped forward to be xml maintainer, so I just went ahead and committed it. I should find time to look at your new patch some time today, or perhaps Ezio will have time. (Clearly minidom doesn't have enough tests.) |
|||
| msg147655 - (view) | Author: anatoly techtonik (techtonik) | Date: 2011年11月15日 07:40 | |
I would revert this patch (leaving several test cases though) until a proper fix is available. |
|||
| msg147659 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年11月15日 08:40 | |
Yeah, I just haven't found time to do the revert yet (my first naive attempt using hg commands failed and I haven't found time to figure it out or do the reverse-patch method). |
|||
| msg147662 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年11月15日 10:21 | |
Here is a new patch based on Dan's last patch. Correct me if I'm wrong, but it seems to me that it's not possible for a node to have only text-nodes as children and yet have more than one child; i.e. you can't have two or more adjacent text nodes, because they would be considered as a single text node. I therefore changed the check with all() to check if there's only a child and if it's a text node. I also added a test that checks where the \n and \t are added, because testing only that the DOM is preserved is not enough. |
|||
| msg147685 - (view) | Author: Dan Kenigsberg (danken) | Date: 2011年11月15日 15:39 | |
Technically, adjacent Text nodes are not illegal, but preserving this oddity in pretty-print is impossible. It is perfectly fine to pretty-print only the simple case of len()==1. |
|||
| msg147848 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年11月18日 10:47 | |
I did some tests, creating an element ('elem') that contains two adjacent text nodes ('text'). With my latest patch the prettyprint is:
<?xml version="1.0" ?>
<elem>
text
text
</elem>
Here both the text nodes are printed on a newline and they are indented.
With your patch it should be:
<?xml version="1.0" ?>
<elem>texttext</elem>
I'm not sure there's any reason to prefer the second option though.
|
|||
| msg147879 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年11月18日 15:36 | |
New changeset 7262f8f276ff by Ezio Melotti in branch '2.7': #4147: minidom's toprettyxml no longer adds whitespace around a text node when it is the only child of an element. Initial patch by Dan Kenigsberg. http://hg.python.org/cpython/rev/7262f8f276ff New changeset 5401daa96a21 by Ezio Melotti in branch '3.2': #4147: minidom's toprettyxml no longer adds whitespace around a text node when it is the only child of an element. Initial patch by Dan Kenigsberg. http://hg.python.org/cpython/rev/5401daa96a21 New changeset cb6614e3438b by Ezio Melotti in branch 'default': #4147: merge with 3.2. http://hg.python.org/cpython/rev/cb6614e3438b |
|||
| msg147880 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年11月18日 15:39 | |
I committed my patch with a few more tests. This should be fixed now. Thanks for the report and the patch! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:40 | admin | set | github: 48397 |
| 2011年11月18日 15:39:13 | ezio.melotti | set | status: open -> closed resolution: fixed messages: + msg147880 stage: commit review -> resolved |
| 2011年11月18日 15:36:32 | python-dev | set | messages: + msg147879 |
| 2011年11月18日 10:47:05 | ezio.melotti | set | messages: + msg147848 |
| 2011年11月15日 15:39:46 | danken | set | messages: + msg147685 |
| 2011年11月15日 10:21:32 | ezio.melotti | set | files:
+ issue4147.diff messages: + msg147662 stage: test needed -> commit review |
| 2011年11月15日 10:05:53 | pefu | set | nosy:
+ pefu |
| 2011年11月15日 08:41:45 | ezio.melotti | set | priority: normal -> release blocker assignee: ezio.melotti nosy: + benjamin.peterson, georg.brandl |
| 2011年11月15日 08:40:12 | r.david.murray | set | messages: + msg147659 |
| 2011年11月15日 07:40:17 | techtonik | set | messages: + msg147655 |
| 2011年11月15日 00:22:25 | Arfrever | set | nosy:
+ Arfrever |
| 2011年10月02日 16:55:13 | r.david.murray | set | messages: + msg144778 |
| 2011年10月02日 12:57:59 | danken | set | messages: + msg144771 |
| 2011年10月02日 12:03:18 | danken | set | files:
+ minidom-Text-toprettyxml-02.patch messages: + msg144770 |
| 2011年10月02日 01:14:16 | ezio.melotti | set | status: closed -> open messages: + msg144755 nosy: + ezio.melotti, - BreamoreBoy stage: resolved -> test needed |
| 2011年10月01日 20:53:00 | r.david.murray | set | status: open -> closed type: enhancement -> behavior versions: + Python 2.7, Python 3.3 nosy: + r.david.murray messages: + msg144748 stage: test needed -> resolved |
| 2011年10月01日 20:49:38 | python-dev | set | messages: + msg144747 |
| 2011年10月01日 20:49:14 | python-dev | set | nosy:
+ python-dev messages: + msg144746 |
| 2011年10月01日 18:21:03 | danken | set | files:
+ minidom-Text-toprettyxml.patch messages: + msg144745 |
| 2010年11月24日 16:50:50 | techtonik | set | messages: + msg122289 |
| 2010年11月24日 16:47:25 | techtonik | set | nosy:
+ techtonik |
| 2010年11月03日 13:26:02 | tod | set | nosy:
+ tod |
| 2010年07月26日 12:05:28 | BreamoreBoy | set | versions:
+ Python 3.2, - Python 2.7 nosy: + BreamoreBoy messages: + msg111604 stage: test needed |
| 2010年04月25日 10:24:44 | danken | set | nosy:
+ danken |
| 2010年04月03日 12:11:36 | m-samia | set | nosy:
+ m-samia messages: + msg102247 |
| 2009年07月16日 16:55:00 | jgarrison | set | messages: + msg90574 |
| 2009年07月15日 22:22:55 | jgarrison | set | nosy:
+ jgarrison messages: + msg90546 |
| 2008年10月19日 15:32:24 | thomaslee | create | |