GNOME Bugzilla – Bug 523187
xmlSaveFormatFileEnc doesn't work if new nodes were added to an existing xmlDoc
Last modified: 2008-03-19 16:03:11 UTC
write_to_file_formatted doesn't work on new nodes added to a existing Document. Here are the source files and the result of the program: === program.cc === #include <iostream> #include <libxml++/libxml++.h> using namespace std; using namespace xmlpp; int main (void) { try { DomParser parser; parser.parse_file("test.xml"); if(parser) { Document *doc = parser.get_document(); Node *root = doc->get_root_node(); Element *tmp_elem = root->add_child("newnode00"); Node *tmp_node = dynamic_cast<Node*>(tmp_elem); tmp_elem = tmp_node->add_child("childnode01"); tmp_elem->set_child_text("text 0.1"); tmp_elem = tmp_node->add_child("childnode02"); tmp_elem->set_child_text("text 0.2"); doc->write_to_file_formatted("result.xml"); } else cout << "Error reading test.xml\n"; } catch(const std::exception &ex) { cout << "Exception caught: " << ex.what() << endl; } return 0; } === test.xml === <?xml version="1.0"?> <test> <testchild> <another/> </testchild> </test> === result.xml === <?xml version="1.0"?> <test> <testchild> <another/> </testchild> <newnode00><childnode01>text 0.1</childnode01><childnode02>text 0.2</childnode02></newnode00></test>
Created attachment 107545 [details] libxmlpp_formatted.cc Confirmed. And here is the test as an attachment to make life easier when re-testing.
Created attachment 107546 [details] test.xml The XML file used by the test program.
Created attachment 107547 [details] libxml_formatted.c Here is a C test case that seems to show that the problem is in libxml.
If libxml2 detects they might be some mixed content for a node it will refuse to add more nodes for 'formatting' as it can't guess if the text nodes present are here for formatting or as legit content. The test document has text children hence libxml2 disable 'formatting' in that subtree. All spaces in content are significant by default in XML. Years of SGML experience proved that no heuristic could work detecting 'significant' white spaces from 'indenting' white spaces, hence the POV of the XML spec and why libxml2 is just being careful. If you think you know better, just go though the document before calling libxml2 and remove all those 'not significant' text nodes. Libxml2 won't try this because a generation of markup hackers failed to find a proper algorithm, but if you have time to spend and are not afraid of breaking your users document, go for it ! From a libxml2 POV, not a bug, Daniel
That makes sense. Thanks, Daniel. There's no need to be nasty about it though.
Sorry if this sound nasty. Maybe it's because I'm seeing this pointed out more often than it should. Maybe it is possible to do, I'm afraid it's not in the general case. sometimes the default behaviour could be enhanced that's sure, for example if you have a DTD and you know the content model of the elements. But this is a risky business. There is also a cost issue, if you start doing a lot of analysis when saving just because someone switched a 'please indent' flag somewhere, this can have serious consequences in term of throughput while this could be handled in a deterministic way if the program had done the indentation while modifying the tree. Daniel
I guess there should theoretically be some way to specify indenting even with child text nodes, and maybe even a way to say that text should be wrapped at a line limit and indented, but that's not something I'm going to work on, of course. Thanks, Daniel.