GNOME Bugzilla – Bug 786812
CDATA entries polluting HtmlDocument dumps
Last modified: 2017-09-01 21:39:50 UTC
With gxml from master (534ef078), when loading HTML content with HtmlDocument, editing and then dumping the document back, the resulting content includes CDATA entries that break the content (e.g, if you try to open it in a browser). I would expect not having these CDATA entries in the dump. This is a quick example to reproduce it (using Gjs): == GXML CODE == #!/usr/bin/gjs const GXml = imports.gi.GXml; let html = '\ <!doctype html>\ <html>\ <head>\ <style>\ * { color: red; }\ </style>\ </head>\ <body>\ <script type="text/javascript">\ </script>\ </body>\ </html>\ '; let document = GXml.HtmlDocument.from_string(html, 0); print(document.to_string()); == GXML OUTPUT == <?xml version="1.0"?> <html><head> <style><![CDATA[ * { color: red; } ]]></style></head><body> <script type="text/javascript"><![CDATA[ ]]></script></body></html> == LIBXML EXAMPLE == In case it is useful, I used libxml directly, to see what happens: #include <stdio.h> #include <string.h> #include <stdlib.h> #include <libxml/HTMLparser.h> #include <libxml/HTMLtree.h> char *html = "\ <!doctype html>\ <html>\ <head>\ <style>\ * { color: red; }\ </style>\ </head>\ <body>\ <script type=\"text/javascript\">\ </script>\ </body>\ </html>\ "; int main(int argc, char **argv) { htmlDocPtr doc; xmlChar *result = NULL; int len = 0; doc = htmlReadMemory (html, strlen (html), "", NULL, 0); htmlDocDumpMemory (doc, &result, &len); printf ("%s", result); xmlFreeDoc(doc); return 0; } === LIBXML OUTPUT === <!DOCTYPE html> <html> <head> <style> * { color: red; } </style> </head> <body> <script type="text/javascript"> </script> </body> </html>
This bug shoud be fixed upstream at: https://git.gnome.org/browse/gxml/commit/?id=9960fafdfb373578ab2f43f0c552c96b6ce3b1bb