After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 786812 - CDATA entries polluting HtmlDocument dumps
CDATA entries polluting HtmlDocument dumps
Status: RESOLVED FIXED
Product: gxml
Classification: Other
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: GXml maintainer(s)
GXml maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2017-08-25 18:27 UTC by Martin Abente Lahaye
Modified: 2017-09-01 21:39 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Martin Abente Lahaye 2017-08-25 18:27:42 UTC
With gxml from master (534ef078), when loading HTML content with HtmlDocument, editing and then dumping the document back, the resulting content includes CDATA entries that break the content (e.g, if you try to open it in a browser). I would expect not having these CDATA entries in the dump.

This is a quick example to reproduce it (using Gjs):

== GXML CODE ==

#!/usr/bin/gjs
const GXml = imports.gi.GXml;

let html = '\
<!doctype html>\
<html>\
<head>\
  <style>\
  * { color: red; }\
  </style>\
</head>\
<body>\
  <script type="text/javascript">\
  </script>\
</body>\
</html>\
';

let document = GXml.HtmlDocument.from_string(html, 0);
print(document.to_string());

== GXML OUTPUT ==

<?xml version="1.0"?>
<html><head>  <style><![CDATA[  * { color: red; }  ]]></style></head><body>  <script type="text/javascript"><![CDATA[  ]]></script></body></html>

== LIBXML EXAMPLE ==

In case it is useful, I used libxml directly, to see what happens:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/HTMLparser.h>
#include <libxml/HTMLtree.h>

char *html = "\
<!doctype html>\
<html>\
<head>\
  <style>\
  * { color: red; }\
  </style>\
</head>\
<body>\
  <script type=\"text/javascript\">\
  </script>\
</body>\
</html>\
";

int main(int argc, char **argv)
{
    htmlDocPtr doc;
    xmlChar *result = NULL;
    int len = 0;

    doc = htmlReadMemory (html, strlen (html), "", NULL, 0);
    htmlDocDumpMemory (doc, &result, &len);
    printf ("%s", result);

    xmlFreeDoc(doc);
    return 0;
}

=== LIBXML OUTPUT ===

<!DOCTYPE html>
<html>
<head>  <style>  * { color: red; }  </style>
</head>
<body>  <script type="text/javascript">  </script>
</body>
</html>
Comment 1 Daniel Espinosa 2017-09-01 21:39:50 UTC
This bug shoud be fixed upstream at:

https://git.gnome.org/browse/gxml/commit/?id=9960fafdfb373578ab2f43f0c552c96b6ce3b1bb