GNOME Bugzilla – Bug 357992
HTMLParser misses HTML_PARSE_NOCDATA option
Last modified: 2021-07-05 13:26:28 UTC
The XML parser supports an XML_PARSE_NOCDATA option that prevents CDATA nodes from appearing in the tree. This simplifies the tree structure and text handling quite a bit. The problem is that the HTML parser does not support this and, even worse, it happily generates CDATA nodes for style and script tags that were not in the original document. This can be prevented "by hand" by setting the "sax.cdataBlock" function to NULL (the libxml2 code handles this correctly). However, a parser option would make this much cleaner.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.