After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 769760 - Attributes containing server side includes are not "round tripped" properly
Attributes containing server side includes are not "round tripped" properly
Status: RESOLVED FIXED
Product: libxml2
Classification: Platform
Component: general
git master
Other Mac OS
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2016-08-11 18:29 UTC by Patrick Toomey
Modified: 2020-08-15 16:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to replicate Xerces behavior (935 bytes, application/mbox)
2016-08-11 18:29 UTC, Patrick Toomey
Details

Description Patrick Toomey 2016-08-11 18:29:09 UTC
Created attachment 333131 [details]
Patch to replicate Xerces behavior

Assume we have a fragment of html/xml with the following content. 

<html><body><p><a href='&lt;!--"&gt;&lt;script&gt;alert(1)&lt;/script&gt;-->'>test1</a></p></body></html>

Notice that the href attribute uses single quotes and the quoted value contains an escaped server side include (<!-- ... -->). If one round trips the above (parses it and then prints it out again as html/xml) you get the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<p>
<a href="<!--"><script>alert(1)</script>-->">test1</a>
</p>
</body>
</html>

The resulting output is semantically different than the input. The href attribute quoting is changed to use double quotes, and the original double quote found in the attribute is not escaped. As a result, the attribute value is closed early and the remaining data is inserted into the document unescaped. 

Performing the same round trip using Xerces results in the following:

<html>
<body>
<p>
<a href="<!--%22><script>alert(1)</script>-->">test1</a>
</p>
</body>
</html>

In this case, the href attribute quoting is also changed to use double quotes. But, the double quote found inside of the attribute is escaped, so as to preserve semantics of the original attribute value. 

I've attached a proof of concept patch that a colleague of mine pulled together to replicate the Xerces behavior (at least for a basic test case). It is not clear if a more thorough patch would be required to handle other cases.
Comment 1 Mike Dalessio 2018-03-19 20:32:44 UTC
Hi all,

I'd like to bump this issue, as I've just published medium-severity CVE-2018-8048 for the Loofah sanitization library due to the behavior mentioned in this thread:

https://github.com/flavorjones/loofah/issues/144

I'm providing a C reproduction of the problem which I believe makes a case for this "server-side include" behavior to be considered an inconsistent feature in libxml2:

https://gist.github.com/flavorjones/129b004957484dfced1cd21be3c64c06

That gist provides C code which demonstrates that libxml2 can be input a specially-crafted HTML fragment which, when serialized and re-parsed by libxml2, will contain element attributes that did not exist in the original document structure.

I'd appreciate if the core maintainers would take a look at the code in that gist, and be open to a conversation about how this is unexpected and inconsistent behavior at best, and introduces a vector for XSS injection at worst.

Thanks for your consideration,
-mike
Comment 2 Mike Dalessio 2018-03-20 20:29:54 UTC
I'd like to also point readers to another CVE issued by another sanitization library:

* CVE-2018-3740
* https://github.com/rgrove/sanitize/issues/176

I'd like to ask the libxml2 maintainers again to consider addressing this issue.
Comment 3 Mike Dalessio 2018-03-23 01:49:23 UTC
A third CVE has been issued for behavior due to this commit in a downstream HTML sanitization library:

* CVE-2018-3741
* https://groups.google.com/d/msg/rubyonrails-security/tP7W3kLc5u4/uDy2Br7xBgAJ

Three different owners of three different downstream libraries focused on security have declared CVEs because of vulnerabilities in their software introduced by this behavior.

Can I ask again that the libxml2 maintainers consider addressing this issue?
Comment 4 Nick Wellnhofer 2020-08-15 16:58:04 UTC
Should be fixed by removing the offending commit from 2013 for various reasons:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/c1ba6f54d32b707ca6d91cb3257ce9de82876b6f

That said, it's still a bad idea to use a 20+ years old, unmaintained HTML 4 parser to sanitize input for the modern web.