After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 784894 - URI Escaping does not follow RFC 3986
URI Escaping does not follow RFC 3986
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2017-07-13 09:31 UTC by idra
Modified: 2021-07-05 13:25 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Add RFC3986 compatibe URI Escape function (8.79 KB, patch)
2017-07-13 09:31 UTC, idra
none Details | Review
Small Python script illustrating the character classes (1.99 KB, text/plain)
2017-08-22 14:02 UTC, John Dennis
  Details

Description idra 2017-07-13 09:31:56 UTC
Created attachment 355487 [details] [review]
Add RFC3986 compatibe URI Escape function

URI Escaping still follows the obsolete RFC2396, the main issue is that it does not escape chracters that have been since moved into the reserved chracters set causing interop issues in some dependent softare as evidenced here:
https://bugzilla.redhat.com/show_bug.cgi?id=1458237

Attached find a draft patch (compiles not tested yet), that adds support for RFC3986 conformant escaping. I added it as a separate function to avoid breaking applications that may depend on the old escaping for interoperability.
Comment 1 Nick Wellnhofer 2017-07-13 12:05:35 UTC
I'd suggest to simply change xmlURIEscapeStr to use ISA_UNRESERVED instead of IS_UNRESERVED. It seems that the ISA_* macros are for RFC3986 and the IS_* macros for RFC2396.

This will make xmlURIEscapeStr escape the characters !*'() unless overridden by the 'list' argument.
Comment 2 John Dennis 2017-08-21 23:57:29 UTC
The original bug report was erroneous, RFC 3986 mentions the reserved character set in Section 2.2, but that does not tell you what characters must be escaped because what needs to be escaped depends upon the URI component. The only way to know the escaping rules for a specific part of a URI, you have to read the "Collected ABNF for URI" in Appendix A.

But the current libxml2 API does not provide a public entry point that allows you to specify the URI component you need to escape.

I think the best you can do with the existing API is to call xmlURIEscapeStr() with a non-NULL second parameter consisting of the characters not to escape in addition to the characters it won't escape. But what characters are those for specific components of a URI? Well, it's pretty hard to figure out without looking at the source, even then it's not easy.

Maybe the bug report really needs to be "libxml2 does not provide an API to escape component specific parts of a URI according to RFC-3986.
Comment 3 John Dennis 2017-08-22 14:02:02 UTC
Created attachment 358149 [details]
Small Python script illustrating the character classes

I found it difficult to evaluate exactly what characters were subject to escape in the the various RFC's, what libxml2 implements and what the differences were. At least I found it difficult to do without the inevitable human error that occurs when reading specs and code.

The little Python script builds "sets" of characters and allows you to perform set operations on them (e.g. union, intersection, difference).

It's also the only way I was confident I could come up with the right set of exceptions to pass in the 2nd parameter of xmlURIEscapeStr().
Comment 4 GNOME Infrastructure Team 2021-07-05 13:25:51 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.