After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 708621 - append_form_encoded encode all non alphanum including valid dot, dash, unerscore and tilde
append_form_encoded encode all non alphanum including valid dot, dash, unersc...
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: Misc
unspecified
Other Linux
: Normal critical
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2013-09-23 13:21 UTC by Alban Browaeys
Modified: 2013-09-24 14:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
append_form_encoded: replace custom uri encode by g_uri_escape_string (1.29 KB, patch)
2013-09-23 13:21 UTC, Alban Browaeys
needs-work Details | Review
append_form_encoded: sync urlencode with php RFC1738 implementation. (917 bytes, patch)
2013-09-23 21:16 UTC, Alban Browaeys
committed Details | Review

Description Alban Browaeys 2013-09-23 13:21:54 UTC
Created attachment 255560 [details] [review]
append_form_encoded: replace custom uri encode by  g_uri_escape_string

This breaks facebook connect via empathy and gnome online accounts (even though the latter only complains in debug mode that it cannot refresh the credentials).

append_form_encoded: replace custom uri encode by
 g_uri_escape_string.

Previous detection of characters to mangle bypassed only for alphanum.
g_uri_escape_string also preserve dot, dash, underscore and tilde.
Fixes facebook authentication , ie :
v=1.0
was mangled to:
v=1%2E0
before base64 encoding and sent to the facebook api (ending up in a
generic not-authorized response from facebook).
Comment 1 Dan Winship 2013-09-23 14:10:35 UTC
Comment on attachment 255560 [details] [review]
append_form_encoded: replace custom uri encode by  g_uri_escape_string

It's not standard URI encoding though; forms have slightly-different rules. In particular, ' ' must be encoded to '+', not "%20".

The current code matches what the HTML 4.01 spec says to do (http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1), though I'm not even a teeny bit surprised that some sites depend on the browser *not* doing this. HTML5 has slightly different rules (http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#application/x-www-form-urlencoded-encoding-algorithm), which also allow "*", "-", ".", and "_" to appear unencoded as well, so I guess we should do that.
Comment 2 Alban Browaeys 2013-09-23 18:22:05 UTC
in light of your comment I believe the issue might be in the user (empathy sasl code ends up calling this function to url encode then base64 the "querystring".
Thus it looks like misuse of a function that works the way it should. I still need to check anew the api and empathy .
Comment 3 Alban Browaeys 2013-09-23 20:49:26 UTC
https://developers.facebook.com/docs/chat/ provides an example in php. The query is build with php http_build_query. Regarding the php background  of facebook  I believe they kept http_build_query definition of urlencoded.
http://git.php.net/?p=php-src.git;a=blob;f=ext/standard/http.c#l210 -> http://git.php.net/?p=php-src.git;a=blob;f=ext/standard/http.c#l28 with enc_type TSRMLS_CC thus calls into http://git.php.net/?p=php-src.git;a=blob;f=ext/standard/url.c#l488 :
citation : /* Allow only alphanumeric chars and '_', '-', '.'; escape the rest */

from http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 :
Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').

thuse this only apply to space and reserved characters.

http://www.ietf.org/rfc/rfc1738.txt section "Reserved" .
   The characters ";",
   "/", "?", ":", "@", "=" and "&" are the characters which may be
   reserved for special meaning within a scheme. No other characters may
   be reserved within a scheme.
  (...)
   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.
Comment 4 Dan Winship 2013-09-23 21:14:28 UTC
well, it says "reserved", but then immediately afterward says "Non-alphanumeric characters are replaced by '%HH'", implying that they meant that all non-alphanumeric characters are considered reserved for their purposes. (It can't mean "encode *only* the characters from the 'reserved' production in RFC 1738" since that would mean that "+" didn't get encoded, which would make it indistinguishable from " " in the output.)

At any rate, the html5 rule is unambiguous (and is basically what php does).
Comment 5 Alban Browaeys 2013-09-23 21:16:08 UTC
Created attachment 255595 [details] [review]
append_form_encoded: sync urlencode with php RFC1738  implementation.

Attached is a patched based on php  understanding of rfc1738 aka what facebook expected (I tested this new version , works well too. Both versions did not break either jabber, gtalk nor messenger.live.com via telepathy-gabble.

PS: sorry not default enctype TSRMLS_CC, but default enctype PHP_QUERY_RFC1738 for http_build_query.
Comment 6 Dan Winship 2013-09-24 14:53:56 UTC
pushed to master. it will eventually be in 2.44.1. Thanks for the patch.