GNOME Bugzilla – Bug 708621
append_form_encoded encode all non alphanum including valid dot, dash, unerscore and tilde
Last modified: 2013-09-24 14:54:00 UTC
Created attachment 255560 [details] [review] append_form_encoded: replace custom uri encode by g_uri_escape_string This breaks facebook connect via empathy and gnome online accounts (even though the latter only complains in debug mode that it cannot refresh the credentials). append_form_encoded: replace custom uri encode by g_uri_escape_string. Previous detection of characters to mangle bypassed only for alphanum. g_uri_escape_string also preserve dot, dash, underscore and tilde. Fixes facebook authentication , ie : v=1.0 was mangled to: v=1%2E0 before base64 encoding and sent to the facebook api (ending up in a generic not-authorized response from facebook).
Comment on attachment 255560 [details] [review] append_form_encoded: replace custom uri encode by g_uri_escape_string It's not standard URI encoding though; forms have slightly-different rules. In particular, ' ' must be encoded to '+', not "%20". The current code matches what the HTML 4.01 spec says to do (http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1), though I'm not even a teeny bit surprised that some sites depend on the browser *not* doing this. HTML5 has slightly different rules (http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#application/x-www-form-urlencoded-encoding-algorithm), which also allow "*", "-", ".", and "_" to appear unencoded as well, so I guess we should do that.
in light of your comment I believe the issue might be in the user (empathy sasl code ends up calling this function to url encode then base64 the "querystring". Thus it looks like misuse of a function that works the way it should. I still need to check anew the api and empathy .
https://developers.facebook.com/docs/chat/ provides an example in php. The query is build with php http_build_query. Regarding the php background of facebook I believe they kept http_build_query definition of urlencoded. http://git.php.net/?p=php-src.git;a=blob;f=ext/standard/http.c#l210 -> http://git.php.net/?p=php-src.git;a=blob;f=ext/standard/http.c#l28 with enc_type TSRMLS_CC thus calls into http://git.php.net/?p=php-src.git;a=blob;f=ext/standard/url.c#l488 : citation : /* Allow only alphanumeric chars and '_', '-', '.'; escape the rest */ from http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1 : Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A'). thuse this only apply to space and reserved characters. http://www.ietf.org/rfc/rfc1738.txt section "Reserved" . The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme. (...) Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
well, it says "reserved", but then immediately afterward says "Non-alphanumeric characters are replaced by '%HH'", implying that they meant that all non-alphanumeric characters are considered reserved for their purposes. (It can't mean "encode *only* the characters from the 'reserved' production in RFC 1738" since that would mean that "+" didn't get encoded, which would make it indistinguishable from " " in the output.) At any rate, the html5 rule is unambiguous (and is basically what php does).
Created attachment 255595 [details] [review] append_form_encoded: sync urlencode with php RFC1738 implementation. Attached is a patched based on php understanding of rfc1738 aka what facebook expected (I tested this new version , works well too. Both versions did not break either jabber, gtalk nor messenger.live.com via telepathy-gabble. PS: sorry not default enctype TSRMLS_CC, but default enctype PHP_QUERY_RFC1738 for http_build_query.
pushed to master. it will eventually be in 2.44.1. Thanks for the patch.