GNOME Bugzilla – Bug 508773
g_uri_escape_string() documentation unclear.
Last modified: 2008-01-15 15:58:23 UTC
g_uri_escape_string() takes "a string of reserved characters that are allowed to be used.", but it's not clear how this is used: http://library.gnome.org/devel/glib/unstable/glib-URI-Functions.html#g-uri-escape-string Is it a list of reserved characters that may _not_ be used (and will therefore be escaped) or is it a list of reserved characters (some subset of some well-known list of reserved characters) that will not be escaped?
Added: * Normally all characters that are not "unreserved" (i.e. ASCII alphanumerical * characters plus dash, dot, underscore and tilde) are escaped. * But if you specify characters in @reserved_chars_allowed they are not * escaped. This is useful for the "reserved" characters in the URI * specification, since those are allowed unescaped in some portions of * a URI.
Thanks. That's very helpful. g_uri_unescape_string() could use some similar clarification. For instance, it's not clear whether anything would happen if illegal_characters was NULL, or if that's again just modifying some standard set of characters to expect to be escaped.
Adding this: * If any of the characters in @illegal_characters or the character zero appears * as an escaped character in @escaped_string then that is an error and %NULL * will be returned. This is useful it you want to avoid for instance having a * slash being expanded in an escaped path element, which might confuse pathname * handling.
and: * @illegal_characters: an optional string of illegal characters not to be allowed.
Many thanks. I assume that reserved_chars_allowed and illegal_characters may be UTF-8? Sorry if this is getting annoying.
No, they are bytes. Uri escaping works on a byte basis, not a unicode character basis. There is no encoding specified, and in fact there might not be any (for example in an inline encoded data uri).