GNOME Bugzilla – Bug 710004
URLs should be displayed as unescaped unicode
Last modified: 2015-07-28 15:15:48 UTC
Created attachment 257115 [details] Searching "main page" in japanese in Firefox and Epiphany. Every time a link has a character different than english is shown as hexadecimal code in epiphany inside the page and the address bar, which makes them unreadable to humans. And sometimes (when page title is different of link) the only way to search a page is by those characters, making it impossible to find them.
Created attachment 270550 [details] [review] ephy-title-box: Show correctly UTF-8 URL This fix only the url in the subtitle when it shown. If I do the same in the location entry, when the user edit an UTF-8 url it get a non-valid url.
Created attachment 270560 [details] [review] nautilus-floating-bar: Show correctly UTF-8 URL
*** This bug has been marked as a duplicate of bug 596399 ***
Created attachment 284557 [details] [review] ephy-title-box: Show correctly UTF-8 URL Rebase for master.
Created attachment 284561 [details] [review] ephy-title-box: Show correctly UTF-8 URL
Created attachment 286957 [details] [review] ephy-title-box: Show correctly UTF-8 URL
Created attachment 286958 [details] ephy-title-box: Show correctly UTF-8 URL
Created attachment 286959 [details] [review] ephy-title-box: Show correctly UTF-8 URL
Review of attachment 286959 [details] [review]: ::: src/ephy-title-box.c @@ +683,3 @@ EphyTitleBoxPrivate *priv; + EphyEmbedShellMode mode; + gchar *decode_uri; How about "decoded_uri" instead of "decode_uri" (decode is a verb!).
Created attachment 292448 [details] [review] ephy-title-box: Show correctly UTF-8 URL Both patches are simple and correct; I'm uploading a new version of the second patch because I want to rename the decode_uri variable in the second patch to unescaped_uri, which seems more sensible to me.
You are wrong. Try this url, for example, and see what do you see. https://he.wikipedia.org/wiki/%D7%99%D7%A9%D7%A8%D7%90%D7%9C#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 With the patch, I see in the title this label: https://he.wikipedia.org/wiki/ישראל#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 But I had to see this title: https://he.wikipedia.org/wiki/ישראל#כלכה (in English: https://en.wikipedia.org/wiki/Israel#Economy) Also, when I click on the title and try to edit the url in the location entry, I see the uncidoe url: https://he.wikipedia.org/wiki/%D7%99%D7%A9%D7%A8%D7%90%D7%9C#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 But I had to see this url: https://he.wikipedia.org/wiki/ישראל#כלכה (The same for the url in the statusbar)
So it looks like g_uri_unescape_string() can't handle the # character....
It look like we needs something else to handle this, but I don't know what or how. Firefox handle it correctly, so maybe we can to try looking in Firefox's code?
(In reply to comment #12) > So it looks like g_uri_unescape_string() can't handle the # character.... Er, just kidding. I went digging in that function, couldn't find anything wrong, was about to start looking into GtkLabel... would have been a huge waste of time, those URLs both display fine for me (with the attached patches). The first URL looks like it's displayed unescaped, but that's because the portion after the # was not escaped in the first place. So the patches are fine. The location entry still needs to be fixed.
Created attachment 292478 [details] proof
Created attachment 292479 [details] more proof -- it works
*** Bug 596399 has been marked as a duplicate of this bug. ***
Created attachment 292482 [details] [review] Properly display UTF-8 URLs in the location entry
Michael, for a moment I thought you are right, but infact it still dosn't work. Take this link: https://he.wikipedia.org/wiki/ישראל#כלכלה It will move you to the link https://he.wikipedia.org/wiki/ישראל, but not the the section #כלכלה, and in the title you see ישראל#כלכלה. Take the escaped unicode link: https://he.wikipedia.org/wiki/%D7%99%D7%A9%D7%A8%D7%90%D7%9C#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 It will move you to the link https://he.wikipedia.org/wiki/ישראל#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 and to the section #כלכלה, but the title will be ישראל#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94, not ישראל#כלכלה. The same with the float bar (put the cursor on the links in the comment #11 and see.
(In reply to comment #19) > Michael, for a moment I thought you are right, but infact it still dosn't work. > > Take this link: https://he.wikipedia.org/wiki/ישראל#כלכלה > It will move you to the link https://he.wikipedia.org/wiki/ישראל, but not > the the section #כלכלה, and in the title you see ישראל#כלכלה. This is an unrelated, very annoying bug: see bug #722718 > Take the escaped unicode link: > https://he.wikipedia.org/wiki/%D7%99%D7%A9%D7%A8%D7%90%D7%9C#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 > It will move you to the link > https://he.wikipedia.org/wiki/ישראל#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94 > and to the section #כלכלה, but the title will be > ישראל#.D7.9B.D7.9C.D7.9B.D7.9C.D7.94, > not ישראל#כלכלה. > > The same with the float bar (put the cursor on the links in the comment #11 and > see. But isn't that expected? The dots aren't escape characters, only the %s are. The new behavior seems to be exactly the same as Firefox's.
Created attachment 292517 [details] [review] Display unescaped URIs in the location entry completion
Created attachment 292522 [details] [review] Display unescaped URIs in the bookmark properties dialog
Still need to do the history dialog, and bookmarks editor window.
Created attachment 292557 [details] [review] Add ephy_tree_model_node_add_column_full Allow creating a tree model column that derives its values from a property, but is modified by a function for display.
Created attachment 292558 [details] [review] Add ephy_node_view_add_column_full Allow creating an EphyNodeView column that derives its values from a property, but is modified by a function for display.
Created attachment 292559 [details] [review] Display unescaped URLs in the bookmarks editor
Created attachment 292560 [details] [review] Display unescaped URIs in the history dialog
OK, done. The patches are all very small/simple, but since there are so many, I wonder if we should convert the URL much earlier internally, unescaping the URL before it ever reaches the history or bookmarks database, rather than unescaping it just before display.
I think we needs the same patch for the files download's title. I just downoladed a file and get this title: %D7%90%D7%95%D7%A8%D7%95%D7%AA%20-%20%D7%96%D7%A8%D7%A2%D7%95%D7%A0%D7%99%D7%9D%20-%20%D7%94.%20%D7%99%D7%A1%D7%95%D7%A8%D7%99%D7%9D%20%D7%9E%D7%9E%D7%A8%D7%A7%D7%99%D7%9D.pdf
(In reply to comment #29) > I think we needs the same patch for the files download's title. > I just downoladed a file and get this title I confirm. [1] contains a test for this. But I think this one is different enough to merit a new bug, since we'll probably want to fix it in WebKit. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=619847
Review of attachment 292517 [details] [review]: This seems to be causing trace #234872
Well that didn't work
+ Trace 234873
Comment on attachment 292517 [details] [review] Display unescaped URIs in the location entry completion This seems to be a bug in g_markup. I don't think it should hold up this patchset, since nobody seems to be able to reproduce it.
I think most of this patches do the escaping at the wrong place. * nautilus-loating-bar: this is supposed to be a generic widget that can show any message, not necessarily uris. It should be the caller then one escaping the message only when it's a URI. That message is built inside EphyWebView in ephy_web_view_set_loading_message(). That's probably where the escaping should happen. * Title box and location entry: Both are updated from the location controller. And the location controller location is set by the window. I think the window should give the URI escaped to the location controller. Again in this case, sometimes the window doesn't even need to escape the URI, for example when using the typed address. * And probably same for the others, I haven't looked at them in detail. I would probably help to store the escaped URI in EphyWebView on demand, adding a function to get it, ephy_web_view_get_display_address() for example. That way the URL is escaped only once and used everywhere.
(In reply to Carlos Garcia Campos from comment #34) > I would probably help to store the escaped URI in EphyWebView on demand, > adding a function to get it, ephy_web_view_get_display_address() for > example. That way the URL is escaped only once and used everywhere. Well the EphyWebView already has separate address and URI properties. The address property seems to be what is intended to be displayed to the user, so I think we should just use the address property for this.
(In reply to Carlos Garcia Campos from comment #34) > I think most of this patches do the escaping at the wrong place. > > * nautilus-loating-bar: this is supposed to be a generic widget that can > show any message, not necessarily uris. It should be the caller then one > escaping the message only when it's a URI. That message is built inside > EphyWebView in ephy_web_view_set_loading_message(). That's probably where > the escaping should happen. OK, I agree, this can be handled in ephy_web_view_set_loading_message() and ephy_web_view_set_link_message(). > * Title box and location entry: Both are updated from the location > controller. And the location controller location is set by the window. I > think the window should give the URI escaped to the location controller. > Again in this case, sometimes the window doesn't even need to escape the > URI, for example when using the typed address. OK, I agree. These cases are both covered by decoding the URI in ephy_web_view_set_address(). With one exception: the location entry completion. This still has to be decoded in EphyLocationEntry itself, as in the patch above, since the GtkTreeModel passed to ephy_location_entry_set_completion() needs to have the percent-encoded form of the URI; only the view can have the decoded URI (since we must ensure not to attempt to load the decoded URI; the bugs would be very subtle: it would work almost always, unless the encoded character was something that would mess with the URI or the server, like a / or a ?). > * And probably same for the others, I haven't looked at them in detail. The other cases are bookmarks dialog and history dialog, which have the same problem as the location entry completion: they need to store the percent-encoded URI, and only convert it for display.
(In reply to Yosef Or Boczko from comment #29) > I think we needs the same patch for the files download's title. > I just downoladed a file and get this title: > %D7%90%D7%95%D7%A8%D7%95%D7%AA%20- > %20%D7%96%D7%A8%D7%A2%D7%95%D7%A0%D7%99%D7%9D%20-%20%D7%94. > %20%D7%99%D7%A1%D7%95%D7%A8%D7%99%D7%9D%20%D7%9E%D7%9E%D7%A8%D7%A7%D7%99%D7%9 > D.pdf Can you provide a link to a file like this? I think something else is wrong with my test case.
Here's a new set of patches; it should properly handle everything except downloads. Note that I am not quite sure what to do with the Copy Location context menu items on the bookmarks and history dialogs; currently those return a percent-encoded URI reflecting the value that's actually stored rather than what is displayed, but maybe those should be decoded too. I think not decoding those is probably the right thing to do, though, so the user can copy-paste the URI into the address bar and be sure that the load will work.
Created attachment 308150 [details] [review] EphyWebView: decode the URI before storing it in address This means the address property can now be used to get a user-friendly decoded address suitable for display. The uri property should now be used to get the percent-encoded address. It turns out that most uses of these functions were already correct for this new scheme, but except we must be sure to store the percent-encoded address in the bookmarks database and to display the decoded address when creating a web app. For more information on encoded URIs, see RFC 3986 sections 2.1 and 2.4.
Created attachment 308151 [details] [review] EphyWebView: Decode URI before setting the loading or status message This ensures that the URI displayed in the floating bar is decoded.
Created attachment 308152 [details] [review] Display decoded URIs in the location entry completion
Created attachment 308153 [details] [review] Display decoded URIs in the bookmark properties dialog
Created attachment 308154 [details] [review] Add ephy_tree_model_node_add_column_full Allow creating a tree model column that derives its values from a property, but is modified by a function for display.
Created attachment 308155 [details] [review] Add ephy_node_view_add_column_full Allow creating an EphyNodeView column that derives its values from a property, but is modified by a function for display.
Created attachment 308156 [details] [review] Display decoded URIs in the bookmarks editor
Created attachment 308157 [details] [review] Display decoded URIs in the history dialog
Comment on attachment 308150 [details] [review] EphyWebView: decode the URI before storing it in address This one was maybe a bad idea; I already found a few cases I missed. I will add a second property.
Created attachment 308158 [details] [review] EphyWebView: add display-address property This stores a decoded URI. It is the only URI that is appropriate for display to the user.
Created attachment 308159 [details] [review] Show decoded URI in the new web application dialog
Created attachment 308160 [details] [review] EphyWindow: store a decoded address, not a percent-encoded address This means the location controller will get a user-friendly display URI.
I just test the new patches, and now it works really good, even when I typed on the location entry I see the right character. Thanks!
Review of attachment 308151 [details] [review]: Ok
Review of attachment 308152 [details] [review]: Ok
Review of attachment 308153 [details] [review]: Ok
Review of attachment 308154 [details] [review]: Looks good
Review of attachment 308155 [details] [review]: Ok
Review of attachment 308156 [details] [review]: Please, consider my comment before committing. ::: src/bookmarks/ephy-bookmarks-editor.c @@ +1487,3 @@ + char *unescaped_url = g_uri_unescape_string (url, NULL); + g_value_set_string (value, unescaped_url); + g_free (unescaped_url); This could be simplified and you avoid a heap allocation by using g_value_take_string(). const char *url = g_value_get_string (value); g_value_take_string (value, g_uri_unescape_string (url, NULL));
Review of attachment 308157 [details] [review]: Please, fix the memory leak before committing. ::: src/ephy-history-window.c @@ +775,3 @@ +{ + int col_id = GPOINTER_TO_INT (user_data); + const char *url; This is not correct, gtk_tree_model_get will return a copy of the string that you should free.
Review of attachment 308158 [details] [review]: I'm not sure we want another property for this. This is actually a different view of the existing address property, that we store just for convenience to return a const char and avoid decoding multiple times. If you need to monitor it, you can connect to the notify of the address property and then call get_display_address(). So, I think this could be simplified by just decoding the current address on demand in get_display_address().
Review of attachment 308159 [details] [review]: Ok
Review of attachment 308160 [details] [review]: Perfect.
(In reply to Carlos Garcia Campos from comment #57) > ::: src/bookmarks/ephy-bookmarks-editor.c > @@ +1487,3 @@ > + char *unescaped_url = g_uri_unescape_string (url, NULL); > + g_value_set_string (value, unescaped_url); > + g_free (unescaped_url); > > This could be simplified and you avoid a heap allocation by using > g_value_take_string(). > > const char *url = g_value_get_string (value); > g_value_take_string (value, g_uri_unescape_string (url, NULL)); Cool, thanks. (In reply to Carlos Garcia Campos from comment #58) > ::: src/ephy-history-window.c > @@ +775,3 @@ > +{ > + int col_id = GPOINTER_TO_INT (user_data); > + const char *url; > > This is not correct, gtk_tree_model_get will return a copy of the string > that you should free. Oops, good catch. (In reply to Carlos Garcia Campos from comment #59) > I'm not sure we want another property for this. This is actually a different > view of the existing address property, that we store just for convenience to > return a const char and avoid decoding multiple times. If you need to > monitor it, you can connect to the notify of the address property and then > call get_display_address(). So, I think this could be simplified by just > decoding the current address on demand in get_display_address(). OK.
Attachment 308151 [details] pushed as b2f5f1a - EphyWebView: Decode URI before setting the loading or status message Attachment 308152 [details] pushed as 17e4d34 - Display decoded URIs in the location entry completion Attachment 308153 [details] pushed as 072ccac - Display decoded URIs in the bookmark properties dialog Attachment 308154 [details] pushed as e9f2882 - Add ephy_tree_model_node_add_column_full Attachment 308155 [details] pushed as 9652445 - Add ephy_node_view_add_column_full Attachment 308156 [details] pushed as 7218cb4 - Display decoded URIs in the bookmarks editor Attachment 308157 [details] pushed as 71be3ea - Display decoded URIs in the history dialog
Naturally I pushed before fixing your comments... sigh.
Created attachment 308269 [details] [review] Fixup for 'display decoded URIs in bookmarks editor' I pushed before addressing Carlos's review comments. Bad Michael!
Created attachment 308270 [details] [review] Fixup for 'display decoded URIs in the history dialog'
Attachment 308269 [details] pushed as 339e290 - Fixup for 'display decoded URIs in bookmarks editor' Attachment 308270 [details] pushed as dbee788 - Fixup for 'display decoded URIs in the history dialog'
(In reply to Carlos Garcia Campos from comment #59) > So, I think this could be simplified by just > decoding the current address on demand in get_display_address(). I'd rather decode in set_address() and store it in the priv struct without any property. Otherwise, the caller of get_display_address() has to free the memory.
Created attachment 308271 [details] [review] EphyWebView: add get_display_address() This returns a decoded URI. It is the only URI that is appropriate for display to the user.
Created attachment 308272 [details] [review] Show decoded URI in the new web application dialog
Created attachment 308273 [details] [review] EphyWindow: store a decoded address, not a percent-encoded address This means the location controller will get a user-friendly display URI.
Created attachment 308274 [details] [review] EphyWebView: add get_display_address() This returns a decoded URI. It is the only URI that is appropriate for display to the user.
Created attachment 308275 [details] [review] Show decoded URI in the new web application dialog
Created attachment 308276 [details] [review] EphyWindow: store a decoded address, not a percent-encoded address This means the location controller will get a user-friendly display URI.
Blaarg Bugzilla... OK Carlos, can you look over attachment #308274 [details]? I'm not sure if that's what you want.
(In reply to Michael Catanzaro from comment #68) > (In reply to Carlos Garcia Campos from comment #59) > > So, I think this could be simplified by just > > decoding the current address on demand in get_display_address(). > > I'd rather decode in set_address() and store it in the priv struct without > any property. Otherwise, the caller of get_display_address() has to free the > memory. I'm not proposing to not store the decoded value, but to do it only the first time get_display_address is called instead of in set_address. I guess it doesn't matter in this case, because get_display_address will always be called, even in case of redirection for the provisional URI.
Review of attachment 308274 [details] [review]: Ok
(In reply to Carlos Garcia Campos from comment #76) > I'm not proposing to not store the decoded value, but to do it only the > first time get_display_address is called instead of in set_address. I guess > it doesn't matter in this case, because get_display_address will always be > called, even in case of redirection for the provisional URI. I prefer to do it in set_address(); your suggestion seems like a hyperoptimization.
Attachment 308274 [details] pushed as a3317ec - EphyWebView: add get_display_address() Attachment 308275 [details] pushed as 9610744 - Show decoded URI in the new web application dialog Attachment 308276 [details] pushed as 3c4525d - EphyWindow: store a decoded address, not a percent-encoded address