GNOME Bugzilla – Bug 753481
gedit-recent: fix filtering of a filenames that need uri escaping
Last modified: 2015-08-13 20:54:18 UTC
When a user types a filename into the search popover, gedit tries to find matches from recently used documents. It does this by returning all recently used documents that have what the user typed as a substring of the document uri. GtkRecentInfo provides two different apis for returning the associated uri: gtk_recent_info_get_uri() and gtk_recent_info_get_uri_display(). The former returns the uri in a canonical representation with non-ascii characters escaped using url escaping rules. The latter returns the uri with non-ascii characters unescaped. GeditRecent uses the canonicalized, escaping function, so its substring search fails on non-ascii filenames. This commit changes the code over to use the other variant.
Created attachment 309021 [details] [review] gedit-recent: fix filtering of a filenames that need uri escaping
to reproduce, try naming a document 日本語.txt and then try to search for it in the recent document selector popover
Review of attachment 309021 [details] [review]: Thanks a lot it makes sense. If you want to go the extra mile, probably we should use g_utf8_casefold and g_utf8_normalize instead of plain strdown...
Comment on attachment 309021 [details] [review] gedit-recent: fix filtering of a filenames that need uri escaping Attachment 309021 [details] pushed as 2564431 - gedit-recent: fix filtering of a filenames that need uri escaping
Created attachment 309036 [details] [review] gedit-recent: casefold/normalize utf8 strings before comparing them The recent file list is currently checked against the open document selector search entry by converting both the uri and the user input to lowercase. This isn't ideal, because the same lowercase string can be represented using different valid byte sequences, and because there isn't a one-to-one mapping of uppercase and lowercase letters. This commit changes the logic to instead normalize the string into the standard reduced form, and use utf8 case folding to be more robust and various inputs.
Created attachment 309037 [details] [review] highlight-mode-selector: normalize/casefold search string This is like the previous commit, but before the highlight mode selector.
I still need to test attachment 309036 [details] [review] and attachment 309037 [details] [review]
Review of attachment 309037 [details] [review]: Well, as much as I like utf8 I kind of hope that we do not start getting programming languages with weird characters in their names :-) That said, sure, if when testing you do not see problems, it looks good. If we were concerned about performance we could store the casefolded version in the model, but I do not think it matters in practice for this case
Review of attachment 309036 [details] [review]: ::: gedit/gedit-recent.c @@ +259,1 @@ + if (strstr (uri_casefolded, config->substring_filter) == NULL) I guess you also need to normalize and casefold the substring_filter outside the while loop
Review of attachment 309036 [details] [review]: ::: gedit/gedit-recent.c @@ +259,1 @@ + if (strstr (uri_casefolded, config->substring_filter) == NULL) And at that point we can drop the g_utf8_strdown on the only caller in gedit-open-document-selector.c:on_entry_changed which I think it is more correct, the caller should not be aware of the filtering internals
(In reply to Paolo Borelli from comment #8) > Review of attachment 309037 [details] [review] [review]: > > Well, as much as I like utf8 I kind of hope that we do not start getting > programming languages with weird characters in their names :-) Yea, I don't see anyone contributing syntax highlighting for Gödel any time soon (and regardless, i don't think many people would write Gödel instead of Gödel into the search box). I mainly just did this patch for completeness. > That said, sure, if when testing you do not see problems, it looks good. If > we were concerned about performance we could store the casefolded version in > the model, but I do not think it matters in practice for this case will push after testing
(In reply to Paolo Borelli from comment #9) > I guess you also need to normalize and casefold the substring_filter outside > the while loop It is normalized and casefolded outside the while loop (by the caller). > And at that point we can drop the g_utf8_strdown on the only caller in > gedit-open-document-selector.c:on_entry_changed > > which I think it is more correct, the caller should not be aware of the > filtering internals I agree doing it here instead of in the caller makes sense, but that's a separate change, so i'll do that first, and follow up with another commit to do the strdown to normalize update. The only advantage I could see of keeping doing it in the caller is if there was some valid scenario where the list got updated much more frequently than the entry got changed, since then it would cache the computation. I don't think that scenario exists.
(In reply to Ray Strode [halfline] from comment #12) I agree doing it here instead of in the caller makes sense, but that's a > separate change, so i'll do that first, and follow up with another commit to > do the strdown to normalize update. I am not obsessive about microcommits, but I do appreciate when someone takes the time of doing things that way :-) Feel free to push directly both changes if testing does not turn up unexpected issues. > > The only advantage I could see of keeping doing it in the caller is if there > was some valid scenario where the list got updated much more frequently than > the entry got changed, since then it would cache the computation. I don't > think that scenario exists. Yes, I do not think it is a problem
Created attachment 309231 [details] [review] highlight-mode-selector: normalize/casefold search string This is like the previous commit, but before the highlight mode selector.
Created attachment 309232 [details] [review] gedit-recent: don't require caller to manually lowercase filter string gedit_recent_get_items filters out items from the recent file list that match a set filter string. The filter string must be explicitly lowercased by the caller, which is confusing. This commit changes the code to allow the caller to pass the filter string unmodified. gedit_recent_get_items will now implicitly lowercase the string on behalf of the caller.
Created attachment 309233 [details] [review] gedit-recent: casefold/normalize utf8 strings before comparing them The recent file list is currently checked against the open document selector search entry by converting both the uri and the user input to lowercase. This isn't ideal, because the same lowercase string can be represented using different valid byte sequences, and because there isn't a one-to-one mapping of uppercase and lowercase letters. This commit changes the logic to instead normalize the string into the standard reduced form, and use utf8 case folding to be more robust and various inputs.
Attachment 309231 [details] pushed as 2e27965 - highlight-mode-selector: normalize/casefold search string Attachment 309232 [details] pushed as 99a802e - gedit-recent: don't require caller to manually lowercase filter string Attachment 309233 [details] pushed as 6248e11 - gedit-recent: casefold/normalize utf8 strings before comparing them
(i screwed up the commit message on attachment 309231 [details] [review] oh well)