After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 753481 - gedit-recent: fix filtering of a filenames that need uri escaping
gedit-recent: fix filtering of a filenames that need uri escaping
Status: RESOLVED FIXED
Product: gedit
Classification: Applications
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: Gedit maintainers
Gedit maintainers
Depends on:
Blocks:
 
 
Reported: 2015-08-10 17:10 UTC by Ray Strode [halfline]
Modified: 2015-08-13 20:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gedit-recent: fix filtering of a filenames that need uri escaping (2.71 KB, patch)
2015-08-10 17:10 UTC, Ray Strode [halfline]
committed Details | Review
gedit-recent: casefold/normalize utf8 strings before comparing them (7.38 KB, patch)
2015-08-10 23:32 UTC, Ray Strode [halfline]
none Details | Review
highlight-mode-selector: normalize/casefold search string (3.91 KB, patch)
2015-08-10 23:32 UTC, Ray Strode [halfline]
none Details | Review
highlight-mode-selector: normalize/casefold search string (3.92 KB, patch)
2015-08-13 20:48 UTC, Ray Strode [halfline]
committed Details | Review
gedit-recent: don't require caller to manually lowercase filter string (6.19 KB, patch)
2015-08-13 20:49 UTC, Ray Strode [halfline]
committed Details | Review
gedit-recent: casefold/normalize utf8 strings before comparing them (6.00 KB, patch)
2015-08-13 20:49 UTC, Ray Strode [halfline]
committed Details | Review

Description Ray Strode [halfline] 2015-08-10 17:10:29 UTC
When a user types a filename into the search popover, gedit tries
to find matches from recently used documents.  It does this by
returning all recently used documents that have what the user typed
as a substring of the document uri.

GtkRecentInfo provides two different apis for returning the associated
uri: gtk_recent_info_get_uri() and gtk_recent_info_get_uri_display().
The former returns the uri in a canonical representation with non-ascii
characters escaped using url escaping rules.  The latter returns the
uri with non-ascii characters unescaped.  GeditRecent uses the
canonicalized, escaping function, so its substring search fails
on non-ascii filenames.

This commit changes the code over to use the other variant.
Comment 1 Ray Strode [halfline] 2015-08-10 17:10:33 UTC
Created attachment 309021 [details] [review]
gedit-recent: fix filtering of a filenames that need uri escaping
Comment 2 Ray Strode [halfline] 2015-08-10 17:14:43 UTC
to reproduce, try naming a document

日本語.txt

and then try to search for it in the recent document selector popover
Comment 3 Paolo Borelli 2015-08-10 19:28:12 UTC
Review of attachment 309021 [details] [review]:

Thanks a lot it makes sense.

If you want to go the extra mile, probably we should use g_utf8_casefold and g_utf8_normalize instead of plain strdown...
Comment 4 Ray Strode [halfline] 2015-08-10 23:31:34 UTC
Comment on attachment 309021 [details] [review]
gedit-recent: fix filtering of a filenames that need uri escaping

Attachment 309021 [details] pushed as 2564431 - gedit-recent: fix filtering of a filenames that need uri escaping
Comment 5 Ray Strode [halfline] 2015-08-10 23:32:18 UTC
Created attachment 309036 [details] [review]
gedit-recent: casefold/normalize utf8 strings before comparing them

The recent file list is currently checked against the open document
selector search entry by converting both the uri and the user input
to lowercase.

This isn't ideal, because the same lowercase string can be represented
using different valid byte sequences, and because there isn't a
one-to-one mapping of uppercase and lowercase letters.

This commit changes the logic to instead normalize the string into
the standard reduced form, and use utf8 case folding to be more robust
and various inputs.
Comment 6 Ray Strode [halfline] 2015-08-10 23:32:22 UTC
Created attachment 309037 [details] [review]
highlight-mode-selector: normalize/casefold search string

This is like the previous commit, but before the highlight
mode selector.
Comment 7 Ray Strode [halfline] 2015-08-10 23:33:40 UTC
I still need to test attachment 309036 [details] [review] and attachment 309037 [details] [review]
Comment 8 Paolo Borelli 2015-08-11 09:19:17 UTC
Review of attachment 309037 [details] [review]:

Well, as much as I like utf8 I kind of hope that we do not start getting programming languages with weird characters in their names :-)


That said, sure, if when testing you do not see problems, it looks good. If we were concerned about performance we could store the casefolded version in the model, but I do not think it matters in practice for this case
Comment 9 Paolo Borelli 2015-08-11 09:19:44 UTC
Review of attachment 309036 [details] [review]:

::: gedit/gedit-recent.c
@@ +259,1 @@
+				if (strstr (uri_casefolded, config->substring_filter) == NULL)

I guess you also need to normalize and casefold the substring_filter outside the while loop
Comment 10 Paolo Borelli 2015-08-11 09:27:37 UTC
Review of attachment 309036 [details] [review]:

::: gedit/gedit-recent.c
@@ +259,1 @@
+				if (strstr (uri_casefolded, config->substring_filter) == NULL)

And at that point we can drop the g_utf8_strdown on the only caller in gedit-open-document-selector.c:on_entry_changed

which I think it is more correct, the caller should not be aware of the filtering internals
Comment 11 Ray Strode [halfline] 2015-08-12 18:59:10 UTC
(In reply to Paolo Borelli from comment #8)
> Review of attachment 309037 [details] [review] [review]:
> 
> Well, as much as I like utf8 I kind of hope that we do not start getting
> programming languages with weird characters in their names :-)
Yea, I don't see anyone contributing syntax highlighting for Gödel any time soon (and regardless, i don't think many people would write Gödel instead of Gödel into the search box).

I mainly just did this patch for completeness.

> That said, sure, if when testing you do not see problems, it looks good. If
> we were concerned about performance we could store the casefolded version in
> the model, but I do not think it matters in practice for this case
will push after testing
Comment 12 Ray Strode [halfline] 2015-08-12 19:11:17 UTC
(In reply to Paolo Borelli from comment #9)

> I guess you also need to normalize and casefold the substring_filter outside
> the while loop

It is normalized and casefolded outside the while loop (by the caller).

> And at that point we can drop the g_utf8_strdown on the only caller in
> gedit-open-document-selector.c:on_entry_changed
> 
> which I think it is more correct, the caller should not be aware of the
> filtering internals
I agree doing it here instead of in the caller makes sense, but that's a separate change, so i'll do that first, and follow up with another commit to do the strdown to normalize update.

The only advantage I could see of keeping doing it in the caller is if there was some valid scenario where the list got updated much more frequently than the entry got changed, since then it would cache the computation.  I don't think that scenario exists.
Comment 13 Paolo Borelli 2015-08-12 20:09:14 UTC
(In reply to Ray Strode [halfline] from comment #12)
I agree doing it here instead of in the caller makes sense, but that's a
> separate change, so i'll do that first, and follow up with another commit to
> do the strdown to normalize update.

I am not obsessive about microcommits, but I do appreciate when someone takes the time of doing things that way :-)

Feel free to push directly both changes if testing does not turn up unexpected issues.

> 
> The only advantage I could see of keeping doing it in the caller is if there
> was some valid scenario where the list got updated much more frequently than
> the entry got changed, since then it would cache the computation.  I don't
> think that scenario exists.

Yes, I do not think it is a problem
Comment 14 Ray Strode [halfline] 2015-08-13 20:48:56 UTC
Created attachment 309231 [details] [review]
highlight-mode-selector: normalize/casefold search string

This is like the previous commit, but before the highlight
mode selector.
Comment 15 Ray Strode [halfline] 2015-08-13 20:49:09 UTC
Created attachment 309232 [details] [review]
gedit-recent: don't require caller to manually lowercase filter string

gedit_recent_get_items filters out items from the recent file list that
match a set filter string. The filter string must be explicitly
lowercased by the caller, which is confusing.

This commit changes the code to allow the caller to pass the filter
string unmodified. gedit_recent_get_items will now implicitly lowercase
the string on behalf of the caller.
Comment 16 Ray Strode [halfline] 2015-08-13 20:49:16 UTC
Created attachment 309233 [details] [review]
gedit-recent: casefold/normalize utf8 strings before comparing them

The recent file list is currently checked against the open document
selector search entry by converting both the uri and the user input
to lowercase.

This isn't ideal, because the same lowercase string can be represented
using different valid byte sequences, and because there isn't a
one-to-one mapping of uppercase and lowercase letters.

This commit changes the logic to instead normalize the string into
the standard reduced form, and use utf8 case folding to be more robust
and various inputs.
Comment 17 Ray Strode [halfline] 2015-08-13 20:49:31 UTC
Attachment 309231 [details] pushed as 2e27965 - highlight-mode-selector: normalize/casefold search string
Attachment 309232 [details] pushed as 99a802e - gedit-recent: don't require caller to manually lowercase filter string
Attachment 309233 [details] pushed as 6248e11 - gedit-recent: casefold/normalize utf8 strings before comparing them
Comment 18 Ray Strode [halfline] 2015-08-13 20:54:18 UTC
(i screwed up the commit message on attachment 309231 [details] [review] oh well)