GNOME Bugzilla – Bug 646808
Add search-based retrieval
Last modified: 2015-02-17 09:20:42 UTC
Folks' API is currently just a big bag of contacts with change notification. This works fine when all contacts are stored locally, there are relatively few contacts, and all changes are guaranteed to be pushed to us; it does not scale to a world with remote stores which we have to poll (bug 643718) or where there are many contacts. It also requires each application to re-implement search. <http://lists.freedesktop.org/archives/telepathy/2011-March/005369.html> outlines a series of API additions to Folks to allow search-based retrieval of contacts.
Bug #648805 discusses a few points that relate to this bug (particularly the guarantee of consistent Individual linking regardless of search, which we agreed upon in the discussion linked above).
From discussions at the hackfest: • Implement it using a “view” object which wraps the IndividualAggregator and searches over the whole set of Individuals. • Automatically update the set of results as the IndividualAggregator changes. • Have a “search harder” flag for searching backends like LDAP and Facebook which are search-only. • Have a strictness enum (e.g. strict/fuzzy/auto-correct) for the level of strictness of matching property values to search terms. • All searches are boolean AND, not boolean OR. Nobody uses boolean OR in real life. • Don't bother pushing query strings down into the backends, since by the time the user does a search, all of the individuals have been loaded and displayed in gnome-contacts anyway. • Support efficient narrowing of searches, though this is an internal implementation detail.
We need to return which part of the results match the search terms.
(In reply to comment #2) > • Have a strictness enum (e.g. strict/fuzzy/auto-correct) for the level of > strictness of matching property values to search terms. This probably should work towards relaxing strictness and be advisory: if a backend only supports strict (case-insensitive) searches, "fuzzy" etc. is ignored. > • All searches are boolean AND, not boolean OR. Nobody uses boolean OR in real > life. But some filters should be able to match multiple fields (e.g. name or nickname or org. name matches the search terms). > • Don't bother pushing query strings down into the backends, since by the time > the user does a search, all of the individuals have been loaded and displayed > in gnome-contacts anyway. Isn't this an external consideration for Folks? It deserves to be usable outside of Gnome, I think. > • Support efficient narrowing of searches, though this is an internal > implementation detail. I'm afraid incremental search can't be efficiently internalized. Recognizing when a new search string is an appended increment of the previous one is perhaps easy; but the narrowed result usually sheds so many contacts that reusing the same view object for it and signalling the dropped contacts may be impractical.
(In reply to comment #3) > We need to return which part of the results match the search terms. Where is this needed? A UI could simply highlight text matches in the displayed data.
(In reply to comment #2) > • Have a “search harder” flag for searching backends like LDAP and Facebook > which are search-only. > • Have a strictness enum (e.g. strict/fuzzy/auto-correct) for the level of > strictness of matching property values to search terms. > • All searches are boolean AND, not boolean OR. Nobody uses boolean OR in real > life. Also, in my design I tried to not impose too strict requirements on the implementation of searches. A backend should produce a result set that generally follows the idea of the query case, in a way that does not cause undue surprises for the user. This will ease mapping onto various search mechanisms (LDAP, CardDAV, the gazillion of web APIs, etc...) without killing yourselves over strict semantics. Of course, care should be taken to avoid data loss in use cases that aim to enumerate all contacts useful in a certain way (e.g. list all locally stored contacts with an email address).
Punting bugs that won't be fixed by Folks 0.6.0.
See also bug #658576 to get a single individual from his ID.
(In reply to comment #2) > From discussions at the hackfest: > • Have a “search harder” flag for searching backends like LDAP and Facebook > which are search-only. I've split this off as its own bug #660299, since we need to implement new backends to add API for it. But we should prototype it before we close this bug and merge its code to make sure we won't need to break API when we get around to it.
Created attachment 201031 [details] [review] Add (shallow) search functionality to Folks Patches from branch: http://cgit.collabora.com/git/user/treitter/folks.git/log/?h=bgo646808-search =============================== This branch isn't quite finished (in part) because I need feedback on it. As we've switched to not breaking API, I think we need more scrutiny for new API. So I'd like to get early feedback, roll it in, make sure we have realistic use cases, and future-proof ourselves a little (by factoring in deep-searching before we finalize the new API in this branch; see below). The design goals in this branch should match what we came up with in our Folks/Contacts hackfest a little while back. Please let me know if I missed anything in the implementation. SearchView is primarily meant for "one off" searching like Gnome Shell's people search provider, an email program's type-ahead contact matching when composing a new email, etc. Programs which expose a browser-like interface to contacts (Empathy, Gnome Contacts) may wish to just stick with using the IndividualAggregator directly (though I have optimized the match-all search ("") and it seems to have similar performance to the plain IndividualAggregator based on the existing EDS stress test (I'll add more test cases before merging, so we should be able to catch and fix any issues which show up with additional test cases). Some more thoughts on the branch: * It would be nice to create an IndividualView interface like: public interface IndividualView { public signal individuals_changed_detailed (MultiMap <Individual?, Individual?>); public abstract async void prepare () throws GLib.Error; public abstract Map<string, Individual> individuals {} public abstract is_prepared {}; public abstract is_quiescent {}; ... } and have IndividualAggregator and SearchView both implement it. However, this would break ABI for subclasses of IndividualAggregator which override its signals. There should be zero instances of this in the wild, but it's a theoretical possibility. * I think the best way to work in deep searching would be to add a construct-only property to SearchView (like "deep-search") and add a bool deep_search parameter to the SearchView constructor. Setting deep-search to true would have the SearchView /only/ return deep-search results. In the case that the application wants both shallow and deep results, they would simply add another SearchView and display the combined results however they'd like. This way we wouldn't need to specially-label some Individuals as being from a deep search. (Though, on the other hand, that would potentially be useful in the case that the user wanted to copy search results locally (but I don't want to encourage that data aliasing)). * Before merging, we need at least one real-world use of this functionality. I'm planning to rebase the Gnome Shell search provider upon this and test it out before finalizing this branch.
Review of attachment 201031 [details] [review]: Looks like a good start to me. I haven't reviewed the test cases much more than just skimming over them, but I've taken a reasonably close look at the rest of the code. I can't think of any architectural problems with it other than what I've noted in comments. I'll reply to comment #10 separately. ::: folks/query.vala @@ +27,3 @@ + * @since UNRELEASED + */ +public enum Folks.QueryStrictness Seeing how you've documented the elements of this enum made me think that perhaps it would be more flexible to use a set of flags here instead of an enum. e.g. OR-able flags for: whole-word matching, case sensitivity, per-field matching, and anything else we come up with. I don't think it would make any of the internals of folks more complex, and would give that extra bit of flexibility in the API. @@ +31,3 @@ + /** + * Whole-word matches only ('red' will not match 'Alfred'). + * Case-insensitiive ('alfred' will match 'Alfred'). s/insensitiive/insensitive/ @@ +65,3 @@ + "full_name", + "nickname", + "structured_name" If these are property names, they should be using hyphens instead of underscores. Same for the ones below. @@ +95,3 @@ + }; + + QueryStrictness _strictness; This should be private and have a default value. @@ +107,3 @@ + } + + private string[] _match_fields; Needs a default value. @@ +116,3 @@ + * + * Also note that more fields (particularly rarely-matched fields) will + * negatively impact performance, so only include important fields. Might be good to note that the field names should come from Query.MATCH_FIELDS_* (if I'm right in assuming that). ::: folks/search-view.vala @@ +23,3 @@ + +/** + * A view of {@link Individual}s which match a given {@link Query}. Might want to mention that a new SearchView/Query has to be created for each new query string being used. @@ +71,3 @@ + /** + * Whether {@link IndividualAggregator.prepare} has successfully completed for + * this aggregator. s/this aggregator/this view's aggregator/. @@ +88,3 @@ + * + * @see IndividualAggregator.individuals_changed_detailed + * @since UNRELEASED Needs a @param line for the changes parameter. @@ +120,3 @@ + * This function is guaranteed to be idempotent. + * + * @since UNRELEASED Needs a @throws line. @@ +129,3 @@ + this._aggregator.individuals_changed_detailed.connect ( + this._aggregator_individuals_changed_detailed_cb); + yield this._aggregator.prepare (); If this throws an error, this._prepare_pending will get stuck at true. @@ +156,3 @@ + this._individuals.unset (individual_old.id); + view_changes.remove_all (individual_old); + view_changes.set (individual_old, null); Shouldn't individual_old only be added to view_changes if it matched the query? i.e. If it actually existed in this._individuals before. @@ +167,3 @@ + { + this._individuals.set (individual_new.id, + individual_new); I can't currently think of a situation where this would be a problem, but I'm hesitant about adding to and removing from this._individuals in the same loop (the outermost loop over changes.get_keys()). If it's possible to refactor the code to do all the removals first, followed by all the additions to this._individuals I'd feel happier about it. We've had bugs in the IndividualAggregator before which were caused by adding a given ID to a collection and then removing it immediately afterwards because we mis-ordered the operations on the collection. ::: folks/simple-query.vala @@ +33,3 @@ + /* These are guaranteed to be non-null */ + private string _query_string; + private HashSet<string> _query_tokens; It would be more efficient to use a simple array for this, since it's built once and then just iterated over a lot. It doesn't need fast lookups. @@ +71,3 @@ + token_str += t; + } + token_str += "]"; I wouldn't bother building a fancy string for the debug output. It's probably clearer to just call debug() for each of the tokens. @@ +86,3 @@ + * resultant contacts + * + * @since UNRELEASED Need a @param line for match_fields. @@ +107,3 @@ + { + /* Treat an empty query string as "match all" */ + if (this._query_tokens.size < 1) Could also check the size of this.match_fields here. @@ +133,3 @@ + /* Build up a set of tokens found on for the current prop_name, + * since we can't safely modify tokens_remaining in place */ + var tokens_found = new HashSet<string> (); This could be a linked list or array instead of a hash set. @@ +144,3 @@ + + foreach (var token_found in tokens_found) + tokens_remaining.remove (token_found); Instead of using tokens_found, it should be possible to explicitly use a Gee.Iterator to iterate over tokens_remaining above. You can then call iter.remove() to remove tokens which have been matched. That should be more efficient than maintaining tokens_found and should allow for earlier exiting of the function once tokens_remaining.size == 0. @@ +235,3 @@ + else + { + /* FIXME: handle non-string AFDs */ Reminder: FIXME here. @@ +253,3 @@ + else + { + /* FIXME: need to do something here */ Print a warning to the console and return false, perhaps? @@ +291,3 @@ + else + { + /* FIXME: need to do something here */ Warn and return false? ::: tests/eds/search-view.vala @@ +17,3 @@ + * Authors: Raul Gutierrez Segales <raul.gutierrez.segales@collabora.co.uk> + * Travis Reitter <travis.reitter@collabora.co.uk> + * Empty line. @@ +55,3 @@ + this._search_view = null; + this._aggregator = null; + this._eds_backend.tear_down (); Set this._eds_backend and this._expected_individuals to null. @@ +226,3 @@ + this._expected_individuals.remove (persona.display_id); + + /* FIXME: cut */ Reminder: some FIXMEs here.
(In reply to comment #10) *snip* > Some more thoughts on the branch: > > * It would be nice to create an IndividualView interface like: > > public interface IndividualView > { > public signal individuals_changed_detailed (MultiMap <Individual?, > Individual?>); > public abstract async void prepare () throws GLib.Error; > public abstract Map<string, Individual> individuals {} > public abstract is_prepared {}; > public abstract is_quiescent {}; > ... > } > > and have IndividualAggregator and SearchView both implement it. However, this > would break ABI for subclasses of IndividualAggregator which override its > signals. There should be zero instances of this in the wild, but it's a > theoretical possibility. It would be nice, but probably not worth breaking ABI for. I guess we should file a bug about it for the Next Big API Break. > * I think the best way to work in deep searching would be to add a > construct-only property to SearchView (like "deep-search") and add a bool > deep_search parameter to the SearchView constructor. Setting deep-search to > true would have the SearchView /only/ return deep-search results. In the case > that the application wants both shallow and deep results, they would simply add > another SearchView and display the combined results however they'd like. This > way we wouldn't need to specially-label some Individuals as being from a deep > search. (Though, on the other hand, that would potentially be useful in the > case that the user wanted to copy search results locally (but I don't want to > encourage that data aliasing)). I'm not convinced about only showing deep results in a deep-search–enabled SearchView. Combining the results from two SearchViews feels like a nasty can of worms, especially since the same Individual may be returned by both SearchViews. Why would we have to label some Individuals as being from a deep search? Otherwise, I agree with the implementation of deep search being as a construct-only property on SearchView. My vote for the property name is for ‘search-at-level-11’. > * Before merging, we need at least one real-world use of this functionality. > I'm planning to rebase the Gnome Shell search provider upon this and test it > out before finalizing this branch. Great work! \o/
Created attachment 201569 [details] [review] Add (shallow) search functionality to Folks (try 2) Patches from updated branch: http://cgit.collabora.com/git/user/treitter/folks.git/log/?h=bgo646808-search =============== This addresses all of Philip's concerns and actually tests and implements the strictness flags (I accidentally glossed over that in the first run). Remaining comments: > @@ +167,3 @@ > + { > + this._individuals.set (individual_new.id, > + individual_new); > > I can't currently think of a situation where this would be a problem, but I'm > hesitant about adding to and removing from this._individuals in the same loop > (the outermost loop over changes.get_keys()). If it's possible to refactor the > code to do all the removals first, followed by all the additions to > this._individuals I'd feel happier about it. We've had bugs in the > IndividualAggregator before which were caused by adding a given ID to a > collection and then removing it immediately afterwards because we mis-ordered > the operations on the collection. ======================= Just moved the addition code into a second loop over changes.get_keys() to guarantee this. We could add some clever shortcut to reduce the amount of extra work this adds, but I'd like to hold off until/unless we can prove that this slow-down matters, since it would make debugging a uglier.
Review of attachment 201569 [details] [review]: Just a quick review since I believe you already have some updates to this branch in the pipeline. ::: folks/query.vala @@ +127,3 @@ + }; + + QueryStrictnessFlags _strictness_flags = QueryStrictnessFlags.DEFAULTS; This should be private. ::: folks/search-view.vala @@ +144,3 @@ + { + this._aggregator.individuals_changed_detailed.disconnect ( + this._aggregator_individuals_changed_detailed_cb); You should set this._prepare_pending=false here, or it will get stuck at true when the throw statement below is executed. @@ +172,3 @@ + { + /* individual_old removed or replaced */ + this._individuals.unset (individual_old.id); You can combine the this._individuals.has_key() and .unset() calls for efficiency: .unset() returns a bool indicating whether the individual was in the map before being removed. You can use this in the if() instead of .has_key(). ::: folks/simple-query.vala @@ +30,3 @@ +public class Folks.SimpleQuery : Folks.Query +{ + private const string _whitespace_chars = " \t\n\r"; isspace() also recognises ‘\f’ and ‘\v’ as whitespace — we might want to do the same. @@ +105,3 @@ + * {@inheritDoc} + */ + public override bool is_match (Individual individual) In case you haven't done it already, it might be wise to examine the generated C code for the speed-critical methods like this one and see if there are any unnecessary copies/strdup()s being introduced by Vala which we could eliminate. (I haven't done so, and we can always do this later; I'm just making a note about it.)
(In reply to comment #14) > ::: folks/simple-query.vala > @@ +30,3 @@ > +public class Folks.SimpleQuery : Folks.Query > +{ > + private const string _whitespace_chars = " \t\n\r"; > > isspace() also recognises ‘\f’ and ‘\v’ as whitespace — we might want to do the > same. Sure - though note that we'll have to delay support of \v due to bgo#664689.
(In reply to comment #14) > @@ +105,3 @@ > + * {@inheritDoc} > + */ > + public override bool is_match (Individual individual) > > In case you haven't done it already, it might be wise to examine the generated > C code for the speed-critical methods like this one and see if there are any > unnecessary copies/strdup()s being introduced by Vala which we could eliminate. > (I haven't done so, and we can always do this later; I'm just making a note > about it.) I've only done some basic checking -- will do more before the final merge.
Created attachment 202041 [details] [review] Add (shallow) search functionality to Folks (try 3) Patches from branch: http://cgit.collabora.com/git/user/treitter/folks.git/log/?h=bgo646808-search ================ This addresses the last review's points and adds query changing after creating a SearchView. Next up, query-changing debouncing.
Review of attachment 202041 [details] [review]: ::: folks/search-view.vala @@ +33,3 @@ + * + * A {@link SearchView} is strictly bound to its initial {@link Query} so a new + * {@link SearchView} must be created with a new {@link Query} as necessary. Is this paragraph relevant any more? @@ +71,3 @@ + /* Re-evaluate all Individuals (only if necessary) */ + if (this._is_prepared) + this._evaluate_all_aggregator_individuals (); Out of interest, what's the reasoning behind allowing SearchView.query to be settable, but not allowing SimpleQuery.query_string to be set? I can see the advantage in being about to (for example) change a view's query from a SimpleQuery to a ComplexQueryBecauseIStillCantFindWhatImLookingFor easily, but I wonder if most search needs (i.e. live search with debouncing) would be served fine just by updating the value of SimpleQuery.query_string, and leaving SearchView.query alone. @@ +85,3 @@ + * For clients who only wish to have a snapshot of search results, this + * property is valid once {@link SearchView.prepare} is finished and this + * SearchView may be unreferenced and ignored afterward. Either add a @link to “SearchView” or change it to prose (e.g. “search view”), but I think it's a bit ugly to have “SearchView” in plain text. @@ +238,3 @@ + { + individual.notify.disconnect (this._individual_notify_cb); + this._individuals.unset (individual.id); As noted before, you can combine the .has_key() and .unset() calls to save a lookup in the hash table: if (individual != null && this._individuals.unset (individual.id) == true) { individual.notify.disconnect (this._individual_notify_cb); return true; } @@ +246,3 @@ + } + + private void _evaluate_all_aggregator_individuals () Might be clearer to call this (e.g.) “_add_all_aggregator_individuals”.
Created attachment 202318 [details] [review] Add (shallow) search functionality to Folks (try 4) Patches from branch: http://cgit.collabora.com/git/user/treitter/folks.git/log/?h=bgo646808-search ===================== Cleans up some code related to re-evaluating matches in SearchViews and addresses points in comment #18 (Does not yet implement debouncing)
(In reply to comment #18) > @@ +71,3 @@ > + /* Re-evaluate all Individuals (only if necessary) */ > + if (this._is_prepared) > + this._evaluate_all_aggregator_individuals (); > > Out of interest, what's the reasoning behind allowing SearchView.query to be > settable, but not allowing SimpleQuery.query_string to be set? I can see the > advantage in being about to (for example) change a view's query from a > SimpleQuery to a ComplexQueryBecauseIStillCantFindWhatImLookingFor easily, but > I wonder if most search needs (i.e. live search with debouncing) would be > served fine just by updating the value of SimpleQuery.query_string, and leaving > SearchView.query alone. My original idea was that Queries were simple objects describing a given query. Most importantly, they should contain as little state (and signaling) as possible. But given that your case is likely to be very common, it seems worthwhile (so I've done this in the latest patch (including a couple subtests)). > @@ +246,3 @@ > + } > + > + private void _evaluate_all_aggregator_individuals () > > Might be clearer to call this (e.g.) “_add_all_aggregator_individuals”. This pointed out a little confusion in the code; it was possible for the Aggregator's individuals-changed-detailed callback to *remove* individuals that were indicated as being added (not just avoiding adding them if they failed to match the query). The code that was indeed re-evaluating was abusing this quirk in the callback handler. So I've cleaned that up and made the code reflect my original intent with the word "evaluate" ("add or remove (as necessary)").
(In reply to comment #8) > See also bug #658576 to get a single individual from his ID. Out of curiosity, does this branch fix this? (I have to confess I didn't look at it yet).
(In reply to comment #21) > (In reply to comment #8) > > See also bug #658576 to get a single individual from his ID. > > Out of curiosity, does this branch fix this? (I have to confess I didn't look > at it yet). I haven't tried it yet, but it may already work if you search for their ID and set the match_fields to {"id"}. But that isn't an intended goal of this branch - we logically-AND all terms, so you could get at most one result with this type of query (and you have to construct the Query, SearchView, prepare the SearchView, etc.). I think it's best fixed in the bug mentioned in the original comment as some simple utility function on IndividualAggregator.
Review of attachment 202318 [details] [review]: Perhaps it's time to push the unrelated changes to debug.vala and various bits of documentation. The rest looks OK to me. How long's the debouncing stuff going to take? It would be good to get this all pushed soon.
(In reply to comment #23) > Review of attachment 202318 [details] [review]: > > Perhaps it's time to push the unrelated changes to debug.vala and various bits > of documentation. Those were already pushed. They appear later in this branch because I haven't rebased it upon master since I added them (to avoid breaking continuity). Though it may be worthwhile to add a -2 branch to tidy that all up. I did miss commit a5b766e235 though, so thanks. I've pushed that to origin now. > The rest looks OK to me. How long's the debouncing stuff going to take? It > would be good to get this all pushed soon. My estimate is a week for debouncing (almost entirely because I'm doing something else in the meantime; it shouldn't be much work). The idea was to wait to merge till I finished the Gnome Shell rebase to make sure our API is right. I'm fairly happy with our API now though I'm shelving the Gnome Shell rebase temporarily. It requires more structural changes to Gnome Shell's search classes than I want to spend time on just now (and fmuellner from Gnome Shell is cleaning that up a little anyhow). I'd also like to get in an is-quiescent signal here as well, since it's mandatory for "one-shot" use (as the Shell and probably other applications would desire). This is really simple to add, so I should get that done about the time of debouncing, then we can merge.
(In reply to comment #24) > (In reply to comment #23) > > Review of attachment 202318 [details] [review] [details]: > > > > Perhaps it's time to push the unrelated changes to debug.vala and various bits > > of documentation. > > Those were already pushed. They appear later in this branch because I haven't > rebased it upon master since I added them (to avoid breaking continuity). > Though it may be worthwhile to add a -2 branch to tidy that all up. Whoops. Please excuse my brain fart. > I did miss commit a5b766e235 though, so thanks. I've pushed that to origin now. I don't see it there. What am I missing? > > The rest looks OK to me. How long's the debouncing stuff going to take? It > > would be good to get this all pushed soon. > > My estimate is a week for debouncing (almost entirely because I'm doing > something else in the meantime; it shouldn't be much work). *snip* Great!
(In reply to comment #25) > > I did miss commit a5b766e235 though, so thanks. I've pushed that to origin now. > > I don't see it there. What am I missing? I am a great magician. Not only is it now visible, its commit hash is now 5db7b6b7! Presto change-o. (Honestly, I know I pushed it last time and git seemed fine with it. No idea what happened.)
Just so it doesn't get lost in the time it's been since I've last worked on this branch, it still needs a couple bits before merging: * a working client - I started some work on this but it won't be completed; so, I'm going to fulfill this requirement by creating a search tool (possibly by just by adding it as a feature in folks-inspect) * a flag for "search harder" and a simple prototype client that uses it. We don't necessarily need to close bug #660299 before merging this branch, but I want to be convinced we won't have to break API to support it later if we do put that off.
(In reply to comment #27) > Just so it doesn't get lost in the time it's been since I've last worked on > this branch, it still needs a couple bits before merging: > > * a working client - I started some work on this but it won't be completed; > so, I'm going to fulfill this requirement by creating a search tool (possibly > by just by adding it as a feature in folks-inspect) > > * a flag for "search harder" and a simple prototype client that uses it. We > don't necessarily need to close bug #660299 before merging this branch, but I > want to be convinced we won't have to break API to support it later if we do > put that off. Also, we'll need to include API for sorting and pagination if we want to support them efficiently and native in our API. I don't see why this would matter for any local contacts (where we're talking about a few hundred contacts in large cases), but it could make sense for search-only PersonaStores.
The more I think about it, the more I think we need to provide a match strength indicator for each result. It can be easy to assume that you'll only get a few results if you search for the name "bob", so ordering won't matter too much, but the fuzzier the searching we support (eg, Android-like numbers mapped to keypad letters), the greater the importance of ranking.
Before finalising the new API, it would be useful to check that it can support behaviour such as what’s implemented here: https://github.com/openshine/gnome-sms/blob/master/src/helper.vala, just as a check.
Here are some more notes I have locally that I haven't included here yet: * add 'is-quiescent' to SearchView * add de-bouncing to changes in the SearchView query (may need to expose the debounce timeout, in ms; probably should keep this internal until demanded) * Create search tool for Folks search. Once it works for all obvious use cases, merge search branch * double-check critical code paths in Search code to make sure it doesn't do a lot of unnecessary strdups/copies * re-consider whole-word as the default for search -- at least, it doesn't make sense for email and IM addresses
(In reply to comment #30) > Before finalising the new API, it would be useful to check that it can support > behaviour such as what’s implemented here: > https://github.com/openshine/gnome-sms/blob/master/src/helper.vala, just as a > check. Matching phone numbers? That should be handled because we compare phone details via values_equal(), which includes normalization. But it's certainly worth including a few test cases to make sure it works as it should.
Created attachment 292888 [details] [review] core: Implement base classes for searching
Created attachment 292889 [details] [review] inspect: Add search option to folks-inspect
Patches updated to include some work done internally. Also updated to apply cleanly to recent master. None of the review comments have been purposefully addressed, and the branch might not even compile. But at least it applies to master now. http://cgit.collabora.com/git/user/pwith/folks.git/log/?h=search
Created attachment 295024 [details] [review] core: Implement base classes for searching
Created attachment 295025 [details] [review] inspect: Add search option to folks-inspect
I’ve updated the patches to address all the review comments so far. Here’s a brief summary of the changes: • Added SearchView.is-quiescent, proxying IndividualAggregator.is-quiescent. This may change when we add search-only persona stores. • Added SearchView.refresh() for explicitly refreshing results. This is currently pointless because we listen to PersonaStore signals and dynamically update the result set. However, it will be useful for search-only PersonaStores (bug #660299), which do not emit change notifications. The only other interaction we need with such stores is to pass them a Query object when preparing the SearchView (and when SearchView.query is changed), which fits in fine with the current API. • Various compilation, documentation and style changes. Here are a few issues I’ve addressed without code changes: • Debouncing: I don’t think this belongs in folks. The UI should do the debouncing, so that any UI which _does_ need folks to return results as fast as possible doesn’t have to wait for a debounce timeout. • Suitability for gnome-sms: Looks like it all matches up nicely to me. • Return the part of the results which matches the search terms (comment #3): This is too complex to implement generally, and of limited application. If programs really want it, we can add it later once we’ve got a better idea of their requirements. And here are the remaining issues, somewhat intertwined, and needing comment from you, Travis: • Sorting and pagination: I think it’s a bad idea. If we’re searching, the whole point is to narrow the search space down to a handful of results, for which you do not need pagination. Sorting should be by match strength (see below). Sorting and pagination are incredibly hard to implement in a general manner, and from a quick bit of research, none of our backends (specifically, EDS and Tracker) support sorting or pagination natively anyway, so there are no performance gains to be had by farming it out to them. In terms of performance, we have the option of changing the implementation of SearchView.individuals to be lazy if needed in future, retrieving results as the client iterates over the Map, rather than retrieving them all at once. That would be an engineering challenge, but my point is that we aren’t tying ourselves into a hole by our choice of SearchView.individuals API at the moment. • Match strength: I think this is a really important idea, and should be how we sort results: by match strength, rather than full name. In the common case of searching for a contact by name, match strength should approximate sorting the results by full name anyway. So, what needs to be done: 1. Travis, I’d appreciate your opinions on sorting and pagination. If you agree with me, I can go ahead and remove the SearchView.sorted_individuals and SearchView.get_sorted_page() APIs. 2. We need to incorporate match strength into the results. Do we actually need to expose a number for it? Or would it be OK to always sort SearchView.individuals by decreasing match strength? I suspect there are no useful situations where we need to have a measure of the relative match strength of two results. 3. That does mean SearchView.individuals has to be a SortedSet<Individual>.
Created attachment 295114 [details] [review] core: Implement base classes for searching
Created attachment 295115 [details] [review] inspect: Add search option to folks-inspect
(In reply to comment #38) > I’ve updated the patches to address all the review comments so far. Here’s a > brief summary of the changes: > > • Added SearchView.is-quiescent, proxying IndividualAggregator.is-quiescent. > This may change when we add search-only persona stores. > • Added SearchView.refresh() for explicitly refreshing results. This is > currently pointless because we listen to PersonaStore signals and dynamically > update the result set. However, it will be useful for search-only PersonaStores > (bug #660299), which do not emit change notifications. The only other > interaction we need with such stores is to pass them a Query object when > preparing the SearchView (and when SearchView.query is changed), which fits in > fine with the current API. My preference would be to move that API to the branch for bug #660299 so we don't introduce API until it actually does something under at least one set of conditions. I'd be more compelled if there were a language-based reason to add anticipated API where it would be much more awkward to add it later. > • Various compilation, documentation and style changes. > > Here are a few issues I’ve addressed without code changes: > > • Debouncing: I don’t think this belongs in folks. The UI should do the > debouncing, so that any UI which _does_ need folks to return results as fast as > possible doesn’t have to wait for a debounce timeout. Fair enough. If we truly can prove there are cases where it would be meaningful, we should be able to add it later (first, internally, then expose it if needed). > • Suitability for gnome-sms: Looks like it all matches up nicely to me. > • Return the part of the results which matches the search terms (comment #3): > This is too complex to implement generally, and of limited application. If > programs really want it, we can add it later once we’ve got a better idea of > their requirements. Agreed. It's a lot of work to get right and it's a very marginal benefit. > And here are the remaining issues, somewhat intertwined, and needing comment > from you, Travis: > > • Sorting and pagination: I think it’s a bad idea. If we’re searching, the > whole point is to narrow the search space down to a handful of results, for > which you do not need pagination. Sorting should be by match strength (see > below). Sorting and pagination are incredibly hard to implement in a general > manner, and from a quick bit of research, none of our backends (specifically, > EDS and Tracker) support sorting or pagination natively anyway, so there are no > performance gains to be had by farming it out to them. In terms of performance, > we have the option of changing the implementation of SearchView.individuals to > be lazy if needed in future, retrieving results as the client iterates over the > Map, rather than retrieving them all at once. That would be an engineering > challenge, but my point is that we aren’t tying ourselves into a hole by our > choice of SearchView.individuals API at the moment. Good points. Lazily retrieving results while iterating the Map is an interesting idea if it comes to that. I think the idea behind pushing sorting down to the backends was that they could tack a "SORT" modifier onto their query if possible as that would likely be very efficient. ...Except then we'd have sorted Personas but not Individuals. So, I'm not sure why this was a compelling idea when I suggested it. Agreed on pagination. > • Match strength: I think this is a really important idea, and should be how > we sort results: by match strength, rather than full name. In the common case > of searching for a contact by name, match strength should approximate sorting > the results by full name anyway. Yes, definitely, match strength is most important. Most people will be most interested in full name matches (so, if it matters, that should be prioritized) though matching on other fields is also useful. And I think it's also important that we match within strings (eg, someone's surname needs to match them strongly). I occasionally still see search features which only match from the front of the field and it's jarring how much it's making the user work harder for something that's trivial in software. It /might/ be meaningful to factor in word boundaries though (eg, if someone looking for "Alice Smith" isn't going to start typing "esmi" or "e smi"). My biggest concern with sorting is that applications don't re-sort results around rapidly on the user, particularly when they're going to trigger some action based on the top/highlighted result without manually waiting for things to settle. Gnome Shell and Unity are both really bad at this. Gnome Shell can present results slowly and non-deterministically enough that the same query, typed twice at similar rates, can easily present different results at the time the user hits Enter. Unity often shuffles entire categories around between me finishing my query and hitting enter. Eg, just now, I typed "time", and was about to hit Enter. Most of the typing time, the top results were the applications "Time Tracker", "Time & Date", and "GIMP Image Editor" (?!) but just before hitting Enter, the "Files & Folders" category moved to the top, so I ended up opening a document with "time" in the filename. For added confusion, I just re-tried this, and ended up with the category ordering: Applications, Music, Files & Folders. So, new category in the middle and the three categories were in a different order. They're stable in this order when I re-open the search (it presents the last search and results) but not if I re-do the same query on the same data. It's possible that there's no good trade-off between live results and sort stability. On the other hand, for "one-shot" results (which I think is an important mode for SearchView to support), Google somehow does extremely well with this through keeping all relevant data in RAM (and RAM disks), high parallelization, and very short timeouts on their servers returning results. In theory, their results are non-deterministic though I've never noticed it myself. Surely, if a bunch of networked computers can have very fast timeouts and return results near-instantly, we could do the same locally on a very limited data set. It's a little late in the process to think about this, but, if every change to the query were a new "one-shot" search (optimizing for additional typing narrowing existing results, as Gnome Shell does), one-shot searching may be the simplest thing to support and sufficient for most use cases. What are the use cases for live search that couldn't be reasonably handled by a one-shot query and listening to key IndividualAggregator signals like "individuals-deleted"? > So, what needs to be done: > 1. Travis, I’d appreciate your opinions on sorting and pagination. If you > agree with me, I can go ahead and remove the SearchView.sorted_individuals and > SearchView.get_sorted_page() APIs. See above. > 2. We need to incorporate match strength into the results. Do we actually need > to expose a number for it? Or would it be OK to always sort > SearchView.individuals by decreasing match strength? I suspect there are no > useful situations where we need to have a measure of the relative match > strength of two results. The upside of putting a number on match strength would be be that the application could handle results streaming in and sorting itself. This would be important for search-only backends which could have significant delays in results and/or return them piecemeal. That leads to the sorting instability I'd like to avoid but it may be unavoidable in some cases (where we have to query server(s) we have no control over and which perform poorly). I think we /should/ sort results before initially presenting them in an order that makes sense to reduce sort instability. In the common case, late results might not affect the visible portion of the results in the application. > 3. That does mean SearchView.individuals has to be a SortedSet<Individual>. See answer to 2.
Created attachment 295198 [details] [review] core: Implement base classes for searching
Created attachment 295199 [details] [review] inspect: Add search option to folks-inspect
Here are some updated patches. Main changes: • Added a SearchView.unprepare() method to mirror SearchView.prepare() and IndividualAggregator.unprepare(). • Moved the search tests to use the dummy backend, which makes them a bit more reliable. • Pushed several fixes to master which make the tests pass for me now. Thanks for your reply Travis. I’ll comment on it soon.
(In reply to comment #41) > (In reply to comment #38) > > • Added SearchView.refresh() for explicitly refreshing results. This is > > currently pointless because we listen to PersonaStore signals and dynamically > > update the result set. However, it will be useful for search-only PersonaStores > > (bug #660299), which do not emit change notifications. The only other > > interaction we need with such stores is to pass them a Query object when > > preparing the SearchView (and when SearchView.query is changed), which fits in > > fine with the current API. > > My preference would be to move that API to the branch for bug #660299 so we > don't introduce API until it actually does something under at least one set of > conditions. > > I'd be more compelled if there were a language-based reason to add anticipated > API where it would be much more awkward to add it later. I think it should be added in this first round, so that people can start using it appropriately. Otherwise they will have to retrofit it to applications, which will never work as well. > > • Match strength: I think this is a really important idea, and should be how > > we sort results: by match strength, rather than full name. In the common case > > of searching for a contact by name, match strength should approximate sorting > > the results by full name anyway. > > Yes, definitely, match strength is most important. Most people will be most > interested in full name matches (so, if it matters, that should be prioritized) > though matching on other fields is also useful. And I think it's also important > that we match within strings (eg, someone's surname needs to match them > strongly). I occasionally still see search features which only match from the > front of the field and it's jarring how much it's making the user work harder > for something that's trivial in software. It /might/ be meaningful to factor in > word boundaries though (eg, if someone looking for "Alice Smith" isn't going to > start typing "esmi" or "e smi"). > > My biggest concern with sorting is that applications don't re-sort results > around rapidly on the user, particularly when they're going to trigger some > action based on the top/highlighted result without manually waiting for things > to settle. Agreed. > On the other hand, for "one-shot" results (which I think is an important mode > for SearchView to support), Google somehow does extremely well with this > through keeping all relevant data in RAM (and RAM disks), high parallelization, > and very short timeouts on their servers returning results. In theory, their > results are non-deterministic though I've never noticed it myself. Surely, if a > bunch of networked computers can have very fast timeouts and return results > near-instantly, we could do the same locally on a very limited data set. SearchView does support one-shot results as: yield search_view.prepare (); use_my_results (search_view.individuals); > It's a little late in the process to think about this, but, if every change to > the query were a new "one-shot" search (optimizing for additional typing > narrowing existing results, as Gnome Shell does), one-shot searching may be the > simplest thing to support and sufficient for most use cases. > > What are the use cases for live search that couldn't be reasonably handled by a > one-shot query and listening to key IndividualAggregator signals like > "individuals-deleted"? A search operation where the user changes persona stores part-way through. For example, I could imagine opening gnome-contacts to search for a contact who I added on my phone today. My phone isn’t currently paired via Bluetooth, so the contact isn’t found. I realise this, pair my phone, and the results should be updated without me having to touch gnome-contacts. I think the existing API, with its dynamic updates to SearchView.individuals, satisfies both this and the one-shot use case. > > So, what needs to be done: > > 1. Travis, I’d appreciate your opinions on sorting and pagination. If you > > agree with me, I can go ahead and remove the SearchView.sorted_individuals and > > SearchView.get_sorted_page() APIs. > > See above. OK, I will remove those APIs in the next iteration. > > 2. We need to incorporate match strength into the results. Do we actually need > > to expose a number for it? Or would it be OK to always sort > > SearchView.individuals by decreasing match strength? I suspect there are no > > useful situations where we need to have a measure of the relative match > > strength of two results. > > The upside of putting a number on match strength would be be that the > application could handle results streaming in and sorting itself. This would be > important for search-only backends which could have significant delays in > results and/or return them piecemeal. That leads to the sorting instability I'd > like to avoid but it may be unavoidable in some cases (where we have to query > server(s) we have no control over and which perform poorly). > > I think we /should/ sort results before initially presenting them in an order > that makes sense to reduce sort instability. In the common case, late results > might not affect the visible portion of the results in the application. If the results are a SortedSet, the application doesn’t have to worry about sorting, so doesn’t need the match strengths. Two situations: 1. One-shot: Application takes sorted results from SearchView.individuals and displays them in order. 2. Dynamic: Application has taken and displayed an initial set of results. SearchView.individuals_changed_detailed is emitted with a SortedSet<Individual> for added and another for removed. The application can either display those Individuals without ordering; ignore them if it doesn’t want to upset the UI; or query for them in SearchView.individuals to get their relative position, using SortedSet.lower() and .higher() to get the next elements (which, for the first and last elements in the signalled SortedSet have already been displayed). So in summary my API proposal is: public class SearchView { public SortedSet<Individual> individuals; public signal void individuals_changed_detailed (SortedSet<Individual> added, SortedSet<Individual> removed); } The change to individuals_changed_detailed does lose us the information about mappings between old and new individuals, but I think that’s OK — it was intended for updating editor views, which are not applicable for search results. If the user wants that, they will have to match it up with IndividualAggregator.individuals_changed_detailed.
(In reply to comment #45) > (In reply to comment #41) > > (In reply to comment #38) > > > • Added SearchView.refresh() for explicitly refreshing results. This is > > > currently pointless because we listen to PersonaStore signals and dynamically > > > update the result set. However, it will be useful for search-only PersonaStores > > > (bug #660299), which do not emit change notifications. The only other > > > interaction we need with such stores is to pass them a Query object when > > > preparing the SearchView (and when SearchView.query is changed), which fits in > > > fine with the current API. > > > > My preference would be to move that API to the branch for bug #660299 so we > > don't introduce API until it actually does something under at least one set of > > conditions. > > > > I'd be more compelled if there were a language-based reason to add anticipated > > API where it would be much more awkward to add it later. > > I think it should be added in this first round, so that people can start using > it appropriately. Otherwise they will have to retrofit it to applications, > which will never work as well. How can you know if you're using it appropriately if it's a no-op? :) Any application using SearchView is going to have to set up the signal handlers either way (unless they can guarantee they're only using search-only backends, and even then, I think that'd be a bad practice in case their backends ever change) and they'll just use mostly the same logic after calling refresh(). Is there any advantage to having refresh() be a public function rather than just allowing applications to set the existing Query and refresh afterward (or even having them create a new equivalent Query and setting that)? If we made that change, applications could be check to only pass in the existing Query if they don't want to do extra work which would be pretty obvious. It's not a deal-breaker to me, just thought I'd explain my logic. > SearchView does support one-shot results as: > yield search_view.prepare (); > use_my_results (search_view.individuals); Ah, of course. Good point. > > It's a little late in the process to think about this, but, if every change to > > the query were a new "one-shot" search (optimizing for additional typing > > narrowing existing results, as Gnome Shell does), one-shot searching may be the > > simplest thing to support and sufficient for most use cases. > > > > What are the use cases for live search that couldn't be reasonably handled by a > > one-shot query and listening to key IndividualAggregator signals like > > "individuals-deleted"? > > A search operation where the user changes persona stores part-way through. For > example, I could imagine opening gnome-contacts to search for a contact who I > added on my phone today. My phone isn’t currently paired via Bluetooth, so the > contact isn’t found. I realise this, pair my phone, and the results should be > updated without me having to touch gnome-contacts. Good point. > I think the existing API, with its dynamic updates to SearchView.individuals, > satisfies both this and the one-shot use case. > > > 2. We need to incorporate match strength into the results. Do we actually need > > > to expose a number for it? Or would it be OK to always sort > > > SearchView.individuals by decreasing match strength? I suspect there are no > > > useful situations where we need to have a measure of the relative match > > > strength of two results. > > > > The upside of putting a number on match strength would be be that the > > application could handle results streaming in and sorting itself. This would be > > important for search-only backends which could have significant delays in > > results and/or return them piecemeal. That leads to the sorting instability I'd > > like to avoid but it may be unavoidable in some cases (where we have to query > > server(s) we have no control over and which perform poorly). > > > > I think we /should/ sort results before initially presenting them in an order > > that makes sense to reduce sort instability. In the common case, late results > > might not affect the visible portion of the results in the application. > > If the results are a SortedSet, the application doesn’t have to worry about > sorting, so doesn’t need the match strengths. > > Two situations: > 1. One-shot: Application takes sorted results from SearchView.individuals and > displays them in order. > 2. Dynamic: Application has taken and displayed an initial set of results. > SearchView.individuals_changed_detailed is emitted with a SortedSet<Individual> > for added and another for removed. The application can either display those > Individuals without ordering; ignore them if it doesn’t want to upset the UI; > or query for them in SearchView.individuals to get their relative position, > using SortedSet.lower() and .higher() to get the next elements (which, for the > first and last elements in the signalled SortedSet have already been > displayed). That works. I do think the dynamic case's support for inserting the changes in sorted order isn't immediately obvious for anyone not familiar with SortedSet (I had to look up its API reference), so we should be careful to include an example in our own API reference. > So in summary my API proposal is: > public class SearchView { > public SortedSet<Individual> individuals; > > public signal void individuals_changed_detailed (SortedSet<Individual> added, > SortedSet<Individual> > removed); > } > > The change to individuals_changed_detailed does lose us the information about > mappings between old and new individuals, but I think that’s OK — it was > intended for updating editor views, which are not applicable for search > results. If the user wants that, they will have to match it up with > IndividualAggregator.individuals_changed_detailed. Hmm, that is a good point and can make the API a bit less convenient. In effect, every application would have to listen to IndividualAggregator.individuals_changed_detailed where they're using SearchView because, otherwise, any Individual the application presents in its UI could have been replaced by the time the user interacts with it. And we'd have to explain this clearly in the API docs.
(In reply to comment #46) > (In reply to comment #45) > > (In reply to comment #41) > > > (In reply to comment #38) > > > > • Added SearchView.refresh() for explicitly refreshing results. This is > > > > currently pointless because we listen to PersonaStore signals and dynamically > > > > update the result set. However, it will be useful for search-only PersonaStores > > > > (bug #660299), which do not emit change notifications. The only other > > > > interaction we need with such stores is to pass them a Query object when > > > > preparing the SearchView (and when SearchView.query is changed), which fits in > > > > fine with the current API. > How can you know if you're using it appropriately if it's a no-op? :) Because it’s documented. Do you think that’s enough? > Any application using SearchView is going to have to set up the signal handlers > either way (unless they can guarantee they're only using search-only backends, > and even then, I think that'd be a bad practice in case their backends ever > change) and they'll just use mostly the same logic after calling refresh(). Nope, they can wait for SearchView.is_quiescent then just read SearchView.individuals, without needing the other signals. > Is there any advantage to having refresh() be a public function rather than > just allowing applications to set the existing Query and refresh afterward (or > even having them create a new equivalent Query and setting that)? If we made > that change, applications could be check to only pass in the existing Query if > they don't want to do extra work which would be pretty obvious. I was imagining the use case of a ‘refresh’ button in the UI, where the user will explicitly want to refresh that instant. > > I think the existing API, with its dynamic updates to SearchView.individuals, > > satisfies both this and the one-shot use case. > > > > > 2. We need to incorporate match strength into the results. Do we actually need > > > > to expose a number for it? Or would it be OK to always sort > > > > SearchView.individuals by decreasing match strength? I suspect there are no > > > > useful situations where we need to have a measure of the relative match > > > > strength of two results. > That works. I do think the dynamic case's support for inserting the changes in > sorted order isn't immediately obvious for anyone not familiar with SortedSet > (I had to look up its API reference), so we should be careful to include an > example in our own API reference. Yeah, for sure. I’ll see if any better API designs come to mind when I’m implementing it. > > So in summary my API proposal is: > > public class SearchView { > > public SortedSet<Individual> individuals; > > > > public signal void individuals_changed_detailed (SortedSet<Individual> added, > > SortedSet<Individual> > > removed); > > } > > > > The change to individuals_changed_detailed does lose us the information about > > mappings between old and new individuals, but I think that’s OK — it was > > intended for updating editor views, which are not applicable for search > > results. If the user wants that, they will have to match it up with > > IndividualAggregator.individuals_changed_detailed. > > Hmm, that is a good point and can make the API a bit less convenient. In > effect, every application would have to listen to > IndividualAggregator.individuals_changed_detailed where they're using > SearchView because, otherwise, any Individual the application presents in its > UI could have been replaced by the time the user interacts with it. And we'd > have to explain this clearly in the API docs. What makes you think that? All individuals notified as being added or removed in IndividualAggregator.individuals_changed_detailed would still be notified in SearchView.individuals_changed_detailed, so no stale individuals could remain in the UI. The only information missing would be the associations between added and removed individuals — i.e. “individual A replaces individual B” rather than “individual A was added and coincidentally individual B was removed”.
Created attachment 295303 [details] [review] core: Implement base classes for searching
Created attachment 295304 [details] [review] inspect: Add search option to folks-inspect
Here are some updated patches: • New API implemented. • Match strength implemented and tested to give reasonable results (and in a reasonable order) with name matching. • folks-inspect search command tweaked to print one line per result, instead of the full individual’s details. You can get those by using the ‘individuals’ command. • Unit tests cleaned up a little. One thing I’ve spotted is that searching doesn’t ignore accents in the way you’d expect. We should probably port the clever tpaw-live-search.c code to SimpleQuery so we can do matching without accents. I don’t have time for that tonight though, and I wanted to get these patches out for a bit of review first.
(In reply to comment #47) > (In reply to comment #46) > > (In reply to comment #45) > > > (In reply to comment #41) > > > > (In reply to comment #38) > > > > > • Added SearchView.refresh() for explicitly refreshing results. This is > > > > > currently pointless because we listen to PersonaStore signals and dynamically > > > > > update the result set. However, it will be useful for search-only PersonaStores > > > > > (bug #660299), which do not emit change notifications. The only other > > > > > interaction we need with such stores is to pass them a Query object when > > > > > preparing the SearchView (and when SearchView.query is changed), which fits in > > > > > fine with the current API. > > How can you know if you're using it appropriately if it's a no-op? :) > > Because it’s documented. Do you think that’s enough? Sure > > Any application using SearchView is going to have to set up the signal handlers > > either way (unless they can guarantee they're only using search-only backends, > > and even then, I think that'd be a bad practice in case their backends ever > > change) and they'll just use mostly the same logic after calling refresh(). > > Nope, they can wait for SearchView.is_quiescent then just read > SearchView.individuals, without needing the other signals. Ah, good point. > > Is there any advantage to having refresh() be a public function rather than > > just allowing applications to set the existing Query and refresh afterward (or > > even having them create a new equivalent Query and setting that)? If we made > > that change, applications could be check to only pass in the existing Query if > > they don't want to do extra work which would be pretty obvious. > > I was imagining the use case of a ‘refresh’ button in the UI, where the user > will explicitly want to refresh that instant. That seems unnecessary for a UI. I don't think I've seen any UI that includes a "refresh results" button but it seems particularly like overkill when the application can automatically keep the results up-to-date anyhow. And, in the worst case (if the UI doesn't auto-update and the user expects different results upon re-query), the user can always submit the same query which I think is pretty self-evident. > > > I think the existing API, with its dynamic updates to SearchView.individuals, > > > satisfies both this and the one-shot use case. > > > > > > > 2. We need to incorporate match strength into the results. Do we actually need > > > > > to expose a number for it? Or would it be OK to always sort > > > > > SearchView.individuals by decreasing match strength? I suspect there are no > > > > > useful situations where we need to have a measure of the relative match > > > > > strength of two results. > > That works. I do think the dynamic case's support for inserting the changes in > > sorted order isn't immediately obvious for anyone not familiar with SortedSet > > (I had to look up its API reference), so we should be careful to include an > > example in our own API reference. > > Yeah, for sure. I’ll see if any better API designs come to mind when I’m > implementing it. > > > > So in summary my API proposal is: > > > public class SearchView { > > > public SortedSet<Individual> individuals; > > > > > > public signal void individuals_changed_detailed (SortedSet<Individual> added, > > > SortedSet<Individual> > > > removed); > > > } > > > > > > The change to individuals_changed_detailed does lose us the information about > > > mappings between old and new individuals, but I think that’s OK — it was > > > intended for updating editor views, which are not applicable for search > > > results. If the user wants that, they will have to match it up with > > > IndividualAggregator.individuals_changed_detailed. > > > > Hmm, that is a good point and can make the API a bit less convenient. In > > effect, every application would have to listen to > > IndividualAggregator.individuals_changed_detailed where they're using > > SearchView because, otherwise, any Individual the application presents in its > > UI could have been replaced by the time the user interacts with it. And we'd > > have to explain this clearly in the API docs. > > What makes you think that? All individuals notified as being added or removed > in IndividualAggregator.individuals_changed_detailed would still be notified in > SearchView.individuals_changed_detailed, so no stale individuals could remain > in the UI. The only information missing would be the associations between added > and removed individuals — i.e. “individual A replaces individual B” rather than > “individual A was added and coincidentally individual B was removed”. Ah, sorry, you're right. That's only the case if the Individual is being displayed in something like an editor, like you said. And, in that case, it's up to that UI to track updates if they want to, eg, avoid silently overwriting details upon save. So, that sounds fine to me then.
Created attachment 295965 [details] [review] core: Implement base classes for searching This bumps our GLib dependency to 2.40.0 for use of g_str_tokenize_and_fold() to perform the fuzzy search matching. This adds the following new API allowing for dynamic search views over an IndividualAggregator: • Query interface • SimpleQuery class • SearchView class It includes a fairly comprehensive test suite.
Created attachment 295966 [details] [review] inspect: Add search option to folks-inspect
The set of patches I just uploaded have the following changes: • Drop QueryStrictnessFlags. Xavier and I could not think of a sensible use case for the non-default flags, and they were massively complicating item 2… • Use g_str_tokenize_and_fold() to effortlessly support matching ASCII search queries against non-ASCII tokens from Individuals. I would like to spend some time cleaning up the tests, because I’ve turned them into a bit of a mess over the iterations of these patches. But apart from that, I consider the patch set done and in need of review. Will you have time in the next couple of days for that, Travis, or should I ask someone else? Thanks. :-) (In reply to comment #51) > > > Is there any advantage to having refresh() be a public function rather than > > > just allowing applications to set the existing Query and refresh afterward (or > > > even having them create a new equivalent Query and setting that)? If we made > > > that change, applications could be check to only pass in the existing Query if > > > they don't want to do extra work which would be pretty obvious. > > > > I was imagining the use case of a ‘refresh’ button in the UI, where the user > > will explicitly want to refresh that instant. > > That seems unnecessary for a UI. I don't think I've seen any UI that includes a > "refresh results" button but it seems particularly like overkill when the > application can automatically keep the results up-to-date anyhow. > > And, in the worst case (if the UI doesn't auto-update and the user expects > different results upon re-query), the user can always submit the same query > which I think is pretty self-evident. I’m not arguing this very well. What about the case where a new search-only store is added in the background, after search results have returned? I’m not sure we want the SearchView to automatically search it, as that would lead to the result set jumping around seemingly for no reason. If the UI thinks it’s a good idea, it can call the refresh() method to explicitly request the new store be queried. At the very least, asking the user’s code to do ‘search_view.query = search_view.query’ is yucky, and refresh() is more descriptive. It also allows for feedback on when the refresh finishes (when the async method returns).
(In reply to comment #54) > The set of patches I just uploaded have the following changes: > • Drop QueryStrictnessFlags. Xavier and I could not think of a sensible use > case for the non-default flags, and they were massively complicating item 2… > • Use g_str_tokenize_and_fold() to effortlessly support matching ASCII search > queries against non-ASCII tokens from Individuals. > > I would like to spend some time cleaning up the tests, because I’ve turned them > into a bit of a mess over the iterations of these patches. But apart from that, > I consider the patch set done and in need of review. Will you have time in the > next couple of days for that, Travis, or should I ask someone else? Thanks. :-) I'll try to but please don't block on me if you're in a hurry. > (In reply to comment #51) > > > > Is there any advantage to having refresh() be a public function rather than > > > > just allowing applications to set the existing Query and refresh afterward (or > > > > even having them create a new equivalent Query and setting that)? If we made > > > > that change, applications could be check to only pass in the existing Query if > > > > they don't want to do extra work which would be pretty obvious. > > > > > > I was imagining the use case of a ‘refresh’ button in the UI, where the user > > > will explicitly want to refresh that instant. > > > > That seems unnecessary for a UI. I don't think I've seen any UI that includes a > > "refresh results" button but it seems particularly like overkill when the > > application can automatically keep the results up-to-date anyhow. > > > > And, in the worst case (if the UI doesn't auto-update and the user expects > > different results upon re-query), the user can always submit the same query > > which I think is pretty self-evident. > > I’m not arguing this very well. What about the case where a new search-only > store is added in the background, after search results have returned? I’m not > sure we want the SearchView to automatically search it, as that would lead to > the result set jumping around seemingly for no reason. If the UI thinks it’s a > good idea, it can call the refresh() method to explicitly request the new store > be queried. So, it sounds like refresh() will mainly be to pull in results due to added sources. Maybe a different name could clarify that it's not required simply for new Individuals but only for additions of search-only PersonaStores and to re-query those PersonaStores for new members. Would requery() be sufficiently more-descriptive? > At the very least, asking the user’s code to do ‘search_view.query = > search_view.query’ is yucky, and refresh() is more descriptive. It also allows > for feedback on when the refresh finishes (when the async method returns). You're right that that expression is a little cryptic if you don't carefully check "query"'s set() function though I would expect a short comment to remind of the intended side effect in that case. But it seems worthwhile to avoid this so we can wait on the async function.
Created attachment 295999 [details] [review] core: Implement base classes for searching This bumps our GLib dependency to 2.40.0 for use of g_str_tokenize_and_fold() to perform the fuzzy search matching. This adds the following new API allowing for dynamic search views over an IndividualAggregator: • Query interface • SimpleQuery class • SearchView class It includes a fairly comprehensive test suite.
Created attachment 296000 [details] [review] inspect: Add search option to folks-inspect
Updated the unit tests to simplify a few things and give better coverage of the string tokenisation and folding code. I think this is entirely ready for review.
Review of attachment 295999 [details] [review]: With the caveat that I haven't build or smoke-tested this, here is my review: This looks pretty good at this point. Most of my comments are fairly minor. + /** + * Determines whether a given {@link Individual} matches this query. + * + * @param individual an {@link Individual} to match against + * @return a positive integer if the individual matches this query, or zero + * if they do not match; higher numbers indicate a better match + * @since UNRELEASED + */ + public abstract uint is_match (Individual individual); If we intend to not leak a given value for match strength, we might want to keep this private and make the public version return a bool instead. + private Query _query; + /** + * The {@link Query} that this view is based upon. + * + * If this {@link SearchView} has already been prepared, this will force a + * re-evaluation of all {@link Individual}s in the + * {@link IndividualAggregator}, which can be an expensive operation. s/this will force/setting this will force/ s/, which/ which/ + /** + * Emitted when one or more {@link Individual}s are added to or removed from + * the view. + * + * The structure of `changes` is the same as in + * {@link IndividualAggregator.individuals_changed_detailed}. + * + * @param added a set of {@link Individual}s added to the search view + * @param removed a set of {@link Individual}s removed from the search view + * + * @see IndividualAggregator.individuals_changed_detailed + * @since UNRELEASED + */ + public signal void individuals_changed_detailed (SortedSet<Individual> added, + SortedSet<Individual> removed); The reference to `changes` is outdated and needs to be replaced with `added` and `removed` + public async void unprepare () throws GLib.Error + { + if (!this._is_prepared || this._prepare_pending) + { + return; + } + + try + { + /* Release the aggregator. */ + yield this._aggregator.unprepare (); Are we sure we want to chain up to unprepare() (or, at least, recommend it unconditionally in the docstring)? It seems pretty likely that clients done with a SearchView won't necessarily want to unprepare the IndividualAggregator. + /* FIXME: This is a pretty big hack. Ideally, we would use a custom + * SortedSmallSet implementation, written in C and using a GPtrArray, + * instead of TreeSet and this GObject data hackery. + * + * However, since we’re dealing with small result sets, this is good + * enough for a first implementation. */ + var key = "folks-match-strength-%p".printf (this); Maybe we should factor out this special string as a function result since it's aliased elsewhere. It'd add a little function call overhead but maybe there's a decoration to hint inlining to the compiler? + /* Break match strength ties by display name. */ + var display_name = a.display_name.collate (b.display_name); + if (display_name != 0) + return display_name; + How would ties even happen unless objects are being compared to themselves (which I would expect any function using this to avoid)? + /* individuals_new added (ind_old == null) or replacing (!= null) + */ + if (individuals_new != null && individuals_new.size > 0) + { I'm not sure what this comment is saying since we just skip over this block in the first case. This is repeated again below it. + private void _individual_notify_cb (Object obj, ParamSpec ps) + { + var individual = obj as Individual; + if (individual != null) + { + var had_individual = + (individual != null && this._individuals.contains (individual)); Redundant "individual != null" check. + * This re-evaluates the query immediately, so most clients should implement + * de-bouncing to ensure re-evaluation only happens when (for example) the + * user has stopped typing a new query. Probably true currently though I don't see why the debouncing window couldn't be so small the user wouldn't feel an artificial slow-down (eg, 50 ms, though that number is arbitrary and maybe 100 ms would be fine). Maybe we should specifically suggest that a very narrow window is probably fine and experimentation is recommended? + * @param query_string text to match contacts against. Results will match all + * tokens within the whitespace-delimited string (logical-ANDing the tokens). + * A value of "" will match all contacts. However, it is recommended to not + * use a query at all if filtering is not required. + * @param match_fields the field names to apply this query to. See + * {@link Query.match_fields} for more details. An empty array will match all + * contacts. However, it is not recommended to use a query if filtering is not + * required. It might be better in each case to say "it is recommended to use the IndividualAggregator directly if filtering is not required", if that means what I think it does (and it probably does, since I think I wrote that). + foreach (var prop_name in this.match_fields) + { + /* FIXME: can't be var because of bgo#638208 */ + unowned ObjectClass iclass = individual.get_class (); This is apparently fixed in Vala. + /* FIXME: there's a lot of unnecessary work here just to get the key + * and value types due to bgo#663339 */ + var multi_map_keys = prop_value_multi_map.get_keys (); + var key_type = multi_map_keys.element_type; + var multi_map_values = prop_value_multi_map.get_values (); + var value_type = multi_map_values.element_type; This is supposed to be fixed in libgee.
Review of attachment 296000 [details] [review]: Looks good to merge after the first patch.
Created attachment 296265 [details] [review] fixup! core: Implement base classes for searching
Created attachment 296266 [details] [review] fixup! core: Implement base classes for searching
Created attachment 296267 [details] [review] fixup! core: Implement base classes for searching
Thanks for the review. I’ve addressed all the comments except the ones below in three fixup commits. I can provide a squashed version if you would find that easier to review. (In reply to comment #59) > + /** > + * Determines whether a given {@link Individual} matches this query. > + * > + * @param individual an {@link Individual} to match against > + * @return a positive integer if the individual matches this query, or zero > + * if they do not match; higher numbers indicate a better match > + * @since UNRELEASED > + */ > + public abstract uint is_match (Individual individual); > > If we intend to not leak a given value for match strength, we might want to > keep this private and make the public version return a bool instead. I don’t think it’s worth the complexity, and exposing the match strength here does give us a bit of an escape hatch if we want to expose it in future. (Also, there is actually no way to have a private member of an interface.) > + /* Break match strength ties by display name. */ > + var display_name = a.display_name.collate (b.display_name); > + if (display_name != 0) > + return display_name; > + > > How would ties even happen unless objects are being compared to themselves > (which I would expect any function using this to avoid)? > > > + /* individuals_new added (ind_old == null) or replacing (!= > null) > + */ > + if (individuals_new != null && individuals_new.size > 0) > + { > > I'm not sure what this comment is saying since we just skip over this block in > the first case. This is repeated again below it. I’ve refactored that whole block, since I noticed it was using MultiMap.get_keys() and .get_values() which are a performance nightmare. > + * This re-evaluates the query immediately, so most clients should implement > + * de-bouncing to ensure re-evaluation only happens when (for example) the > + * user has stopped typing a new query. > > Probably true currently though I don't see why the debouncing window couldn't > be so small the user wouldn't feel an artificial slow-down (eg, 50 ms, though > that number is arbitrary and maybe 100 ms would be fine). Maybe we should > specifically suggest that a very narrow window is probably fine and > experimentation is recommended? Not sure what you mean here? Are you referring to it suggesting that the query is only updated once the user has finished entering all of their search terms? If so, that’s not quite what I meant, but I could clarify the wording — “when (for example) the user has paused in their typing.”? > + foreach (var prop_name in this.match_fields) > + { > + /* FIXME: can't be var because of bgo#638208 */ > + unowned ObjectClass iclass = individual.get_class (); > > This is apparently fixed in Vala. Yes, though it now disallows use of ‘var’ there because it’s implicitly owned. We have to stick with ‘unowned ObjectClass’. I’ve removed the FIXME.
Review of attachment 296265 [details] [review]: Looks good. - /* FIXME: can't be var because of bgo#638208 */ unowned ObjectClass iclass = individual.get_class (); Any reason you cut the comment without taking advantage of Vala now supporting 'var' here?
Review of attachment 296266 [details] [review]: Looks good
Review of attachment 296267 [details] [review]: Looks good
(In reply to comment #64) > Thanks for the review. I’ve addressed all the comments except the ones below in > three fixup commits. I can provide a squashed version if you would find that > easier to review. > > (In reply to comment #59) > > + /** > > + * Determines whether a given {@link Individual} matches this query. > > + * > > + * @param individual an {@link Individual} to match against > > + * @return a positive integer if the individual matches this query, or zero > > + * if they do not match; higher numbers indicate a better match > > + * @since UNRELEASED > > + */ > > + public abstract uint is_match (Individual individual); > > > > If we intend to not leak a given value for match strength, we might want to > > keep this private and make the public version return a bool instead. > > I don’t think it’s worth the complexity, and exposing the match strength here > does give us a bit of an escape hatch if we want to expose it in future. > > (Also, there is actually no way to have a private member of an interface.) OK, no big deal. > > + /* individuals_new added (ind_old == null) or replacing (!= > > null) > > + */ > > + if (individuals_new != null && individuals_new.size > 0) > > + { > > > > I'm not sure what this comment is saying since we just skip over this block in > > the first case. This is repeated again below it. > > I’ve refactored that whole block, since I noticed it was using > MultiMap.get_keys() and .get_values() which are a performance nightmare. Great. > > + * This re-evaluates the query immediately, so most clients should implement > > + * de-bouncing to ensure re-evaluation only happens when (for example) the > > + * user has stopped typing a new query. > > > > Probably true currently though I don't see why the debouncing window couldn't > > be so small the user wouldn't feel an artificial slow-down (eg, 50 ms, though > > that number is arbitrary and maybe 100 ms would be fine). Maybe we should > > specifically suggest that a very narrow window is probably fine and > > experimentation is recommended? > > Not sure what you mean here? Are you referring to it suggesting that the query > is only updated once the user has finished entering all of their search terms? > If so, that’s not quite what I meant, but I could clarify the wording — “when > (for example) the user has paused in their typing.”? Sorry, I was tired when I did this review. Yeah, the main thing is to encourage debouncing to the point the user has paused typing with the intention of seeing results updated. With a little searching (heh), it looks like a reasonable range for this single timeout value is about 500 to 1000 ms: http://ux.stackexchange.com/questions/38543/amount-of-time-to-determine-a-user-has-stopped-typing But an even better approach seems to be one that uses two different timeouts: http://ux.stackexchange.com/a/34363 It might be best to just leave application developers to their own devices and just give them the high-level summary (de-bounce as long as the user is actively typing). Though, given our context, it may just be a best practice to not debounce at all (or at least stop once they hit two characters) since some real names or nicknames can be 2-3 characters (eg, "Jo", "Bob") and we should be able to return all matching local contacts (from a pool on the order of thousands) nearly instantly on any system Folks would run on. My Android KitKat phone's Contacts app appears to not debounce at all. The instant I type the first character, I have results (searching all fields). > > + foreach (var prop_name in this.match_fields) > > + { > > + /* FIXME: can't be var because of bgo#638208 */ > > + unowned ObjectClass iclass = individual.get_class (); > > > > This is apparently fixed in Vala. > > Yes, though it now disallows use of ‘var’ there because it’s implicitly owned. > We have to stick with ‘unowned ObjectClass’. I’ve removed the FIXME. Ah, please ignore my comment on the first of these three patches.
Comments from Erick by e-mail, with my replies inline: + SearchView::individuals_changed_detailed signal: On the > documentation it says that I will be able to insert the added results > its proper place on a partially added list of results. Am I getting > this right? Will I be able of that. Will it use the same rating scale > of the previous emissions of the signal Yup, that’s correct. It will use the same match strength scale, but that scale is not part of the API, so you should only use it for pairwise comparisons. The scale does not change over the process’ lifetime. I’ll expand the documentation in a fixup commit shortly. > + SearchView::completed I would like a signal with something like > this. To know the matching completed. I know I needed to hack > something like this for a shell-search provider, and right now I don't > find a proper reason for it's use in contacts, but that can be helpful > in some way. Depends whether you want ‘live’ or ‘snapshot’ search results — whether they will continue to update over a long period of time as persona stores go online and offline or individuals are edited. For a shell search provider, you want snapshot results. For a search in gnome-contacts you probably want live results. In both cases, SearchView.individuals is guaranteed to be correct after SearchView.prepare() finishes. So your ‘complete signal’ is the async return from SearchView.prepare(). For live results, you then need to continue listening to the SearchView.individuals_changed_detailed signal. Does that make sense and fit in with how you want to use it?
Created attachment 296481 [details] [review] fixup! core: Implement base classes for searching
(In reply to Philip Withnall from comment #70) > Created attachment 296481 [details] [review] [review] > fixup! core: Implement base classes for searching Unless anybody has any further comments I plan to merge this tomorrow and do a release tomorrow or early next week.
(In reply to Philip Withnall from comment #71) > (In reply to Philip Withnall from comment #70) > > Created attachment 296481 [details] [review] [review] [review] > > fixup! core: Implement base classes for searching > > Unless anybody has any further comments I plan to merge this tomorrow and do > a release tomorrow or early next week. Sounds good to me. Thanks for all your hard work!
Pushed. 0.11.0 will be released shortly. Attachment 295999 [details] pushed as 8a07a8c - core: Implement base classes for searching Attachment 296000 [details] pushed as b4c3375 - inspect: Add search option to folks-inspect
Attachment 295999 [details] causes compilation error when using clang: tests/dummy/search-view.vala:63:373: error: hex escape sequence out of range ...{"Pa\xf1", "persona3"}, {"Pa\x6e\x303", "persona3"}, ... ^~~~~
Created attachment 296994 [details] [review] tests: Fix UTF-8 encoding of the letter énye in test vectors
Whoops, fixed. Thanks for the notice. Attachment 296994 [details] pushed as 8e7b403 - tests: Fix UTF-8 encoding of the letter énye in test vectors