GNOME Bugzilla – Bug 749938
Can't combine search specifiers
Last modified: 2016-04-27 05:14:23 UTC
Created attachment 304045 [details] relevant debug messages, combining search specifiers For example, the queries 'hello from:matthew' or 'to:matthew from:matthew' do not work correctly. I have attached the relevant debug messages, the sqlite error is: 'unable to use function MATCH in the requested context.' I don't think having multiple MATCH operators like that using AND is valid http://sqlfiddle.com/#!7/074ac/9
I can reproduce this. (Rather easily too -- I'm surprised I hadn't run across this already.) Jim, do you know how to fix this, or should I start boning up on SQL?
Created attachment 304298 [details] [review] fix to allow combining search specifiers Here is a fix attempt, please have a look. Basically it uses this query syntax, the ":" character to specify columns http://www.sqlite.org/fts3.html#section_3 . I am not sure if this approach is compatible with how ":" characters in queries are currently handled but it behaves fine for me.
Comment on attachment 304298 [details] [review] fix to allow combining search specifiers diff --git a/src/engine/imap-db/imap-db-account.vala b/src/engine/imap-db/imap-db-account.vala index 9e29cfa..efe8ff2 100644 --- a/src/engine/imap-db/imap-db-account.vala +++ b/src/engine/imap-db/imap-db-account.vala @@ -959,6 +959,7 @@ private class Geary.ImapDB.Account : BaseObject { Gee.List<SearchTerm>? terms = query.get_search_terms(field); if (terms == null || terms.size == 0) continue; + field = field ?? ""; // Each SearchTerm is an AND but the SQL text within in are OR ... this allows for // each user term to be AND but the variants of each term are or. So, if terms are @@ -977,43 +978,45 @@ private class Geary.ImapDB.Account : BaseObject { // // party* OR parti* eventful* OR event* StringBuilder builder = new StringBuilder(); + bool is_first_exact = true; foreach (SearchTerm term in terms) { if (term.sql.size == 0) continue; if (term.is_exact) { + if (is_first_exact) + builder.append_printf("%s:", field); builder.append_printf("%s ", term.parsed); + is_first_exact = false; } else { bool is_first_sql = true; foreach (string sql in term.sql) { if (!is_first_sql) builder.append(" OR "); - builder.append_printf("%s ", sql); + builder.append_printf("%s:%s ", field, sql); is_first_sql = false; } + is_first_exact = true; } } - phrases.set(field ?? "MessageSearchTable", builder.str); + phrases.set(field ?? "", builder.str); } return phrases; } - private void sql_add_query_phrases(StringBuilder sql, Gee.HashMap<string, string> query_phrases) { - foreach (string field in query_phrases.keys) - sql.append(" AND %s MATCH ?".printf(field)); - } - private int sql_bind_query_phrases(Db.Statement stmt, int start_index, Gee.HashMap<string, string> query_phrases) throws Geary.DatabaseError { int i = start_index; + StringBuilder query_expression = new StringBuilder(); // This relies on the keys being returned in the same order every time // from the same map. It might not be guaranteed, but I feel pretty // confident it'll work unless you change the map in between. foreach (string field in query_phrases.keys) - stmt.bind_string(i++, query_phrases.get(field)); + query_expression.append_printf("%s ", query_phrases.get(field)); + stmt.bind_string(i++, query_expression.str); return i - start_index; } @@ -1105,10 +1108,10 @@ private class Geary.ImapDB.Account : BaseObject { WHERE id IN ( SELECT docid FROM MessageSearchTable - WHERE 1=1 + WHERE MessageSearchTable + MATCH ? + ) """); - sql_add_query_phrases(sql, query_phrases); - sql.append(")"); if (blacklisted_ids_sql != "") sql.append(" AND id NOT IN (%s)".printf(blacklisted_ids_sql)); @@ -1809,8 +1812,11 @@ private class Geary.ImapDB.Account : BaseObject { WHERE docid IN ( """); sql_append_ids(sql, id_map.keys); - sql.append(")"); - sql_add_query_phrases(sql, query_phrases); + sql.append(""" + ) + AND MessageSearchTable + MATCH ? + """); Db.Statement stmt = cx.prepare(sql.str); sql_bind_query_phrases(stmt, 0, query_phrases);
Created attachment 304412 [details] [review] fix to allow combining search specifiers Sorry, first time using bugzilla, I didn't manage to add my comment or attach the new patch... There are minor issues, e.g. some comments are no longer relevant and get_query_phrases doesn't need to return a HashMap, but it should actually work now.
Comment on attachment 304412 [details] [review] fix to allow combining search specifiers Well, I spoke too soon again. Unfortunately the fix isn't as simple as I'd hoped, it doesn't work for searching columns for phrases, e.g. 'body:"hello there"'.
Sorry I haven't been able to look at your patch yet. For now, I'll let you continue your work, but if you'd like me to look at a WIP, just say so. Thanks!
Created attachment 305024 [details] [review] fix to allow combine search specifiers No problem, they haven't worked fully... Can you look at this one?
In a little bit of testing, it seems to work. I'm going to run this for a few days to make sure there are no surprises. I think this partially invalidates the comment in the get_query_phrases about how the search system works? Would you be willing to update that text?
The comment is still correct, it's referring to the query syntax for each individual MATCH, not the way different MATCHes are combined like I initially thought.
Hey Matthew, I'm keen to land this patch, and will have a look at it later in the week. Are you still happy with them as-is? Also, if you have them in git, getting them formatted with `git format-patch` would be great. Thanks!
Created attachment 326466 [details] [review] fix to allow combine search specifiers, git format-patch Hi Michael. Yes I am still happy with it.
Created attachment 326614 [details] database disk image is malformed error Hey Matthew, I have been running this patch for a while and seems to work fine, in that queries like "from:me hello" do indeed work. Thanks! However looking at the debug log, a different error in a similar context as the ones you reported in the attachment above is being logged - debug output is attached. This particular output occurred running the search "from:me hello", where "hello" was actually found in message. I'm pretty sure that contrary to the message the DB isn't corrupted, since I can reproduce this with either of two newly added accounts - unless Geary or the FTS implementation is doing something wrong. Also, it doesn't seem affect the search results - rather just the highlighting. So, do you also see this message with this patch applied when running a similar search? Since it affects just the highlighting, is there an additional query for that also needs to be fixed that the patch misses?
Ah, nevermind, I was able to reproduce the issue in c:12 on master (see Bug 765515).
Review of attachment 326466 [details] [review]: So, finally to review the actual patch now... :) ::: src/engine/imap-db/imap-db-account.vala @@ +1009,3 @@ + sql.append_printf(""" + %s + SELECT %s Here, a series of additional SELECT sub-statements are generated for each additonal search term after the first. What's the reason why you took that approach, rather than using the boolean AND/OR/NOT connectives within the MATCH argument string itself, per the FTS docs in section 3.1 or 3.2? <https://www.sqlite.org/fts3.html#section_3_1>? It strikes me that the query might more more efficient that way. E.g. Doing: > "MATCH 'term1' AND 'term2'" Instead of: > "MATCH 'term1' AND SELECT ... MATCH 'term2'" I note the enhanced version (§3.1) is a compile-time dependency, but I wonder if we can get away with the standard version (§3.2) fine - use AND in search_async and no operator (an implicit OR) in do_get_search_matches should work? Even if we have to use the enhanced version both Ubuntu and Fedora have have already enabled it. Arch and other distros that haven't would need a bug filed against them, but I'd be happy to do that.
Yeah, it does look excessive, having to INTERSECT the (almost identical) SELECTs. It takes advantage of that "MATCH x AND y" approach when searching within one column but when it comes to searching different columns (using search specifiers) there are no examples shown. Basically I want to combine WHERE x MATCH y with WHERE a MATCH b. In the original code it tries to do 'WHERE field1 MATCH ? AND field2 MATCH ?' which unfortunately as I discovered doesn't work.
Review of attachment 326466 [details] [review]: Ah, I see now that the standard connectives are being used in ::get_query_phrases, which I guess means the additional selects are possibly the best we can do for the moment. It doesn't seem to be a performance issue on my accounts with 50k+ messages, but that's on a SSD drive. Let's get this committed and see how it tests out for everyone else.
Adam: Please go ahead and commit this when you have a moment.
I've pushed this to master. Mike, I'll leave it to you to close this bug if appropriate.
Yep, this is good to resolve as fixed. Thanks all!