GNOME Bugzilla – Bug 570312
Cannot search strings that contain SQLite wildcards
Last modified: 2009-03-08 23:11:47 UTC
Please describe the problem: As explained here http://www.sqlite.org/lang_expr.html , we need a "ESCAPE" clause for this wildcards: %, _, \, ... Steps to reproduce: 1. Have an album called "1_2_3". 2. Create a smartplaylist with the rule Album -> Contains -> "_". Actual results: The song is not included in the smartplaylist. Expected results: The song should be included. Does this happen every time? Yes. Other information:
Created attachment 127810 [details] [review] proposed patch This fixes the problem.
Change SQLiteEscapeClause to a "private const string ESCAPE_CLAUSE, and rename SQLiteEscape to EscapeLike and make it static.
(In reply to comment #2) > Change SQLiteEscapeClause to a "private const string ESCAPE_CLAUSE, and rename > SQLiteEscape to EscapeLike and make it static. > Thanks for the review Gabriel! I agree it shouldn't be a property, but on my tests it wasn't working as a field, maybe because I should place it *before* the other readonly fields? I'll test.
Created attachment 127907 [details] [review] fix v2 Changed according to Gab's comments.
Your patch breaks smart playlists with an "Album is <text>" rule, where <text> contains an escaped character. I don't know if properly escaping the text in ``StringQueryValue`` is even possible, since it depends on the type of query and not merely the data type. One solution that occurs to me, though it might not work, is to use a custom SQLite function for escaping strings. For example: "contains"), "LIKE HYENA_ESCAPE_LIKE('%',{0},'%')" + ESCAPE_CLAUSE where: HYENA_ESCAPE_LIKE(args[]) -> args[0] + EscapeLike (args[1]) + args[2] This would let escaping depend on the type of query being performed, but might degrade performance. To create a custom SQLite function: using Mono.Data.Sqlite; [SqliteFunction (Name = "EXAMPLE", FuncType = FunctionType.Scalar, Arguments = 1)] internal class ExampleFunction : SqliteFunction { public override object Invoke (object[] args) { return "example: " + args[0]; } } ------- Also, using this method, escaping single-quotes should be moved back into `ToSql()`.
Thanks for your feedback. (In reply to comment #5) > Your patch breaks smart playlists with an "Album is <text>" rule, where <text> > contains an escaped character. Are you sure about that? Indeed, what I was testing to verify this bug, is smart playlists. Actually, the original report contains a typo (I should have used the character "%" in the example, not the "_" because the latter means any character of length zero), but the scenario you mention is precisely the one that *is not* working and that this patch tends to fix. Test again please.
s/zero/one/
Oh, wait, I see what you mean, I haven't tested the operator [IS], but only [CONTAINS]. I'll test again...
Created attachment 128249 [details] [review] Escape SQLite wildcard characters like LIKE queries Using custom functions
The point of these patches is so when peoples' search includes a % (etc) we treat it as a literal character (by escaping it) so that SQLITE doesn't treat it as a non-literal, right? Quoting John: "I don't know if properly escaping the text in ``StringQueryValue`` is even possible, since it depends on the type of query and not merely the data type." Why is this true? StringQueryValues should always search for the literal value of the string entered (possibly escaped/normalized as per the other bug) - you're thinking maybe we do want to support the SQL/special treatment of %_ etc?
> The point of these patches is so when peoples' search includes a % (etc) we > treat it as a literal character (by escaping it) so that SQLITE doesn't treat > it as a non-literal, right? Exactly > Why is this true? StringQueryValues should always search for the literal value > of the string entered (possibly escaped/normalized as per the other bug) - > you're thinking maybe we do want to support the SQL/special treatment of %_ > etc? The problem is that % and _ can *only* be escaped in LIKE queries -- in other queries, escaping them changes the value of the literal string being searched and causes incorrect results. Since the query value has no knowledge of what kind of query it's being used in, it can't know how to escape its value. The escaping has to be either: * pushed up to a layer of the code that knows what kind of query is being executed, or * pushed down to escape only in LIKE queries (my patch).
Ah, ok, thanks for explaining. I don't like the idea of pushing it to the custom function - I guess b/c I worry it'll be calculated more than once, though that may be mistaken. Could you verify that it's only run once per query (and not once per row)?
Created attachment 128252 [details] [review] Escape SQLite wildcard characters like LIKE queries (v2) > I don't like the idea of pushing it to the custom function - I guess b/c I > worry it'll be calculated more than once, though that may be mistaken. Could > you verify that it's only run once per query (and not once per row)? Oh dear, it *is* being run per-row. I had assumed constant functions would be optimized to run only once. Here's a patch that uses the other method, checking the query operator for the presence of "LIKE" and escaping the query value if needed. I don't know if this is a good idea; is QueryField supposed to handle escaping like this?
I like the discussions going on here. (In reply to comment #13) > Created an attachment (id=128252) [edit] > Escape SQLite wildcard characters like LIKE queries (v2) > > > I don't like the idea of pushing it to the custom function - I guess b/c I > > worry it'll be calculated more than once, though that may be mistaken. Could > > you verify that it's only run once per query (and not once per row)? > > Oh dear, it *is* being run per-row. I had assumed constant functions would be > optimized to run only once. > > Here's a patch that uses the other method, checking the query operator for the > presence of "LIKE" and escaping the query value if needed. I don't know if this > is a good idea; is QueryField supposed to handle escaping like this? I like the patch because it's simple, but it's a bit hacky, because in this case you can guess which is the operator by simply scanning the text that it provider. However, IMO what happens here really is that the architecture on how Operators and Values are handled, is not ideal. Because I'm thinking on another case (covered in the bug which is blocked by this one) in which this problem happens again, and I'm not sure the same technique as the one this patch excercises will be enough. What I'm trying to say is that I was thinking about doing another patch that changes a bit the architecture on how Operators and Values are combined, in order to design properly how Values should be transformed in respect to Ops. Does that sound good?
s/provider/provides
Created attachment 128432 [details] [review] slighty better patch? This patch uses the approach described on comment#14. Let me know what you think.
Created attachment 128441 [details] [review] smaller patch this patch is smaller than the last one
Thanks Andrés, I like the changing of ToSql to accept the op arg, committed.
Thanks for committing Gabriel. However, I believe this is broken again. It has conflicted with some recent changes from John Milikin (specifically, Hyena.StringUtil.SearchKey seems to be removing the wildcards we are trying to escape). I will add some unit tests that demonstrate the issue.
Created attachment 129035 [details] [review] unit tests This patch updates a test to pass with the changes from the last commit, and adds a test that should pass when we fix the last issue.
Created attachment 129118 [details] [review] proposed fix What was happenning is that JMilikin's stripping function was too agressive with the punctuation. We can discuss this but IMHO there are a lot of things that may be affected by this (for example, as my unit tests include: artists like "Kelli O'Hara", compilation albums like "100% Techno") or a ton of other things (it may be weird to use this kind of punctuation in a quick query, but not in a smart playlist). So this patch fixes it and adds and removes the corresponding unit tests. Please review.
Created attachment 129121 [details] [review] fix2 v2 Better fix, includes escaping in ExactStringQueryValue.
Created attachment 129123 [details] [review] alternative fix This is an alternative fix if you don't agree with comment#21. So either this patch or the one in comment#22 should be committed, depending on your opinion.
I don't understand why the alternative fix is needed if I didn't agree we shouldn't strip punctuation. Anyway, let's do discuss: should we strip punctuation? Your "100% Techno" will still match things, it'll just match "100 techno" in addition to "100% techno" (and "1.0.0.% tech.-no" for that matter).
Gabriel, thanks for your comment. The only (In reply to comment #24) > I don't understand why the alternative fix is needed if I didn't agree we > shouldn't strip punctuation. Because when I created the patch that was committed for this bug, the "ExactStringQueryValue" didn't exist. Now that it exists, that type is ignoring the type of escape needed to match exact strings. This doesn't have any specific use case where I can think it's breaking anything, *but* it increases correctness (it's like replacing ' with '' after SearchKey() as it is in the current codebase; useless right now but more correct for the future) and is *indeed* needed for the next version of the patch I'm cooking for the bug that depends on this one: bug 564355 (because, if you see the old patch posted there, UriQueryValue inherits from ExactStringQueryValue). > Anyway, let's do discuss: should we strip punctuation? Your "100% Techno" will > still match things, it'll just match "100 techno" in addition to "100% techno" > (and "1.0.0.% tech.-no" for that matter). Yep, I see what you mean. Well, the only case in which I think this kind of stripping would be useful is if, let's say someone wants to search for "O'Hara" but writes "O`hara" instead. In this case it's a feature, but it may be seen as a bug if someone is writing a quick query and inserts "O'" and sees that every string that contains an 'o' is included, whereas the results should be much more limited. John, what do you think about this?
Created attachment 129179 [details] [review] Escape LIKE in `ExactStringQueryValue.ToSql()` The lack of escaping in `ExactStringQueryValue` is indeed a bug; it will cause problems when people attempt to search for filenames containing % or _. Attached is a patch to fix this. It moves all escaping logic into a public static method, `EscapeString`. > Yep, I see what you mean. Well, the only case in which I think this kind of > stripping would be useful is if, let's say someone wants to search for > "O'Hara" but writes "O`hara" instead. In this case it's a feature, but it may > be seen as a bug if someone is writing a quick query and inserts "O'" and > sees that every string that contains an 'o' is included, whereas the results > should be much more limited. I believe the reason behind requesting stripping of punctuation is so that a search for "ohara" will also match "o'hara", along with "o! hara!" or whatever odd stylistic characters artists insert into their names/titles.
(In reply to comment #26) > Created an attachment (id=129179) [edit] > Escape LIKE in `ExactStringQueryValue.ToSql()` > > The lack of escaping in `ExactStringQueryValue` is indeed a bug; it will cause > problems when people attempt to search for filenames containing % or _. > Attached is a patch to fix this. It moves all escaping logic into a public > static method, `EscapeString`. This is what I intended to fix in my latest patch (which includes unit tests, and fixes one). Only difference is that your EscapeString is public and mine protected...
I think it's better to strip punctuation. IMO, it's better to show more results than not enough. Example : Bob searches for "100%", but his techno album is named "100 % Techno" (with a space before the %). He will be so happy that banshee still finds it that it will forgive it for also finding his "Top 100 Love Songs" album. If I'm not confused, that means that the patch in comment #23 would get committed, because unit tests are good. ;)
Actually we don't strip spaces so that use case fails.
Sorry, I'm wrong. Doh.
(In reply to comment #30) > Sorry, I'm wrong. Doh. Heh, I was confused about that as well, but then realized that maybe we do strip (trim) multiple spaces into just one, so that's why that search may work. I'll update the patch with more unit tests that contain this situation.
We don't combine multiple spaces into one, so it won't work if you search for "100 % techno" with quotations around it, but it will if you leave the quotations off (b/c then 100 and techno are searched for in separate, ANDed conditions). Please don't update anything right now, I'm in the middle of applying/committing the patches.
Thanks guys, I combined your patches and committed them.
(In reply to comment #32) > We don't combine multiple spaces into one, so it won't work if you search for > "100 % techno" with quotations around it, but it will if you leave the > quotations off (b/c then 100 and techno are searched for in separate, ANDed > conditions). Nice to know, but wouldn't then be good to trim multiple spaces into one? I'll try to make a patch for that. > Please don't update anything right now, I'm in the middle of > applying/committing the patches. (In reply to comment #33) > Thanks guys, I combined your patches and committed them. Thanks for committing Gabriel. However, without wanting to make this discussion longer: if Bertrand's argument didn't apply because of the space issue, shouldn't we not discard again the idea about not stripping punctuation? Indeed, I even had an argument against Bertrand's point (which indeed is really good), so keep reading if you care: (In reply to comment #28) > I think it's better to strip punctuation. IMO, it's better to show more results > than not enough. > Example : Bob searches for "100%", but his techno album is named "100 % Techno" > (with a space before the %). He will be so happy that banshee still finds it > that it will forgive it for also finding his "Top 100 Love Songs" album. But this circumstances seem to fit more in a situation in which Bob is doing a quick search, not a SmartPlayList for instance. Indeed, there is a problem if Bob has this kind of albums: * Top 100 Love Songs * 100% Techno * 100% Techno Vol.2 * 100% Techno Vol.3 So let's say Bob stablishes the SmartPlayList to look for "100%": he will freak out and file a bug, and then we will have to close the bug as FEATURE, and he will be a bit disappointed about it. However, let's say we rectify and we strip punctuation, and Bob has the "inaccurate" album name "100 % Techno" (or whatever example that really works): he would file a bug saying that his search for "100%" didn't return any results, and then we would close it as INVALID, pointing out to him his spelling error on the album. So Bob would say "Doh!" and be happy to know that it was his fault and not Banshee, and would correct the album name. Sorry for the long post. I don't care much about this little detail, but I thought I should mention it. Now I will get back to work (on the bug that depended on this ;) ).
(In reply to comment #34) > (In reply to comment #32) > > We don't combine multiple spaces into one, so it won't work if you search for > > "100 % techno" with quotations around it, but it will if you leave the > > quotations off (b/c then 100 and techno are searched for in separate, ANDed > > conditions). > > Nice to know, but wouldn't then be good to trim multiple spaces into one? I'll > try to make a patch for that. FYI: I opened this as bug 573484 and there's a patch there pending for review.