GNOME Bugzilla – Bug 672690
Document searching very slow / useless in the shell
Last modified: 2013-04-19 15:34:23 UTC
The new document search provider for GNOME Shell is very slow, making it more or less useless for any real-life usage. On my reasonably (core i7) fast machine with good internet connection it takes ~10 seconds until the first search result (from Documents) pops up. Disabling online documents makes it a little bit faster, but not enough to make it actually usable. This is with GNOME / GNOME Documents / GNOME Shell 3.3.92.
Even though you're not the first one to report this problem, the previous reports I had were still in the order of 1 second, not 10 seconds. Do you have a very big number of documents on your machine? Owen actually tried to track down the performance hit yesterday, and it looks like most of the time is spent inside tracker and our query hits a slow path in the DB code. Needs more investigation.
Basically the problem here is that a query like: SELECT ?urn WHERE { ?urn a rdfs:Resource FILTER (fn:contains (fn:lower-case (tracker:coalesce(nie:title(?urn), nfo:fileName(?urn))), "tod")) } devolves to loading the title and filename for each document and checking if it matches the filter expression. So if you have a large number of documents, performance is going to get slow. What in my testing helped a lot was to add a full-text search to narrow the set of documents before doing a more exact search - using: SELECT ?urn WHERE { ?urn a rdfs:Resource ; fts:match "tod*" FILTER (fn:contains (fn:lower-case (tracker:coalesce(nie:title(?urn), nfo:fileName(?urn))), "tod")) } instead. For the full query: SELECT DISTINCT ?urn nie:url(?urn) nfo:fileName(?urn)nie:mimeType(?urn)nie:title(?urn) tracker:coalesce(nco:fullname(?creator), nco:fullname(?publisher), '') tracker:coalesce(nfo:fileLastModified(?urn), nie:contentLastModified(?urn)) AS ?mtime nao:identifier(?urn) rdf:type(?urn) nie:dataSource(?urn) ( EXISTS { ?urn nao:hasTag nao:predefined-tag-favorite } ) ( EXISTS { ?urn nco:contributor ?contributor FILTER ( ?contributor != ?creator ) } ) WHERE { ?urn a rdfs:Resource OPTIONAL { ?urn nco:creator ?creator . } OPTIONAL { ?urn nco:publisher ?publisher . } FILTER (( (fn:contains (fn:lower-case (tracker:coalesce(nie:title(?urn), nfo:fileName(?urn))), "tod") || fn:contains (fn:lower-case (tracker:coalesce(nco:fullname(?creator), nco:fullname(?publisher))), "tod"))) && (((fn:starts-with (nie:url(?urn), "file:///home/otaylor/Desktop")) || (fn:starts-with (nie:url(?urn), "file:///home/otaylor/Documents")) || (fn:starts-with (nie:url(?urn), "file:///home/otaylor/Downloads")) || (fn:starts-with (nao:identifier(?urn), "gd:collection:local:"))) || (nie:dataSource(?urn) = "gd:goa-account:account_1311966202")) && ((true) || ((nie:dataSource(?urn) = "gd:goa-account:account_1311966202") || false)) && (((fn:contains(rdf:type(?urn), "nfo#DataContainer")) && (fn:starts-with(nao:identifier(?urn), "gd:collection"))) || fn:contains(nie:mimeType(?urn), "application/pdf") || fn:contains(rdf:type(?urn), "nfo#Presentation") || fn:contains(rdf:type(?urn), "nfo#Spreadsheet") || fn:contains(rdf:type(?urn), "nfo#PaginatedTextDocument"))) } ORDER BY DESC (?mtime)LIMIT 50 OFFSET 0 this took the time down from ~490ms to ~60ms. (I'm unsure about the data size reduction - there are about 32000 rdf:Resources in my tracker database, but its possible that some of them are filtered out from other terms in the above and the linear-search part of the original query was across a smaller set than that. The number of documents that fts:match "tod*" is 6) The dowsides of this approach are: * tracker:fts can only do prefix matches, so we change from matching anywhere in the string to matching only at the beginning of words. But to me, this is likely an improvement. * While the set of properties indexed by FTS is large (tracker-sparql -q 'SELECT ?p { ?p tracker:fulltextIndexed true }') this set only contains direct text properties, and not indirect properties like nco:creator or nco:publisher. To handle this, you'd do something like include contact objects in the initial search, and then do a second fast search to look for documents with that creator or publisher. [I do think substring matches in the creator or publisher is a bit low value, because its going to be very indifferently set, and when it is set, it should be prioritized below a title/filename search. ("Notes for Cosimo" is a better search result for "cosimo then some document that Cosimo happened to have originally created.)] (Thanks to Jürg Billeter for discussion on IRC)
Created attachment 210664 [details] [review] FTS: proof of concept patch This shows the speedups possible, for both the shell and the builtin search. (Andreas: if you are able to test this, it would be interesting if it makes things fast for your situation.) Needs here: - Better attention to quoting - Perhaps avoiding doing fts searches for 't*' when you type the first letter in the shell - maybe don't search unless there are at least 2 or 3 letters. - Some resolution of the creator/publisher situation - either just say we don't want to search that, or take the approach I outlined in the other comment.
The patch makes quite a significant difference. In some cases I get a nearly instant response now, so that's great. With others it's still a 2-3 seconds delay until the results are displayed. Really couldn't figure out a pattern here, other than that (as expected) searching for one or two letters is slow most of the time. Also: The delay is much more visible in GNOME Shell than in GNOME Documents itself. Another thing that I noticed is that gjs-console sometimes stays around with a pretty high cpu usage for a few seconds even after the search results are already displayed. I remember also seeing this behaviour directly after login, gjs-console pegging my CPU for a few seconds, at least enough to be noticeable (but I guess that might be another bug) (In reply to comment #3) > Created an attachment (id=210664) [details] [review] > FTS: proof of concept patch
Created attachment 210708 [details] [review] shellSearchProvider: cancel any in-progress search I think what might be going on with shell searches is that for searches within gnome-documents, previous searches are cancelled on update, but that doesn't happen for shell searches, so if you type your query slowly you first get a (slow) query for a single letter, then the faster query doesn't run until later. Try this patch - it doesn't make much of a noticeable difference for me, but with a larger set of documents it may help.
Review of attachment 210708 [details] [review]: This patch is functionally equivalent to the one in https://bugzilla.gnome.org/show_bug.cgi?id=672733 (testing one or the other would still be useful)
(In reply to comment #6) > Review of attachment 210708 [details] [review]: > > This patch is functionally equivalent to the one in > https://bugzilla.gnome.org/show_bug.cgi?id=672733 (testing one or the other > would still be useful) I now pushed the patch from that bug to git, so you can just update to latest master to test this.
Another thing we could try doing is reducing the limit we provide to tracker - which is currently always 50 - for the shell search, where we are unlikely to display 50 items in the UI. Testing using tracker-sparql wasn't that promising though - using the query generated by gnome-documents (with my fts:match patch) for the string 't' with different LIMIT values, gave, in seconds: 1 0.224 10 0.237 20 0.251 50 0.268 (some noise in the timings but you get the point.) I think the reason for this is that tracker needs to query everything that matches 't*' from the database, filter with the provided filter on field values, sort by modified time, and only then can apply the LIMIT. The sort be modified time can, in theory, be moved before the 'filter with the provided filter' - but it's hard to avoid having to assemble a large temporary result set. So I don't think this is a very promising approach. If the cancelling isn't good enough, then I think the right thing is probably to avoid even starting searches for 1 or 2 characters - have shellSearchProvider just immediately return [].
(In reply to comment #5) > Created an attachment (id=210708) [details] [review] > shellSearchProvider: cancel any in-progress search > Looks good this basically makes the gnome shell search ~as fast as the search in GNOME Documents. So both patches combined deliver a substantial improvement (though 2-3 secs for some searches are still pretty bad for such a fast machine, on slower machines that might translate to a very substantial wait). One thing that is still very slow atm is the initial all documents view after starting the application (or if I empty the search entry), both ~5 seconds.
Comment on attachment 210708 [details] [review] shellSearchProvider: cancel any in-progress search Obsoleted by Florian's patch that has been committed
FTS support is now being worked on with a similar patch in bug 668728. I think the other performance problems of the shell search provider has been fixed in the meantime, so I'm going to close this as a duplicate of bug 668728. *** This bug has been marked as a duplicate of bug 668728 ***