GNOME Bugzilla – Bug 639345
[patch] [filesystem] implement search
Last modified: 2011-01-18 11:26:52 UTC
Created attachment 178170 [details] [review] filesystem: implemented search The filesystem plugin would be better off with the search functionality implemented.
Attached a first version of the patch, with the following known issues: - skip is ignored - use create_content() which is blocking, but that's a general issue in the fs plugin, since this is used in other places - it might be cleaner to use a GCancellable, but its doc says it cannot be used in multiple concurrent operations, which I think is the case here
Other notable limitation: does not handle symlinks
Review of attachment 178170 [details] [review]: A few comments: 1) A search operation, must *always* be finished by invoking the user callback with remaining_count=0. This means: - When the operation is stopped because of an error, we must invoke the user callback with the error and remaining_count=0. - When the operation is cancelled, we must stop the operation and invoke the user callback with NULL error, reamining_count=0, and NULL media. 2) I think you are not closing some files properly or at least there is some problem with that, after a few tests I got lots of these (even after restarting test-ui). "grl-filesystem.c:737: Got error while searching: Too many files opened" 3) The plugin does not filter media files and returns all kind of regular files (see file_is_valid_content). 4) Cancellation support is missing. We really need this because search operations on the filesystem plugin can be pretty expensive, but this can be added later on. 5) Maybe we should do better chunking of the result emission to improve responsiveness (see for example produce_from_path), but we could investigate this later. I think we should fix 1-3 before commiting the patch, then we can commit and investigate 4-5.
Regarding 1): When the search operation is finished because we've crawled the whole fs hierarchy we had to crawl, is it acceptable to have the same behaviour as a cancelled operation? In other words: can we return "-1" as "remaining" for each call to the callback with a media, and then return "remaining=0" with a NULL media when we've gone through the whole hierarchy? I'm asking that because the documentation does not seem to talk about these kind of cases, I think it should (but maybe we should move that discussion to the ml)
(In reply to comment #4) > Regarding 1): > When the search operation is finished because we've crawled the whole fs > hierarchy we had to crawl, is it acceptable to have the same behaviour as a > cancelled operation? > In other words: can we return "-1" as "remaining" for each call to the callback > with a media, and then return "remaining=0" with a NULL media when we've gone > through the whole hierarchy? That would be ok. > I'm asking that because the documentation does not seem to talk about these > kind of cases, I think it should (but maybe we should move that discussion to > the ml) Yes, we should update the docs to clarify these situations.
About 2), it's an algorithmic issue: we should scan the fs tree depth-first, so that the number of open files for that scan is at most the height of the tree at a given moment.
> About 2), it's an algorithmic issue: we should scan the fs tree depth-first, so > that the number of open files for that scan is at most the height of the tree > at a given moment. As discussed on IRC, let's try to use a FIFO to queue directories and process them one by one to avoid the problem. Also, for the record, let's have a configuration option for the plugin to limit the depth of the search. I propose a integer value so that: -1 => No limit 0 => only search the root directoy 1 => one search first-level subdirs ... We could set the default to some sane value, maybe something between 5-10? I guess that would be better than setting the default to -1.
Created attachment 178308 [details] [review] filesystem: implemented search Here's a new version, using a FIFO of directories. It makes the code much simpler (especially for memory management) and avoids having too many open files. Also noteworthy: I've removed the cancelling and error members of SearchOperation. error was never really used (none of the error we get is individually fatal to the crawling). cancelling is not needed any more because of the change of architecture. Support for cancel() should be put in search_next_directory(), probably using a GCancellable (now possible since we don't use concurrent operations). Also note that, depending on circumstances/hardware, the FIFO search as is might be slower. We could make it faster by having several search_next_directory() calls running in parallel, but that could strain the system too much, and would prevent us from using GCancellable. File type filtering and depth limit still need to be implemented, will have a look next week
Review of attachment 178308 [details] [review]: I think we should call search_next_directory from got_children in case the call to g_file_enumerate_children_finish raises an error. Otherwise, if the first directory fails, for example, the search is aborted (search_next_directory is never invoked again) and no user callback is ever called. I would wait until you add at least the support for filtering media content before pushing the patch, but other than that the patch looks good to me. BTW, using a FIFO for crawling makes the system way more responsive while the search process is on-going :)
Created attachment 178559 [details] [review] filesystem: implemented search
Created attachment 178560 [details] [review] filesystem: added depth limit to search
Updated patch, now handle file filtering. Added another patch that adds depth limit. There still one bug I need to investigage in it: after doing a filesystem search with grilo-test-ui, when quitting I get the following warnings: (lt-grilo-test-ui:30683): GLib-GObject-WARNING **: invalid uninstantiatable type `(null)' in cast to `GObject' (lt-grilo-test-ui:30683): GLib-GObject-CRITICAL **: g_object_get_data: assertion `G_IS_OBJECT (object)' failed This makes me think there should be one unref too many somewhere...
Pushed both patches. For the record, cancellation support is still missing, should be added in the future. --- commit ce984d37dad6f36b01b7d6d300b7e4e52889f671 Author: Guillaume Emont <gemont@igalia.com> Date: Mon Jan 17 18:11:57 2011 +0100 filesystem: added depth limit to search It is controlled by a "maximum-search-depth" configuration option. Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> commit 8b6c7a053f728737ae6595b8862ca2b6ef998999 Author: Guillaume Emont <gemont@igalia.com> Date: Wed Jan 12 18:47:09 2011 +0100 filesystem: implemented search Fixes https://bugzilla.gnome.org/show_bug.cgi?id=639345 Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
On the refcount bug I was mentioning, it seems to reproduce when doing search with other plugins, so it's either a bug common to many plugins, or a bug in grilo-test-ui (more likely). Therefore, this bug is definitely fixed. For the cancellation thing, I think it deserves a separate bug, especially since it's an issue for all operations in the filesystem plugin, not only search.