GNOME Bugzilla – Bug 680834
Indexing mount-points on demand
Last modified: 2021-05-26 22:23:32 UTC
Created attachment 219890 [details] [review] tracker-miner-fs: Indexing mount-points on demand It would be nice having Tracker able to index a mount-point and to add the related device as a datasource to the store when an application wants it. This patch adds a dbus method to the /org/freedesktop/Tracker1/Miner/Files/Index object, which receives a mount-point-uri as parameter and indexes that given mount-point just while the sender bus name stays available, i.e., it stops the indexing process as soon as no applications are interested on it. I'm quite sure it needs a lot of enhancements and reviews. Pls, let me know! :)
Comment on attachment 219890 [details] [review] tracker-miner-fs: Indexing mount-points on demand Hi there, thank you for the patch, some comments: >+ <method name="IndexMount"> >+ <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/> >+ <arg type="s" name="mount_point_uri" direction="in" /> >+ </method> >+ I am not entirely sure about adding a new method. There is little difference between this and indexing a path (which you can already do). We also have code paths proven for mount points popping up and down and I wonder how to integrate preferences/config into this. That is to say, we have a preference to *not* index removable media already and we should probably play nicely with that. What is clear is that the "client monitoring" code would have to be integrated some how... I was thinking something along the lines of updating that preference: Index removable media: o Disable entirely o Enable on application requests for data (GNOME Documents, etc). o Enable in all cases Perhaps something like that would work. In the middle case, you could use the existing API and it would only index mount points in those scenarios. > static void >+set_up_mount_point_cb (GObject *source, >+ GAsyncResult *result, >+ gpointer user_data) >+{ >+ TrackerSparqlConnection *connection = TRACKER_SPARQL_CONNECTION (source); >+ gchar *device_urn = user_data; >+ GError *error = NULL; >+ >+ tracker_sparql_connection_update_finish (connection, result, &error); >+ >+ if (error) { >+ g_critical ("Could not set mount point in database '%s', %s", >+ device_urn, >+ error->message); >+ printf ("Could not set mount point in database '%s', %s", >+ device_urn, >+ error->message); No printf() calls please :) >+/** >+ * tracker_miner_fs_mount_add: >+ * @fs: a #TrackerMinerFS >+ * @file: #GMount for the directory to inspect >+ * @recurse: whether the directory should be inspected recursively >+ * >+ * Tells the filesystem miner to inspect a mount. This method intents >+ * to escape the strict duality dictated by the index-removable-devices >+ * and index-optical-discs gsettings keys. >+ * >+ * Since: >+ **/ Was this code copied from somewhere else? Just wondering :) looks familiar. >+ tracker_sparql_connection_update_async (tracker_miner_get_connection (TRACKER_MINER (fs)), >+ queries->str, >+ G_PRIORITY_LOW, >+ NULL, >+ set_up_mount_point_cb, >+ g_strdup (urn)); >+ My only question would be about the priority being low and how we might need that to be higher in cases where applications demand it ahead of other queue processing...: >diff --git a/src/libtracker-miner/tracker-miner-fs.h b/src/libtracker-miner/tracker-miner-fs.h >index f4642a1..e06ae95 100644 >--- a/src/libtracker-miner/tracker-miner-fs.h >+++ b/src/libtracker-miner/tracker-miner-fs.h >@@ -97,6 +97,8 @@ GType tracker_miner_fs_get_type (void) G_GNUC_CONST; > void tracker_miner_fs_directory_add (TrackerMinerFS *fs, > GFile *file, > gboolean recurse); >+void tracker_miner_fs_mount_add (TrackerMinerFS *fs, >+ GMount *mount); ... > gboolean tracker_miner_fs_directory_remove (TrackerMinerFS *fs, > GFile *file); > gboolean tracker_miner_fs_directory_remove_full (TrackerMinerFS *fs, I would keep the directory and mount point APIs grouped better/together. > static void >+application_appears (GDBusConnection *connection, >+ const gchar *name, >+ const gchar *name_owner, >+ gpointer user_data) >+{ >+ IndexMountData *data = user_data; >+ >+ tracker_miner_fs_mount_add (TRACKER_MINER_FS (data->miner), data->mount); >+} I quite like the way this has been approached. >+static void >+application_disappears (GDBusConnection *connection, >+ const gchar *name, >+ const gchar *name_owner, >+ gpointer user_data) >+{ >+ IndexMountData *data = user_data; >+ >+ tracker_miner_stop (data->miner); >+} We can't do this sadly. It stops ALL indexing for everything. What we need to do instead is to remove all GFile objects from the queues pertaining to the mount point. That way other items in the queues can continue as normal. >+static void >+handle_method_call_index_mount (TrackerMinerFilesIndex *miner, >+ GDBusMethodInvocation *invocation, >+ GVariant *parameters) >+{ >+ TrackerMinerFilesIndexPrivate *priv; >+ TrackerDBusRequest *request; >+ guint *watch_name_id; >+ const gchar *mount_point_uri; >+ const gchar *sender; >+ GFile *file; >+ GFileInfo *file_info; >+ GError *internal_error; >+ IndexMountData *data; >+ >+ g_variant_get (parameters, "(&s)", &mount_point_uri); >+ >+ tracker_gdbus_async_return_if_fail (mount_point_uri != NULL, invocation); >+ >+ request = tracker_g_dbus_request_begin (invocation, "%s(uri:'%s')", __FUNCTION__, mount_point_uri); >+ >+ file = g_file_new_for_uri (mount_point_uri); >+ file_info = g_file_query_info (file, >+ G_FILE_ATTRIBUTE_STANDARD_TYPE, >+ G_FILE_QUERY_INFO_NOFOLLOW_SYMLINKS, >+ NULL, NULL); >+ >+ if (!file_info) { >+ internal_error = g_error_new_literal (1, 0, "File does not exist"); I can't remember if we translate error strings or not, I think we do. >+ tracker_dbus_request_end (request, internal_error); >+ g_dbus_method_invocation_return_gerror (invocation, internal_error); >+ >+ g_error_free (internal_error); >+ >+ g_object_unref (file); >+ >+ return; >+ } >+ >+ g_object_unref (file_info); >+ >+ data = g_slice_new0 (IndexMountData); >+ data->mount = g_object_ref (g_file_find_enclosing_mount (file, NULL, &internal_error)); I generally prefer to have _new() and _free() functions for typedefs because then it's clear that the structure is created/freed completely and correctly in all cases. Otherwise it can get messy keeping track of members freed and not. It's also useful for destroy function handlers :) >+ priv = TRACKER_MINER_FILES_INDEX_GET_PRIVATE (miner); >+ data->miner = g_object_ref (TRACKER_MINER_FS (priv->files_miner)); >+ >+ if (data->mount) { >+ internal_error = g_error_new_literal (1, 0, "Mount does not exist"); >+ tracker_dbus_request_end (request, internal_error); >+ } >+ >+ sender = g_dbus_method_invocation_get_sender (invocation); >+ >+ watch_name_id = g_bus_watch_name (G_BUS_TYPE_SESSION, >+ sender, >+ G_BUS_NAME_WATCHER_FLAGS_NONE, >+ application_appears, >+ application_disappears, >+ data, >+ NULL); What happens to watch_name_id here? We likely don't need to keep track of it actually. We should outlive the application calling us. -- Generally, not a bad start. Needs more work. I think we need to stream line a few things. Thanks for the initial effort here.
Hello Martyn, thanks for your review! (In reply to comment #1) > (From update of attachment 219890 [details] [review]) > Hi there, thank you for the patch, some comments: > > >+ <method name="IndexMount"> > >+ <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/> > >+ <arg type="s" name="mount_point_uri" direction="in" /> > >+ </method> > >+ > > I am not entirely sure about adding a new method. There is little difference > between this and indexing a path (which you can already do). Yes, it's pretty close to the IndexFile DBus method. But I wonder how could we add this "client monitoring" code without having a new dbus method for it. It would be insane having the IndexFile DBus method checking whether a mount-point is passed as parameter, and then using the miner-appears/disappers mechanism accordingly. Besides, we don't want to make the 'IndexFile' method work [only] in the app-driven semantic since it's already used out there with other approaches. > We also have code > paths proven for mount points popping up and down and I wonder how to integrate > preferences/config into this. That is to say, we have a preference to *not* > index removable media already and we should probably play nicely with that. Okey, but this preference is for automated behavior. I think it's fine when it responds to a user action. > What is clear is that the "client monitoring" code would have to be integrated > some how.. Agreed! :) > I was thinking something along the lines of updating that preference: > > Index removable media: > o Disable entirely > o Enable on application requests for data (GNOME Documents, etc). > o Enable in all cases > > Perhaps something like that would work. In the middle case, you could use the > existing API and it would only index mount points in those scenarios. That would work good with a new DBus method for app-driven semantic. Converting a widely used method to this approach would lead to unexpected behaviors on scenarios where a user does not know what it's doing under the hood. Cheers, Felipe.
(In reply to comment #2) > Hello Martyn, thanks for your review! > > (In reply to comment #1) > > (From update of attachment 219890 [details] [review] [details]) > > Hi there, thank you for the patch, some comments: > > > > >+ <method name="IndexMount"> > > >+ <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/> > > >+ <arg type="s" name="mount_point_uri" direction="in" /> > > >+ </method> > > >+ > > > > I am not entirely sure about adding a new method. There is little difference > > between this and indexing a path (which you can already do). > > Yes, it's pretty close to the IndexFile DBus method. But I wonder how could we > add this "client monitoring" code without having a new dbus method for it. It > would be insane having the IndexFile DBus method checking whether a mount-point > is passed as parameter, and then using the miner-appears/disappers mechanism > accordingly. Actually, I wonder if we should just have a IndexFileForProcess type of API, this way it serves for other cases where apps are interested in a target location and we can build the monitoring code in with a quick call to the IndexFile functions to do the same job. Then we would also need to figure the mount point additional work that's necessary. The only remaining question is, does it make any sense to do this mount point work this way? If we have the preferences configured as I mentioned before, we would automatically index or not index a location when it is mounted. So a simple cross reference with the preferences and the URI passed to an API like this would be enough I think. > Besides, we don't want to make the 'IndexFile' method work [only] in the > app-driven semantic since it's already used out there with other approaches. Indeed. > > We also have code > > paths proven for mount points popping up and down and I wonder how to integrate > > preferences/config into this. That is to say, we have a preference to *not* > > index removable media already and we should probably play nicely with that. > > Okey, but this preference is for automated behavior. I think it's fine when it > responds to a user action. Well, yes - but you could also argue that you're indexing this data behind the user's back for removable media too right ... or is this a question/dialog we're presenting the user? My concern here is that in the end, it really depends on if we should be allowing blanket bans on certain areas of indexing, regardless of what apps ask for. If the answer to that is yes, then it makes sense to do what I suggest above. If no, then your original patch with the new API makes more sense. > > I was thinking something along the lines of updating that preference: > > > > Index removable media: > > o Disable entirely > > o Enable on application requests for data (GNOME Documents, etc). > > o Enable in all cases > > > > Perhaps something like that would work. In the middle case, you could use the > > existing API and it would only index mount points in those scenarios. > > That would work good with a new DBus method for app-driven semantic. Converting > a widely used method to this approach would lead to unexpected behaviors on > scenarios where a user does not know what it's doing under the hood. I would suggest we leave the existing API as-is and only suggest this in the new API call.
Created attachment 221334 [details] [review] tracker-miner-fs: Indexing mount-points on demand Hi, I liked your proposal. Having an IndexFileForProcess DBus method would make Tracker a more suitable backend tool for [client-]applications. Talking again about mount-points, does it make sense to index a mount-point and ignore that it is related to a device data-source? This way, I think we should add the device as datasource automatically. I've made some improvements to my first patch proposal (keeping the mount-point work). I hope you like it AND point the necessary enhancements. Thanks!
Hello Felipe, I've looked at the latest patch and fixed a few things. The work is all now in a branch "index-mount-points". I'm happy to commit this to master when I've figured out what to do with app_vanishes(). We remove the directory from the index when the process dies, BUT! if this directory was indexed by miner-fs or another app, then it breaks for those cases. We can't know that locations are exclusively for one process or not. The way I think we could fix this is to set the files by removing tracker:available. This makes it faster to reuse them if a process comes back and monitors the same space again too. Thoughts?
Hi! I think the best way to make sure that app_vanishes() doesn't clobber things the miner-fs is doing is to add an 'owner' parameter to the tracker_indexing_tree_add() and tracker_indexing_tree_remove() functions. The indexing tree could keep a GHashTable of root node -> list of owners; when removing a root node, it would remove the given owner from the list and then, if the list of owners is empty, actually remove that node and all children from config_tree. There aren't many root nodes (just the directories configured for indexing, and anything IndexFile or IndexFileForProcess added), so this shouldn't take much extra RAM. My only other comment on this patch is that tracker_miner_fs_add_mount() should have a comment as to why we're OK with index-removable-devices being overridden.
The latest version of this patchset can be found here: https://git.gnome.org/browse/tracker/log/?h=index-mount-points I will try to rebase them on top of master.
Now rebased here: https://git.gnome.org/browse/tracker/log/?h=wip/index-mount-points
(In reply to comment #0) > Created an attachment (id=219890) [details] [review] > tracker-miner-fs: Indexing mount-points on demand > > It would be nice having Tracker able to index a mount-point and to add the > related device as a datasource to the store when an application wants it. > > This patch adds a dbus method to the > /org/freedesktop/Tracker1/Miner/Files/Index object, which receives a > mount-point-uri as parameter and indexes that given mount-point just while the > sender bus name stays available, i.e., it stops the indexing process as soon as > no applications are interested on it. I don't know whether the API changed in later patches, but this would be problematic for Totem for example, where you'd probably want to suspend the indexing (and related actions, such as thumbnailing) to avoid resource consumption while viewing a media.
(In reply to comment #9) > I don't know whether the API changed in later patches, but this would be > problematic for Totem for example, where you'd probably want to suspend the > indexing (and related actions, such as thumbnailing) to avoid resource > consumption while viewing a media. The basic concept of this API is still there. I don't see how suspending indexing etc relates to this patch though; I would expect tracker to monitor some sort of session "low resource mode" indicator that is not limited to the interaction with Totem.
What happens if IndexFileForProcess gets called 50000 times? That g_bus_watch_name on sender of it is never being cleaned up. That's a FD resource that gets leaked and is easily exploitable by applications. I don't understand why Tracker should monitor the appearance and vanishing of applications. Why don't the applications that want Tracker to do something initiate the action? Also, the feature-branch needs some serious cleaning up. Revert, fix build, indentation mistakes in commit 1 that together with new code in commit 2 get partly fixed, etc.
(In reply to Philip Van Hoof from comment #11) > What happens if IndexFileForProcess gets called 50000 times? That > g_bus_watch_name on sender of it is never being cleaned up. That's a FD > resource that gets leaked and is easily exploitable by applications. I'm not sure what you mean; the way this should work is that the watch is only held during the duration of the async call - that is until the requested file is indexed. So most of those should turn into no-ops after the file has been indexed the first time and Tracker realizes that its graph is up to date. I also believe Tracker already handles multiple requests of the same file by compressing them in the same queued element to the miner. If this is called 50000 times, by the same application in any case, and the application quits before those calls are processed, they should get cancelled from the g_bus_watch_name() logic. If you're worried about keeping 50000 watches alive, it could easily be solved by keeping e.g. a hash table from application bus name -> watch id, and avoiding watching again if we have one already. > I don't understand why Tracker should monitor the appearance and vanishing > of applications. > > Why don't the applications that want Tracker to do something initiate the > action? The application does request an action to Tracker, by calling this method. Tracker needs to watch the application so that quitting it (or the application crashing) cancels the request. This is because Tracker should stop indexing e.g. a large external HDD passed to this function (the primary use case the patchset was developed) when no other applications are interested in that data. > Also, the feature-branch needs some serious cleaning up. Revert, fix build, > indentation mistakes in commit 1 that together with new code in commit 2 get > partly fixed, etc. Absolutely, that needs doing.
AFAICT, it seems a better option to keep the accounting of the clients more tightly coupled to TrackerIndexingTree, it is the actual source of information for the miner(s) to determine which files need and don't need indexing. tracker_miner_fs_add_*() APIs are something I'd like to eventually deprecate/remove in favor of TrackerIndexingTree, would be great if we didn't keep adding API there. That said, seems like a worthwhile feature for 1.6, I may pick it up again at some point.
A feature like this will a requirement to show media files on hot-pluggable storage, be it an SD card in a tablet, or an external hard drive with a laptop.
(In reply to Cosimo Cecchi from comment #12) > (In reply to Philip Van Hoof from comment #11) > > What happens if IndexFileForProcess gets called 50000 times? That > > g_bus_watch_name on sender of it is never being cleaned up. That's a FD > > resource that gets leaked and is easily exploitable by applications. > > I'm not sure what you mean; the way this should work is that the watch is > only held during the duration of the async call - that is until the > requested file is indexed. So most of those should turn into no-ops after > the file has been indexed the first time and Tracker realizes that its graph > is up to date. I also believe Tracker already handles multiple requests of > the same file by compressing them in the same queued element to the miner. > > If this is called 50000 times, by the same application in any case, and the > application quits before those calls are processed, they should get > cancelled from the g_bus_watch_name() logic. > > If you're worried about keeping 50000 watches alive, it could easily be > solved by keeping e.g. a hash table from application bus name -> watch id, > and avoiding watching again if we have one already. What I'm worried about is Tracker's miners having to keep state for other applications. We already have GraphUpdated, which we kept stateless for reasons that we thought about many many hours, for applications to get notified. We also went to great lengths to keep it stateless. The hashtable that you propose illustrates the state: application-bus-name as key for a watch-id means that we are storing per-application state. I wonder why a filesystem miner needs to keep application state. Applications have graphupdated to get themselves notified by changes to metadata and resources that they might be interested in. Even the mount-point of a removable device becoming available and disappearing (tracker:available property on the tracker:Volume stuff) can be tracked with it. > > I don't understand why Tracker should monitor the appearance and vanishing > > of applications. > > > > Why don't the applications that want Tracker to do something initiate the > > action? (In reply to Bastien Nocera from comment #12) > A feature like this will a requirement to show media files on hot-pluggable > storage, be it an SD card in a tablet, or an external hard drive with a > laptop. We already support hot-pluggable storage on multiple devices (the N9 if you attach a mass storage device and the Jolla if you insert a sdcard). I'm sture we can improve things. But I really hope Martyn and Carlos use a good design rather than quickly doing things out of urgency.
(In reply to Philip Van Hoof from comment #15) > Applications have graphupdated to get themselves notified by changes to > metadata and resources that they might be interested in. > > Even the mount-point of a removable device becoming available and > disappearing (tracker:available property on the tracker:Volume stuff) can be > tracked with it. You are ignoring the other part of my comment where I explain why the existing API is not enough for the use case I want to address. I don't think you have a good understanding of the problem we're trying to solve with this patch. > We already support hot-pluggable storage on multiple devices (the N9 if you > attach a mass storage device and the Jolla if you insert a sdcard). > > I'm sture we can improve things. But I really hope Martyn and Carlos use a > good design rather than quickly doing things out of urgency. This issue was opened in July 2012; I think the characterization of "quickly doing things out of urgency" is not exactly applicable here. I don't like the tone this conversation is taking; if you have a problem with the design of the feature, please explain it clearly and try to understand the reasons behind this request, instead of passive-aggressively label it as bad design. Fortunately I've had the chance to extensively talk to Martyn and Carlos about this feature in person and on IRC, and they have all the information to take a good decision to move this forward.
(In reply to Cosimo Cecchi from comment #16) > I don't like the tone this conversation is taking; if you have a problem > with the design of the feature, please explain it clearly and try to > understand the reasons behind this request, instead of passive-aggressively > label it as bad design. After this I joined IRC and addressed Cosimo personally. Cosimo and me had a good discussion on this on IRC. Nobody is passive-aggressive. We both have legitimate concerns and we understand each-other's concerns and mostly (if not completely) agree. We're both going to take our time to think about what we said to each other (technically). So there's no need to escalate this. Please let us iron this out. It's not as simple as "bah! this must be done! Just do it!". It's about an important API which we must get right. Let us engineers do this. I entrust Cosimo with this task and I'm sure he'll propose to me something that addresses my concerns, and that will be acceptable for Carlos and Martyn too. I also think the results will be fantastic (not just perfect, perfect usually isn't good enough). Bastien, I respect you. But please let the Tracker team handle this. We understand the necessity of the API. We understand the urgency. We just want to get it fantastic (= more than perfect). Like we do for all of our APIs. Please trust us.
(In reply to Philip Van Hoof from comment #17) > (In reply to Cosimo Cecchi from comment #16) > > > I don't like the tone this conversation is taking; if you have a problem > > with the design of the feature, please explain it clearly and try to > > understand the reasons behind this request, instead of passive-aggressively > > label it as bad design. > > After this I joined IRC and addressed Cosimo personally. > > Cosimo and me had a good discussion on this on IRC. <snip> > Bastien, I respect you. But please let the Tracker team handle this. We > understand the necessity of the API. We understand the urgency. We just want > to get it fantastic (= more than perfect). Like we do for all of our APIs. > > Please trust us. Want to look again at who wrote what maybe.
(In reply to Bastien Nocera from comment #18) > Want to look again at who wrote what maybe. I wrote "I wonder why a filesystem miner needs to keep application state.". That's a legitimate concern. Among other things this means that the proposed API won't survive us getting shut down, the session closed and/or the computer rebooted. At restart instead of assuming that we should cancel the request because the requesting application and we ourselves have vanished, we might need to take other actions (the removable device nor the necessity of having to index it, after that event, hasn't necessarily disappeared - and making the assumption that we ca easily make that assumption is often wrong). Note that tracker not only runs on a typical GNOME desktop. Also on TVs and digital TV setup boxes, on the dashboard of a car, on two (or more) smartphones and on external harddrives. On servers serving as search-index for SMB Windows shares and NAS drives. Those are just the applications I remember in the first five minutes of thinking about it. I'm also sure a lot of consumers have not even told us about their use-cases. This also means that making desktop-assumptions like : if an application disappears, its requests can be discarded, can be wrong. When I look at the API proposal of this bug, I have those concerns. Mentioning them isn't necessarily "passive aggressive". When I listed these to Cosimo on IRC he no longer saw my questions as passive aggressive. I just want us to get this right.
(In reply to Philip Van Hoof from comment #19) > Mentioning them isn't necessarily "passive aggressive". When I listed these > to Cosimo on IRC he no longer saw my questions as passive aggressive. You're quoting Cosimo from comment 16 like I wrote that.
If I'm really supposed to take part of that conversation... (In reply to Philip Van Hoof from comment #15) > (In reply to Bastien Nocera from comment #12) > > > A feature like this will a requirement to show media files on hot-pluggable > > storage, be it an SD card in a tablet, or an external hard drive with a > > laptop. > > We already support hot-pluggable storage on multiple devices (the N9 if you > attach a mass storage device and the Jolla if you insert a sdcard). Except that a 32GB card isn't an 8TB NAS, or a 500GB spinning disk that the user might plug into their laptop. The expectations aren't the same either. If I launch indexing on my NAS [1] (as you mentioned in comment 19), I don't have any expectation of availability. It's going to be ready "when it's done". Ditto for the SD card in the phone (comment 15). When I attach a hard-drive to my machine, I want 1) media to be available ASAP, and only the app know which media I'm actually interested in, 2) interactivity, or perceived interactivity not to go down, so hammering the hard disk is out of the question because Tracker cannot know what I plugged it in for, and what media interests me. In the case of external media on laptops (or convertible tablets), Tracker is competing against the file chooser, so it needs to do less work to be more efficient. Apps know what the user wants to do, Tracker doesn't and can't unless told. [1]: Synology has a home-made indexer using Postgres as storage
Those are all good reasons to have such an api. But none of the use-cases require us to keep state about the applications requesting in our process.
An example stateless solution would be similar to unsolicited events in IMAP, except use DBus's signaling infrastructure: 1) The application start listening on a DBus signal with a arg0 filter being a uuid it generates for itself. 2) It then registers a request (with a priority/strategy as parameter) and attaches that uuid as cookie to it. 3) Tracker's FS miner reacts to this (based on the priority and/or strategy, perhaps combined with its own decisions), when the request is ready, with a DBus signal emit having as arg0 the uuid-cookie. 4) If the data to pass to the application at this point is large, then a new API with the uuid-cookie as in parameter and a array of objects as out parameter is needed. After reading that (and at a certain timeout), the data is discarded from Tracker's state. Hopefully the data to pass at that point is small. In which case it can be part of the signal emit. 5) The signal emissions should also, just like how GraphUpdated works, collect multiple events together to avoid hammering the ipc bus with massive amounts of signals. Perhaps kdbus and arg0 filtering solves performance issues of waking up a lot applications plus dbus-monitor. But not all platforms already have this. 6) The priority/strategy could survive a reboot, session restart, tracker shutting down, etc. It could be registered using stored state in meta.db. In the end, the info is on the task being requested. Not on the application or process requesting it. PID x might request it, PID y might handle it. A disappearing PID x should not necessarily discard the request in Tracker's FS miner.
IMO this pattern would make sense if the request were super-verbose (eg. notifying about every file found, or somesuch), or if tracker exclusively dealt with services from which high reliability is expected. In reality, this is more for apps we should consider short-lived, be it either due to crashes, the app closing... This is very much akin to the Clear/SetRdfTypes calls in tracker-extract, we are already doing the very same business of keeping track of requestors, and it seems the same kind of "please do this first" soft request, apps should still listen to GraphUpdated as usual. As for possible abusive behavior, AFAICS the only increasing resource on g_bus_watch_name_on_connection() is memory, I'm sure there will be other things going wrong before either tracker-extract (and tracker-miner-fs after this) get hindered in any way, and on the side we don't care about. It is also worth noting that the final control of the indexed resources is in the miner, files/directories not passing filters will still be ignored, and unmounts can just unschedule things and drop watches on callers. I took the initial patch and reworked it considerably, the patch actually implements this behavior for the current IndexFile DBus call, so watches will happen automatically on index requests on directories. Combining this and tracker-extract's SetRdfTypes, eg. gnome-music may request indexing music pieces from an usb stick ASAP.
Created attachment 309322 [details] [review] tracker-miner-fs: Keep cache of IndexFile requesters on directories The senders for all DBus requests to the IndexFile method on directories will be now watched, the dbus presence of the senders will control the lifetime of the directory on the indexed directories tree. There may be multiple requests on a same directory, in such case the directory will be indexed/monitored for as long as there is alive requesters on it. Requests on already indexed directories (or children of recursively indexed ones) will be silently ignored. Unmounts will also silently drop the IndexFile listeners, applications should issue new requests on volume mounts if desired. The patch is loosely based on initial work from Felipe Borges.
Forgot to mention, I intend to get this in before the feature freeze coming the next week.
Can you also please provide a test application for the new features? I've already requested this sort of thing many times for other features (such as file changes), and didn't get anything. I would have no idea how to integrate that in grilo's Tracker plugin for example.
Created attachment 309379 [details] raw dbus client I tested the feature with the attached raw dbus client, just run it as test-client /some/mount/folder/. FWIW this feature is also available through tracker_miner_manager_index_file(_async) from libtracker-control (this library is separate from libtracker-miner, and has its standalone .pc file): https://git.gnome.org/browse/tracker/tree/src/libtracker-control/tracker-miner-manager.h#n134 As for file updates, there's a sample GraphUpdated client at: https://git.gnome.org/browse/tracker/tree/examples/libtracker-sparql/class-signal.c It's perhaps a bit too raw, but running it helps figuring out the data to be expected there, would be nice making this part of developer docs, but it's probably better to be taken in a separate bug.
(In reply to Carlos Garnacho from comment #24) > IMO this pattern would make sense if the request were super-verbose (eg. > notifying about every file found, or somesuch), or if tracker exclusively > dealt with services from which high reliability is expected. What makes you think Tracker isn't used in such a use-case? I personally think that starting and stopping the tracker-miner-fs process should have no influence on the system. Right now we do quite a lot to make sure it doesn't have any influence. With this API the tracker-miner-fs process goes in the direction of influencing the entire desktop and system (in a embedded use-case) based on it getting started and stopped. > In reality, this is more for apps we should consider short-lived, be it > either due to crashes, the app closing... Right now it's for apps that we can consider short-lived. But that's an assumption on how and for what the new API will be used. Often such things diverge. > This is very much akin to the > Clear/SetRdfTypes calls in tracker-extract, we are already doing the very > same business of keeping track of requestors, Yah, and that's not a nice API. > and it seems the same kind of > "please do this first" soft request, apps should still listen to > GraphUpdated as usual. > As for possible abusive behavior, AFAICS the only increasing resource on > g_bus_watch_name_on_connection() is memory, I'm sure there will be other > things going wrong before either tracker-extract (and tracker-miner-fs after > this) get hindered in any way, and on the side we don't care about. Memory itself isn't my main concern (although it is a small one, as I think we should aim to have a fixed amount of memory instead of making it possible that memory balloons). The state is. And the state disappearing and because of it behaviour changing when we get started and stopped. [cut - about all good changes to the patch]
(In reply to Philip Van Hoof from comment #29) > (In reply to Carlos Garnacho from comment #24) > > IMO this pattern would make sense if the request were super-verbose (eg. > > notifying about every file found, or somesuch), or if tracker exclusively > > dealt with services from which high reliability is expected. > > What makes you think Tracker isn't used in such a use-case? Didn't claim that, but right, I just don't know, not many non-regular downstreams raise their voices on the Tracker ML. I'd wish to hear from them, but so far I find optimizing for this usecase quite nebulous. > > I personally think that starting and stopping the tracker-miner-fs process > should have no influence on the system. > > Right now we do quite a lot to make sure it doesn't have any influence. With > this API the tracker-miner-fs process goes in the direction of influencing > the entire desktop and system (in a embedded use-case) based on it getting > started and stopped. I see it rather the opposite way, the desktop and system are able to influence tracker-miner-fs doings. We've had this IndexFile call for years, and clients were already able to do so, except tracker-miner-fs would just keep indexing the directory even if all requesters were long gone. And I could even argue that this is a way to "influence" the system, keeping tracker-miner-fs churning on possibly large directory trees, slow drives, etc... even if all interest on that content has vanished. And there's plenty of prior art here, eg. any app can use TrackerMinerManager to pause/resume miners on whatever reason. > > > In reality, this is more for apps we should consider short-lived, be it > > either due to crashes, the app closing... > > Right now it's for apps that we can consider short-lived. But that's an > assumption on how and for what the new API will be used. Often such things > diverge. Right, we can't dictate the clients' lifetime. I however can't see how can we put all usecases together, I see the value of your proposal for long lived apps (maybe even making this part of the ontology), but to apps that we can consider prone to disappearing unexpectedly (crashes, etc), it would have the same flaw that IndexFile currently has, so there'd be basically no value in porting to such more complex system. I ultimately think this is a problem of settings, index-removable-devices defaults to false (for reasons IMO), although for the "talks with long lived system services" usecase probably setting this to true makes sense, because it would be mostly safe to assume that some client is interested in some content (are removable devices useful in those contexts? is this needed for other similar folders?). It is worth noting that the response to "are we interested in removable devices?" is most usually yes, and it is completely independent to the setting value. > > > This is very much akin to the > > Clear/SetRdfTypes calls in tracker-extract, we are already doing the very > > same business of keeping track of requestors, > > Yah, and that's not a nice API. I don't think it's the best example of dbus api either, but it fits a purpose, and clearly has a demand seeing it's being used around. > > > and it seems the same kind of > > "please do this first" soft request, apps should still listen to > > GraphUpdated as usual. > > > > As for possible abusive behavior, AFAICS the only increasing resource on > > g_bus_watch_name_on_connection() is memory, I'm sure there will be other > > things going wrong before either tracker-extract (and tracker-miner-fs after > > this) get hindered in any way, and on the side we don't care about. > > Memory itself isn't my main concern (although it is a small one, as I think > we should aim to have a fixed amount of memory instead of making it possible > that memory balloons). The state is. And the state disappearing and because > of it behaviour changing when we get started and stopped. > Memory: this just keeps a short ":1.xx" address string, a GFile, and a few scaffolding structs/pointers, I see dbus rejecting connections before we get anywhere serious. We might keep a limit on watched directory GFiles per client though. State: I really think your main concerns are already materialized in IndexFile as it currently exists. As for this name watching feature bringing a behavior change, it is so in the same terms that unmount events or configuration changes are, so it's really nothing that deserves special treatment clients shouldn't be already aware of.
(In reply to Carlos Garnacho from comment #30) > (In reply to Philip Van Hoof from comment #29) [cutting here and there] > > Right now we do quite a lot to make sure it doesn't have any influence. With > > this API the tracker-miner-fs process goes in the direction of influencing > > the entire desktop and system (in a embedded use-case) based on it getting > > started and stopped. > I see it rather the opposite way, the desktop and system are able to > influence tracker-miner-fs doings. We've had this IndexFile call for years, > and clients were already able to do so, except tracker-miner-fs would just > keep indexing the directory even if all requesters were long gone. > > And I could even argue that this is a way to "influence" the system, keeping > tracker-miner-fs churning on possibly large directory trees, slow drives, > etc... even if all interest on that content has vanished. Hmm. I see Tracker's services as services. Not as desktop apps that can be turned off and on by other apps. Services can start and stop, just like computers can reboot. But apps use an API to get info and instruct us to do things. Not to control it completely. The fact that it runs under the user's session is just accidental and because it needs no extra privileges. If we churn, then that's a bug to solve. The user can stop the service. Or the user can configure the service not to be interested in certain content any more. But apps deciding for the user what content is and what content isn't interesting? Maybe apps can prioritize this (what I guess these APIs and what IndexFile are for). But letting other apps make big policy decisions? Sounds a bit chaotic to me. > And there's plenty of prior art here, eg. any app can use > TrackerMinerManager to pause/resume miners on whatever reason. Yes but those implement our interface and/or use our defined APIs. Pause, resume, etc. But indeed. Since we offer a Stop API and meanwhile we keep state in the process for other applications, and the Stop API loses that state: that means that two applications can interfere with each other yet only do legal calls on us: One requests state (with this API). The other stops and starts tracker-miner-fs. Suddenly the first's functionality stops working because of the legal actions the second took. Isn't that weird? Both did completely legal things. And yet one of them has a bug. GraphUpdated avoids this mess entirely. You can start and stop tracker-store as often as you like. Makes no difference for its correctness to all of its consumers (both in and out). > Right, we can't dictate the clients' lifetime. I however can't see how can > we put all usecases together, I see the value of your proposal for long > lived apps (maybe even making this part of the ontology), but to apps that > we can consider prone to disappearing unexpectedly (crashes, etc), it would > have the same flaw that IndexFile currently has, so there'd be basically no > value in porting to such more complex system. I understand there is a flaw with continuing on a request while the requesting app has vanished. But meanwhile I don't know why we would want to fix it? It got requested. So we fulfill it. Whether the requester crashes or not. Whether we reboot or get stopped and started, or not. Until something unrequests the request. Also, doesn't this make your API a blocking-style API? How can a run and exit script use this? The script vanishes. Or must it wait in a sleep-loop (= blocking style API)? They can't even use select() to wait for this ;) > I ultimately think this is a problem of settings, index-removable-devices > defaults to false (for reasons IMO), although for the "talks with long lived > system services" usecase probably setting this to true makes sense, because > it would be mostly safe to assume that some client is interested in some > content (are removable devices useful in those contexts? is this needed for > other similar folders?). Yes I agree. Sounds like we are trying to solve a deeper problem with a hack. > It is worth noting that the response to "are we interested in removable > devices?" is most usually yes, and it is completely independent to the > setting value. Maybe we need an API that allows a desktop to implement asking the user about it? ie. the gnome-session could implement TrackerRemovableDeviceIndexAsker (or another fancy interface name) that looks a bit like AskUserToIndex(in s device, out b answer) And then gnome-session could just ask: Hey peep, you want us to index the USB stick you just inserted? [Yah, NOoh!] > > > This is very much akin to the > > > Clear/SetRdfTypes calls in tracker-extract, we are already doing the very > > > same business of keeping track of requestors, > > > > Yah, and that's not a nice API. > > I don't think it's the best example of dbus api either, but it fits a > purpose, and clearly has a demand seeing it's being used around. It can also mean that we are missing something more fundamental. > > > and it seems the same kind of > > > "please do this first" soft request, apps should still listen to > > > GraphUpdated as usual. > > > > > > > As for possible abusive behavior, AFAICS the only increasing resource on > > > g_bus_watch_name_on_connection() is memory, I'm sure there will be other > > > things going wrong before either tracker-extract (and tracker-miner-fs after > > > this) get hindered in any way, and on the side we don't care about. > > > > Memory itself isn't my main concern (although it is a small one, as I think > > we should aim to have a fixed amount of memory instead of making it possible > > that memory balloons). The state is. And the state disappearing and because > > of it behaviour changing when we get started and stopped. > > > > Memory: this just keeps a short ":1.xx" address string, a GFile, and a few > scaffolding structs/pointers, I see dbus rejecting connections before we get > anywhere serious. We might keep a limit on watched directory GFiles per > client though. ok, sure > State: I really think your main concerns are already materialized in > IndexFile as it currently exists. As for this name watching feature bringing > a behavior change, it is so in the same terms that unmount events or > configuration changes are, so it's really nothing that deserves special > treatment clients shouldn't be already aware of. So because we already have a problem in our APIs, it's fine to introduce a new problem :-) ?
(In reply to Philip Van Hoof from comment #31) > (In reply to Carlos Garnacho from comment #30) > > (In reply to Philip Van Hoof from comment #29) > > [cutting here and there] > > > > Right now we do quite a lot to make sure it doesn't have any influence. With > > > this API the tracker-miner-fs process goes in the direction of influencing > > > the entire desktop and system (in a embedded use-case) based on it getting > > > started and stopped. > > > I see it rather the opposite way, the desktop and system are able to > > influence tracker-miner-fs doings. We've had this IndexFile call for years, > > and clients were already able to do so, except tracker-miner-fs would just > > keep indexing the directory even if all requesters were long gone. > > > > And I could even argue that this is a way to "influence" the system, keeping > > tracker-miner-fs churning on possibly large directory trees, slow drives, > > etc... even if all interest on that content has vanished. > > Hmm. I see Tracker's services as services. Not as desktop apps that can be > turned off and on by other apps. Services can start and stop, just like > computers can reboot. But apps use an API to get info and instruct us to do > things. This problem extends to any dbus api that does not apply changes persistently. If we got rid of all redundance, the only source of intructions for tracker-miner-fs would be store/GSettings, but then that only makes sense if indexing is done "when it's done" and we drop all sense of interactivity. Even then, as you say people find their ways around broken/insufficient APIs, and I really think it is quite sane to at least keep the lifetime of the "behavior changes" in the scope of the miner, rather than lending it to the client. > > Not to control it completely. IndexFile is only additive, you can't prevent anything else from being indexed if it has to. > > The fact that it runs under the user's session is just accidental and > because it needs no extra privileges. > > If we churn, then that's a bug to solve. The user can stop the service. Or > the user can configure the service not to be interested in certain content > any more. That still misses the crashing scenario. IMO, we should not only aim for elegangly written apps that will be educated with their resources, but with the random joe that is eg. tinkering with this gnome platform/jolla tablet/whatnot. > > But apps deciding for the user what content is and what content isn't > interesting? Maybe apps can prioritize this (what I guess these APIs and > what IndexFile are for). But letting other apps make big policy decisions? > Sounds a bit chaotic to me. I really don't get what "big policy decisions" are you talking about :), indexing a folder not canonically coming from settings isn't one IMO, we already do that if index-removable-devices is true. Except that this setting is false by default, and yet we want removable devices indexed on some circumstances. This is precisely what IndexFile meant to cater for, indexing files/directories not specifically told by the configuration, gnome-documents eg. ensures the gnome help manual PDF file is indexed, we even expose this feature through the "tracker index -f ..." CLI tool. > > > And there's plenty of prior art here, eg. any app can use > > TrackerMinerManager to pause/resume miners on whatever reason. > > Yes but those implement our interface and/or use our defined APIs. Pause, > resume, etc. TrackerMinerManager also has an index_file request, just like pause/resume APIs > > But indeed. Since we offer a Stop API and meanwhile we keep state in the > process for other applications, and the Stop API loses that state: that > means that two applications can interfere with each other yet only do legal > calls on us: > > One requests state (with this API). The other stops and starts > tracker-miner-fs. Suddenly the first's functionality stops working because > of the legal actions the second took. How is it legal starting/stopping tracker-miner-fs? that should be up to the session manager (eg. not in our hands), miners very much expect to survive any caller, proof of that is the clunky way to send SIGTERM to miners on "tracker control -t". > > Isn't that weird? Both did completely legal things. And yet one of them has > a bug. GraphUpdated avoids this mess entirely. You can start and stop > tracker-store as often as you like. Makes no difference for its correctness > to all of its consumers (both in and out). In the current shape of things, I don't see "restart miner" as a legal scenario except for testing purposes, and again, these dbus apis were already stateless from the start. > > > Right, we can't dictate the clients' lifetime. I however can't see how can > > we put all usecases together, I see the value of your proposal for long > > lived apps (maybe even making this part of the ontology), but to apps that > > we can consider prone to disappearing unexpectedly (crashes, etc), it would > > have the same flaw that IndexFile currently has, so there'd be basically no > > value in porting to such more complex system. > > I understand there is a flaw with continuing on a request while the > requesting app has vanished. But meanwhile I don't know why we would want to > fix it? It got requested. So we fulfill it. Whether the requester crashes or > not. Whether we reboot or get stopped and started, or not. Until something > unrequests the request. > > Also, doesn't this make your API a blocking-style API? How can a run and > exit script use this? The script vanishes. Or must it wait in a sleep-loop > (= blocking style API)? They can't even use select() to wait for this ;) That was my only counterargument for not doing this as a separate DBus call, I think once-off script usecases are a small enough, there might be some in the wild, so the concern is valid. But still, that looks like the pattern I wouldn't want to encourage myself. > > > I ultimately think this is a problem of settings, index-removable-devices > > defaults to false (for reasons IMO), although for the "talks with long lived > > system services" usecase probably setting this to true makes sense, because > > it would be mostly safe to assume that some client is interested in some > > content (are removable devices useful in those contexts? is this needed for > > other similar folders?). > > Yes I agree. Sounds like we are trying to solve a deeper problem with a hack. IMO, we are making up the whole "deeper problem", specially if all we know is "there might be other usecases"... I don't quite get how the dbus name watching feature turns into a flaw to avoid here. > > > It is worth noting that the response to "are we interested in removable > > devices?" is most usually yes, and it is completely independent to the > > setting value. > > Maybe we need an API that allows a desktop to implement asking the user > about it? > > ie. the gnome-session could implement TrackerRemovableDeviceIndexAsker (or > another fancy interface name) that looks a bit like AskUserToIndex(in s > device, out b answer) > > And then gnome-session could just ask: Hey peep, you want us to index the > USB stick you just inserted? [Yah, NOoh!] That would still need gnome-session to talk to tracker-miner-fs somehow, no? > > > > > This is very much akin to the > > > > Clear/SetRdfTypes calls in tracker-extract, we are already doing the very > > > > same business of keeping track of requestors, > > > > > > Yah, and that's not a nice API. > > > > I don't think it's the best example of dbus api either, but it fits a > > purpose, and clearly has a demand seeing it's being used around. > > It can also mean that we are missing something more fundamental. FWIW, The name watching feature is the least of my problems here. IMO such rethoric questions belong to a Tracker 2.x/future/whatnot. This bug has been opened for 3 years, with a patch containing a valid approach that fits the current APIs and tracker-miner-fs ways of working. And it grabs enough attention to have several people at Guadec asking me specifically about this bug. I could call it fair to block this if there were plans/branches to do this "the proper way", we currently are quite blank at what that is though (I eg. think your proposal sort of misses the point wrt the original request usecases, but again, IMO you're trying to put together two irreconciliable things). And we don't have plenty of manpower to realize it either. > > > > > and it seems the same kind of > > > > "please do this first" soft request, apps should still listen to > > > > GraphUpdated as usual. > > > > > > > > > > As for possible abusive behavior, AFAICS the only increasing resource on > > > > g_bus_watch_name_on_connection() is memory, I'm sure there will be other > > > > things going wrong before either tracker-extract (and tracker-miner-fs after > > > > this) get hindered in any way, and on the side we don't care about. > > > > > > Memory itself isn't my main concern (although it is a small one, as I think > > > we should aim to have a fixed amount of memory instead of making it possible > > > that memory balloons). The state is. And the state disappearing and because > > > of it behaviour changing when we get started and stopped. > > > > > > > Memory: this just keeps a short ":1.xx" address string, a GFile, and a few > > scaffolding structs/pointers, I see dbus rejecting connections before we get > > anywhere serious. We might keep a limit on watched directory GFiles per > > client though. > > ok, sure > > > State: I really think your main concerns are already materialized in > > IndexFile as it currently exists. As for this name watching feature bringing > > a behavior change, it is so in the same terms that unmount events or > > configuration changes are, so it's really nothing that deserves special > > treatment clients shouldn't be already aware of. > > So because we already have a problem in our APIs, it's fine to introduce a > new problem :-) ? Let's try to define what the "new problem" is. I really think the patch takes the status quo to a better shape, and not applying it would still leave us in an inconsistent state, not any better nor closer to your ideal. I really would like to know if you downright disapprove this patch in essence (eg. see fundamental flaws) or it just doesn't go in the direction you'd like to see Tracker going, because one has more weight than the other to keep blocking this (the perception from our userbase is quite hindered already I'd say). Sam/Martyn/Jürg/anyone , opinions?
Carlos, not a lot to add to what hasn't already been said, except that your approach sounds great to me, and would work well for the gnome-documents/gnome-photos use cases. We want apps to be able to have their own policies as to whether removable devices are indexed or not. Having a centralized "do you want to index this device?" on hotplug sounds like an extremely confusing UX design. Philip, did you actually read through the latest Carlos' patch? Most of the comments that you brought up in this last round seem non-sequiturs when you look at what the patch does.
I think Carlos' points have made a lot of sense to me. My personal feeling is that I don't think all removable media should be indexed by default because you can't guarantee it's what the user wants every time. That is why it is OFF by default. APIs ---- Applications have more context, reason, requirement and (should have) control about what they want and when they want it. Tracker has less information and can't guess this correctly for apps/cases. This means you have to empower the applications with a rich enough API. I've been thinking about this on a few levels; over the years we've had: a) APIs that relate to a process lifetime. b) APIs that relate content to a GRAPH (or application you could say). c) APIs that relate content to a period of inactivity (removing DB resources after removable media hasn't been mounted for X days). d) others? I think we need to cater for cases like this (and maybe some we've not considered?). I wonder if application developers have any other "life-time" needs for the data we should be thinking about? The API approaches I mentioned above are not all used for indexing content, some are for pausing miners and other things. The Problem ----------- Generally speaking, what makes this problem hard, is knowing when the data should be cleaned up so as not to overpopulate the database. I think knowing WHEN to populate is an easier problem to solve. I think you have to consider both in order to have a fully rounded solution and avoid future problems or bugs. Populating ---------- We should do this: 1. When Tracker is mandated to (e.g. default preferences - i.e. $HOME/Documents or Indexing Removable Devices). 2. When an application is asking Tracker to index a file or directory (e.g. IndexFile API). I think #2 is currently lacking good APIs for application developers. Currently, the IndexFile API is permanent and a quick solution. It's not really well thought out. Consider: - How long does the application need this content? - What if other applications need this content after the first application? - What if an application A doesn't want to see the content but application B does (GRAPHS?) - When are any one of the applications done with that content? - When a device is physically removed, what we do with the data in the DB? - How long to we remember it (in case it is plugged in again) / do we remember it? ... Currently #2 is a permanent solution unless the content is removed by the user (e.g. rm $FILE). Should it be? Perhaps we need an API that allows for more life time sensitivity, e.g.: (somewhat crude...) IndexFile(URI); <-- permanent IndexFileForProcessDuration(URI); IndexFileUntilReboot(URI); IndexFileUntilDays(URI, daysUntilRemoved); IndexRemovableDevice(URI, application, daysUntilRemoved); As for content from removable devices, perhaps (again crude): RemovableDeviceAddInterest(application, device_type, callback) RemovableDeviceRemoveInterest(application, device_type, callback) Where 'callback' allows us to call applications with each file we find. We do this internally already from libtracker-miner to tracker-miner-fs, calling apps with interest could also be done. Is this too much? Maybe a MIME type is enough? These sort of APIs help with the problem of unpopulating the database too. Any other ideas from you guys? Honestly though, I would rather hear from application developers *HOW* they would prefer to get their data and work to that specification.
Review of attachment 309322 [details] [review]: Seems that you did away with IndexFileForProcess and just changed the behaviour of IndexFile. Which is a D-Bus ABI break technically. If IndexFile isn't widely used then it doesn't really matter, but I think it's cleaner to add a new function call. I also think that this patch doesn't solve the problem of interference between the 'applications opt-in indexing' configuration and the 'gsettings' configuration. For example, if an app calls IndexFile(/foo), where /foo is a directory that is configured to be indexed anyway, and then the app disappears, /foo will be removed from the indexing tree. But that's wrong, because /foo was still configured to be indexed. So I think we need to do reference counting inside the TrackerIndexingTree. It's a little ugly but it's the only way to allow multiple ways of controlling what is indexed. I think the discussions about API design are a bit moot. Any embedded users who don't want to use these methods can avoid using them. On the desktop, I agree that things like Miner.Pause provide ways for apps to break a user's desktop, but there are about a billion ways for an app to break a user's desktop, and sandboxing seems like the way to go to fix that. Meanwhile, not being able to use Tracker index removable devices is a big hole in the GNOME desktop which lots of people (including me) want to see fixed. ::: src/miners/fs/tracker-miner-files-index.c @@ +370,2 @@ if (is_dir) { + TrackerIndexingTree *indexing_tree; I'd really appreciate a comment here explaining what's going on. I guess it's correct, but can't really work it out at a glance :) ::: src/miners/fs/tracker-miner-files-peer-listener.c @@ +19,3 @@ + * Author: Carlos Garnacho <carlosg@gnome.org> + */ + this class could really do with a comment at the top explaining its purpose
Created attachment 309503 [details] [review] libtracker-miner: Add ownership tracking to Tracker indexing config The TrackerIndexingTree class keeps track of each directory or file that has been explicitly selected for indexing and monitoring. As a new feature, we want to allow any application to add to the list of directories being indexed and monitored, and to later be able to remove things from the list again. Previously, the tracker-miner-fs process was totally in control of what went in the IndexingTree. In order to ensure that applications can't interfere with the existing configuration, we need to track *who* added something to the indexing tree. This is done with a string identifier. The tracker_miner_fs_directory_add() functions now take an 'owner' parameter so they can keep track of who is interested in each configured indexing root. The tracker_miner_fs_ignore() function lets a given owner that previously expressed interest in an indexing root now say that they are not interested. If that indexing root has no 'owners' left, it is deleted from the TrackerIndexingTree, which cancels any running indexing tasks and removes any monitors. We need to make sure that nobody can claim the internal 'tracker-miner-fs' ID through the external API -- that would make it possible to override the internal configuration.
I may be totally on the wrong planet, but this is what I think we need to do to make sure apps can't interfere with the indexing configuration set by the user. Not fully tested yet
s/tracker_miner_fs_ignore()/tracker_miner_fs_unwatch()/ in the above message
(In reply to Carlos Garnacho from comment #32) > This bug has been opened for 3 years, with a patch containing a valid > approach that fits the current APIs and tracker-miner-fs ways of working. > And it grabs enough attention to have several people at Guadec asking me > specifically about this bug. > > I could call it fair to block this if there were plans/branches to do this > "the proper way", we currently are quite blank at what that is though (I eg. > think your proposal sort of misses the point wrt the original request > usecases, but again, IMO you're trying to put together two irreconciliable > things). And we don't have plenty of manpower to realize it either. I yield. This is a good argument. If nobody else is concerned enough to solve this the proper way within three years, then surely it's important enough for us to accept a solution done in the way proposed by those who are affected by what they wanted to fix. Let's keep the door open for better ideas that solve this, though. By that I mean that we should be willing to deprecate and change APIs in future. > Let's try to define what the "new problem" is. I really think the patch > takes the status quo to a better shape, and not applying it would still > leave us in an inconsistent state, not any better nor closer to your ideal. > > I really would like to know if you downright disapprove this patch in > essence (eg. see fundamental flaws) or it just doesn't go in the direction > you'd like to see Tracker going, because one has more weight than the other > to keep blocking this (the perception from our userbase is quite hindered > already I'd say). I (just) don't like the direction. I'd like to see tracker-miner-fs go in the direction of statelessness. But I admit that my ideals and ideology are, here, of lesser importance. > Sam/Martyn/Jürg/anyone , opinions? I'm indeed also very interested in other opinions. But I don't want to stall this any further. I feel that I got the opportunity to make my point of view more than clear :-) Thanks.
Hey :), replying to a bunch of comments in a row. (In reply to Philip Van Hoof from comment #39) > (In reply to Carlos Garnacho from comment #32) > > This bug has been opened for 3 years, with a patch containing a valid > > approach that fits the current APIs and tracker-miner-fs ways of working. > > And it grabs enough attention to have several people at Guadec asking me > > specifically about this bug. > > > > I could call it fair to block this if there were plans/branches to do this > > "the proper way", we currently are quite blank at what that is though (I eg. > > think your proposal sort of misses the point wrt the original request > > usecases, but again, IMO you're trying to put together two irreconciliable > > things). And we don't have plenty of manpower to realize it either. > > I yield. This is a good argument. If nobody else is concerned enough to > solve this the proper way within three years, then surely it's important > enough for us to accept a solution done in the way proposed by those who are > affected by what they wanted to fix. > > Let's keep the door open for better ideas that solve this, though. By that I > mean that we should be willing to deprecate and change APIs in future. Thanks :), and of course! FWIW I do consider places of the miners public API terribly ad-hoc, and this a quite good example (ReindexMimeTypes an even better one...). At this point I can only hope to deprecate and replace by newer APIs/ways as you say, but I also think the end result should be just as easy for clients as it were (say, if we add some "complex" mechanism, we should also wrap it conveniently on libtracker-control) > > > Let's try to define what the "new problem" is. I really think the patch > > takes the status quo to a better shape, and not applying it would still > > leave us in an inconsistent state, not any better nor closer to your ideal. > > > > I really would like to know if you downright disapprove this patch in > > essence (eg. see fundamental flaws) or it just doesn't go in the direction > > you'd like to see Tracker going, because one has more weight than the other > > to keep blocking this (the perception from our userbase is quite hindered > > already I'd say). > > I (just) don't like the direction. I'd like to see tracker-miner-fs go in > the direction of statelessness. But I admit that my ideals and ideology are, > here, of lesser importance. Fair enough, on the big picture sounds like a worthwhile goal. I'm glad we get to agree about the priorities :). (In reply to Sam Thursfield from comment #35) > Review of attachment 309322 [details] [review] [review]: > > Seems that you did away with IndexFileForProcess and just changed the > behaviour of IndexFile. Which is a D-Bus ABI break technically. If IndexFile > isn't widely used then it doesn't really matter, but I think it's cleaner to > add a new function call. Yeah, it was also brought up above, I'd ponder just going for IndexFileForProcess as the original patch did, although I really didn't want to extend the API on this side... > > I also think that this patch doesn't solve the problem of interference > between the 'applications opt-in indexing' configuration and the 'gsettings' > configuration. For example, if an app calls IndexFile(/foo), where /foo is > a directory that is configured to be indexed anyway, and then the app > disappears, /foo will be removed from the indexing tree. But that's wrong, > because /foo was still configured to be indexed. That's what the extra code in handle_method_call_index_file() means to prevent :). It will only add the file to the TrackerIndexingTree if it wasn't part already of a recursively indexed directory, or a indexing root itself. This kind of accounts on configuration being "static" though, we might also ensure there is no file in the TrackerIndexingTree at the time of adding it from config, ensuring it is removed from there would also tear down the watches. Another case that might happen (because of the way we call tracker_file_system_forget_files() on tracker_indexing_tree_remove()) is that clients indexing a subfolder (eg /mnt/foo/subfolder) of other folders requested by other clients (eg /mnt/foo) are left deaf. That is not specific to IndexFile, and might also be triggered with certain configurations. And anyway, that already involves some assumptions on the drive content from the client inspecting the subfolder, which is arguably wrong. > > So I think we need to do reference counting inside the TrackerIndexingTree. > It's a little ugly but it's the only way to allow multiple ways of > controlling what is indexed. > > I think the discussions about API design are a bit moot. Any embedded users > who don't want to use these methods can avoid using them. On the desktop, I > agree that things like Miner.Pause provide ways for apps to break a user's > desktop, but there are about a billion ways for an app to break a user's > desktop, and sandboxing seems like the way to go to fix that. I agree there, there's things that belong to the current session, and offering it doesn't bring the obligation to use it. > > Meanwhile, not being able to use Tracker index removable devices is a big > hole in the GNOME desktop which lots of people (including me) want to see > fixed. > > ::: src/miners/fs/tracker-miner-files-index.c > @@ +370,2 @@ > if (is_dir) { > + TrackerIndexingTree *indexing_tree; > > I'd really appreciate a comment here explaining what's going on. I guess > it's correct, but can't really work it out at a glance :) As explained above, that's mainly some checks so 1) we don't add watches to directories already being inspected and 2) we only keep watches from files that originally came from IndexFile. > > ::: src/miners/fs/tracker-miner-files-peer-listener.c > @@ +19,3 @@ > + * Author: Carlos Garnacho <carlosg@gnome.org> > + */ > + > > this class could really do with a comment at the top explaining its purpose Yup, you're right it can do with some extra comments. (In reply to Sam Thursfield from comment #36) > Created attachment 309503 [details] [review] [review] > libtracker-miner: Add ownership tracking to Tracker indexing config ... and this patch in turn modifies public C API :) (tracker-indexing-tree.h and tracker-miner-fs.h are public headers). During the past times, I really intended to get rid of tracker_miner_fs_add* (I never finished the blow as you see), so it would be great if we avoided API growing there, would be nice to have this as a tracker_indexing_tree_add_with_source() or somesuch. It would also be great if NULL were a shortcut for "self", so we don't require as many changes around. (In reply to Martyn Russell from comment #34) > Perhaps we need an API that allows for more life time sensitivity, e.g.: > (somewhat crude...) > > IndexFile(URI); <-- permanent > IndexFileForProcessDuration(URI); > IndexFileUntilReboot(URI); > IndexFileUntilDays(URI, daysUntilRemoved); > IndexRemovableDevice(URI, application, daysUntilRemoved); > > As for content from removable devices, perhaps (again crude): > > RemovableDeviceAddInterest(application, device_type, callback) > RemovableDeviceRemoveInterest(application, device_type, callback) This seems to go quite in the opposite direction than what Philip suggests, I'd suggest that we take some time to chat about this at #tracker before we go wild adding API, probably tracker 1.7/1.8 material at least :)
Created attachment 309646 [details] [review] tracker-miner-fs: Keep cache of IndexFile requesters on directories The senders for all DBus requests to the IndexFile method on directories will be now watched, the dbus presence of the senders will control the lifetime of the directory on the indexed directories tree. There may be multiple requests on a same directory, in such case the directory will be indexed/monitored for as long as there is alive requesters on it. Requests on already indexed directories (or children of recursively indexed ones) will be silently ignored. Unmounts will also silently drop the IndexFile listeners, applications should issue new requests on volume mounts if desired. The patch is loosely based on initial work from Felipe Borges.
That's the patch that's going into the impending release, with some better comments, and splitting this into an IndexFileForProcess call in order to preserve backwards compat. It was agreed on #tracker that we should probably do roughly as sam's patch does, but will be better to work further on this as internal API, so it can be made part of 1.6.x, so this bug will be kept open (and also for future planning).
Comment on attachment 309646 [details] [review] tracker-miner-fs: Keep cache of IndexFile requesters on directories Attachment 309646 [details] pushed as 614589d - tracker-miner-fs: Keep cache of IndexFile requesters on directories
Regarding indexing of mount points, I noticed some rough edges while trying to index non-native (ie. non file:///) mount points (bug 792337).
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/tracker/-/issues/ Thank you for your understanding and your help.