GNOME Bugzilla – Bug 772570
[MinerFs][FileNotifier][FileSystem] FileSystem's Node Tree goes out of sync if a directory inside an index root is deleted during first crawl
Last modified: 2021-05-26 22:23:41 UTC
This is a bit of a corner case. This only happens during the first crawl/indexation done by the MinerFs. If a folder with many subfolders and files inside on of the indexed locations is deleted during the first crawl, TrackerFileSystem Node Tree may go out of sync. After crawling is finished many of the deleted folders and some files may still be referenced there. This happens mainly because during the first crawl, been_crawled in TrackerMinerFs is set to FALSE, and so, on file/folder deletions the created/updated item queues are not checked for children of the deleted folders. The deletion is queued, and tracker_file_system_forget_files is called correctly so that the nodes in the tree corresponding to the deleted folder and its content will be marked to be eventually removed once the last file reference is dropped. However, the reenqueue mechanism doesn't play nicely in this particular case. All queued files corresponding to the deleted folder and its contents will most likely encounter an error during the calls to item_add_or_update so they won't be inserted. However, when the contents of the folders are picked up, we'll perform a lookup_file_urn on their parents (which weren't inserted). Doing so will re-insert them as nodes in the FileSystems if they had been removed. Due to the fact that they were never inserted, we'll get an error "Parent not indexed yet", so we'll reenqueue both of them. This goes on until the REENTRY_MAX is reached for each file. When this happens, there is a reference that is never dropped so the file effectively leaks and the file gets stuck in TrackerFileSystem's node tree. So as a summary: In tracker-miner-fs.c, the reenqueue mechanism currently has two problems: --Leaking GFiles when REENTRY_MAX is reached for a file: We call g_object_ref on each file before item_reenqueue is called, but no unref is called if when it is decided not to reenqueue them. Because of this they get stuck in the FileSystem's NodeTree. Here: https://git.gnome.org/browse/tracker/tree/src/libtracker-miner/tracker-miner-fs.c#n2604 --During the first crawl, if a folder is deleted, the created/item queues are not cleaned of any queued items that correspond to the contents of this directory (because been_crawled is still FALSE). Due to this, the reenqueue mechanism will force the retry of each file's parent, reinserting it to the FileSystem's node tree, even if it was already removed. This out of synch node tree can potentially prevent all contents of being correctly inserted if the folder that was deleted is reinserted.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/tracker/-/issues/ Thank you for your understanding and your help.