After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 726264 - better handling of symlinks / git-annex support
better handling of symlinks / git-annex support
Status: RESOLVED OBSOLETE
Product: tracker
Classification: Core
Component: Miners
0.16.x
Other Linux
: Normal normal
: ---
Assigned To: tracker-general
tracker-general
: 722451 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2014-03-13 17:49 UTC by Christophe Rhodes
Modified: 2021-05-26 22:26 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tracker-miner-fs: Follow symlinks when querying file info (1.18 KB, patch)
2017-11-03 15:31 UTC, Carlos Garnacho
none Details | Review
tracker-miner-fs: Refine ignore-directories-with-content to allow git dirs (1.49 KB, patch)
2017-11-03 15:32 UTC, Carlos Garnacho
none Details | Review

Description Christophe Rhodes 2014-03-13 17:49:05 UTC
I manage my Music collection in ~/Music using git-annex, which keeps the file content under ~/Music/.git/annex/ and maintains a symlink farm within regular directories under ~/Music/

tracker correctly indexes all the symlinks, but refuses to follow them; I think it also refuses to recurse down the (hidden) ~/Music/.git/ directory.  This means that gnome-music, for example, thinks that I have no music available.

I think it would be nice if tracker (perhaps only when recursing down directories) were to follow symlinks to files, and add the link targets to the set of things to be indexed.  (I think there would also be an argument to handle links to directories similarly, but for me that is lower priority).

The workaround of telling tracker to index the ~/Music/.git/ path itself has its own problems: git-annex keeps older versions of content around, so that it can roll the checkout status back to earlier points in its history.  That means that operations like retagging don't actually delete the music content, but adds a new file in the annex and repoints the symlink, so indexing ~/Music/.git directly makes multiple music files with the same content but marginally different metadata visible.

Thanks,

Christophe
Comment 1 Aleksander Morgado 2014-03-14 14:52:18 UTC
Another workaround is to use 'git annex direct' for that repo checkout, right?
Comment 2 Martyn Russell 2014-03-14 19:32:26 UTC
(In reply to comment #0)
> I manage my Music collection in ~/Music using git-annex, which keeps the file
> content under ~/Music/.git/annex/ and maintains a symlink farm within regular
> directories under ~/Music/
> 
> tracker correctly indexes all the symlinks, but refuses to follow them; I think
> it also refuses to recurse down the (hidden) ~/Music/.git/ directory.  This
> means that gnome-music, for example, thinks that I have no music available.
> 
> I think it would be nice if tracker (perhaps only when recursing down
> directories) were to follow symlinks to files, and add the link targets to the
> set of things to be indexed.  (I think there would also be an argument to
> handle links to directories similarly, but for me that is lower priority).
> 
> The workaround of telling tracker to index the ~/Music/.git/ path itself has
> its own problems: git-annex keeps older versions of content around, so that it
> can roll the checkout status back to earlier points in its history.  That means
> that operations like retagging don't actually delete the music content, but
> adds a new file in the annex and repoints the symlink, so indexing ~/Music/.git
> directly makes multiple music files with the same content but marginally
> different metadata visible.

So the problem here is that we use G_FILE_QUERY_INFO_NOFOLLOW_SYMLINKS quite a lot with the GIO APIs. I don't remember why, possibly to avoid recursive issues. It's been in there a while. I suspect this branch would fix your problems:

  https://git.gnome.org/browse/tracker/log/?h=follow-symlinks

I knocked it up quickly. Sadly, to make it into master, it would have to be a preference or optional somehow because this is the sort of change that can have quite nasty effects for people.

(In reply to comment #1)
> Another workaround is to use 'git annex direct' for that repo checkout, right?

Sadly, using direct mode is not suggested and pretty much goes against the point of git-annex.
Comment 3 Aleksander Morgado 2014-03-14 20:50:10 UTC
> (In reply to comment #1)
> > Another workaround is to use 'git annex direct' for that repo checkout, right?
> 
> Sadly, using direct mode is not suggested and pretty much goes against the
> point of git-annex.

I don't think that's true. Isn't it just a different workflow, in which the files may be updated, and therefore you control when you want the updates to get commited?
Comment 4 Christophe Rhodes 2014-03-15 16:06:14 UTC
It's not just a "different" work flow: there is more potential for data loss with direct-mode.  If it comes to it, I'll do it (I have enough redundant copies elsewhere) but the place where I have this issue is also the place where I am likely to be doing substantial retagging, so it would make me nervous.
Comment 5 Martyn Russell 2014-03-17 09:41:17 UTC
(In reply to comment #3)
> > (In reply to comment #1)
> > > Another workaround is to use 'git annex direct' for that repo checkout, right?
> > 
> > Sadly, using direct mode is not suggested and pretty much goes against the
> > point of git-annex.
> 
> I don't think that's true. Isn't it just a different workflow, in which the
> files may be updated, and therefore you control when you want the updates to
> get commited?

From their page: 

http://git-annex.branchable.com/direct_mode/

"""
This is for experts only. You can lose data doing this, or check enormous files directly into your git repository, and it's your fault if you do! Also, there should be no good reason to need to do this, ever.
"""

I did a lot of reading and playing when I was looking into git-annex and I always wanted to use direct mode, because I don't like the idea of symlinks so much (messes up the shell colours :) - anyway...

(In reply to comment #4)
> It's not just a "different" work flow: there is more potential for data loss
> with direct-mode.  If it comes to it, I'll do it (I have enough redundant
> copies elsewhere) but the place where I have this issue is also the place where
> I am likely to be doing substantial retagging, so it would make me nervous.

I've put it up for review on the mailing list here:

  https://mail.gnome.org/archives/tracker-list/2014-March/msg00028.html

to gauge interest in a full blown solution here.
Comment 6 Hashem Nasarat 2014-04-25 04:14:00 UTC
Perhaps this is a different issue, but my ~/Music is a symlink to /media/large-hdd/Music, and unless I explicitly add /media/large-hdd/Music to tracker, it doesn't index ~/Music.
Comment 7 Martyn Russell 2014-04-25 08:25:58 UTC
(In reply to comment #6)
> Perhaps this is a different issue, but my ~/Music is a symlink to
> /media/large-hdd/Music, and unless I explicitly add /media/large-hdd/Music to
> tracker, it doesn't index ~/Music.

That's right, we don't follow symlinks. The branch above should resolve that.
Comment 8 o589926 2015-02-10 12:30:28 UTC
Any news on this?

In my case I have my music in ~/annex/music and XDG_MUSIC_DIR is set accordingly. I would really like to listen to music with Gnome Music or Audacious, but both of them rely on tracker to find the music. The result is that none of my music is found.
Comment 9 Martyn Russell 2015-02-13 23:56:38 UTC
(In reply to markus from comment #8)
> Any news on this?
> 
> In my case I have my music in ~/annex/music and XDG_MUSIC_DIR is set
> accordingly. I would really like to listen to music with Gnome Music or
> Audacious, but both of them rely on tracker to find the music. The result is
> that none of my music is found.

I created a branch to handle this, but it didn't get merged because there were some deeper concerns by the community about the code base's readiness for it.

I did add to the original branch too - so it is more complete than it was when I sent out the review email

Carlos had the initial concern. CCing here to comment.
Comment 10 Uwe Geuder 2015-02-23 21:49:23 UTC
As additional information: (just in case somebody is as confused as I just was by tracker's behaviour)

I can confirm that tracker (I'm on version 0.17) in most cases does not follow symbolic links. However, if I create a symbolic link to a directory while tracker is already running it will follow the link and index the files. When I reboot the system the files get removed from the index and will never appear again.

From that behaviour I would guess that when inotify (or whatever mechanism is used inside tracker) reports newely created element there is no check whether the new element is a symbolic link. In contrast to that when the system is (re-)started and a static directory tree is being scanned that processing will stop at each symbolic link. (Guessing means I have not read the code, you guessed it;)

If the intention of the original code was not follow symbolic links, it is buggy.

Myself I would have liked that it had always followed symbolic links. I have not decided yet whether I try Martyn's patch or whether I try to find a solution without symbolic links within my files to be indexed. Will sleep over it first, too late anyway...
Comment 11 Uwe Geuder 2015-02-24 19:02:28 UTC
After sleeping about it I solved my problem by replacing the symbolic link by a bind mount. Seems to work.

Again, this is just for information who are hitting this "feature". Bind mounting is not a universally useful solution to the problem, but maybe it suits someone else, too.
Comment 12 van.de.bugger 2015-03-04 21:45:51 UTC
There are more users who have troubles with gnome-music and symlinks, see #722451.
Comment 13 Vadim Rutkovsky 2015-03-05 08:51:45 UTC
*** Bug 722451 has been marked as a duplicate of this bug. ***
Comment 14 Carlos Garnacho 2015-03-05 12:32:36 UTC
(In reply to Martyn Russell from comment #9)
> (In reply to markus from comment #8)
> > Any news on this?
> > 
> > In my case I have my music in ~/annex/music and XDG_MUSIC_DIR is set
> > accordingly. I would really like to listen to music with Gnome Music or
> > Audacious, but both of them rely on tracker to find the music. The result is
> > that none of my music is found.
> 
> I created a branch to handle this, but it didn't get merged because there
> were some deeper concerns by the community about the code base's readiness
> for it.
> 
> I did add to the original branch too - so it is more complete than it was
> when I sent out the review email
> 
> Carlos had the initial concern. CCing here to comment.

My concern with the initial work is that IMO it isn't enough to just follow symlinks, see this example treedir:

/mnt
  a.mp3
  link -> $HOME/folder/

$HOME/folder/ (indexed recursively)
  link1 -> /mnt
  subfolder/
    link1 -> .
    link2 -> $HOME/

There's several pitfalls there:
- There's several closed cycles there (~/folder/link1 with /mnt/link, folder/subfolder/link1 alone, folder/subfolder/link2)
- folder/subfolder/link2 would make $HOME indexed recursively, regardless of settings

Such behavior would make tracker keep recursing until some limit is hit, I'm uncertain whether it would be first path lenghts, the database size limits we impose (and I'm not sure tracker-miner-fs would stop at that), or user patience.

It'd be also quite wasteful if we indexed (and extracted info from) ~/folder/link1/a.mp3, ~/folder/link1/link/link1/a.mp3... We'd keep duplicated information in the database, with the corresponding maintenance burden (just imagine the database updates after renaming ~/folder/link1 to ~/folder/link2)

What IMO should happen here is:

- We should store only *real* paths/uris, no symlinks in between, that'd be file:///mnt/a.mp3 for the previous example
- We should implement proper cycle detection so we don't use any more resources than necessary, this should fall in place if we do the above.
- We could provide some tracker:realuri() sparql function that helps getting rid of symlinks for query purposes (so eg. tracker info ~/folder/link1/a.mp3 works as expected)
- We should store the target uri for symlinks, preparing tracker-miner-fs for optionally indexing/removing subtrees as symlinks are made is also a plus, looks tricky though as you need some kind of ref counting.
- This would still require reworking of queries if you expect that querying for contents of ~/folder returns a.mp3, there's no other option than checking contained symlinks and translating paths/uris, maybe a sparql function could be convenient for the latter
Comment 15 kaptoxic 2016-04-07 03:47:13 UTC
I believe this would be a very useful feature.

I am probably missing something, but it seems a cycle detection would suffice (e.g. by tracking the path of visited nodes) to handle the problem of cycles. On the other hand, indexing two files multiple times (via symlinks) is probably fine as that seems to be aligned with the intended behaviour.
Comment 16 Jason Hunter 2017-11-03 12:45:22 UTC
..and so nothing ever happens here, which makes this useless for me. 

Do we have any other indexers which actually works?
Comment 17 Carlos Garnacho 2017-11-03 15:29:45 UTC
Strange place to ask that ;). The project is nowadays mostly developed during free time, we simply don't have the attention span to attend all feature requests...

It seems easy enough to just follow symlinks of regular files, and pretend the symlink is the real thing. AFAICS that should work for the git annex case. I'm attaching 2 patches that should let tracker-miner-fs index those.

However, indexing of recursive hierarchies of symlinks is an entirely different yak to shave. It is not only a problem of cycle detection as comment #15 points out, but also of representation in stored metadata. Tracker wants to index a certain file exactly once, and ideally queries for paths with any symlink in between should resolve to the file as indexed.
Comment 18 Carlos Garnacho 2017-11-03 15:31:57 UTC
Created attachment 362910 [details] [review]
tracker-miner-fs: Follow symlinks when querying file info

This makes tracker-miner-fs kind of work with directories managed by
git annex. This change is only in regards to the stored metadata (eg.
files are seen as "regular" instead of symlinks), but directories
shall not be recursed.
Comment 19 Carlos Garnacho 2017-11-03 15:32:03 UTC
Created attachment 362911 [details] [review]
tracker-miner-fs: Refine ignore-directories-with-content to allow git dirs

The intent of adding '.git' here was ignoring code repositories, which are
still pointless to index. In order to make Tracker friendlier to directories
managed by git annex, change this setting so we look for files commonly
found in the root of source code repos, other than '.git'.
Comment 20 Sam Thursfield 2021-05-26 22:26:31 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new enhancement request ticket at
  https://gitlab.gnome.org/GNOME/tracker/-/issues/

Thank you for your understanding and your help.