Bug 123345 – Handle file/directory moves done within and outside rhythmbox

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 123345 - Handle file/directory moves done within and outside rhythmbox


Summary:	Handle file/directory moves done within and outside rhythmbox


Status:	RESOLVED OBSOLETE

Product:	rhythmbox
Classification:	Other
Component:	general
Version:	unspecified
Hardware:	Other other

Importance:	Normal normal
Target Milestone:	---
Assigned To:	RhythmBox Maintainers
QA Contact:	RhythmBox Maintainers

URL:
Whiteboard:

Duplicates:	327039 347642 408346 566111 587483 612686 614781 (view as bug list)
Depends on:
Blocks:

Reported:	2003-09-26 23:05 UTC by Andy Buckley
Modified:	2018-05-24 10:22 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Andy Buckley 2003-09-27 00:10:47 UTC

Package: rhythmbox
Severity: enhancement
Version: 0.5.3
Synopsis: Request for inbuilt file structure manipulation
Bugzilla-Product: rhythmbox
Bugzilla-Component: General

Description:
Description of Problem:
Currently, "restructuring" a music collection's file structure or
renaming files (from ID3 tags, for example) makes Rhythmbox lose that
file. After reimporting, any metadata such as last-played time or rating
are lost. A preferable method would be to have a built-in facility for
moving and renaming song files, updating the library XML as reqd.

Steps to reproduce the problem:
1. Move or rename a song file
2. Delete current entry and reimport song

Actual Results:
Song metadata has been lost and manual reimporting was reqd

Expected Results:
Moving and renaming (and maybe generating filenames and folder structure
from e.g. ID3 tags) could/should be capable from within Rhythmbox,
maintaining the song-metadata link.

Thanks!




------- Bug moved to this database by unknown@bugzilla.gnome.org 2003-09-26 20:10 -------

The original reporter (buckley@hep.phy.cam.ac.uk) of this bug does not have an account here.
Reassigning to the exporter, unknown@bugzilla.gnome.org.
Reassigning to the default owner of the component, rhythmbox-maint@bugzilla.gnome.org.

Comment 1 Chris Ness 2004-04-08 20:24:21 UTC

I did some quick tests to see how long a MD5 sum takes to run on a regular ogg file:

[nesscg@woman tmp]$ time md5sum The\ Tragically\ Hip\ -\ Fully\ Completely\ -\
Fifty-Mission\ Cap.ogg; mv The\ Tragically\ Hip\ -\ Fully\ Completely\ -\
Fifty-Mission\ Cap.ogg The\ Tragically\ Hip\ -\ Fully\ Completely\ -\
Fifty-Mission\ Cap-1.ogg; time md5sum The\ Tragically\ Hip\ -\ Fully\
Completely\ -\ Fifty-Mission\ Cap-1.ogg
c94d67262f5466d33a0261fc788a5e69  The Tragically Hip - Fully Completely -
Fifty-Mission Cap.ogg
 
real    0m0.112s
user    0m0.046s
sys     0m0.006s
c94d67262f5466d33a0261fc788a5e69  The Tragically Hip - Fully Completely -
Fifty-Mission Cap-1.ogg
 
real    0m0.056s
user    0m0.039s
sys     0m0.014s

Taking the first reading since the file wasn't loaded into memory yet (cache
miss cost is not in the second reading as the file has already been loaded into
RAM for the first call to `md5sum`) and applying that to a regular sized music
library of 1 Gig of files.

[nesscg@woman tmp]$ du -h The\ Tragically\ Hip\ -\ Fully\ Completely\ -\
Fifty-Mission\ Cap-1.ogg
3.4M    The Tragically Hip - Fully Completely - Fifty-Mission Cap-1.ogg

>>> (1000/3.4)*0.112          // could have used 1024, doesn't really matter
32.941176470588239

That's 32 seconds to add MD5 Sums into the XML file, never mind all the other
data that needs to go into it.

This doesn't have to be done at startup of RB but can be done in the background
by a thread that checks to see if files have an empty MD5SUM entry.  

After the inital scan it only needs to be done when new files are found.  Build
a hash for the new files and then compare against the old hash's.  If it's not
found it's a new file.  Otherwise modify the entry for the hash to reflect the
new state information for the file.

I think this is overkill and will cause imports of large sets of new files to
take forever.

Thoughts?

Comment 2 Christophe Fergeau 2004-04-08 20:50:43 UTC

I think some way to identify music files will be necessary sooner or later, but
it's pretty difficult to get right. A perfect solution would solve this problem,
would help to identify duplicate oggs/mp3s as in bug 133109 (for which md5 on
the audio content won't work) and could also be used to handle ipod/rhythmbox
synchronization. Maybe using a well chosen set of tags would work for that.

Comment 3 Chris Ness 2004-04-09 03:09:33 UTC

Just re-thought my previous comment and MD5 wouldn't work if you changed the
tags because they are embedded in the file and therefore the hash would change too.

You could run `md5sum` on a subset of the file - that is the music encoded bits
only.  The Vorbis specs show that once you get past the 3 header blocks
(IDENTIFICATION, COMMENT, SETUP
http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-intro.html - High Level Decode
Process is the section you want) you are left with only the audio data.  That
could be made into a hash and therefore a signature of the track.

Not sure how MP3's or the other audio types handle their audio streams and metadata.

This would solve the problem of roving tags and file system location names ONLY.

Conserning Christophe's comment:
================================
I don't think I would want something to "inspect" the decoded audiostream (we're
talking serious CPU usage here) and come back with a fingerprint.  You could in
theory have the same fingerprint for a live version of a song as well as the
studio version.  Highly unlikely but possible.  

It would be nice to cross between media (mp3,ogg,flac) to find duplicates, but
would likely not very efficent or reliable by trying to finger print the song
itself.  I just can't see that happening any time soon since both algorithms
(ogg vs. mp3) throw away different parts of the stream for encoding.  

I think there is software that will fingerprint uncompressed music in the wild
though.

Reading bug 133109 it seems to make sense to stick with the tag fields and file
location (for the playlist example) for finding dups.

Comment 4 James "Doc" Livingston 2006-01-15 08:53:24 UTC

*** Bug 327039 has been marked as a duplicate of this bug. ***

Comment 5 Alex Lancaster 2006-07-18 12:33:53 UTC

*** Bug 347642 has been marked as a duplicate of this bug. ***

Comment 6 Alex Lancaster 2006-07-18 12:35:23 UTC

Bug #347642 wants to be able to track file moves done outside rhythmbox (e.g. in nautilus).

Comment 7 Alex Lancaster 2007-02-16 02:30:47 UTC

*** Bug 408346 has been marked as a duplicate of this bug. ***

Comment 8 Alex Lancaster 2007-02-16 02:34:45 UTC

A related bug for the same problem with podcasts: bug #339677.

Comment 9 Rob 2007-03-14 03:43:12 UTC

Is there any update for this?  I was going to submit a bug related to this before I stubled upon this filing.  What if the Rhythmbox database just stored rhythmbox-specific information and called upon it when a file matching the ID3 tags is imported?  If a file is deleted via Rhythmbox, then the associated entries in the rhythmbox database are deleted.  If the file becomes missing, then the database entries are held on to.

The problems I foresee resulting from this implementation are:

Changing a file's tag (e.g. correcting faulty information)
Deleting files via external programs could result in a very large entry in Rhythmbox's database.

Possible solutions (although I believe that they may be over-the-top and along the lines of "But the thing is we shouldn't *have* to do that!") are maintenance scripts/applets/programs to empty portions of the database for files deleted and conversion scripts/applets/programs to update the location pointer for files that are moved.

In my opinion, an implementation for this problem should be included in Rhythmbox before it reaches 1.0 because people with days upon days of music shouldn't have to re-rate all of their tracks.

Comment 10 Jonathan Matthew 2007-03-14 09:00:12 UTC

No one is working on this at the moment.

I'm not sure if I'm reading you correctly, but it sounds like you're proposing to somehow use the file's metadata as the primary key in the database rather than the URI.  I don't think this is a good solution, as it swaps the existing problem (can't detect moved files) for a new set of problems and questions (editing tags loses user data, can't have multiple files with the same metadata, and it's not at all clear how radio streams, podcasts, etc. could be handled in this scheme).  This doesn't seem like a good tradeoff to me.

Comment 11 Paul Jakma 2008-01-13 16:59:00 UTC

This bug is a limitation which hits me reasonably regularly, both with regard to duplicates and with respect to metadata-lifetime (ratings particularly). This limitation essentially makes "ratings" useless past the short-term.

The solution mooted, of md5'ing the audio-data subset of a file, would fix my problems.

As a cheap, but less-general, solution it'd be nice to take advantage of the inode number. E.g. I run a script to hard-link duplicate MP3s on a regular basis.

Comment 12 Jonathan Matthew 2008-12-31 10:30:06 UTC

*** Bug 566111 has been marked as a duplicate of this bug. ***

Comment 13 Jonathan Matthew 2009-07-02 21:25:41 UTC

*** Bug 587483 has been marked as a duplicate of this bug. ***

Comment 14 Jonathan Matthew 2010-03-14 00:59:42 UTC

*** Bug 612686 has been marked as a duplicate of this bug. ***

Comment 15 Paul Jakma 2010-03-14 17:45:22 UTC

Instead of using the MD5 (or other hash) sum directly, another option would be to have some kind of UUID tag. This would have the advantage that the identity of the file would not have to depend on the accompanying content directly. It could of course be an option to generate the UUID from the MD5, when lacking other identifiers.

Comment 16 Jonathan Matthew 2010-04-03 23:57:54 UTC

*** Bug 614781 has been marked as a duplicate of this bug. ***

Comment 17 Jonathan Matthew 2010-04-04 03:08:12 UTC

From "JamesIsIn" in bug 614781:

I see no reason to give Rb the fuctionalities to navigate and change the
folder/file hierarchy in order to solve the matter of missing files.  I have a
file browser for doing that already (Nautilus).  Additionally, the need for md5
fingerprints may have value but is seemingly superfluous to this matter, and
though the eventual inclusion of md5 information in Rb may faciliate file
re-location (especially automated re-location) it is not necessary to make file
re-location dependent upon md5 information (and may in some circumstances be
detrimental).

All Rb really needs to have is the ability to re-associate path/to/a.flac with
path/is/now/b.flac in its xml files.

I think the most appropriate place to include that additional functionality is
within the Missing Files area (through a button, a right-click option, and/or a
menu option).

Comment 18 Colan Schwartz 2012-10-24 01:48:21 UTC

What about going with some sort of https://secure.wikimedia.org/wikipedia/en/wiki/Inotify implementation?  The application would be notified whenever there are changes to the watched area(s).

Comment 19 GNOME Infrastructure Team 2018-05-24 10:22:46 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/13.