GNOME Bugzilla – Bug 123345
Handle file/directory moves done within and outside rhythmbox
Last modified: 2018-05-24 10:22:46 UTC
Package: rhythmbox Severity: enhancement Version: 0.5.3 Synopsis: Request for inbuilt file structure manipulation Bugzilla-Product: rhythmbox Bugzilla-Component: General Description: Description of Problem: Currently, "restructuring" a music collection's file structure or renaming files (from ID3 tags, for example) makes Rhythmbox lose that file. After reimporting, any metadata such as last-played time or rating are lost. A preferable method would be to have a built-in facility for moving and renaming song files, updating the library XML as reqd. Steps to reproduce the problem: 1. Move or rename a song file 2. Delete current entry and reimport song Actual Results: Song metadata has been lost and manual reimporting was reqd Expected Results: Moving and renaming (and maybe generating filenames and folder structure from e.g. ID3 tags) could/should be capable from within Rhythmbox, maintaining the song-metadata link. Thanks! ------- Bug moved to this database by unknown@bugzilla.gnome.org 2003-09-26 20:10 ------- The original reporter (buckley@hep.phy.cam.ac.uk) of this bug does not have an account here. Reassigning to the exporter, unknown@bugzilla.gnome.org. Reassigning to the default owner of the component, rhythmbox-maint@bugzilla.gnome.org.
I did some quick tests to see how long a MD5 sum takes to run on a regular ogg file: [nesscg@woman tmp]$ time md5sum The\ Tragically\ Hip\ -\ Fully\ Completely\ -\ Fifty-Mission\ Cap.ogg; mv The\ Tragically\ Hip\ -\ Fully\ Completely\ -\ Fifty-Mission\ Cap.ogg The\ Tragically\ Hip\ -\ Fully\ Completely\ -\ Fifty-Mission\ Cap-1.ogg; time md5sum The\ Tragically\ Hip\ -\ Fully\ Completely\ -\ Fifty-Mission\ Cap-1.ogg c94d67262f5466d33a0261fc788a5e69 The Tragically Hip - Fully Completely - Fifty-Mission Cap.ogg real 0m0.112s user 0m0.046s sys 0m0.006s c94d67262f5466d33a0261fc788a5e69 The Tragically Hip - Fully Completely - Fifty-Mission Cap-1.ogg real 0m0.056s user 0m0.039s sys 0m0.014s Taking the first reading since the file wasn't loaded into memory yet (cache miss cost is not in the second reading as the file has already been loaded into RAM for the first call to `md5sum`) and applying that to a regular sized music library of 1 Gig of files. [nesscg@woman tmp]$ du -h The\ Tragically\ Hip\ -\ Fully\ Completely\ -\ Fifty-Mission\ Cap-1.ogg 3.4M The Tragically Hip - Fully Completely - Fifty-Mission Cap-1.ogg >>> (1000/3.4)*0.112 // could have used 1024, doesn't really matter 32.941176470588239 That's 32 seconds to add MD5 Sums into the XML file, never mind all the other data that needs to go into it. This doesn't have to be done at startup of RB but can be done in the background by a thread that checks to see if files have an empty MD5SUM entry. After the inital scan it only needs to be done when new files are found. Build a hash for the new files and then compare against the old hash's. If it's not found it's a new file. Otherwise modify the entry for the hash to reflect the new state information for the file. I think this is overkill and will cause imports of large sets of new files to take forever. Thoughts?
I think some way to identify music files will be necessary sooner or later, but it's pretty difficult to get right. A perfect solution would solve this problem, would help to identify duplicate oggs/mp3s as in bug 133109 (for which md5 on the audio content won't work) and could also be used to handle ipod/rhythmbox synchronization. Maybe using a well chosen set of tags would work for that.
Just re-thought my previous comment and MD5 wouldn't work if you changed the tags because they are embedded in the file and therefore the hash would change too. You could run `md5sum` on a subset of the file - that is the music encoded bits only. The Vorbis specs show that once you get past the 3 header blocks (IDENTIFICATION, COMMENT, SETUP http://www.xiph.org/ogg/vorbis/doc/vorbis-spec-intro.html - High Level Decode Process is the section you want) you are left with only the audio data. That could be made into a hash and therefore a signature of the track. Not sure how MP3's or the other audio types handle their audio streams and metadata. This would solve the problem of roving tags and file system location names ONLY. Conserning Christophe's comment: ================================ I don't think I would want something to "inspect" the decoded audiostream (we're talking serious CPU usage here) and come back with a fingerprint. You could in theory have the same fingerprint for a live version of a song as well as the studio version. Highly unlikely but possible. It would be nice to cross between media (mp3,ogg,flac) to find duplicates, but would likely not very efficent or reliable by trying to finger print the song itself. I just can't see that happening any time soon since both algorithms (ogg vs. mp3) throw away different parts of the stream for encoding. I think there is software that will fingerprint uncompressed music in the wild though. Reading bug 133109 it seems to make sense to stick with the tag fields and file location (for the playlist example) for finding dups.
*** Bug 327039 has been marked as a duplicate of this bug. ***
*** Bug 347642 has been marked as a duplicate of this bug. ***
Bug #347642 wants to be able to track file moves done outside rhythmbox (e.g. in nautilus).
*** Bug 408346 has been marked as a duplicate of this bug. ***
A related bug for the same problem with podcasts: bug #339677.
Is there any update for this? I was going to submit a bug related to this before I stubled upon this filing. What if the Rhythmbox database just stored rhythmbox-specific information and called upon it when a file matching the ID3 tags is imported? If a file is deleted via Rhythmbox, then the associated entries in the rhythmbox database are deleted. If the file becomes missing, then the database entries are held on to. The problems I foresee resulting from this implementation are: Changing a file's tag (e.g. correcting faulty information) Deleting files via external programs could result in a very large entry in Rhythmbox's database. Possible solutions (although I believe that they may be over-the-top and along the lines of "But the thing is we shouldn't *have* to do that!") are maintenance scripts/applets/programs to empty portions of the database for files deleted and conversion scripts/applets/programs to update the location pointer for files that are moved. In my opinion, an implementation for this problem should be included in Rhythmbox before it reaches 1.0 because people with days upon days of music shouldn't have to re-rate all of their tracks.
No one is working on this at the moment. I'm not sure if I'm reading you correctly, but it sounds like you're proposing to somehow use the file's metadata as the primary key in the database rather than the URI. I don't think this is a good solution, as it swaps the existing problem (can't detect moved files) for a new set of problems and questions (editing tags loses user data, can't have multiple files with the same metadata, and it's not at all clear how radio streams, podcasts, etc. could be handled in this scheme). This doesn't seem like a good tradeoff to me.
This bug is a limitation which hits me reasonably regularly, both with regard to duplicates and with respect to metadata-lifetime (ratings particularly). This limitation essentially makes "ratings" useless past the short-term. The solution mooted, of md5'ing the audio-data subset of a file, would fix my problems. As a cheap, but less-general, solution it'd be nice to take advantage of the inode number. E.g. I run a script to hard-link duplicate MP3s on a regular basis.
*** Bug 566111 has been marked as a duplicate of this bug. ***
*** Bug 587483 has been marked as a duplicate of this bug. ***
*** Bug 612686 has been marked as a duplicate of this bug. ***
Instead of using the MD5 (or other hash) sum directly, another option would be to have some kind of UUID tag. This would have the advantage that the identity of the file would not have to depend on the accompanying content directly. It could of course be an option to generate the UUID from the MD5, when lacking other identifiers.
*** Bug 614781 has been marked as a duplicate of this bug. ***
From "JamesIsIn" in bug 614781: I see no reason to give Rb the fuctionalities to navigate and change the folder/file hierarchy in order to solve the matter of missing files. I have a file browser for doing that already (Nautilus). Additionally, the need for md5 fingerprints may have value but is seemingly superfluous to this matter, and though the eventual inclusion of md5 information in Rb may faciliate file re-location (especially automated re-location) it is not necessary to make file re-location dependent upon md5 information (and may in some circumstances be detrimental). All Rb really needs to have is the ability to re-associate path/to/a.flac with path/is/now/b.flac in its xml files. I think the most appropriate place to include that additional functionality is within the Missing Files area (through a button, a right-click option, and/or a menu option).
What about going with some sort of https://secure.wikimedia.org/wikipedia/en/wiki/Inotify implementation? The application would be notified whenever there are changes to the watched area(s).
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/13.