GNOME Bugzilla – Bug 329988
Tracks sometimes appear as if just added
Last modified: 2006-02-07 08:08:15 UTC
After closing and restarting rhythmbox, occasionally tracks start appearing as if they had only just been added (I normally sort my library by "Date Added" so I spot them right away). Tags are re-read with whatever plugin happens to be active at that time. I haven't touched that part of the filesystem to my knowledge in between and the mod times (as reported by "ls -l" are unchanged from the original date, so it appears that that files are really untouched. This is somewhat hard to reproduce, but it often happens after I do a suspend/resume cyle on my laptop, but not always.
Another piece of data that may be useful, I've noticed that this mostly happens with tracks that have full URI/filenames that contain awkward characters like "&" (e.g. either the artist name in one of the directories, like below, or in the album directory or title filename). This song recently showed up as if it had just been added (even though it hasn't been touched on disk), and I noticed this in the debug log (after running rb with "-d"): [0x9733108] [rhythmdb_execute_stat_info_cb] rhythmdb.c:1858 (00:22:44): got error on file:///home/blah/Jam Spoon/Tripomatic Fairytales 2001/08_Right in the Night Fall in Love with Music.mp3: Couldn't access file:///home/blah/Jam Spoon/Tripomatic Fairytales 2001/08_Right in the Night Fall in Love with Music.mp3: File not found It appears that the unescaped version version of the URI doesn't contain the appropriate "&", which leads me to suspect that maybe the "&" gets dropped from the full escaped URI?
How long ago were these first added to your library? If they were added via Import before Nov 29, or via library watching before Jan 17, then this is probably due to an issue with URI canonicalisation occasionally not working when upgrading from pre-canonicalisation to an RB that uses it.
It seems that the & is being used verbatim in the URIs in the <location> tag in rhythmdb.xml: <location>file:///home/alex/mp3/complete-rips/Jam%20&%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> But these appear to "escaped" (aka percent-encoded) URIs in which case the & should be replaced with %26, but it is confusing because sometimes & does need to be used in URIs (e.g. ones that use CGI parameters). In this case, however, it looks like %26 should be used rather than &. To get an idea of how confusing it can be check out this post on an XML list: http://mailman.lyra.org/pipermail/dav-dev/2003-June/004761.html
(In reply to comment #2) > How long ago were these first added to your library? If they were added via > Import before Nov 29, or via library watching before Jan 17, then this is > probably due to an issue with URI canonicalisation occasionally not working > when upgrading from pre-canonicalisation to an RB that uses it. They were added on Jan 6 via "Import Directory". Here's the full <entry>: <entry type="song"> <title>Right in the Night (Fall in Love with Music)</title> <genre>Electronica/Dance</genre> <artist>Jam & Spoon</artist> <album>Tripomatic Fairytales 2001</album> <track-number>8</track-number> <duration>364</duration> <file-size>8183936</file-size> <location>file:///home/alex/mp3/complete-rips/Jam%20&%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> <mountpoint>file:///</mountpoint> <mtime>1136562955</mtime> <first-seen>1139213957</first-seen> <last-seen>1139214166</last-seen> <bitrate>179</bitrate> <date>727564</date> <mimetype>application/x-id3</mimetype> </entry>
Could be related to bug #326653? Those tracks also have "&" in their path.
Oh, it gets worse. Rhythmbox gets passes things from external programs, which can be any of the following, and we have no idea which: a) a path b) a path escaped URI (i.e. only spaces and things escaped) c) a fully escaped URI (pretty much everything that isn't a number or letter) That location shouldn't be a problem. RB uses the "path escaped" form, which escapes spaces and not ampersands. Ampersands MUST be escaped when stored in XML, so libxml does that (and undoes it when giving it to RB).
(In reply to comment #6) > That location shouldn't be a problem. RB uses the "path escaped" form, which > escapes spaces and not ampersands. Ampersands MUST be escaped when stored in > XML, so libxml does that (and undoes it when giving it to RB). I understand, but what I'm saying maybe "&" it should be escaped to %26 rather than & I don't know if it should, but I did a more systematic search of my database and I've found that the one thing all of the files that have mysteriously shown up as if they have just been added, all have in common is a "&" somewhere in their path.
Can you find one of these tracks, and check rhythmdb.xml to see if it there multiple times? Don't do a search on the location containing the ampersand, use the title or something. What I think the problem is, is that it is in the db multiple times with slightly different escaping. Because Rhythmbox uses the location as a unique identifier, it will treat "file:///path/to/the&file.mp3" and "file:///path/to/the%26file.mp3" as different, even though they are really the same. We should really use the db versioning in rhythmdb.xml to perform forced canonicalisation once. I'll whip up a patch to do that.
Created attachment 58803 [details] [review] perform rhythmdb upgrades This updates the currentl rhythmdb version to 1.1, and performs URI canonicalisation when loading a 1.0 version db. Alex: can you make a backup of your db, and then run with this patch applied to see if it fixes the problem.
(In reply to comment #8) > Can you find one of these tracks, and check rhythmdb.xml to see if it there > multiple times? Don't do a search on the location containing the ampersand, use > the title or something. > > What I think the problem is, is that it is in the db multiple times with > slightly different escaping. Because Rhythmbox uses the location as a unique > identifier, it will treat "file:///path/to/the&file.mp3" and > "file:///path/to/the%26file.mp3" as different, even though they are really the > same. Yep, bingo, 3 different versions... grep 08_Right .gnome2/rhythmbox/rhythmdb.xml <location>file:///home/alex/mp3/complete-rips/Jam%20&%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> <location>file:///home/alex/mp3/complete-rips/Jam%20%26%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> <location>file:///home/alex/mp3/complete-rips/Jam%20%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location>
(In reply to comment #9) > Created an attachment (id=58803) [edit] > perform rhythmdb upgrades > > This updates the currentl rhythmdb version to 1.1, and performs URI > canonicalisation when loading a 1.0 version db. > > Alex: can you make a backup of your db, and then run with this patch applied to > see if it fixes the problem. I did so, backed-up old version, started rhythmbox, then closed it. But I still seem dupes in the new rhythmdb.xml Seems to be the new one (1.1): $ head -2 rhythmdb.xml <?xml version="1.0" standalone="yes"?> <rhythmdb version="1.1"> <entry type="song"> but dupes still there: $ grep 08_Right rhythmdb.xml <location>file:///home/alex/mp3/complete-rips/Jam%20&%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> <location>file:///home/alex/mp3/complete-rips/Jam%20&%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> <location>file:///home/alex/mp3/complete-rips/Jam%20%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> Is this supposed to fix/remove the dupes from rhythmdb.xml? or will is simply ensure that they aren't rediscovered twice?
Created attachment 58839 [details] [review] better patch This now discards the duplicates properly. It also merges the user data (rating, play count and last-played time) from the duplicated.
Better, I now have only two duplicates and not three: $ grep 08_Right rhythmdb.xml <location>file:///home/alex/mp3/complete-rips/Jam%20&%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> <location>file:///home/alex/mp3/complete-rips/Jam%20%20Spoon/Tripomatic%20Fairytales%202001/08_Right%20in%20the%20Night%20Fall%20in%20Love%20with%20Music.mp3</location> $ grep 08_Right rhythmdb.xml This was after trying the patch using my now modified 1.1 database. I tried going back to my original 1.0 database, but I get the same problem. Somewhere along the line I may have moved the directory from "Jam Spoon" to "Jam & Spoon", could that be the issue?
So the second <entry> (with the old directory name "Jam Spoon") is actually listed as: <hidden>1</hidden> which means that it can't find it on disk, and it waits for a certain length of time before deleting it from the db, right? If so, patch seems to be working.
Yep, that's what is supposed to happen. If this looks good, I'll commit it to cvs and hopefully it will fix any upgrade canonicalisation issues people are seeing.
Looks good. Commit away!
One query: is the db format backwards compatible? i.e. if I have a 1.1 db file, but run it on rb that reads or emits 1.0 db files, will it still read it OK? I would expect it would since there doesn't seem to be any new syntax, but it would just write it back as a 1.0 file.
(In reply to comment #17) > One query: is the db format backwards compatible? i.e. if I have a 1.1 db > file, but run it on rb that reads or emits 1.0 db files, will it still read it > OK? > > I would expect it would since there doesn't seem to be any new syntax, but it > would just write it back as a 1.0 file. Yes, because rhythmdb previously wasn't actually checking what version it was loading. And there are no incompatible changes. Patch committed to cvs, which *should* fix the remaining canonicalisation issues.