GNOME Bugzilla – Bug 661100
Rescan Music Library changes import date (Date Added column) on files with special characters
Last modified: 2011-12-11 21:04:05 UTC
When I hit Tools/Rescan Music Library, *some* of the files in my library get their Import date set to the current date. It appears to select random files for this reset. The files in question were already in my library, and weren't moved or copied or modified. It is not the behavior I'd expect since these are old files, and it pollutes my 'Recently added' smart playlist. * Steps to reproduce: - With a big music library, hit Rescan Music Library. - Watch the count of items in the Recently Added playlist. - With a little bit of bad luck, it should increase. Note: if I rescan twice, the second time I get no new new files.
I can confirm this issue. The exact same thing happened to me today, though I did not have the time to file an issue yet. It appears to affect those files that have non-alphanumeric characters in the track title and/or artist name (,&@$=+ etc).
Well observed! Indeed, all the tracks reset have punctuations (&,'=?) in their *track titles*. Spaces are not in this list. Album and artist with punctuations are not reset.
I cannot reproduce this problem with master (and I have tried with non-alphanumeric characters). Bertrand, did you confirm the bug because you reproduced it, or because 2 people reproduced it? If the former, can you tell me how you did it? Or anyone else can extract an exact pattern of steps to do it?
I just reproduced it on 2.2.0: - move the content of ~/Music/ somewhere else, leaving it empty - start with an empty library (--db=blabla) - import two tracks (one with a title containing a comma, one without any punctuation) - they are copied to the ~/Music - wait a couple of secs - hit Rescan Library - the 'Imported date' field in the properties is now different for the two files: for the first file it is when you hit Rescan, for the second it's when you actually imported
Matthias, thanks for the information. I did all the steps you mention except (1) and (4). Can you tell me if those steps are required to reproduce the bug please?
I just ran into this problem (losing a couple of thousand songs from my playlists :-/ ) This issue may be related to #652953. I had a look at all the newest tracks to see whether I can find some commonalities. All the new tracks have symbols in their filenames that are percent-encoded. Comparing the tracks that got readded as new with the counterparts that remained as their old tracks, I only found examples for '&', ',' and "="; "?" and the apostrophe seem unaffected. Going by #652953, I assume that the symbol got replaced by the percent-encoding. This applies only to the file name, not the directory name. For example, the following URI still has a & in the directory of the URI, but not in the file name: file:///home/wolki/Musik/Einzel/The%20Mamas%20&%20the%20Papas/The%20Mamas%20%26%20the%20Papas%20-%20California%20Dreamin'.mp3 This seems to hold for all of the thousands of new entries: all of them have percent-encodings for each instance of '?,=' in the file name, but not in the directory. Also, almost all of them have one of these symbols (and the ones that don't are genuinely new files). I tried to reproduce it on a fresh database, but failed: apparently, the percent-encodings were used from the beginning. In #652953, apparently adding as files instead of folders is the cause, but I'm pretty sure I added most of the files by refreshing the library in the past. As this only affects some of the files within an album, I can compare the date they were added, for example I have an album that's in the database since February 2009, but the one track that contains a comma in its name is now listed as having been added as yesterday. Also, while refreshing the library again, some tracks were added as new another time,, including some that I added for the first time in the refresh that recreated thousands of files. (e.g. file:///home/wolki/Musik/Alben/Various%20Artists/Blues%20Heroes/04%20-%20John%20Lee%20Hooker%20-%20One%20Bourbon%2C%20One%20Scotch%2C%20One%20Beer.flac , while the tracks that did not contain a comma stayed at their old versions). It also affected a small number of older files, unfortunately I can't say whether they survived the first rescan and were hit by the second one, or whether they were hit by both.
*** Bug 662650 has been marked as a duplicate of this bug. ***
*** Bug 663306 has been marked as a duplicate of this bug. ***
I might have good news: there is a high chance that this is indeed a duplicate of bug 631970. To confirm this, can anyone affected by this bug try the last patch that is proposed in that bug to check if it fixes the problem? Thanks in advance for testing!
I tried to build an ubuntu 2.2.0 package with the patch but I get a compile error ./Banshee.IO.Gio/Directory.cs(121,44): error CS0103: The name `Path' does not exist in the current context I'm not fluent in C# and I have no idea why Path can't be found (apparently it's part of the standard library isn't it?)
Created attachment 200682 [details] [review] patch with "System.IO" namespace Thanks for testing! (In reply to comment #10) > I tried to build an ubuntu 2.2.0 package with the patch but I get a compile > error That's weird! Please test this patch instead.
I'd say that it is a duplicate. Its definitely related to inconsistencies in the way Banshee is handling Uri encoded paths. My database contains Uris which do not have any percent encoded commas or ampersands. It seems that when it is doing the re-scan it is encoding the Uris with percent encoded symbols in the file name only and not the directory name which is why the SQL statements I posted in Bug 663306 dont work. So yeah these lines from DatabaseImportManager.cs should return NULL but dont. if (DatabaseTrackInfo.ContainsUri (uri, PrimarySourceIds)) { return null; }
Review of attachment 200682 [details] [review]: I was having the same problem here as well as those in 631970. I could simply edit a track's info with save metadata on, then a rescan would pick it up as a "new" track, causing it to show up with a new import date. The cause (and fix) suggested in 631970 is part of the above patch and I can confirm is solves this issue as well. SafeUri directly uses the glib function g_filename_to_uri (and a couple other related glib functions) which do NOT encode commas and ampersands it seems. I confirmed this with with a bit of C code just to make sure, and that is how glib itself is working. If this is "correct" or not, I can't say. It is however different than Uri.EscapeDataString (originally used as part of the directory walk.) What's important is that this patch should at least make it consistent, and prevent it from finding existing tracks as "new" and therefore losing any metadata.
Xepher, thanks for your testing! I wanted a 2nd person to test the patch and I was hanging out yesterday on IRC with Age Bosma (Forage), and suddenly he couldn't reproduce the bug anymore, so he wasn't able to test the patch. But now I just got up in the morning and had a revelation! And I now know what's the root cause of the issue (and I can reproduce the bug), so I'll be committing shortly your patch and maybe some other patch to update the database of people who may still have records imported with a buggy Banshee (with buggy I mean, a Banshee version that doesn't have your patch applied).
So what is the correct form of URI that we want stored in the database? I believe it should just be the g_filename_to_uri converted version and this should be used everywhere. I'm not sure why the code currently percent-encodes the filename separately from the path. g_filename_to_uri calls g_escape_uri_string with the the UNSAFE_PATH parameter which means that '/', '&', '=', ':', '@', '+', '$' and ',' will be allowed and all other non alpha-numeric characters will be percent encoded.
Hi, I've been using banshee for a couple of days with the patch, and the bug didn't happen yet. Since it's rather nondeterministic when it happens, I'm waiting for a few more days, but I'm optimistic. Well done guys!
This morning I compiled Banshee with the patch on my main computer (the one with the biggest library), launched it and hit Rescan. The bug was still there apparently: Banshee modified the 'added date' of about 800 files with punctuations in their names (among the ~20000 in my library) to the current date. Does it mean that the patch doesn't solve the bug? I don't know, maybe there is something to update first in the database... What do you think? Let me know if you need me to test something else.
Created attachment 201094 [details] [review] db migration (In reply to comment #17) > This morning I compiled Banshee with the patch on my main computer (the one > with the biggest library), launched it and hit Rescan. The bug was still there > apparently: Banshee modified the 'added date' of about 800 files with > punctuations in their names (among the ~20000 in my library) to the current > date. > Does it mean that the patch doesn't solve the bug? I don't know, maybe there is > something to update first in the database... What do you think? Let me know if > you need me to test something else. Hey Matthias, yes I'm aware of that and I made this additional patch to solve it.
Sorry if I wasn't clear on what I'd tested myself either... After my library got corrupted by the original issue (and showed over 2000 new tracks) I rolled back to a backup I had from a few months prior and then did almost all of my tests and troubleshooting from that known-good point. It seems if the database had already been polluted with "bad" data, then fixing it back to the prior "correct" stuff caused almost the same problem for the first rescan... I think it should/would have remained correct after that though. However, Andres' patch above is a much more in-depth fix to correct errors already stored in the database, and is thus a much better solution.
Review of attachment 201094 [details] [review]: Thanks for the patch ! I committed it with some changes: - Prefix the name of the DB function with BANSHEE_ - Use static methods instead of allocating 2 SafeUri instances that are immediately discarded. This should be safe as we're only looking at URIs that start with file://
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.
*** Bug 652953 has been marked as a duplicate of this bug. ***