GNOME Bugzilla – Bug 520516
Cover/Album art filenames not robust, fail for non-ASCII, parenthesis, etc
Last modified: 2010-02-18 01:56:56 UTC
Banshee trunk does not longer respect characters which are written inside ( ) to calculate the filename of the coverart file in .local/share/album-art. Take for example artist="Foo" album="Bar (CD1)". Previously banshee looked for a file named foo-barcd1.*, now it looks for foo-bar.* This is especially bad for multi-disc albums with individual covers.
Created attachment 106618 [details] Example Here you see that the previously working cover art does not work anymore. If I rename nineinchnails-thefragilecd1 to nineinchnails-thefragile it works for both discs
Doesn't Banshee handle this gracefully? Of course, newly imported albums would be fine (they'll match the new pattern), but as soon as you play the music, doesn't it recheck (and reget) the cover art?
Well, the "new pattern" does not always work ("This is especially bad for multi-disc albums with individual covers.") as only one file can be used. Another case comes to mind: you have an album and a single with the same name, you appended "(Single)" to the album name, now it won't be possible to use album art for the single. Anyway, I have not noticed that banshee tries to fetch cover art... is this even enabled in trunk, yet? all by multi-disc albums don't show cover art.
Oh, I see... your example above showed the same image for both albums, but I suppose that's after you renamed the image. I see what you mean.
It was pointed out to me on IRC that album names should not contain "( )". Well, not all artists seem to agree with this rule, from my collection: Oasis - (What's The Story) Morning Glory? AC/DC - Highway To Hell (Live) Marilyn Manson - Lest We Forget (The Best Of) Exploited - Let's Start A War (Said Maggie One Day) Bad Religion - Punk Rock Songs (The Epic Years) Die Ärzte - Rock'n'Roll Realschule (MTV Unplugged) In Flames - Trigger (EP) Slipknot - Vol 3: (The Subliminal Verses) Die Ärzte - Zu Spät (Live) All those "( )" are part of the actual album name, as seen on the covers. Will Farrington said this would "screw over" people using a special tagging scheme of classical music "Composer - AlbumName (Performer)" but I don't see the problem at all. Saving the art as composer-albumnameperformer won't hurt at all. But, not having the possibility to differentiate between "AC/DC - Highway to Hell" (the album) and "AC/DC - Highway to Hell (Live)" which is a single release of a live album is bad. Changing the subject again because multi-artist is the smaller problem here.
So...when fixing this, we should be careful, because the current behavior is actually quite nice in some instances - when playing from last.fm or similar, and they add (Live from X) or something, since we ignore that part it can sometimes still get cover art when it otherwise wouldn't match. So maybe we need to try w/ the ()'s text, and then without if that returns no matches.
(In reply to comment #5) > It was pointed out to me on IRC that album names should not contain "( )". > Well, not all artists seem to agree with this rule, from my collection: > > All those "( )" are part of the actual album name Let alone the Sigur Rós album effectively typed as "( )": http://en.wikipedia.org/wiki/(_) :).
*** Bug 541350 has been marked as a duplicate of this bug. ***
Check out the patches in bug 530690, which may implicitly fix this bug.
The "URL-encode cover filenames" patch?
Created attachment 123053 [details] [review] patch This patch should fix this bug. It only affects the filename of the cached cover, so the queries to last.fm etc should all be safe.
(In reply to comment #3) > Another case comes to mind: you have an album and a single with the same name, > you appended "(Single)" to the album name, now it won't be possible to use > album art for the single. > Anyway, I have not noticed that banshee tries to fetch cover art... is this > even enabled in trunk, yet? all by multi-disc albums don't show cover art. It looks like this issue is getting fixed, but I wanted to point out another case where parentheses are absolutely necessary. The rock band Weezer has *three* self-titled albums. We don't want all of them lumped together as a single album, so the usual practice is to specify the color of the cover in parenthesis, e.g., "Weezer (green album)". Even Musicbrainz--which is particularly anal about doing things exactly the right way--follows this practice.
We need a migration path for this, too - we don't want to have to refetch a ton of cover art and end up having dupes. Aaron and I talked with pvanhoof at the Summit a few months ago about a more general media-art specification. Here are our notes: http://live.gnome.org/MediaArtStorageSpec
*** Bug 530690 has been marked as a duplicate of this bug. ***
In the media art spec, what is the purpose of stripping arbitrary characters such as @ or =? Since the "normalized" string will be digested with MD5 anyway, there's no filesystem character restrictions to worry about. Considering the vast array of styles used in artist and album names, declaring some characters forbidden strikes me as a poor idea. Ditto for stripping pairs of parens/brackets/etc. It's not Banshee's duty to decide which albums are worthy of displaying cover art. If there is a problem with poor metadata from some sources like Last.FM, that ought to be dealt with in the Last.FM plugin, not in a far-reaching cover-art spec. The section on thumbnailing media mentions only the sizes "normal" and "large", from the thumbnail spec. Looking at Banshee's media art cache, it appears to use at least the following sizes: 34, 36, 48, 64, and 300 px. The heuristics for discovering art is unnecessarily restrictive. If I'm reading that section correctly, the following obvious cases will fail to properly discover art: * Album with 10 tracks, cover.jpg, and 4 misc support files (lyrics, sheet music, etc) * Two tracks from a 10-track album, with cover.jpg * Album with cover.png In addition, these rules appear to be duplicates: * Any file called cover.* or front.* or album.* (jpg, jpeg). Case insensitive. * Any file case independent called cover.jpeg,jpg will serve as album art
John, I agree re: char stripping of something that's going to be MD5'd. In terms of the hueristics, I don't think we'll probably (need to ) change our much. If you don't like our current ones, file a bug. That page of notes is definitely not final.
Why don't you simply use a metafile to index everything? Something like: $ cat /home/user/.cache/media-art/media-art.xml <media-art> <album artist="Sigur Rós" album="( )"> <front size="300px">1234567890abcdef.png</front> </album> <radio name="Radio ga ga"> <logo size="64px">fedcba0987654321.png</front> </radio> </media-art> That way: - No worry about file system - No name collision since we use the exact tags (seriously, who cares about duplicates – add a "Clean cache" function if it's really needed) - No hash collision (just a random string) (and yes, collisions may happen) - Constant-time search for human ("human-findable" – this *is* important) - Multiple file size supported - Changeable artworks (by human and computer) I admit I found two drawbacks: - Non constant-time search for computer (but SELECTable if imported with SQL) - Relatively considerable overhead in memory (if loaded) No?
A single index file with random filenames has several disadvantages: * If the index becomes corrupted, all cover art is lost to applications but continues to consume disk space. * Multiple applications dealing with cover art will have to coordinate their activity, using file locking or similar. * Using your file format, any application that wanted to find a cover for a particular album would have to manage its own parsing and indexing of the index file. In exchange for these, it has no advantage over using predictable digests in a directory structure aside from easier human searchability. ------------------ As an aside, it would be nice if the media art spec had instructions on how to handle non-ASCII text with regards to combining characters. Taking Sigur Rós as an example, it could be represented using at least two methods: "Sigur R\u00f3s" "Sigur R\u0301os" I've encountered both in file tags, mostly depending on which OS the file was tagged on. Either can be easily transformed into the other, so there should be a decision made about which will be used for calculating the MD5 digest.
*** Bug 570999 has been marked as a duplicate of this bug. ***
Created attachment 129755 [details] [review] Implement the media art storage spec (v1) Initial, rough, patch to implement the spec. Mostly works, but a few bugs remain. ------------ Changes to the spec: I couldn't find the "discussion page" for the spec Gabriel mentioned in IRC, so I shall write my notes here. podcast- and radio- covers are stored without a "C" section. There is no point to a section that will always contain the same data. The track- cover type is added, for storing per-track cover art. Its format is track-{album_artist/track_artist}-{album_title}-{title}. The album- cover type is modified to take into account *only* album-artist, because track-artist is per-track. To obtain a hexdigest for an artist/title, the text is decomposed into NKFD form and encoded with UTF-8. ------------ Albums don't have any knowledge of where they are located, because there's no guarantee an album will be contained in only a single directory. Tracks, however, *do* know where they are located. This makes the .mediaartlocal directory difficult to use properly. If album art is placed in such a directory, it will not be copied to ~/.cache/media-art because the indexer is per-track. In the album list no art will be displayed, because that list is per-album. Essentially, placing covers in .mediaartlocal prevents them from being displayed. It might not be possible to implement a .mediaartlocal directory as described in the spec. ------------ The cover metadata query jobs (Banshee.Metadata.*QueryJob) appear to run every time a track is played. This is a bit silly; they ought to be run only when a track has been added or modified. In addition, because the query jobs run only when a track has been played, importing the library results in entirely blank cover art in the album browser. ------------ This patch uses GNOME's implementation of the thumbnail spec. Because the thumbnail spec only specifies the "normal" (128x128) and "large" (256x256) sizes, additional conversions are required to properly scale pixbufs for display in Banshee. There appears to be a significant amount of pixbuf scaling going on when playing tracks. While I haven't noticed any performance problems due to this, it might be more noticeable on a slower sytem.
@Comment #20: Feel free to add remarks to the spec wiki page. The spec isn't finished and can be subject to changes. Just mark them so that I know what got changed and why. Then we can make the necessary decisions and adapt our projects (so far that's Banshee and Tracker, but if our spec is sane then I hope to see many more projects starting to use this specification). Multi resolution thumbnails other than normal and large isn't something I want this spec to specify. That's either for the thumbnailer's part of the story (the thumbnail spec) or specific for Banshee's application domain (store them in Banshee's own private cache directory). The right solution for that isn't to push them into this spec, but rather to convince the thumbnail-spec people that for certain applications such thumbs would be a) useful and b) shared by many applications meaning that therefore letting each application have its own caches is c) a bad idea (otherwise c) isn't a bad idea idea, because you don't gain anything anyway).
*** Bug 574166 has been marked as a duplicate of this bug. ***
*** Bug 575309 has been marked as a duplicate of this bug. ***
I have a CD album whose ID3 information is in a non-latin unicode character encoding. Naturally, when loading this new album into Banshee it tries to scan it's online database for matching cover art and it can't because the website only contains English album information. I am assuming this so far, I have no idea if this is what banshee is actually doing. Not being able to find this information automatically is fine, for I have included the cover.jpg in the album's directory. However, Banshee does not load this file. I have noticed the following behavior: when Banshee finds cover art (whether online or provided in the directory) it copies the jpg to ~/.cache/album-art. The file is then named "[artist]-[album].jpg" Using this information I then copied my cover.jpg to the album art directory and named it using the same convention. However, Banshee yet again failed to load this image. I then erased my entire album art cache directory and placed the same cover.jpg from the non-english album into an english album's directory and loaded this into Banshee. Banshee then found the cover.jpg and displayed it. (The included cover.jpg was NOT the matching cover.jpg for that album. This was done so that I could verify the included jpg was loading, and not an automatically downloaded copy) Hope this helps.
Created attachment 130685 [details] [review] If stripping characters yields an empty string, return original string I was thinking about potential workarounds for this, to display non-ASCII covers while not requiring an upgrade of all Banshee users' cache directories. What if the escaping function is modified so that if stripping unwanted characters yields an empty (ie unusable) string, it simply returns the original input? All existing cached art would continue to work correctly, and art without any "allowed" characters would work as well. The only albums that would have their cache locations changed currently don't work at all anyway.
Created attachment 132257 [details] Image scaled by banhsee Please excuse my ignorance, if I'm wrong, but it seems in banshee new scaling is implemented, now there are 48x48 images. The problem is that scaling quality, is extremely poor. This results in album art images looking very unpleasant. I attach banshee scaled image and the same image scaled by GIMP. I hope you'll see the difference.
Created attachment 132258 [details] Image scaled by GIMP
I have just upcated this spec: http://live.gnome.org/MediaArtStorageSpec to have leading and trailing white spaces removed at "Generating a unique and safe file name".
I have run in to this problem with a number of albums with different cover art for different parenthesized versions, notably: Sigur Rós - ( ) Nine Inch Nails - Head Like A Hole Nine Inch Nails - Head Like A Hole (UK) Nine Inch Nails - Head Like A Hole (Promo) Nine Inch Nails - We're In This Together (Green) Nine Inch Nails - We're In This Together (Orange) Nine Inch Nails - We're In This Together (Yellow) some other singles so it would be nice to see this fixed. (In reply to comment #6) > So...when fixing this, we should be careful, because the current behavior is > actually quite nice in some instances - when playing from last.fm or similar, > and they add (Live from X) or something, since we ignore that part it can > sometimes still get cover art when it otherwise wouldn't match. So maybe we > need to try w/ the ()'s text, and then without if that returns no matches. This fallback idea would not work well for me. Using the Head Like A Hole releases as an example, the US version (unparenthesized) has cover art, the UK version has separate cover art from the US version and the promo version doesn't have cover art. If you were to fall back to using no parenthesis then the promo version would get cover art for the US version which is undesirable and leads to confusion in the album view. This fallback doesn't seem necessary for anyone who simply tags their tracks correctly.
Hey Phillip, John, et al: I was looking at the media art spec again, and man, it's complicated. Let's simplify it. Zero character stripping or anything, just a simple $artworktype-$md5.jpg where $artworktype is "album", "podcast", etc, and $md5 is the MD5 of the NFKD of the concatenation of the relevant bits for each artworktype: eg artist-album for an album.
(In reply to comment #30) > $artworktype-$md5.jpg I quite often manually add files to .cache/album-art/ and also see people ask how to do that on IRC quite often. There's no way anyone will manually calculate md5s though. So in this case we absolutely need a way to set (or overwrite) cover art from banshee's GUI.
John, Thanks for the patch, sorry it hasn't gotten the attention prior to now that it deserves. I'm a little wary of supporting (potentially) multiple images per track (eg a track-specific image, an album-wide one, etc). It adds a lot of complexity. We have to check multiple paths every time we do a lookup. All our lookup code should really become aware of the different types, and ask specifically for what it wants (album vs track art, are fallbacks ok, etc). Our fetchers (last.fm, embedded, etc) and saving all needs to be aware of the various types, too. With your patch - correct me if I'm mistaken - in the Save* methods, it looks like album art is saved to a track-specific location. What do you think of making this patch a bit more incremental - changing to using a simple type-hash.jpg naming scheme instead of the current one, but only supporting type=album for now? What is the GotDuplicateCover method supposed to do? It looks expensive (and doesn't appear to close album_stream). Michael: agreed, though I don't think it blocks this patch/bug (bug #336350 for those keeping score at home)
(In reply to comment #31) > (In reply to comment #30) > > $artworktype-$md5.jpg > > I quite often manually add files to .cache/album-art/ and also see people ask > how to do that on IRC quite often. There's no way anyone will manually > calculate md5s though. So in this case we absolutely need a way to set (or > overwrite) cover art from banshee's GUI. May I suggest Muine's awesome "drag an image to the album image area" feature for this?
Just a bump on Comment #24 from Anton Birkel. It seems that the topic covered in that particular comment was lost in the shuffle. In short, if cover.jpg is saved under a directory that contains non-ASCII characters, Banshee will never pick it up.
Slav, that is probably a result of what this bug is all about: if the artist or album name has non-ascii characters, there are often issues with storing it properly in the cache. Wouter, the manually-edit-cover-art bug is a different one.
Created attachment 136798 [details] [review] A Patch to support any language in the artist/album name I had a problem that I didn't got the album art for Hebrew albums. So I wrote this little patch to fix it. I hope you'll include it in the next release.
Good job Gil. Just want to confirm that it works for me. Thank you.
*** Bug 586652 has been marked as a duplicate of this bug. ***
Gil, thanks for the patch. As indicated by Gabriel, I think we're going towards implementing the media art spec linked above. Also, with your patch, we go over each part twice. I think the EscapePart method is called often, so that's costly. For future patches, please follow the code formatting guidelines in the HACKING file (indent with spaces, etc.)
Created attachment 137503 [details] [review] First step towards implementing the media art spec This patch is a low impact version of John's big patch, taking into account Gabriel's remarks in comment #32 : - The ArtworkId property is not renamed - The cover fetchers are not modified, except where they were making assumptions about ArtworkId - Only "album" and "podcast" prefixes are supported - The digest calculation is not modified from John's patch Open questions : - On the first run after applying the patch, banshee has no cover art. Should we try to migrate from the old cache, or re-fetch everything ? - The ~/.cache/album-art is left untouched. Maybe we should delete it ?
(In reply to comment #40) > Open questions : > - On the first run after applying the patch, banshee has no cover art. Should > we try to migrate from the old cache, or re-fetch everything ? I for example have lots of custom cover art in the current folder. Starting from scratch would be a PITA :( > - The ~/.cache/album-art is left untouched. Maybe we should delete it ? Please don't do that if you don't migrate everything before.
Bertrand, thanks for the comment. It was the first time I wrote a patch to a Mono project (although I'm developing in C# for almost 6 years), I read the HACKING file but maybe I just missed the indent thing... I just want to say that most of my songs and albums are in Hebrew, and doesn't contain any of the ABC letters, that's why I got into this situation that I can't see almost any album art from my impressive collection... I created this patch to help me and others like me to fully enjoy all the features of banshee. And now for some business - I didn't understood why you were using hash codes as the album arts file names in the first place. In UNIX almost every character is a legal character in the filesystem (except NULL) and in Windows it's enough to use the Path.GetInvalidFileNameChars() function to exclude illegal character. I just wanted to keep things as they are as much as I can so I created the ilegal_characters array, and used the second call to RemoveCharacters() to eliminate them. But if you ask me we can easily skip this method call. I think the media art spec is a lovely idea, but if it won't be ready to the upcoming release, please at least include my patch so more non-English speakers could easily start migrate to banshee.
I'm with Gil on this one. This works for nonenglish speakers now, so please let's keep it until a better solution is found.
The reason I don't want to do that, Slav and Gil, is that we should migrate the old artwork, and I'd rather we not have to do that more than once. Migrating the artwork is already going to be tricky. We could migrate just the music library (eg not podcast) cover art by creating a job that goes through every album in the music library and, if the artwork is on disk using the old filename, moves it. Could store the state of the job (so that it can be restarted across Banshee restarts/ran in the bg) by saving the last AlbumID checked (and SELECTing them ORDER BY'd AlbumID). Another consideration: should we put the album art in sub folders by the first 2 chars of the MD5 hash, to avoid having directories with thousands of files? We do this for the Last.fm data cache already. That does mean a fairly complicated directory structure when it comes to the size-specific cache dirs too, but is probably a good idea. Phillip, et all: thoughts?
Gabriel, by using my patch (as is) there is no need in migration. I keep the old filename format in the album art cache directory, but just add more characters to it. On plain English albums there will be no effect. Plain non-English albums, didn't had any album art before, so now when you'll try to play such song, the album art will appear (which is good news!) Hybrid albums (that contains English and non-English letters) will have the old album-art, and when you'll try to play such song a new album art image will be created with the new name. Any how the user won't feel the difference. I think it's good enough for a temporary solution until the new media art will be ready.
Gil, I think a migration would still be needed for your patch; album art for a song with mostly ASCII chars in it but one or two non-ASCII characters would have a different filename with your patch than without.
(In reply to comment #30) > Hey Phillip, John, et al: > > I was looking at the media art spec again, and man, it's complicated. Let's > simplify it. Zero character stripping or anything, just a simple > > $artworktype-$md5.jpg > > where $artworktype is "album", "podcast", etc, and $md5 is the MD5 of the NFKD > of the concatenation of the relevant bits for each artworktype: eg > artist-album for an album. I still think we should keep it simple and do this, plus with the migration job I described above. Sorry for all the misdirection with the complicated cover art spec and all.
*** Bug 595457 has been marked as a duplicate of this bug. ***
Bulk changing the assignee to banshee-maint@gnome.bugs to make it easier for people to get updated on all banshee bugs by following that address. It's usually quite apparent who is working on a given bug by the comments and/or patches attached.
I don't know if this is related or not, but with the new drag-and-drop cover art management, Banshee doesn't even let me add my own cover art for Sigur Ros' ( ) album. Anytime I try to add art for that album, Banshee catches an exception (the first time Banshee actually crashed).
Created attachment 149540 [details] [review] Gil's patch does work, but it doesn't install, so I just recreated it. No code changes, just created against newer git code.
*** Bug 605407 has been marked as a duplicate of this bug. ***
I have another album which causes issues. Cross (✝) by Justice. Because they use the ✝ sign it seems that, allthough I manually set the cover art it's not picked up. http://musicbrainz.org/release/9da1a863-f3f2-4618-bdce-f0c88c055ba5.html http://www.amazon.com/exec/obidos/ASIN/B000OZ2GT4/musicbrainz0d-20?v=glance&s=music
*** Bug 607397 has been marked as a duplicate of this bug. ***
*** Bug 608731 has been marked as a duplicate of this bug. ***
*** Bug 587534 has been marked as a duplicate of this bug. ***
Created attachment 154094 [details] [review] Simplified spec + migration After discussing this on IRC with Gabriel and Aaron we decided to go with a simplified spec: * Image files consist of a prefix ('album-', 'podcast-') and a single MD5 hash. * The hash is calculated on *unmodified* artist and album name. That is, no lower-casing, no stripping, no removal of 'special' characters (@#!...) and no removal of what's inside brackets. * For album covers, the hashed string is in this format: '<artist>\t<album>'. If we used a space, for the string "foo bar baz" we would be able to tell whether it's album "bar baz" by "foo" or album "baz" by "foo bar". * Strings are normalised to NFKD before hashing. I also updated the migration method to copy over existing covers from ~/.cache/album-art/ to ~/.cache/media-art/. Migration runs in the main thread but it's fast enough: on my box it copied 560 files in less than 3 seconds. Please test. To undo, just delete the media-art directory.
Review of attachment 154094 [details] [review]: Looks pretty good, Alexander. Let's not forget to remove the verbose Information logging and change the copy to mv before the final commit. ::: src/Core/Banshee.ThickClient/Banshee.Collection.Gui/ArtworkManager.cs @@ +337,3 @@ + if (ServiceManager.DbConnection.TableExists ("PodcastSyndications")) { + sql = "SELECT Title FROM PodcastSyndications"; + using (var reader = new HyenaDataReader (ServiceManager.DbConnection.Query (sql))) { You can simplify this, making it foreach (var title in db.QueryEnumerable<string> (sql)) @@ +347,3 @@ + } + + var old_file = String.Format ("podcast-{0}", old_digest) + ".jpg"; add the ".jpg" to the String.Format, instead of concatenating; use String.Format above in album/artist too @@ +353,3 @@ + var new_path = new SafeUri (Paths.Combine (root_path, new_file)); + + if(Banshee.IO.File.Exists (old_path) && !Banshee.IO.File.Exists (new_path)) { if( => if ( (and above too) ::: src/Dap/Banshee.Dap.MassStorage/Banshee.Dap.MassStorage/MassStorageSource.cs @@ -514,3 @@ - coverart_id = String.Format ("podcast-{0}", Banshee.Base.CoverArtSpec.EscapePart (track.AlbumTitle)); - } else { - coverart_id = track.ArtworkId; Why is this needed? Shouldn't .ArtworkId return the right value for podcast items? ::: src/Extensions/Banshee.Podcasting/Banshee.Podcasting/PodcastService.cs @@ +610,3 @@ { + string title = String.IsNullOrEmpty (feed.Title) ? " " : feed.Title; + return String.Format ("podcast-{0}", Banshee.Base.CoverArtSpec.Digest (title)); Why the " " part, why not just return null?
Created attachment 154099 [details] [review] Fixes This patch addresses issues raised in the previous comment. It also moves the images instead of copying them and deletes the album-art directory afterwards. !!! BACK IT UP BEFORE USING THIS PATCH !!! With moving, migration works much faster: 0.11s vs 3s on my library.
You still have a Banshee.IO.File.Copy in the podcast loop.
Works perfectly here, great job Alexander! "Migrated 781 files in 0.497912s" for the record
Comment on attachment 154099 [details] [review] Fixes Let's get this in, with that Move change for podcasts.
Thanks Gabriel, committed! A couple of notes: * Someone with editing privs should update the proposed spec [1] * Normalisation is broken for some Unicode characters (most notably for Korean). See bnc#480152 [2], would be great if someone could tackle it. [1] http://live.gnome.org/MediaArtStorageSpec [2] https://bugzilla.novell.com/show_bug.cgi?id=480152