GNOME Bugzilla – Bug 530690
Non-ASCII artists/albums' cover.jpg not recognized
Last modified: 2009-01-13 18:52:24 UTC
When trying to get cover arts for(in my case)swedish artists which have åäö letters in the artistname/albumname, these cover arts are not displayed by Banshee. When going to ~/.config/banshee/covers and manually adding them into this directory, you have to manually remove the ÅÄÖ/åäö letters from the filename for Banshee to accept and use the image files in the application and then using them when syncing with you Ipod. I would like Banshee to accept filenames with these letters inside them to make it more painless to get 'non-american/english' named artists covers arts working. Other information:
Anyone able to reproduce this problem or ever seen this sort of problem before?
(In reply to comment #1) > Anyone able to reproduce this problem or ever seen this sort of problem before? > Well, In my case it happens also with tags that contain japanese or chinese letters.
I can confirm this for Japanese artists. Also happening for some artists that use a lot of umlauts like Motörhead, Queensrÿche and such. Please support Spin̈al Tap along with UTF-8! :-)
Can confirm that it does not work with åäö (swedish) with Banshee-1 on Hardy. Would be nice if someone added support for it.
Can also confirm this with Banshee 1.2: Even with the addition of scanning for cover art in music folders (cover.jpg, folder.jpg, etc.), it still fails.
*** Bug 542179 has been marked as a duplicate of this bug. ***
For reproducing the same bug edit tags of some mp3 files this way: 1. Persian: Album Artist: فرامرز اصلانی Album: به یاد حافظ 2: Chinese: Album Artist: 孟庭苇 Album: 真经典 3. English with non-ASCII characters: Album Artist: Guns N' Roses Album: Live Era: '87–'93 can't find cover.jpg in folder when named like examples. [openSUSE 11.0/Banshee 1.2]
*** Bug 546172 has been marked as a duplicate of this bug. ***
I've the same problem here with Banshee1.2.0
I have the same problem with Banshee 1.2.1 and Russian album/artist names.... It's really frustrating to have a large collection of music without album covers...
*** Bug 556666 has been marked as a duplicate of this bug. ***
Here's what I found about this : All cover art providers (Last.fm, covert.jpg, embedded, etc.) won't do anything if the track.ArtworkId property is null. This property is null if either the artist name or the track title do not contain any alphanumeric ASCII character (a-z 0-9). (See the CovertArtSpec.CreateArtistAlbumId method). So instead of dropping the non ASCII characters, maybe we should "normalize" or transliterate (ä -> ae) them ?
Problem solved by changing line 102 on Core/Banshee.Core/Banshee.Base/CoverArtSpec.cs from: return Regex.Replace (part, @"[^A-Za-z0-9]*", "").ToLower (); to: return Regex.Replace (part, @" *", "").ToLower (); I don't know why there is an "EscapePart" function here. Any modern filesystem can handle UTF-8 filenames. Banshee should write cover files named "<artist>-<album>.jpg" where <artist> and <album> are the exact copy of the tag. For those who have a filesystem that doesn't support "non-ascii" filenames, it should be possible to "normalize" it, but I'm not even sure that Banshee will run on such filesystem.
BTW, it doesn't break cover art download. Some artworks will be re-downloaded as the cover art filename has changed. I'd like to hear more about that regular expression that Banshee use to generate the filename (TrackInfo.ArtworkId).
Created attachment 121602 [details] [review] Warren Seine's patch as diff Here's Warren's patch. For me, it seems to "help", but at the same time I get some cover art that doesn't ever reload. I'm going to spend some time "flushing" my cache or reimporting my library's tracks to see if it's really a problem caused by this patch.
Any feedback? Is someone experiencing problems with Banshee writing files with Unicode file names? I'm running Banshee with this patch (with a 5000-song library) and haven't found any regression yet. This bug is not a showstopper, but it should be easy to fix and I'm sure it affects many people.
Has the patch already been applied to the SVN version or does it need manually added and compiled?
The patch is one line change, so it can be applied to pretty much any recent version. But no, it's not yet in the trunk, otherwise this bug would probably be closed ^^.
Most filesystems have a set of disallowed characters. In Linux this is "/", on Mac it's "/" and ":", and on Windows there's a whole bunch of them. The regex should detect and normalize/remove these. Also, instead of removing disallowed characters, they should be converted to an underscore or dash or something. I know this sounds crazy, but I have albums in my library that differ only in the addition of a character that is not allowed. So the albums "album" and "album?" should be converted to "album" and "album_".
Created attachment 122700 [details] [review] Allow unicode characters to be used in cover art filenames Escape any of these characters: /?<>:\*|" to an underscore. In EscapePart, just above the regex, there is code to truncate the part if it contains a left paren; what is the reasoning for this? It seems unnecessary.
Just tried John's patch and it seems to be working as expected, with the covers to my Japanese albums being downloaded and used. (They're the only non-ASCII albums I have.)
I agree that the solution is better than mine. However it's not perfect as two albums may be escaped in the same way (if artist "Foo" named its albums "Bar!" and "Bar?"), but I'm pretty sure this won't happen in real life :) Please commit this patch to SVN.
Never say never. IMO, it would be much better to just encode the filenames in a standard way, e.g., base-64, rfc1522 mailheader-encoded, url-encoded, or something like that, that will map each artist/album to a unique (7-bit safe) filename.
Created attachment 122801 [details] [review] URl-encode cover filenames Attached a patch to properly url-encode the artist ad album names in album art filenames. Works perfectly here. Dont forget to run automake after applying the patch.
Created attachment 122802 [details] [review] URL-encode cover filenames Updated patch; now also encode '-'.
Bas, both of your patches are causing an error when running make: ./Banshee.Base/CoverArtSpec.cs(105,32): error CS0103: The name `HttpUtility' does not exist in the current context
@Thomas, you're wrong. The patch works. Did you "automake" as Bas suggested? I really don't like the URL-encode style. It produces ugly file names, even if it works. Another solution might be to use unique hashes and index it inside a stuctured document (XML). That way, it will work perfectly since we could write the exact tag in the file, but that would be a big change in the way Banshee handles cover art. As expected, my solution did fail when trying to match "AC/DC". The solution is obsolete.
Is it possible for HttpUtility to selectively URL-encode only certain characters? If encoding is only applied to invalid characters, it should allow readable filenames while maintaining uniqueness in the case of album names differing only by invalid characters.
| ./Banshee.Base/CoverArtSpec.cs(105,32): error CS0103: The name `HttpUtility' does not exist in the current context Thomas, don't forget to run automake after applying the patch.
Warren: you don't like the "ugly" filenames, but you do like totally unrecognizable filenames with just a hash number? That really doesn't make much sense. Or maybe you misunderstand what UrlEncode does: it only encodes non-text characters, i.e., "Björk - Human Behaviour" becomes "Bj%c3%b6rk-Human+Behaviour". I agree, not very pretty, but acceptable IMO. At least it allows you to manually inspect the directory.
I didn't say I liked it either :) It's just an "easy" way to bypass the file system problem. A grep in the file is all you need to get the cover. Usually, UrlEncode transforms spaces to "%20". Based on your patch, I guess it's the expected behavior. However, if your solution gives something like "Bj%c3%b6rk-Human+Behaviour" it's fine. But what should I say about my Japanese artists with Japanese trackname?
Come on, it's not like the cache directory is designed to be manipulated by users directly anyway. Who cares about the filenames, as long as they work fine? Bas' patch looks good to me.
@Wouter: Yes, you're right. But what about file name length too? UTF-8 URL-encoded characters are 9 ASCII characters (bytes, usually). I have a few Japanese anime soundtracks that won't fit in the 255 allowed bytes of the ext3 FS.
Why don't we simply use the albumid which is in the database for covers? If the filename isn't that important...
That would be nice, but it would require a bit of rework of the cover logic, as currently ass functions are called with (artist,album) names. If we don't require readable filenames, it would be easier to just use some kind of checksum. A simple md5 of "Artist\0Album" or so would do fine.
@Bas: It would require to compute the hash each time the track is played, I'm not sure we can accept it.
Uh, why is that a problem?
Because it's a CPU-consuming task? Maybe it's not. In that case, nevermind.
Created attachment 123083 [details] [review] Digest concatenated artist/album to generate the cover art ID Here's an implementation of the cover art ID generator that uses hex digests. Provides reasonably unique IDs at the cost of human readability. > @Bas: It would require to compute the hash each time the track is played, I'm > not sure we can accept it. Digesting such short strings is almost instant, even on slow / small systems.
Pretty nice and (quite) unbreakable solution. Sounds great to me.
I'm going to close this as a duplicate of bug #520516 because they have a common solution - a more robust naming scheme. Warren, I think we'll do something like your patch, but probably using MD5 to hash the name instead of urlencode it. Please see the notes/draft of a new media-art spec linked to on the other bug. *** This bug has been marked as a duplicate of 520516 ***