After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 530690 - Non-ASCII artists/albums' cover.jpg not recognized
Non-ASCII artists/albums' cover.jpg not recognized
Status: RESOLVED DUPLICATE of bug 520516
Product: banshee
Classification: Other
Component: Importing
git master
Other All
: Normal normal
: 1.x
Assigned To: Banshee Maintainers
Banshee Maintainers
: 542179 546172 556666 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2008-04-30 08:21 UTC by Magnus Zetterberg
Modified: 2009-01-13 18:52 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Warren Seine's patch as diff (606 bytes, patch)
2008-10-29 20:39 UTC, Andrew Conkling
needs-work Details | Review
Allow unicode characters to be used in cover art filenames (609 bytes, patch)
2008-11-14 22:35 UTC, John Millikin
rejected Details | Review
URl-encode cover filenames (1.34 KB, patch)
2008-11-16 17:12 UTC, Bas Zoetekouw
none Details | Review
URL-encode cover filenames (1.37 KB, patch)
2008-11-16 17:23 UTC, Bas Zoetekouw
rejected Details | Review
Digest concatenated artist/album to generate the cover art ID (2.05 KB, patch)
2008-11-19 23:00 UTC, John Millikin
rejected Details | Review

Description Magnus Zetterberg 2008-04-30 08:21:19 UTC
When trying to get cover arts for(in my case)swedish artists which have åäö letters in the artistname/albumname, these cover arts are not displayed by Banshee. 
When going to ~/.config/banshee/covers and manually adding them into this directory, you have to manually remove the ÅÄÖ/åäö letters from the filename for Banshee to accept and use the image files in the application and then using them when syncing with you Ipod.

I would like Banshee to accept filenames with these letters inside them to make it more painless to get 'non-american/english' named artists covers arts working.

Other information:
Comment 1 Magnus Zetterberg 2008-05-06 14:58:56 UTC
Anyone able to reproduce this problem or ever seen this sort of problem before?
Comment 2 Bartek 2008-06-01 20:09:55 UTC
(In reply to comment #1)
> Anyone able to reproduce this problem or ever seen this sort of problem before?
> 

Well, In my case it happens also with tags that contain japanese or chinese letters.
Comment 3 Matteo Settenvini 2008-06-18 17:09:22 UTC
I can confirm this for Japanese artists. Also happening for some artists that use a lot of umlauts like Motörhead, Queensrÿche and such.

Please support Spin̈al Tap along with UTF-8! :-)
Comment 4 Mikael Håkansson 2008-06-24 17:06:59 UTC
Can confirm that it does not work with åäö (swedish) with Banshee-1 on Hardy.

Would be nice if someone added support for it.
Comment 5 Thomas Pifer 2008-07-31 05:06:28 UTC
Can also confirm this with Banshee 1.2:  Even with the addition of scanning for cover art in music folders (cover.jpg, folder.jpg, etc.), it still fails.
Comment 6 Andrew Conkling 2008-07-31 12:24:52 UTC
*** Bug 542179 has been marked as a duplicate of this bug. ***
Comment 7 Arash Karimpoor 2008-08-03 08:51:19 UTC
For reproducing the same bug edit tags of some mp3 files this way:

1. Persian:
Album Artist: فرامرز اصلانی
Album: به یاد حافظ

2: Chinese:
Album Artist: 孟庭苇
Album: 真经典

3. English with non-ASCII characters:
Album Artist: Guns N' Roses
Album: Live Era: '87–'93

can't find cover.jpg in folder when named like examples. [openSUSE 11.0/Banshee 1.2]
Comment 8 Andrew Conkling 2008-08-04 18:41:09 UTC
*** Bug 546172 has been marked as a duplicate of this bug. ***
Comment 9 longhong 2008-08-05 02:15:47 UTC
I've the same problem here with Banshee1.2.0
Comment 10 Ivan Stetsenko 2008-09-18 11:25:58 UTC
I have the same problem with Banshee 1.2.1 and Russian album/artist names....
It's really frustrating to have a large collection of music without album covers...
Comment 11 Bertrand Lorentz 2008-10-17 12:15:11 UTC
*** Bug 556666 has been marked as a duplicate of this bug. ***
Comment 12 Bertrand Lorentz 2008-10-17 12:23:43 UTC
Here's what I found about this :
All cover art providers (Last.fm, covert.jpg, embedded, etc.) won't do anything if the track.ArtworkId property is null.
This property is null if either the artist name or the track title do not contain any alphanumeric ASCII character (a-z 0-9). (See the CovertArtSpec.CreateArtistAlbumId method).

So instead of dropping the non ASCII characters, maybe we should "normalize" or transliterate (ä -> ae) them ?
Comment 13 Warren Seine 2008-10-26 02:01:07 UTC
Problem solved by changing line 102 on Core/Banshee.Core/Banshee.Base/CoverArtSpec.cs from:
return Regex.Replace (part, @"[^A-Za-z0-9]*", "").ToLower ();
to:
return Regex.Replace (part, @" *", "").ToLower ();

I don't know why there is an "EscapePart" function here. Any modern filesystem can handle UTF-8 filenames. Banshee should write cover files named "<artist>-<album>.jpg" where <artist> and <album> are the exact copy of the tag. For those who have a filesystem that doesn't support "non-ascii" filenames, it should be possible to "normalize" it, but I'm not even sure that Banshee will run on such filesystem.
Comment 14 Warren Seine 2008-10-26 02:40:18 UTC
BTW, it doesn't break cover art download. Some artworks will be re-downloaded as the cover art filename has changed.

I'd like to hear more about that regular expression that Banshee use to generate the filename (TrackInfo.ArtworkId).
Comment 15 Andrew Conkling 2008-10-29 20:39:08 UTC
Created attachment 121602 [details] [review]
Warren Seine's patch as diff

Here's Warren's patch.

For me, it seems to "help", but at the same time I get some cover art that doesn't ever reload. I'm going to spend some time "flushing" my cache or reimporting my library's tracks to see if it's really a problem caused by this patch.
Comment 16 Warren Seine 2008-11-14 19:45:51 UTC
Any feedback? Is someone experiencing problems with Banshee writing files with Unicode file names? I'm running Banshee with this patch (with a 5000-song library) and haven't found any regression yet.

This bug is not a showstopper, but it should be easy to fix and I'm sure it affects many people.
Comment 17 Thomas Pifer 2008-11-14 20:26:05 UTC
Has the patch already been applied to the SVN version or does it need manually added and compiled?
Comment 18 Warren Seine 2008-11-14 21:12:07 UTC
The patch is one line change, so it can be applied to pretty much any recent version. But no, it's not yet in the trunk, otherwise this bug would probably be closed ^^.
Comment 19 John Millikin 2008-11-14 22:19:12 UTC
Most filesystems have a set of disallowed characters. In Linux this is "/", on Mac it's "/" and ":", and on Windows there's a whole bunch of them. The regex should detect and normalize/remove these.

Also, instead of removing disallowed characters, they should be converted to an underscore or dash or something. I know this sounds crazy, but I have albums in my library that differ only in the addition of a character that is not allowed. So the albums "album" and "album?" should be converted to "album" and "album_".
Comment 20 John Millikin 2008-11-14 22:35:46 UTC
Created attachment 122700 [details] [review]
Allow unicode characters to be used in cover art filenames

Escape any of these characters: /?<>:\*|" to an underscore.

In EscapePart, just above the regex, there is code to truncate the part if it contains a left paren; what is the reasoning for this? It seems unnecessary.
Comment 21 Thomas Pifer 2008-11-15 02:06:13 UTC
Just tried John's patch and it seems to be working as expected, with the covers to my Japanese albums being downloaded and used. (They're the only non-ASCII albums I have.)
Comment 22 Warren Seine 2008-11-15 05:29:33 UTC
I agree that the solution is better than mine. However it's not perfect as two albums may be escaped in the same way (if artist "Foo" named its albums "Bar!" and "Bar?"), but I'm pretty sure this won't happen in real life :)

Please commit this patch to SVN.
Comment 23 Bas Zoetekouw 2008-11-16 14:53:28 UTC
Never say never.  IMO, it would be much better to just encode the filenames in a standard way, e.g., base-64, rfc1522 mailheader-encoded, url-encoded, or something like that, that will map each artist/album to a unique (7-bit safe) filename.
Comment 24 Bas Zoetekouw 2008-11-16 17:12:05 UTC
Created attachment 122801 [details] [review]
URl-encode cover filenames

Attached a patch to properly url-encode the artist ad album names in album art filenames.  Works perfectly here.

Dont forget to run automake after applying the patch.
Comment 25 Bas Zoetekouw 2008-11-16 17:23:00 UTC
Created attachment 122802 [details] [review]
URL-encode cover filenames

Updated patch; now also encode '-'.
Comment 26 Thomas Pifer 2008-11-19 01:47:06 UTC
Bas, both of your patches are causing an error when running make:

./Banshee.Base/CoverArtSpec.cs(105,32): error CS0103: The name `HttpUtility' does not exist in the current context
Comment 27 Warren Seine 2008-11-19 02:31:25 UTC
@Thomas, you're wrong. The patch works. Did you "automake" as Bas suggested?

I really don't like the URL-encode style. It produces ugly file names, even if it works. Another solution might be to use unique hashes and index it inside a stuctured document (XML). That way, it will work perfectly since we could write the exact tag in the file, but that would be a big change in the way Banshee handles cover art.

As expected, my solution did fail when trying to match "AC/DC". The solution is obsolete.
Comment 28 John Millikin 2008-11-19 02:44:11 UTC
Is it possible for HttpUtility to selectively URL-encode only certain characters? If encoding is only applied to invalid characters, it should allow readable filenames while maintaining uniqueness in the case of album names differing only by invalid characters.
Comment 29 Bas Zoetekouw 2008-11-19 12:53:25 UTC
| ./Banshee.Base/CoverArtSpec.cs(105,32): error CS0103: The name `HttpUtility' does not exist in the current context

Thomas, don't forget to run automake after applying the patch.
Comment 30 Bas Zoetekouw 2008-11-19 12:58:45 UTC
Warren:  you don't like the "ugly" filenames, but you do like totally unrecognizable filenames with just a hash number?  That really doesn't make much sense.

Or maybe you misunderstand what UrlEncode does:  it only encodes non-text characters, i.e., "Björk - Human Behaviour" becomes "Bj%c3%b6rk-Human+Behaviour".  I agree, not very pretty, but acceptable IMO.  At least it allows you to manually inspect the directory.
Comment 31 Warren Seine 2008-11-19 14:52:59 UTC
I didn't say I liked it either :) It's just an "easy" way to bypass the file system problem. A grep in the file is all you need to get the cover.

Usually, UrlEncode transforms spaces to "%20". Based on your patch, I guess it's the expected behavior. However, if your solution gives something like "Bj%c3%b6rk-Human+Behaviour" it's fine. But what should I say about my Japanese artists with Japanese trackname?
Comment 32 Wouter Bolsterlee (uws) 2008-11-19 16:28:41 UTC
Come on, it's not like the cache directory is designed to be manipulated by users directly anyway. Who cares about the filenames, as long as they work fine? Bas' patch looks good to me.
Comment 33 Warren Seine 2008-11-19 16:46:38 UTC
@Wouter: Yes, you're right. But what about file name length too? UTF-8 URL-encoded characters are 9 ASCII characters (bytes, usually). I have a few Japanese anime soundtracks that won't fit in the 255 allowed bytes of the ext3 FS.
Comment 34 Matteo Settenvini 2008-11-19 18:16:04 UTC
Why don't we simply use the albumid which is in the database for covers?
If the filename isn't that important...
Comment 35 Bas Zoetekouw 2008-11-19 18:19:10 UTC
That would be nice, but it would require a bit of rework of the cover logic, as currently ass functions are called with (artist,album) names.  

If we don't require readable filenames, it would be easier to just use some kind of checksum.  A simple md5 of "Artist\0Album" or so would do fine.
Comment 36 Warren Seine 2008-11-19 18:31:38 UTC
@Bas: It would require to compute the hash each time the track is played, I'm not sure we can accept it.
Comment 37 Bas Zoetekouw 2008-11-19 18:32:37 UTC
Uh, why is that a problem?
Comment 38 Warren Seine 2008-11-19 19:13:17 UTC
Because it's a CPU-consuming task? Maybe it's not. In that case, nevermind.
Comment 39 John Millikin 2008-11-19 23:00:57 UTC
Created attachment 123083 [details] [review]
Digest concatenated artist/album to generate the cover art ID

Here's an implementation of the cover art ID generator that uses hex digests. Provides reasonably unique IDs at the cost of human readability.

> @Bas: It would require to compute the hash each time the track is played, I'm
> not sure we can accept it.

Digesting such short strings is almost instant, even on slow / small systems.
Comment 40 Warren Seine 2008-11-19 23:24:52 UTC
Pretty nice and (quite) unbreakable solution. Sounds great to me.
Comment 41 Gabriel Burt 2009-01-13 18:52:24 UTC
I'm going to close this as a duplicate of bug #520516 because they have a common solution - a more robust naming scheme.  Warren, I think we'll do something like your patch, but probably using MD5 to hash the name instead of urlencode it.  Please see the notes/draft of a new media-art spec linked to on the other bug.

*** This bug has been marked as a duplicate of 520516 ***