After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 458224 - tracks with special-character-only metadata stored on disk incorrectly
tracks with special-character-only metadata stored on disk incorrectly
Status: RESOLVED FIXED
Product: banshee
Classification: Other
Component: Metadata
git master
Other All
: Normal minor
: 2.x
Assigned To: Aaron Bockover
Banshee Maintainers
: 560664 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2007-07-19 08:26 UTC by Adrian Sampson
Modified: 2009-02-12 22:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
replace special-char strings with _ instead of deleting them (526 bytes, patch)
2007-07-19 08:29 UTC, Adrian Sampson
none Details | Review
Updated patch for SVN (538 bytes, patch)
2008-02-13 06:29 UTC, Andrew Conkling
none Details | Review
Allow more characters, replace invalid characters with underscore (970 bytes, patch)
2009-01-15 06:44 UTC, John Millikin
none Details | Review
Allow even more characters, replace invalid characters with underscore (969 bytes, patch)
2009-01-15 20:15 UTC, Dave Steinberg
none Details | Review
Allow more characters, replace invalid characters with underscore (v3) (1.06 KB, patch)
2009-02-01 03:31 UTC, John Millikin
none Details | Review
Allow more characters, replace invalid characters with underscore (v4) (1.62 KB, patch)
2009-02-04 04:27 UTC, John Millikin
reviewed Details | Review
Allow more characters, replace invalid characters with underscore (v5) (1.53 KB, patch)
2009-02-11 04:21 UTC, John Millikin
committed Details | Review

Description Adrian Sampson 2007-07-19 08:26:12 UTC
Please describe the problem:
Artist and album names consisting only of characters removed by Banshee (the characters "\:'~`!@#$%^&*_-+|?/><[] according to FileNamePattern.cs) are stored in incorrect locations when using MoveOnInfoSave or "copy files to music folder when importing". The only band/record affected by this that I know of is !!! and their self-titled debut LP.

Steps to reproduce:
1. Obtain media files whose artist or album contain only special characters that are deleted by Banshee when storing files (for instance, !!!).
2. Set Banshee to use a folder hierarchy wherein at least one directory name consists only of a single metadata field (such as "Artist/Album" or "Album").
3. Set Banshee to "copy files to music folder when importing".
3. Import the music.

Actual results:
A file that should be stored in a directory called path/x where x is a directory name made up of the above characters is instead placed directly in path. For instance, when using the "Artist/Album" hierarchy option, tracks from the album "Myth Takes" by !!! are placed in library/Myth Takes. Worse, tracks from !!! by !!! are placed directly in library.

Expected results:
I expect the tracks to be filed in a path that differentiates them from other artists or albums (for example, something like library/!!!/!!!/trackname).

Does this happen every time?
Yes.

Other information:
One possible solution is to use iTunes' style of escaping: replace special characters with _ (or something else benign) instead of deleting them.
Comment 1 Adrian Sampson 2007-07-19 08:29:44 UTC
Created attachment 91962 [details] [review]
replace special-char strings with _ instead of deleting them

A tiny change replaces "!!!" with "_" in the example case.
Comment 2 Andrew Conkling 2008-02-07 02:02:13 UTC
I'll try to get your patch up to speed for SVN.
Comment 3 Andrew Conkling 2008-02-13 06:29:09 UTC
Created attachment 105121 [details] [review]
Updated patch for SVN

OK, here's the patch. I'll solicit a speedy review from the mailing list.
Comment 4 Ruben Vermeersch 2008-02-13 14:04:31 UTC
Do consider that users with /apps/banshee/library/move_on_info_save will have all their files moved when this is applied. Even worse, due to a bug (is it a bug?), each time a file is played, it will move, leading to some songs of an album named with String.Empty, others with _. Perhaps we should look into that first before changing the FileNamePattern stuff once again (causing me great library maintenance pain).
Comment 5 Bertrand Lorentz 2008-08-15 11:14:09 UTC
The other potential issue I see with the patch is that, for example, if "!!!" ever release an album called "$$$", it will end up in the same directory as "!!!".
Crazy, I know, but everything's possible...
Sadly, I don't have a magic solution for that.

Changing version as the issue and the patch apply to trunk.
Comment 6 Andrew Conkling 2008-09-17 15:18:35 UTC
(In reply to comment #5)
> The other potential issue I see with the patch is that, for example, if "!!!"
> ever release an album called "$$$", it will end up in the same directory as
> "!!!".
> Crazy, I know, but everything's possible...
> Sadly, I don't have a magic solution for that.

Would it be terrible for those to be put into the same folder? a) It's unlikely and b) the same thing would happen if you have multiple releases of "Various Artists / <insert some common name here>".

Meanwhile, there is a valid (but still obscure :P) case for what the patch does do, and it's not being addressed. And I think Ruben's concerns in comment 4 are handled differently now (as of 1.2, I believe).
Comment 7 Gabriel Burt 2009-01-13 21:06:44 UTC
Really not sure what I think of this (besides that it sucks we have to deal w/ the reality of FAT filesystems (well, at least not the original w/ 8-char filename limits :)).  Aaron? 
Comment 8 John Millikin 2009-01-15 06:44:28 UTC
Created attachment 126487 [details] [review]
Allow more characters, replace invalid characters with underscore

This patch provides the underscore replacement behavior of Andrew's, and reduces the set of invalid characters to that of Fat32.
Comment 9 Dave Steinberg 2009-01-15 20:15:28 UTC
Created attachment 126532 [details] [review]
Allow even more characters, replace invalid characters with underscore

A tiny modification to John's patch: I removed ' from invalid_path_characters. John confirmed on the mailing list that it's not invalid on FAT32, and that he had just left it in by mistake.  Thanks John!

This really seems the best approach.  Is there any good reason to remove common characters that are supported on every modern file system (and even FAT32, as well!)?   ;-)
Comment 10 Gabriel Burt 2009-01-16 02:35:33 UTC
I'm wondering if the invalid list came from what's illegal in FAT16?  Can somebody do the research (and ideally include links to definitive resources) to see if that's the case?

Do some portable devices have FAT16 drives?

We could be smarter and use HAL or gvfs to see what type of file system we're writing to, and only use the strict/fat16 naming on actual FAT16 partitions.
Comment 11 John Millikin 2009-01-16 03:03:14 UTC
> I'm wondering if the invalid list came from what's illegal in FAT16?  Can
> somebody do the research (and ideally include links to definitive resources) to
> see if that's the case?

FAT16 has more illegal characters, but still many less than the current list. Also, some characters not allowed in FAT16 are not considered invalid by Banshee. Wikipedia has lists of which characters are supported by various Windows filesystems:

http://en.wikipedia.org/wiki/NTFS
http://en.wikipedia.org/wiki/8.3_filename

> Do some portable devices have FAT16 drives?

FAT16 is popular in devices that use SD cards and such, I think. A Microsoft support page states that periods are not allowed in FAT16[1], so I don't know how well Banshee would work on such devices.

[1] http://support.microsoft.com/kb/120138
Comment 12 John Millikin 2009-01-30 23:11:37 UTC
> <gabaug> please make a list of ones you can justify, ones you can't on the bu

Out of the original escape list, characters that ought to be escaped. NTFS and FAT32 have the same set of illegal characters:

" (fat32)
\ (fat32)
: (fat32, hfs)
* (fat32)
| (fat32)
? (fat32)
/ (fat32, ext3)
> (fat32)
< (fat32)

Additional characters that are invalid in FAT16
+ , . ; = [ ]

Note that this includes periods, precluding filename extensions from being used in FAT16 filesystems. According to the mailing list[1], extensions are required to import media, so Banshee would likely not function properly on a FAT16 filesystem anyway.

Furthermore, the following characters are allowed by Banshee but illegal in FAT16:
, . ; =

Characters Banshee disallows, but valid on mainstream filesystems:
' ~ ` ! @ # $ % ^ & _ -

Characters Banshee disallows, but valid on mainstream non-FAT16 filesystems:
' ~ ` ! @ # $ % ^ & _ - + [ ]

Sources for what characters are valid:
FAT16: http://support.microsoft.com/kb/120138
FAT32: http://msdn.microsoft.com/en-us/library/dd317748(VS.85).aspx
NTFS: http://msdn.microsoft.com/en-us/library/dd317748(VS.85).aspx
HFS: http://www.comentum.com/File-Systems-HFS-FAT-UFS.html

[1] http://mail.gnome.org/archives/banshee-list/2009-January/msg00026.html
Comment 13 John Millikin 2009-02-01 03:31:55 UTC
Created attachment 127680 [details] [review]
Allow more characters, replace invalid characters with underscore (v3)

Of course, remembering to attach the patch would help.
Comment 14 Gabriel Burt 2009-02-04 00:56:34 UTC
From mcs/class/corlib/System.IO/Path.cs, but this doesn't account for having FAT32 mounted under Linux...perhaps we should just gank the Windows array?  Not sure what all those chars are, would think could just have \ / : * ? " < > |

        public static char[] GetInvalidFileNameChars ()
        {
            // return a new array as we do not want anyone to be able to change the values
            if (Environment.IsRunningOnWindows) {
                return new char [41] { '\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07',
                    '\x08', '\x09', '\x0A', '\x0B', '\x0C', '\x0D', '\x0E', '\x0F', '\x10', '\x11', '\x12',
                    '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1A', '\x1B', '\x1C', '\x1D',
                    '\x1E', '\x1F', '\x22', '\x3C', '\x3E', '\x7C', ':', '*', '?', '\\', '/' };
            } else {
                return new char [2] { '\x00', '/' };
            }
        }

        public static char[] GetInvalidPathChars ()
        {
            // return a new array as we do not want anyone to be able to change the values
            if (Environment.IsRunningOnWindows) {
                return new char [36] { '\x22', '\x3C', '\x3E', '\x7C', '\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07',
                    '\x08', '\x09', '\x0A', '\x0B', '\x0C', '\x0D', '\x0E', '\x0F', '\x10', '\x11', '\x12',
                    '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1A', '\x1B', '\x1C', '\x1D',
                    '\x1E', '\x1F' };
            } else {
                return new char [1] { '\x00' };
            }
        }
Comment 15 John Millikin 2009-02-04 04:27:36 UTC
Created attachment 127902 [details] [review]
Allow more characters, replace invalid characters with underscore (v4)

Adds the control characters to the set of invalid characters. Uses the set of characters posted above, in System.IO/Path.cs
Comment 16 Gabriel Burt 2009-02-09 02:17:53 UTC
This looks fine to me.  Aaron, thoughts?
Comment 17 John Millikin 2009-02-10 19:51:49 UTC
*** Bug 560664 has been marked as a duplicate of this bug. ***
Comment 18 John Millikin 2009-02-11 04:21:45 UTC
Created attachment 128443 [details] [review]
Allow more characters, replace invalid characters with underscore (v5)

Updated for trunk, where filename escaping was moved into `Hyena.StringUtil`.
Comment 19 Gabriel Burt 2009-02-12 22:56:26 UTC
Thanks a lot John, committed to trunk.  I also added some tests, handled null input, and moved the Regex stuff to static-init rather than lazy load.