GNOME Bugzilla – Bug 593138
Taglibsharp: textfields may include extra characters after terminator.
Last modified: 2010-03-20 02:55:43 UTC
When reading an id3v2 tag encoded in utf16, taglibsharp eventually decodes the byte-string into text using ByteVector.ToStrings(StringType,int,int). This function supports decoding multiple strings from one bytevector. Some files will include an extraneous byte after the two terminating 0, 0 bytes. Taglibsharp interprets this as a new string, and attempts to decode it. However, decoding a single byte always results in 0xfffd - the unknown unicode character. The ToStrings method is called via an overload from Id3v2.TextInformationFrame.ParseRawData() (at "data.ToStrings (encoding, 1)"). After the strings are read, a correction occurs: // Bad tags may have one or more nul characters at the // end of a string, resulting in empty strings at the // end of the FieldList. Strip them off. The code then proceeds to drop empty strings. However, for a multi-byte encoding such as utf16, the extra bytes may be odd in number (a single, byte with value 84 is what I'm seeing - not sure if the extra byte's value means anything), which causes the extra strings to be non-empty but consist solely of the single character 0xfffd. The likely cause is a resized or rewritten tag (I can't otherwise imagine a good reason for storing a utf16 tag in an odd number of bytes). The becomes is visible when (for instance) the .Title property of Id3v2.Tag returns a title ending in "; ?" (where ? is char 0xfffd). Title returns the .ToString representation of a TextInformationFrame (TIT2); this .ToString simply performs 'string.Join ("; ", Text)', so the extra string consisting solely of character 0xfffd is added to the output.
I'm also seeing some files seem to have an extra byte even without the 0,0 terminator; I'll see if I can figure out a pattern. In any case, the result is that the taglibsharp reads certain tags "incorrectly" - I compared with VLC player and foobar2000, and both read these tags as expected (i.e. without extra characters such as 0xfffd).
This problems seems to _only_ occur with extra bytes with the value 84. Does this mean anything?
This looks like an unsynchronization error in taglibsharp: The extra byte being 84 is coincidental; what's happening is that the next frame is starting one byte earlier than taglibsharp expects. When this occurs, the extra byte has the value of the first byte of the next frameId; most frames are text frames and start with 'T' - which has value 84. Taglibsharp, in Id3v2.Tag.Parse(ByteVector), corrects for unsynchronization on the entire tag if the header unsynchronization flag is set. However, this flag is merely indicative; and in any case each individual frame's header contains a frame-size field which MUST include the length of the frame AFTER unsynchronization. So, by removing 0x00 bytes after 0xFF byted _before_ splitting a tag into frames, taglibsharp is changing the length of frames but failing to account for that when decoding the frames. Thus, utf16 encoded text frames that start with a BOM (i.e. which always contain 0xfffe, and are thus are always at least 1 byte longer when unsynchronized) will have a header indicating frame size that's (at least) 1 byte greater that the actual frame size post-de-unsynchronization. A potential fix might be to simply remove the de-unsynchronization step from Id3v2.Tag.Parse(ByteVector) - each frame can be seperately de-unsynchronized. I'm not sure about the validity of mp3's in the wild; Ideally, if the overall tag header indicates that all frames are unsynchronized, this is simply in addition to the flag that each frame must set - i.e. it's redundant - so simply deleting that code from Id3v2.Tag.Parse(ByteVector) is sufficient. When the overall tag header and the separate frame headers conflict however, a choice needs to be made; I'm not sure which to listen to in that case.
In limited testing, removing the lines: if ((header.Flags & HeaderFlags.Unsynchronisation) != 0) SynchData.ResynchByteVector (data); and changing the call to FrameFactory.CreateFrame to frame = FrameFactory.CreateFrame (data, ref frame_data_position, header.MajorVersion, false); within Id3v2.Tag.Parse(ByteVector data) seems to fix these problems and not introduce too many new ones. Now I hope I never need to think of the term post-de-unsynchronization again ;-).
Hey Eamon, thanks for all this investigation! (In reply to comment #4) > In limited testing, removing the lines: > > > if ((header.Flags & HeaderFlags.Unsynchronisation) != 0) > SynchData.ResynchByteVector (data); > > and changing the call to FrameFactory.CreateFrame to > frame = FrameFactory.CreateFrame (data, ref frame_data_position, > header.MajorVersion, false); > > > within Id3v2.Tag.Parse(ByteVector data) It's better to post modifications as patch files. See http://banshee-project.org/contribute/write-code/ for more info. (Although, take in account that taglib-sharp is in Mono SVN, so no git required.) > seems to fix these problems Please, include in your patch new unit tests that will pass with your patch and fail without it. > and not introduce too many new ones. The patch will likely not go in unless it doesn't introduce any regressions. If you know already what regressions is your patch introducing, please write unit tests for them (if they are not already there) and mark them as NotWorking on the patch, to be able to identify them. Further versions of the patch should hopefully fix this regressions so the NotWorking category can be removed. Thanks!
Moving to the taglib-sharp product, but all of Andrés' recommendations still apply ;)
I'll see what I can do on the unit-testing front, but my hobby music-project is on a low burn right now... I'm running VS.NET on windows, not monodevelop on mono; getting the unit tests to run is just an extra hurdle I haven't taken yet. (If anyone happens to know of a handy automatic way of keeping monodevelop+VS.NET projects in sync, that would help)!
(In reply to comment #7) > I'll see what I can do on the unit-testing front, but my hobby music-project is > on a low burn right now... I'm running VS.NET on windows, not monodevelop on > mono; getting the unit tests to run is just an extra hurdle I haven't taken Ok, I would recommend to do this progressively. A first step would be to post the patch without the unit tests (the patch the includes the modifications of the source code that you were mentioning) and attaching a small version of the file that helps reproducing the problem. From that point, somebody may help you transform the file into a unit test if you don't find the time. > yet. (If anyone happens to know of a handy automatic way of keeping > monodevelop+VS.NET projects in sync, that would help)! Why don't you just use VS.NET projects? MonoDevelop supports them. Thanks
Eamon: ping. TagLibSharp has had some fixes since this bug was filed, can you check with the last version? If you don't have time to investigate this further (either to test or to create the proper unit tests), at least posting the offending file would be appreciated (if it's small enough and doesn't have copyright restricions; otherwise you can send it by email) so others can look at it. Otherwise this bug would be closed as NORESPONSE. Thanks.
No, the bug is still present in SVN. It occurs for all files for which HeaderFlags.Unsynchronisation is set (for the entire tag, not just field-wise) and unsynchronization is actually necessary. Since unsynchronization is necessary for UTF-16 text; any full-tag unsynchronized tag with utf-16 content will be garbled (frame boundaries will break). As long as the two lines if ((header.Flags & HeaderFlags.Unsynchronisation) != 0) SynchData.ResynchByteVector (data); are in Tag.cs, this code cannot work - once the byte vector has been resynched, its length has changed, and that means you in general cannot correctly compute frame boundaries (well, if resynching is an invertible function, it might be possible albeit hard to do). This is due to a change between id3v2.3 and id3v2.4 (see http://www.id3.org/id3v2.4.0-changes): Unsynchronisation [S:6.1] is done on frame level, instead of on tag level, making it easier to skip frames, increasing the streamability of the tag. The unsynchronisation flag in the header [S:3.1] indicates if all frames has been unsynchronized, while the new unsynchronisation flag in the frame header [S:4.1.2] indicates unsynchronisation. To avoid false synchronisations in the frame header the size description and flag field has been rewritten [S:4]. Resynchronisation of the complete tag when the unsynchronisation flag in the tag header is set might result in a corrupt tag. That last sentence is the relevant bit here; taglibsharp is indeed resynchronizing the entire tag and that results in a corrupt tag in common cases for id3v2.4 (but it's required for id2v2.3). Is that the info you were looking for? In short, it looks like this is idv2 version dependent; there's no code as far as I can see that checks MajorVersion (or the underlying field) to switch on this difference, so currently taglibsharp will incorrectly read/write id3v2.4 fields whose frames contain deunsynchronized data and where the (useless) overall tag unsynchronization option is set. Essentially, 2.4 never deunsynchronizes the entire tag, so that flag is redundant (each frame is required to have a unsynchronization flag anyhow).
Note that this means that my earlier suggestion to simply remove the lines is too simplistic; v2.4 doesn't need tag-level unsynchronization; v2.3 does.
Did the above comment (about id3v2.3 vs. id3v2.4) reach the right ears? I'm setting the status from NEEDINFO back to UNCONFIRMED, which I hope is the right thing to do?
Would the following function acceptably: if ((header.Flags & HeaderFlags.Unsynchronisation & (header.MajorVersion <=3)) != 0) SynchData.ResynchByteVector (data); This seems to be the way that Taglib accomplishes it. Of course I may not know what I'm talking about.
(In reply to comment #13) > Would the following function acceptably: > > if ((header.Flags & HeaderFlags.Unsynchronisation & (header.MajorVersion > <=3)) != 0) > SynchData.ResynchByteVector (data); > > > This seems to be the way that Taglib accomplishes it. Of course I may not know > what I'm talking about. Sorry, it should have been: if (((header.Flags & HeaderFlags.Unsynchronisation) !=0) && (header.MajorVersion <= 3)) SynchData.ResynchByteVector(data);
One more comment. The above change HAS corrected a bug I was having with various mp3 files that caused Taglib-sharp to drop all remaining frames after encountering some MCDI frames and other various ones. Applications that used Taglib however did not exhibit this problem. Thanks to Eamon for bringing up the initial symptoms.
Created attachment 154813 [details] [review] Patch for the problem I attached a slightly more robust patch for the issue. Taglibsharp does support per-frame unsynchronization, so if you disable unsynchronization on a Tag level for 2.4, you need to enable it at a frame level. Fortunately, that's just a one line change later on in the same function when it calls the frame factory method.
Created attachment 154834 [details] Mp3 file with correct 2.4 id3 tag which is read correctly by foobar2000 + itunes This file demonstrates the problem. It's a variant of the normal sample_v2_only file, modified to contain a slightly different tag (written by foobar2000) which exhibits the issue.
Created attachment 154841 [details] TestCase for id3v 2.4 correct reading. Attached a test case which tests whether id3v2.4 tags are correctly read/written by referring to earlier attached .mp3 file This test is simply a modification of the current id3 2.3 test. With the patch, reading works. Writing fails: currently the writing code just like the reading code unsynchronizes the whole tag. This means tags written by taglibsharp can be read by taglibsharp (without the patch), but they might not survive other id3 software. With the patch, the reading of correct frames is fixed, but that means tags can't be round-tripped anymore. Fixing the writing code is also trivial, but there's a compatibility issue here: if people used taglibsharp previously to write 2.4 tags, those are now corrupt. I'm not sure if other software can deal with those corrupt tags currently, but certainly post-patch taglibsharp cannot. Incidentally, although current tags are corrupted, this usually doesn't actually result in any big problems since frames were unsynchronized previously (so this only really changes frame headers) - and those are mostly synch-safe (coincidentally).
To summarize: - I wrote a patch to correct the id3v 2.4 tag reading. The patch doesn't address writing; Taglibsharp still corrupts(!) id3 2.4 tags it writes (though this usually leaves part of the tag valid). - I attached an mp3 file which is read incorrectly by taglib before the patch and correctly after the patch - I attached a testcase which verifies this. The testcase still fails since with fixed reading, id3 2.4 tags cannot be roundtripped. Id3 v2.4 tag writing should be fixed too (which I'll be glad to do). Ideally taglibsharp should be able to read the old-style corrupt tags in addition to standard tags, though I'm unsure of how best to do this. Implementing backwards-compatibility for corrupt tags should probably be a different case, though.
One last detail, then I'm off for today; I just checked, id3 v2.4 tags written by taglibsharp indeed are corrupt and not completely read by either foobar2000 or itunes. However, on the test-file I used the first performer and title survived (being at the start of the tag, these have the best chance of surviving), so I'm not sure how noticable corruption is right now.
Created attachment 154847 [details] [review] Patch that fixed both reading + writing. This patch includes the previous reading-only patch and also fixing tag-writing. Specifically, the Render method was altered to check for Tag version in the unsynchronisation logic. If it's version 2.3, then the full tag is unsynchronized (current behaviour). If it's version 2.4, then instead each frame is unsynchronized (i.e. frame's unsynch flag is set and then rendered, tag isn't unsynchronized). This patch passes the aforementioned unit tests and the resulting file is readable in itunes and foobar2000. Without the patch, the unit test fails and the resulting file is only partially readable in itunes and foobar2000. (itunes skips most fields but extracts performer+title correctly, foobar reads most fields correctly, but original artist is filled with gibberish and a few fields are missing.)
Eamon and Chris, thanks for your awesome work. That being said, the last detail to get this committed would be to have the 3 patches in 1, how? Just making sure that you transform the testcase into a unit test that is added to the test suite of TagLibSharp (including also the binary mp3 file to be able to launch the unit test).
- The sample mp3 is tiny and can be dropped into the samples directory together with the rest of the sample mp3 and other files. It's loaded by the unit test, so without it, the unit test will always fail. That can be checked in independently of the rest. - The main patch can be applied to taglib-sharp/src/TagLib/Id3v2/Tag.cs. The modifications are minor and only in that file; that too can be checked in independently. - The unit test is in the standard format for Taglib-sharp as-is (e.g. it's marked as an nunit test-fixture and implements the methods of IFormatTest). You can just drop it (Id3V24FormatTest.cs) next to the existing test (Id3V2FormatTest.cs). The unit test can be checked in independently of the main patch and the sample mp3; however, it will fail unless you also include them. If you want everything as one patch, well, in what format do you expect that? Incidentally, there's a few other (mostly trivial) things I wouldn't mind addressing (silverlight support, a few file-dependent performance issues, an empty picture frame bug, easier .net 4 + VS.NET project files) but right now it's just too much hassle untangling these - if there were a different way to submit cohesive patch-sets more easily, that'd be great.
(In reply to comment #23) > If you want everything as one patch, well, in what format do you expect that? Using the Subversion command line tool, you can just type "svn diff > patch.diff" in the base directory, and it will generate a diff for all the files. (You can also use visual tools for this, such as TortoiseSVN->Create Patch option.) > Incidentally, there's a few other (mostly trivial) things I wouldn't mind > addressing (silverlight support, a few file-dependent performance issues, an > empty picture frame bug, easier .net 4 + VS.NET project files) Please file different bugs for these other issues. Thanks!
(In reply to comment #24) > (In reply to comment #23) > > If you want everything as one patch, well, in what format do you expect that? > > Using the Subversion command line tool, you can just type "svn diff > > patch.diff" in the base directory, Oh, and don't forget to run "svn add path/to/new/file.cs" commands for new files before using "svn diff".
Created attachment 154860 [details] [review] Above three patches rolled into one. This patch includes the mp3 sample, the unit test verifying it, and the new read/write tweaks to fix id3v2.4 support. Some notes: - I'm not sure what version of monodevelop was used to make the project files, but when I loaded em with 2.6.1 on windows, I'm missing gnome-vfs for some not-relevant-now test project, but also saving the project to include a reference to the new csharp unit-test file essentially rewrote the entire .mdp file. So, I didn't include that file in the patch, you'll need to include the unit test manually. - The project file references nunit 2.2. I'm using 2.5.3; the new test won't compile on nunit 2.4 and earlier, so that might be a problem. Unfortunately nunit.org is down at the moment; in any case, the unit tests are really simple: it should hopefully be easy to fix (and in any case, an nunit update might be nice anyhow...)
(In reply to comment #26) > Created an attachment (id=154860) [details] [review] > Above three patches rolled into one. Great, looking good! > This patch includes the mp3 sample, the unit test verifying it, and the new > read/write tweaks to fix id3v2.4 support. > > Some notes: > - I'm not sure what version of monodevelop was used to make the project files, > but when I loaded em with 2.6.1 on windows, I'm missing gnome-vfs for some > not-relevant-now test project, but also saving the project to include a > reference to the new csharp unit-test file essentially rewrote the entire .mdp > file. So, I didn't include that file in the patch, you'll need to include the > unit test manually. No worries, better to include the unit-test file in the Makefile.am in the dir tests. > - The project file references nunit 2.2. I'm using 2.5.3; the new test won't > compile on nunit 2.4 and earlier, so that might be a problem. Unfortunately > nunit.org is down at the moment; in any case, the unit tests are really simple: > it should hopefully be easy to fix (and in any case, an nunit update might be > nice anyhow...) Yes, updating NUnit is fine, but not to the latest version which is almost not packaged anywhere. I would say the best to use right now is 2.4.8, so please can you update the patch to use the API of that version. While you're at it, as well: + public class Id3V24FormatTest : IFormatTest + { + private static string sample24unsynchronization_file... You say that you used tabs in some places and 4 spaces in others. Please unify this (Banshee's source code is 4 spaces, but it seems taglibsharp sources are tabs; but its tests are 4 spaces, sigh). + Assert.AreEqual("MP3 title unicode (ኢትዮጵያ)", file.Tag.Title); Can you use unicode scape characters instead of placing them directly in the file to avoid encoding issues? + public void TestCorruptionResistance() { This method is empty, can you remove it or fill it or say why it needs to be empty in a comment?
s/You say/You see/
TestCorruptionResistance is required by interface - the whole interface is a technically unnecessary; but that's what other tests did so I thought I'd follow that convention (presumably it codifies tests you'd actually like to have). I'll see what I can on the others points; but this project's on the back burner again for a while so that might take a while.
I just spent half a day debugging a pesky issue on an ID3v2.4 tagged mp3 file which caused all frames after APIC to be dropped. Apparently early resynchronisation is to blame, should have searched BGO first :-/ Confirming and assigning the bug to myself, will check the patches later this week.
As far as I can tell, the patches work and the error lies in a detail of the id3 v2.4 spec (relevant sections linked above). I'm sorry I don't have more time to spend on this the coming weeks; Andres mentioned a few issues he'd like to address before merging. The most crucial of these updates seem to be the unit tests which I wrote for nunit 2.5+ which is not part of mono yet. Cheers, Eamon.
Oh, and the unit test source file isn't included in the monodevelop project by the patch. When I did that on my windows system the resultant patch completely rewrote the .mdp file and also updated some details about references - since I didn't understand what was going on there (and don't normally use monodevelop), I left that out.
Don't worry Eamon, I'll take care of the unit tests. Thanks for all the research!
OK, I've got it up and running in my local git branch; as I suspected it also fixes the issue from comment 30. I should get a Mono SVN account later this week, will commit the patch as soon as I receive it.
Oh, one more minor detail would be nice; could you add the various tmpwrite.* files in the sample directory (outputs of unit tests) to svn:ignore? That cleans things up a bit.
Committed, thanks Eamon! ------------------------------------------------------------------------ r153926 | akojev | 2010-03-20 13:31:42 +1100 (Sat, 20 Mar 2010) | 8 lines 2010-03-20 Alexander Kojevnikov <alexander@kojevnikov.com> * src/TagLib/Id3v2/Tag.cs: * tests/Makefile.am: * tests/fixtures/TagLib.Tests.FileFormats/Id3V24FormatTest.cs: * tests/samples/sample_v2_4_unsynch.mp3: * tests/tests.csproj: Patch and unit tests from Eamon Nerbonne fixing ID3v2 unsyncing (bgo#593138)
(In reply to comment #35) > Oh, one more minor detail would be nice; could you add the various tmpwrite.* > files in the sample directory (outputs of unit tests) to svn:ignore? That > cleans things up a bit. Fixed in this commit: ------------------------------------------------------------------------ r153927 | akojev | 2010-03-20 13:53:58 +1100 (Sat, 20 Mar 2010) | 10 lines 2010-03-20 Alexander Kojevnikov <alexander@kojevnikov.com> * src: * docs: * tests: * tests/samples: * .gitignore: * examples: Add svn:ignore and .gitignore