GNOME Bugzilla – Bug 729113
id3demux: Produce text/html output caps and fails
Last modified: 2014-04-29 13:49:23 UTC
I have an mp3 that don't play in GStreamer. It starts with ID3, but id3demux set output caps to text/html, and then the pipeline fails.
Are you just bragging, or are you going to show us the file? ;)
Arg, file was too big, ignored silently
Note, I'll remove as soon as we have a fix, since I have no idea where is that file from/copyright and stuff. http://people.collabora.com/~nicolas/a.mp3
$ gst-launch-1.0 filesrc location= /home/tpm/samples/misc/729113-typefind-html.mp3 ! id3demux ! fakesink dump=true | grep ^0 | head -n4 00000000 (0x7f53bc0061a0): 3c 48 54 4d 4c 3e 0d 0a 3c 48 45 41 44 3e 3c 54 <HTML>..<HEAD><T 00000010 (0x7f53bc0061b0): 49 54 4c 45 3e 50 61 67 65 20 45 78 70 69 72 65 ITLE>Page Expire 00000020 (0x7f53bc0061c0): 64 3c 2f 54 49 54 4c 45 3e 0d 0a 3c 2f 48 45 41 d</TITLE>..</HEA 00000030 (0x7f53bc0061d0): 44 3e 0d 0a 3c 42 4f 44 59 20 42 47 43 4f 4c 4f D>..<BODY BGCOLO Just saying..
Yeah, not sure how that should be handled - there's id3, then some random html content, then eventually the mp3 packets.
From the provider of this file, there is HTML in the ID3, but clearly it's detected as being after, running the typefind on that obviously gives text/html. So I'd guess a parser bug, otherwise we really have html garbage between the id3 and the mp3. Most mp3 player can skip garbage without issue though ...
Even GST do actually :-P gst-launch-1.0 filesrc location=a.mp3 ! mad ! pulsesink
The HTML isn't in the ID3 tag. It starts immediately after - the ID3 tag is declared as 54210 bytes long.
I confirm this. It's really the worst case for us. With the demuxer not knowing why are the valid types, I'm not sure how we could handle this.
I'm tempted to just WONTFIX this. It's clearly a broken file from a buggy server script and probably rather unique. Not sure it's worth jumping through hoops to "fix" this.
Yes and no, e.g. this kind of corruption need to be supported to for HW mp3 player to get the branding. But I suspect our model is not exactly prepared for that. E.g. if it was an jpeg instead of html, we would render the jpeg, but the jpeg decoder would fail later receiving the mp3 data. Though, we could also state that if you where to implement a commercial mp3 player, you would use a static pipeline, with a customer decoder that also handle id3. I would find GStreamer a bit useless in this scenario to be honest.
What "branding" is this a test file for?
(In reply to comment #12) > What "branding" is this a test file for? I'm not saying that file. To be allowed to use MP3 branding on your device, I know that you'll have to go through certification and run over few tests. And I know theses tests includes having corruption between mpeg frames. I guessed this most likely include having corruption between the id3 and the first frame, which is exactly the scenario we are facing.
No, it's not the same scenario at all. We should be recognising files with some junk at the beginning or between some frames just fine.
I've crafted two files with 6bytes corruption at similar location. http://people.collabora.com/~nicolas/test.mp3 http://people.collabora.com/~nicolas/test2.mp3 test2.mp3 has random value as the 6 bytes, and this is recognized and skipped, test.mp3 has specially crafted junk (a partial png header) and fails. So that matches what just have been said. It will fail if it's not junk (html file, png, jpeg, etc), or if you are unlucky enough that the junk triggers a type in typefinder.
Let's close this, I think this is not really a bug, but expected behaviour, and reasonable behaviour. There are probably non-hackish ways to make this work, but that would require some more thought about typefinding design issues, and would be an enhancement and a bit more work. I don't know if it's worth it to be honest. If you think it is, then please clone an enhancement bug from this bug.