GNOME Bugzilla – Bug 699124
vorbisdec: crash on shutdown in webkit unit test
Last modified: 2013-05-28 06:58:08 UTC
I found the crash at vorbis decoder when running WebKit test media/W3C/video/paused/paused_true_during_pause.html as part of https://bugs.webkit.org/show_bug.cgi?id=115166 Backtrace is the following:
+ Trace 231870
Thread 14 (Thread 0x7f283aa789a0 (LWP 23665))
Thread 1 (Thread 0x7f26cdb23700 (LWP 23706))
This causes a test timeout 18/1000 times and 6/1000 end up in a crash like this. With 1.0.4 this was working fine 1000/1000. If you need more info, I would be glad providing it.
Do you get any criticals on stdout when that happens? Could you attach a xz -9 zipped GST_DEBUG=*:6 debug log? Clearly the reason for the crash is that 'out' in copy_samples_s() is NULL, but I wonder if it happens because the allocator returned NULL or because the buffer could not be mapped as writable. It's a bit unfortunate that we can't differentiate between 'error' and 'shutting down/flushing' in the buffer/alloc API.
I can't remember any changes that could've caused a regression like this, especially there were no changes in the vorbis plugin since then. If the allocator returned NULL there would've been a g_critical() when mapping, and mapping also never returns NULL for system memory (unless it's mapped already incompatibly with the requested mapping flags). Is WebKit using/providing a custom allocator somewhere, and especially one that is used for the audio part of the pipeline?
(In reply to comment #2) > Is WebKit using/providing a custom allocator somewhere, and especially one that > is used for the audio part of the pipeline? Nope.
Sebastian: I don't think it's a regression, it sounds to me like it's only being noticed now because of timing changes in the unit test. I suspect the bug existed all along (we had similar issues in the video class IIRC). I don't quite understand why the system memory allocator would return NULL though. Not sure that can happen. There must be some other issue then, like what I mentioned. Could you provide the info I requested in comment #1 ?
(In reply to comment #1) > Do you get any criticals on stdout when that happens? (DumpRenderTree:19890): GStreamer-CRITICAL **: gst_buffer_map_range: assertion `GST_IS_BUFFER (buffer)' failed
Ok, could you get a debug log for this? It really shouldn't happen at all and currently I don't see how it could happen in this situation
Reproducing with log is harder, but I managed to do it with G_DEBUG=fatal_warnings
+ Trace 231881
Thread 14 (Thread 0x7f71169789a0 (LWP 11118))
Thread 9 (Thread 0x7f6fba4a2700 (LWP 11158))
Thread 1 (Thread 0x7f6fb981e700 (LWP 11159))
Created attachment 242902 [details] Pipeline graph Attaching the png with the pipeline graph as suggested by Sebastian. I think the backtrace is the same, but just in case...
+ Trace 231882
Thread 14 (Thread 0x7ffe8bd0f700 (LWP 3067))
Thread 1 (Thread 0x7fffe86589a0 (LWP 3007))
The pipeline graph would only be really interesting for the case when gstreamer complains about a loop in the pipeline graph. For the crash it would be interesting to know what the negotiated allocator is and if the element is currently flushing and ... From the backtrace it seems to me like the pipeline is currently shutting down and thus flushing, so it could very well be that allocation fails there and should be handled more gracefully. However the pipeline graph loop would be a serious other problem.
(In reply to comment #9) > The pipeline graph would only be really interesting for the case when gstreamer > complains about a loop in the pipeline graph. I managed to find the loop warning. For logical reasons the method to dump the pipeline doesn't end and it doesn't even generate anything, so I can't upload the pipeline. Here you have the backtrace:
+ Trace 231883
Thread 12 (Thread 0x7ffe7d39d700 (LWP 16413))
Ok, that's also not very useful... what exactly is the bin in which the loop is detected (the message says that), and then we would need to find between which elements of that bin the loop actually happens
(In reply to comment #11) > Ok, that's also not very useful... what exactly is the bin in which the loop is > detected (the message says that), and then we would need to find between which > elements of that bin the loop actually happens "loop detected in the graph of bin 'decodebin0'!!" I couldn't investigate the elements. After trying the gst_audio_decoder_negotiate solution, I let the program running 8671 times and I had 17 criticals. Only one was caused by the loop, the other 16 were "file gstdecodebin2.c: line 3698 (gst_decode_bin_expose): should not be reached". I'll try to provide the backtrace if useful. I also had some timeouts, but I couldn't investigate the cause yet.
Ok, so my patch "fixed" the allocator problem. But now there's still a) this weird loop. You should somehow try to inspect the elements yourself with gdb, grabbing the child elements out of decodebin, looking at the hierarchy of elements inside decodebin, checking what links to what. b) the assertion in decode_bin_expose(). This basically happens when the internal state of decodebin is messed up, so might be related to a), might be something completely different. A backtrace won't be that useful unfortunately, but looking with gdb where the gst_decode_chain_expose() call before the assertion goes wrong and returns FALSE. And of course of all this a debug log would be useful, but of course that's a bit difficult to get for timing related things like this :( So let's resort to gdb and manual inspection of the pipeline and what happens there for now.
(In reply to comment #13) > Ok, so my patch "fixed" the allocator problem The patch btw was to always let gst_video_decoder_allocate_output_buffer() return a buffer, even if negotiation failed.
As Tim had said, problem is also in 1.0.4 but it happens more often in 1.0.7. Once you solve the problem with the buffer, which I tested with Sebastian's workaround of always providing the buffer, crash still happens.
I just checked the different tags and the problem was almost inexistent in 1.0.4, but it began to be noticeable in 1.0.5 and it remained like that thru 1.0.6 and 1.0.7.
Probably related or equivalent to bug #700399 , and possibly related to bug #700342
*** Bug 700399 has been marked as a duplicate of this bug. ***
Xabier tracked this down to the change made in bug #690420 , which is very much related to bug #700342 Other part of this bug here is bug #700006 I'd prefer to keep this here open until these other bugs are fixed and we can confirm that everything works then.
This is confirmed to be fixed now as part of all the other bugs
(In reply to comment #20) > This is confirmed to be fixed now as part of all the other bugs \o/