GNOME Bugzilla – Bug 475455
GstBin should use internal_links for state changes
Last modified: 2018-05-04 08:58:46 UTC
Hi, I had a working pipeline (a few weeks ago), this one was doing transcoding and worked properly, even with the loop in the graph. With the current gstreamer cvs code some elements seems to be wrongly unreffed by the gstreamer scheduler, here is a a typical result : (<unknown>:3022): GStreamer-CRITICAL **: Trying to dispose element mulawdec0, but it is not in the NULL state. ... (<unknown>:23228): GStreamer-CRITICAL **: gst_object_unref: assertion `((GObject *) object)->ref_count > 0' failed The element is unreffed by function gstbin.c:1589 remove_from_queue(), called from gstbin.c:1669 update_degree() attached are some debug log about mulawdec0 element (only mulawdec0). Regards.
Created attachment 95277 [details] debug log
Could you run your code with export G_DEBUG=fatal_warnings in gdb and get a full stack trace with debugging symbols for the warning? Could you make your code, or a simplified test program that demonstrates the bug, available?
Created attachment 95330 [details] Information about used pipeline
Added stack trace and log - loop_detected_gdb_bt.txt : gdb bt from G_DEBUG=fatal_warnings - loop_detected_log.txt.gz : associated log with GST_DEBUG=5 - critical_gdb_bt.txt : gdb bt from G_DEBUG=fatal_criticals (to avoid core dump on warnings from detected loop and some elements unable to answer latency query) - critical_log.txt.gz : associated log with GST_DEBUG=5
Created attachment 95411 [details] gdb bt with G_DEBUG=fatal_criticals
Created attachment 95412 [details] associated log to G_DEBUG =fatal_criticals
Created attachment 95413 [details] gdb bt with G_DEBUG=fatal_warnings
Created attachment 95414 [details] associated log to G_DEBUG=fatal_warnings
Strange case... From the following cmd :: for elt in `gzip -d -c critical_log.txt.gz | egrep "add_to_queue|remove_from_queue" | cut -f2 -d\' | sort -u` do echo __ $elt __ gzip -d -c critical_log.txt.gz | egrep "add_to_queue|remove_from_queue" | egrep $elt done - most elements are added to queue but not removed - some elements are added and removed from queue (NetRtpBin,VoIPRtpBin) - one element (alawdec0) is removed from queue but not added (or not with add_to_queue() !?)
Seems like the loop might have caused the problem. With pipelines like that we should really use the internal_links function to walk the graph (it is then does not really contain a loop).
A quick and dirty patch could be this one : static void remove_from_queue (GstBinSortIterator * bit, GstElement * element) { if (g_queue_find(bit->queue, element)) { GST_DEBUG_OBJECT (bit->bin, "removing '%s' from queue", GST_ELEMENT_NAME (element)); g_queue_remove (bit->queue, element); gst_object_unref (element); } else { GST_DEBUG_OBJECT (bit->bin, "unable to remove '%s' from queue - not inserted", GST_ELEMENT_NAME (element)); } } that is quite unefficient as we scan all the queue for all elements. A more efficient solution could be to use the result of the g_queue_find to issue a remove with a g_queue_delete_link(). But it's just a workaround.
I commited your workaround, it's better than crashing and burning hard. * gst/gstbin.c: (remove_from_queue): Work around a problem with pipelines containing (semi)loops until a proper, more complicated solution is ready. See #475455.
Wouldn't an implementation over internal_links also fix bug #563828? There the problem is, that there is a ! multiqueue ! b ! multiqueue ! c Where the same multiqueue is used. In reality there's no loop ;) But then, bug #563828 should also be fixed in decodebin2 because linking twice to the multiqueue is suboptimal.
(In reply to comment #10) > Seems like the loop might have caused the problem. With pipelines like that we > should really use the internal_links function to walk the graph (it is then > does not really contain a loop). How exactly would that work? When walking the graph via pad connections (i.e. ... ---> sinkpad --- internal_links ---> srcpad --- peer ---> sinkpad ) we would have a graph with pads as nodes. For every pad it's possible to get the parent element of course. Here it must be ensured that there is no cycle in the pads graph but there could be cycles when transforming the pads graph to the corresponding elements graph. Right? For iterating over all elements in topological order we would then walk this pad graph from the sink element's sink pads. But when do we change the state of an element that owns a pad? The first time when all "dependencies" for at least one srcpad are done? Example: a(src1,src2) and (sink1)b, a.src1 is linked to b.sink1. After b has changed the state a can change the state too, no matter what the peer element of a.src2 does. Comparison: currently we require that all srcpad "dependencies" are done. Does this work as expected? Probably not, because example: a(src1,src2), (sink1)b, (sink1)c. a.src1->b.sink1 and a.src2->c.sink1. a could then change state already once b or c has changed the state. Which would in the end mean, that a is trying to push data to an element in the wrong state (say a and b are PAUSED, c is still READY). If we allow to change state when at least one srcpad dependency is done but require that for every bin state changes are done for those elements with the least remaining srcpad dependencies, would this work? But isn't this then the same behaviour as we have now? If we require all srcpad dependencies to be done before changing the state there should be no difference which graph is walked. Unless I'm missing something the rtpbin<---->rtpbin loop example would then still not work because both rtpbin would wait for the other.
I think the solution here is to split the element and instead have them share a common structure (so instead of having a single multiqueue element have multiple singlequeues that are all connected by some shared object. And/or having a sink that gives its buffer to a separate source element (so they could even be separate gst pipelines if they dont have to be synchronized).
Should we still worry about implement the state change stuff in GstBin with iterate_internal_links instead of what we do now? It shouldn't matter unless we want to allow loops in the element graph that are no loops in the link graph.
I'm not so worried, we could mark this as an enhancement..
Let's close this then, does not seem that useful after all and non-trivial/impossible to implement.