GNOME Bugzilla – Bug 763496
queue: Can release serialized (ALLOCATION) query before downstream returned it
Last modified: 2016-11-25 11:39:53 UTC
==7075== Thread 16 queue0:src: ==7075== Invalid read of size 8 ==7075== at 0x54348A7: gst_pad_peer_query (gstpad.c:3995) ==7075== by 0x71C2EF1: gst_base_transform_default_propose_allocation (gstbasetransform.c:1435) ==7075== by 0x741DD86: gst_video_filter_propose_allocation (gstvideofilter.c:64) ==7075== by 0x71C5EC0: gst_base_transform_default_query (gstbasetransform.c:1535) ==7075== by 0x5434407: gst_pad_query (gstpad.c:3900) ==7075== by 0x54349CA: gst_pad_peer_query (gstpad.c:4032) ==7075== by 0x10D9FB1E: gst_queue_push_one (gstqueue.c:1422) ==7075== by 0x10D9FB1E: gst_queue_loop (gstqueue.c:1485) ==7075== by 0x545FB70: gst_task_func (gsttask.c:332) ==7075== by 0x59AD35D: g_thread_pool_thread_proxy (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4600.2) ==7075== by 0x59AC9C4: g_thread_proxy (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4600.2) ==7075== by 0x8294453: start_thread (pthread_create.c:334) ==7075== by 0x798DEAC: clone (in /lib/x86_64-linux-gnu/libc-2.22.so) ==7075== Address 0xeaca2d0 is 0 bytes inside a block of size 80 free'd ==7075== at 0x4C2AE6B: free (vg_replace_malloc.c:530) ==7075== by 0x74269DB: gst_query_unref (gstquery.h:229) ==7075== by 0x74269DB: gst_video_decoder_negotiate_pool (gstvideodecoder.c:3764) ==7075== by 0x742CBCF: gst_video_decoder_negotiate_unlocked (gstvideodecoder.c:3861) ==7075== by 0x742CBCF: gst_video_decoder_allocate_output_frame (gstvideodecoder.c:4005) ==7075== by 0x1E743D9B: get_output_buffer.part.4 (gstavviddec.c:1190) ==7075== by 0x1E744FBE: gst_ffmpegviddec_video_frame (gstavviddec.c:1376) ==7075== by 0x1E744FBE: gst_ffmpegviddec_frame (gstavviddec.c:1511) ==7075== by 0x1E745CED: gst_ffmpegviddec_handle_frame (gstavviddec.c:1624) ==7075== by 0x7422865: gst_video_decoder_decode_frame (gstvideodecoder.c:3417) ==7075== by 0x7422D1C: gst_video_decoder_chain_forward (gstvideodecoder.c:2201) ==7075== by 0x7425472: gst_video_decoder_chain (gstvideodecoder.c:2503) ==7075== by 0x542E37E: gst_pad_chain_data_unchecked (gstpad.c:4155) ==7075== by 0x542E37E: gst_pad_push_data (gstpad.c:4407) ==7075== by 0x5436082: gst_pad_push (gstpad.c:4526) ==7075== by 0x71C162E: gst_base_transform_chain (gstbasetransform.c:2369) ==7075== Block was alloc'd at ==7075== at 0x4C29C0F: malloc (vg_replace_malloc.c:299) ==7075== by 0x598B558: g_malloc (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4600.2) ==7075== by 0x59A2742: g_slice_alloc (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4600.2) ==7075== by 0x59A2DDD: g_slice_alloc0 (in /lib/x86_64-linux-gnu/libglib-2.0.so.0.4600.2) ==7075== by 0x5446E83: gst_query_new_custom (gstquery.c:672) ==7075== by 0x74268F4: gst_video_decoder_negotiate_pool (gstvideodecoder.c:3705) ==7075== by 0x742CBCF: gst_video_decoder_negotiate_unlocked (gstvideodecoder.c:3861) ==7075== by 0x742CBCF: gst_video_decoder_allocate_output_frame (gstvideodecoder.c:4005) ==7075== by 0x1E743D9B: get_output_buffer.part.4 (gstavviddec.c:1190) ==7075== by 0x1E744FBE: gst_ffmpegviddec_video_frame (gstavviddec.c:1376) ==7075== by 0x1E744FBE: gst_ffmpegviddec_frame (gstavviddec.c:1511) ==7075== by 0x1E745CED: gst_ffmpegviddec_handle_frame (gstavviddec.c:1624) ==7075== by 0x7422865: gst_video_decoder_decode_frame (gstvideodecoder.c:3417) ==7075== by 0x7422D1C: gst_video_decoder_chain_forward (gstvideodecoder.c:2201)
Created attachment 323711 [details] [review] queue: Only unblock upstream waiting for the query once downstream is finished ... when flushing and deactivating pads. Otherwise downstream might have a query that was already unreffed by upstream, causing crashes or other interesting effects.
Created attachment 323714 [details] [review] queue: Only unblock upstream waiting for the query once downstream is finished ... when flushing and deactivating pads. Otherwise downstream might have a query that was already unreffed by upstream, causing crashes or other interesting effects.
Review of attachment 323714 [details] [review]: Someone needs to review this in a bit more detail ::: plugins/elements/gstqueue.c @@ +1665,3 @@ queue->srcresult = GST_FLOW_FLUSHING; /* the item del signal will unblock */ + GST_QUEUE_SIGNAL_DEL (queue); I think this can deadlock, as we now wait for downstream to shut down before
multiqueue has the same problem btw
This is working for me so far, BTW.
Just ran into gstcheck.c:79:F:general:test_queries_while_flushing:0: Unexpected critical/warning: gst_pad_query: assertion 'GST_IS_QUERY (query)' failed FAIL elements/queue (exit status: 1) but haven't been able to reproduce so far, looked like it might be related though.
Comment on attachment 323714 [details] [review] queue: Only unblock upstream waiting for the query once downstream is finished commit 722ad087338520047241a319a506e464017bf0da Author: Sebastian Dröge <sebastian@centricular.com> Date: Fri Mar 11 16:04:52 2016 +0200 queue: Only unblock upstream waiting for the query once downstream is finished ... when flushing and deactivating pads. Otherwise downstream might have a query that was already unreffed by upstream, causing crashes or other interesting effects. https://bugzilla.gnome.org/show_bug.cgi?id=763496
Should this be picked into 1.8 or too risky/unsure?
It's on my to-backport list. Just wanted to wait a bit for it to mature :)
It seems like this commit could cause a deadlock. Doing some stress testing of an application revealed a deadlock. The back trace of the two threads locked is pasted in the bottom. Running version 1.10.1 Thread 1101 is sending an allocation query to a queue. The thread is waiting in gst_queue_handle_sink_query for the serialised query to be handled. As far as I can tell, the pad stream lock is held while we wait. Meanwhile, thread 4 is setting the pipeline state to NULL, and is waiting for said stream lock before flushing the queue. There are no other threads related to the pipeline running. Reverting 722ad087338520047241a319a506e464017bf0da appears to remove the deadlock.
+ Trace 236883
Thread 1101 (Thread 0x449ff460 (LWP 7965))
Thread 4 (Thread 0x42577460 (LWP 5122))
Please file a new bug about this, and ideally provide a testcase. Thanks :)