GNOME Bugzilla – Bug 600648
multiqueue: queues up too much data, excessive memory use with subtitle streams
Last modified: 2015-12-02 16:15:34 UTC
Hi, first of all, I'm not sure if this is really a problem with multiqueue or just how it's used in decodebin2. The problem is, that for some reason multiqueue queues up far too much data. In my specific case ( http://samples.mplayerhq.hu/Matroska/subtitles/SSA_15subtitles.mkv ) I have a Matroska file with 15 subtitle streams, one audio and one video stream. This is all handled correctly by decodebin2 and it exposes the pads. Unfortunately now the problems begin, the video and audio streams are pushed downstream more or less synchronized while the subtitle streams get buffers pushed with timestamps of ~20 seconds at the time where the video is at ~6 seconds. This sounds like a problem in the file itself but when looking at the debug output of matroskademux or pipeline2 below it should be clear that this is not the case. pipeline1 to reproduce the problem: gst-launch-0.10 -v filesrc location=SSA_15subtitles.mkv ! decodebin2 caps="application/x-ssa; video/x-raw-yuv" name=dbin dbin. ! "video/x-raw-yuv" ! fakesink dbin.src5 ! "application/x-ssa" ! fakesink pipeline2 that does not have the problem: gst-launch-0.10 -v filesrc location=SSA_15subtitles.mkv ! matroskademux name=demux ! queue ! fakesink demux.subtitle_05 ! queue ! "application/x-ssa" ! fakesink The problem comes visible from looking at the chain output of fakesink's verbose mode: /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = "chain ******* < (460800 bytes, timestamp: 0:00:06.256000000, duration: 0:00:00.041708299, offset: -1, offset_end: -1, flags: 256) 0x7f2fd817dd30" /GstPipeline:pipeline0/GstFakeSink:fakesink0: last-message = "chain ******* < (460800 bytes, timestamp: 0:00:06.297000000, duration: 0:00:00.041708299, offset: -1, offset_end: -1, flags: 256) 0x7f2fd800d690" /GstPipeline:pipeline0/GstFakeSink:fakesink1: last-message = "chain ******* < ( 20 bytes, timestamp: 0:00:19.620000000, duration: 0:00:01.130000000, offset: -1, offset_end: -1, flags: 0) 0x7f2fd804acb0" /GstPipeline:pipeline0/GstFakeSink:fakesink1: last-message = "chain ******* < ( 24 bytes, timestamp: 0:00:20.840000000, duration: 0:00:02.630000000, offset: -1, offset_end: -1, flags: 0) 0x7f2fd804ac30"
Ok, another case to show this problem: gst-launch-0.10 -v filesrc location=SSA_15subtitles.mkv ! decodebin2 caps="application/x-ssa; video/x-raw-yuv" name=dbin dbin. ! "video/x-raw-yuv" ! fakesink sync=true dbin.src5 ! "application/x-ssa" ! fakesink In this case it really queues 20 seconds of video (because the video fakesink is synchronized, the textsink isn't)
Created attachment 148136 [details] [review] playbin2: Transform QoS events to be meaningful for upstream elements This is necessary because the sinks don't notice the group switches and the decoders/demuxers have a different running time than the sinks. Fixes bug #600648.
Comment on attachment 148136 [details] [review] playbin2: Transform QoS events to be meaningful for upstream elements Sorry, wrong bug
*** Bug 599220 has been marked as a duplicate of this bug. ***
*** Bug 609914 has been marked as a duplicate of this bug. ***
*** Bug 605446 has been marked as a duplicate of this bug. ***
I don't know if this really is a problem. multiqueue tries to keep data queued for all input streams, resizing the queues as big as needed (up to a limit). Maybe you would like it to exclude sparse streams from its algorithm.
I've got a patch set adding an event to notify downstream elements whether a stream is sparse (and possibly other stream specific properties later on), getting multiqueue to avoid eating too much at once for those, and oggdemux signalling sparse streams with that event. It fixes the problem for my test case (brings the delay down to maybe two seconds). If deemed OK, all demuxers (and possibly decoders) will have to be patched to use this signalling.
Created attachment 177132 [details] [review] gstevent: add a new properties event This is for signalling stream specific properties, such as a stream being sparse (the only one currently).
Created attachment 177133 [details] [review] mutiqueue: use the new properties event to avoid overreading subtitles This will mean a much lower delay before a subtitles track change takes effect.
Created attachment 177134 [details] [review] oggdemux: signal stream properties when activating a chain This will allow playbin2's multiqueue to know which streams are sparse to avoid a lot of backlog when switching subtitle streams.
Created attachment 177183 [details] [review] gstevent: add a new properties event This is for signalling stream specific properties, such as a stream being sparse (the only one currently).
Couldn't we piggy-back the GST_EVENT_BUFFERSIZE to indicate whether a stream is sparse or not ? That would allow signalling both the interleave (to fine-tune the buffering needed) and the sparse nature. From the gstevent documentation: """ GST_EVENT_BUFFERSIZE Notification of buffering requirements. Currently not used yet. """
A buffer size event would seem to make sense. However Wim mentioned a few other things he'd like to bundle with 'sparse', which might make less sense to bundle with a buffer size event.
I'm currently working on another solution for this by adding an activate-stream event and using it in multiqueue, etc. And then additionally let multiqueue sync the deactivated/not-linked streams to the running time of the working streams. That does not mean that this sparse-stream event is unnecessary but my solution also fixes a few other problems and would also fix the queues-too-much behaviour for non-sparse streams. Currently the not-linked throttling in multiqueue works by counting buffers and letting not-linked streams wait until it has the same buffer count as the stream with the biggest buffer count. This works quite good for streams when all streams have the approximately the same frame duration but fails otherwise (i.e. in most situations).
That's not how the not-linked handling works in multiqueue at all, at least not originally. The buffer id tracking is intended to ensure that buffers in a not-linked stream are emitted only after other buffers in linked streams that arrived earlier have been emitted. That is, it's attempting to maintain the relative order in which non-linked/linked stream buffers arrive in the output. Linked streams push as quickly as downstream allows, and not-linked streams follow along. So, it works well when the incoming streams are inherently linked (all being pushed by the same demuxer), but doesn't necessarily work well if there is no inherent link between the buffer arrival timing.
Yes, you're right. All buffers and events get an unique, increasing ID in multiqueue and buffers/events of non-linked pads are only pushed after a linked pad pushed a buffer/event with a higher ID. This keeps the order in which the buffers/events are pushed from the demuxer. By also taking the events into account this doesn't strictly follow the buffer timing from the demuxer, e.g. when sparse streams with filler newsegment events are used. What was the reason for doing this throttling also for events?
Ok, syncing by the running time is implemented in my branch now: http://git.collabora.co.uk/?p=user/slomo/gstreamer.git;a=shortlog;h=refs/heads/stream-active This doesn't completely fix the issue though, for sparse streams like subtitles we still need something like the sparse stream event mentioned above. And then keep more than a single buffer in multiqueue. Example of what happens right now. Assume you have two subtitle streams, one with buffers [1s,30s], [20s,5s] and the other one with [8s,3s], [25s,5s]. The first one is activated first, the first buffer goes to the subtitle renderer and the second one is pushed to the subtitle renderer too and then the subtitle renderer synchronizes, i.e. blocks. multiqueue already advanced the running time to 20s now and the first buffer of the second stream is dropped. When we switch to the second stream only the buffer at 25s will be shown because the other one is gone already.
This can still be reproduced with latest GIT and there's a testcase now: https://git.keema.collabora.co.uk/git/insanity/insanity-gst.git The stream-switch test with seed=3235044814 reproduces it quite reliably (without any debug output... with debug output it doesn't happen that often). Sometimes there are up to 30s of buffers queued somewhere between the demuxer and the sinks. This seems to be caused by some burst pushes that happen from the demuxer where 10-20s of data are immediately pushed downstream and accepted, and also multiple audio streams return GST_FLOW_OK for some time (although all but one should return GST_FLOW_NOT_LINKED). Note that this test generates a non-sparse text stream without disconts and a fixed 25fps framerate.
This might be possible to solve in 0.11 with the reconfigure event (which is also sent when the linking state of a stream changes), without the addition of a stream-activate event.
FWIW the problem (IIRC) here is that after switching streams input-selector will return OK on all pads until the newly selected pad pushed the first buffer. If the scheduler is unhappy this will cause lots of buffers going lost on the now unconnected pads, multiqueue believing the buffers were accepted correctly (and not triggering its NOT_LINKED logic to synchronize streams)... Another problem is that if a stream switch happens at a bad time, multiqueue could wait for input-selector to accept new buffers to advance the stream time and push a new buffer on the newly selected pad... and input-selector waits for multiqueue to push a buffer on the newly activated pad before accepting anything on the other pads. These two problems also combine with each other making it even more annoying :) I think for 0.11 we could solve this by explicitely notifying about enabled and disabled streams (but that might be used very soon, different to completely unlinked pads that will never be used). This could be done with the help of the reconfigure event and by adding an optional timestamp field to the reconfigure event to make sure that playback continues from that position if possible. This would also require demuxers and/or decoders to handle that.
*** Bug 658757 has been marked as a duplicate of this bug. ***
Created attachment 232879 [details] [review] event: add stream flags to stream-start event Alternative to attachment 177183 [details] [review]: event: add stream flags to stream-start event API: gst_event_set_stream_flags() API: gst_event_parse_stream_flags() API: GST_STREAM_FLAG_NONE API: GST_STREAM_FLAG_SPARSE https://bugzilla.gnome.org/show_bug.cgi?id=600648
*** Bug 691074 has been marked as a duplicate of this bug. ***
Created attachment 232968 [details] [review] patch for memory consumption issue with sparse streams i'm using this little hack to check the multiqueue peer pads' caps whether they look like a subtitle stream and if so, apply vincent's is_sparse logic
Created attachment 233045 [details] [review] multiqueue: use new stream-start event sparse flag to avoid overreading subtitle Vincent's multiqueue patch ported to the new stream-start flags API
Created attachment 233046 [details] [review] matroskademux: mark subtitle streams as sparse in stream-start event
Don't know if I made a mistake or omitted something when porting over Vincent's patch, but this doesn't work for me with Andreas's file from bug #691074 , the pipeline just doesn't preroll at all. Haven't investigated why yet.
Created attachment 233240 [details] gst-launch-0.10 playbin2 uri=file:///net/media/Videos/HD/test.mkv text-sink=fakesink video-sink=fakesink audio-sink=fakesink this still happens with my patch already applied
any ideas what else could be munching away large amounts of memory when playing certain mkv with subtitles?
Run with massif and see where the memory is all taken
and to (sanely) visualize the output, install and use massif-visualizer
i'll do it but it may take a little longer
What about the remaining multiqueue patch? Does it make sense independent of if it fixes this bug or not? I'm not sure I completely understand what exactly it is doing, can someone explain? :)
As I understand it, it makes multiqueue treat all sparse streams in the multiqueue as "filled" at all times for buffering purposes, so basically makes its buffering behaviour depend only on the audio or video streams. I think this makes sense. The reason I haven't applied it yet because I had some issues with prerolling after applying this patch, and I haven't investigated those issues yet.
Have there been any updates on this? I just noticed that every time I have sparse streams in pipeline the decodebin multiqueue gets filled completely, which can be quite a lot of data.
Created attachment 313866 [details] [review] multiqueue: use new stream-start event sparse flag to avoid overreading subtitles This will mean a much lower delay before a subtitles track changes take effect. Also avoids excessive memory usage in many cases. This will also consider sparse streams as (individually) never full, so as to avoid blocking all playback due to one sparse stream.
This last variant was updated to apply against master In addition, I made an additional change, which is that we don't consider the limit in time for sparse streams when checking whether a single queue is full or not. The rationale is that we'll never be able to figure out a proper time limit for sparse streams.
That last patch still has an issue, it blocks the initial SSA file. I need to dig deeper into it.
commit f6069c2c728e74db370bbf5de0e2e4bf60c30c04 Author: Vincent Penquerc'h <vincent.penquerch@collabora.co.uk> Date: Tue Jan 8 21:16:42 2013 +0000 multiqueue: use new stream-start event sparse flag to avoid overreading subtitles This will mean a much lower delay before a subtitles track changes take effect. Also avoids excessive memory usage in many cases. This will also consider sparse streams as (individually) never full, so as to avoid blocking all playback due to one sparse stream. https://bugzilla.gnome.org/show_bug.cgi?id=600648
(note: this probably needs a whole bunch of other multiqueue commits as well to function properly)