Bug 606382 – decodebin: Upstream events (seek,qos) get lost when switching groups/chains

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 606382 - decodebin: Upstream events (seek,qos) get lost when switching groups/chains


Summary:	decodebin: Upstream events (seek,qos) get lost when switching groups/chains


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-base
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:	playback

Depends on:	748643
Blocks:

Reported:	2010-01-08 09:03 UTC by Tim-Philipp Müller
Modified:	2018-05-05 15:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
deadlock trace (14.54 KB, application/octet-stream) 2010-01-20 11:43 UTC, Thiago Sousa Santos		Details
patch for the deadlock (14.54 KB, patch) 2010-01-20 11:45 UTC, Thiago Sousa Santos	none	Details \| Review
Patch for the deadlock (2.48 KB, patch) 2010-01-20 12:02 UTC, Thiago Sousa Santos	reviewed	Details \| Review
debug log that leads to "seek failed" (6.17 KB, application/octet-stream) 2010-01-20 13:35 UTC, Thiago Sousa Santos		Details
debug log that leads to "seek failed" (15.04 KB, application/octet-stream) 2010-01-20 13:42 UTC, Thiago Sousa Santos		Details
WIP:decodebin2: Forward event/queries for unlinked groups (4.79 KB, patch) 2015-04-29 14:00 UTC, Edward Hervey	none	Details \| Review
decodebin2: Forward event/queries for unlinked groups (4.76 KB, patch) 2015-05-04 09:22 UTC, Edward Hervey	none	Details \| Review
decodebin2: Handle flushing with multiple decode groups (6.33 KB, patch) 2015-05-04 09:22 UTC, Edward Hervey	none	Details \| Review
decodebin2: Forward event/queries for unlinked groups (4.74 KB, patch) 2015-05-06 09:18 UTC, Edward Hervey	none	Details \| Review
decodebin2: Handle flushing with multiple decode groups (6.33 KB, patch) 2015-05-06 09:18 UTC, Edward Hervey	none	Details \| Review
decodebin2: Forward event/queries for unlinked groups (4.74 KB, patch) 2015-08-15 16:04 UTC, Edward Hervey	committed	Details \| Review
decodebin2: Handle flushing with multiple decode groups (6.37 KB, patch) 2015-08-15 16:04 UTC, Edward Hervey	committed	Details \| Review
logs showing hang on seek (33.54 KB, application/x-xz) 2015-11-12 14:19 UTC, Duncan Palmer		Details

Description Tim-Philipp Müller 2010-01-08 09:03:59 UTC

Take attachement #151008 of bug #606348. This is a chained ogg file with a new chain starting at position 17s. Seeking beyong the 17s boundary causes a decoder error.

Comment 1 Wim Taymans 2010-01-08 11:55:34 UTC

oggdemux needs to cache the headers of the chains and push them right after activating the new chains.

Comment 2 Wim Taymans 2010-01-08 16:01:43 UTC

This commit does not completely fix it but makes it mostly work. I think there is a race left in playbin2 when it actually switches chains, which makes it fail sometimes.

commit 0201326db12ac1dfd224385789ae36bbe62f052e
Author: Wim Taymans <wim.taymans@collabora.co.uk>
Date:   Fri Jan 8 16:57:40 2010 +0100

    oggdemux: push headers when activating chains
    
    Keep a list of headers for each stream of a chain. When a chain is activated,
    push the headers before pushing the data so that decoders can sync.
    Fix seeking in chains, take the chain start time into account when comparing
    timestamps.
    
    See #606382

Comment 3 Wim Taymans 2010-01-08 18:09:46 UTC

It a problem in decodebin2 somewhere. It seems that it doesn't mark chains and groups, that got a new pad after no_more_pads, as pending/incomplete/unexposed.

Comment 4 Sebastian Dröge (slomo) 2010-01-13 17:09:25 UTC

(In reply to comment #3)
> It a problem in decodebin2 somewhere. It seems that it doesn't mark chains and
> groups, that got a new pad after no_more_pads, as pending/incomplete/unexposed.

Why are there new pads after no-more-pads at all? Adding a new pad after no-more-pads will make decodebin2 add that new pad to the next/a new group. How should it detect that this new pad belongs to the currently active group?

Comment 5 Wim Taymans 2010-01-13 17:19:21 UTC

There are two types of pad groups

 1) unclosed groups: pads are added to an element but at no time is there ever
    a no-more-pads signal emited. The demuxer usualy has no idea when all
    pads are added. Decodebin2 should assume that when more pads are added, they
    belong to the current group of pads. A similar thing when pads are removed.
    Decodebin2 also uses the multiqueues to decide when enough data is processed
    so that a reasonable amount of pads are added to the group and begin
    playback. This is usually the case for mpeg(ts)demux.

 2) closed groups: pads are added to an element and when they are all added, a
    no-more-pads signal is emited. In this situation a new group can be started
    by first adding the new pads (they go to a new group because the previous
    group was closed), removing the old pads and then signaling no-more-pads.
    This is common for chained oggs (from live radio or local streams) and
    chained asf currently. In this case, decodebin2 uses the no-more-pads signal
    to remove the old group and activate a new group. No multiqueue overrun is
    needed to decide when playback can start.

Comment 6 Sebastian Dröge (slomo) 2010-01-13 18:41:36 UTC

Yes, this is exactly what is implemented in decodebin2. 

But you said: "It seems that it doesn't mark chains and groups, that got a new pad after no_more_pads, as pending/incomplete/unexposed." That would be against 2) because new pads that are added after no-more-pads should open a new group.

Comment 7 Wim Taymans 2010-01-14 10:38:27 UTC

What I saw in debug logs is that the no-more-pads did somehow not expose the new group, leaving the new pads unlinked and erroring out.

Comment 8 Thiago Sousa Santos 2010-01-14 13:31:16 UTC

Either I'm looking at something wrong or I have a different behavior here. I'm using totem and if I seek beyond 17secs I see that the new pads are exposed on the logs, the position bar keeps moving but no audio is played. And lots of pulsering overflow happening.

Comment 9 Thiago Sousa Santos 2010-01-14 13:33:41 UTC

And that happens everytime.

Comment 10 Sebastian Dröge (slomo) 2010-01-14 17:49:27 UTC

Ok for me something completely different happens every time :)

Seeking after 17s sometimes works but most of the time playback continues at some position before 17s. And seeking in general is not very accurate.

Comment 11 Sebastian Dröge (slomo) 2010-01-14 17:50:50 UTC

Oh, and is this actually a regression? I would be surprised if seeking to another chain had worked at some point

Comment 12 Tim-Philipp Müller 2010-01-18 00:32:19 UTC

I don't think it's a regression. I would also be surprised if it ever worked before at any point, but it would still be nice to get this fixed up. It feels like we're really close to a fix, but I might be wrong of course :)

Comment 13 Thiago Sousa Santos 2010-01-18 14:34:49 UTC

A thing I just noticed. When I seek to 22 secs it goes to 5 secs. When I seeked to 31, it went to 13.

22 = 17 + 5
31 ~ 17 + 13

Comment 14 Sebastian Dröge (slomo) 2010-01-18 14:44:39 UTC

(In reply to comment #13)
> A thing I just noticed. When I seek to 22 secs it goes to 5 secs. When I seeked
> to 31, it went to 13.
> 
> 22 = 17 + 5
> 31 ~ 17 + 13

Oh, good catch. It's the same for me. So somewhere the chain start timestamp is getting substracted or something.

Comment 15 Sebastian Dröge (slomo) 2010-01-18 14:55:15 UTC

When reverting the begintime chunk of 0201326db12ac1dfd224385789ae36bbe62f052e the reported timestamps (as reported by totem) are more correct but there's no sound.

Comment 16 Thiago Sousa Santos 2010-01-18 18:45:30 UTC

The commit below fixes the position reporting, but still no sound.

Module: gst-plugins-base
Branch: master
Commit: 125f7dfdb058888a4f2ba0e837259be1e1a2ff2b
URL:    http://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=125f7dfdb058888a4f2ba0e837259be1e1a2ff2b

Author: Thiago Santos <thiago.sousa.santos@collabora.co.uk>
Date:   Mon Jan 18 15:22:52 2010 -0300

oggdemux: granulepos is relative to its chain

When performing seeks, the granulepos should be offset by
its chain start time to avoid using wrong values to
update segment's last_stop. A sample file is indicated on
bug #606382

Comment 17 Thiago Sousa Santos 2010-01-19 12:03:52 UTC

Ok, this seems to do fix the seeking, but wtay still has errors with decodebin2.

Module: gst-plugins-base
Branch: master
Commit: 4b771bff7a224d179c67cdf8ab105f12079b3286
URL:    http://cgit.freedesktop.org/gstreamer/gst-plugins-base/commit/?id=4b771bff7a224d179c67cdf8ab105f12079b3286

Author: Thiago Santos <thiago.sousa.santos@collabora.co.uk>
Date:   Tue Jan 19 08:39:14 2010 -0300

oggdemux: No need to subtract begin time

Last stop is already based on the chain start and there is no need
to subtract the chain start as it may lead to a negative overflow.
This was causing seeking issues when the target chain was not
the first one (that has chain start = 0)

Fixes #606382

Comment 18 Thiago Sousa Santos 2010-01-20 11:43:24 UTC

Created attachment 151825 [details]
deadlock trace

I'm getting this deadlock when seeking around the chain switch (17 secs)

Comment 19 Thiago Sousa Santos 2010-01-20 11:45:46 UTC

Created attachment 151826 [details] [review]
patch for the deadlock

This patch fixes the deadlock, but I still see lots of "seek failed" when using seek.c example.

Dunno if the locking is optimal, but at least it works.

Comment 20 Sebastian Dröge (slomo) 2010-01-20 12:00:03 UTC

(In reply to comment #19)
> Created an attachment (id=151826) [details] [review]
> patch for the deadlock
> 
> This patch fixes the deadlock, but I still see lots of "seek failed" when using
> seek.c example.
> 
> Dunno if the locking is optimal, but at least it works.

You attached the backtrace again :)

Comment 21 Thiago Sousa Santos 2010-01-20 12:02:45 UTC

Created attachment 151828 [details] [review]
Patch for the deadlock

Oops. Now the real patch.

Comment 22 Thiago Sousa Santos 2010-01-20 13:35:02 UTC

Created attachment 151833 [details]
debug log that leads to "seek failed"

This is a part of the debug log showing why "seek failed" error happens on seek.c, the seek events flows upstream until the multiqueue that has already been unlinked due to the chain switching.

Comment 23 Thiago Sousa Santos 2010-01-20 13:36:23 UTC

Comment on attachment 151833 [details]
debug log that leads to "seek failed"

bah, copying and paste fail.

Comment 24 Thiago Sousa Santos 2010-01-20 13:42:11 UTC

Created attachment 151834 [details]
debug log that leads to "seek failed"

The seek event fails because of the chain switch.

It goes upstream up to multiqueue that is already unlinked from the demuxer, so it fails.

Comment 25 Edward Hervey 2010-01-28 19:08:16 UTC

What seems to happen is that :
* oggdemux detects a new chain therefore
 * oggdemux emits no-more-pads
   * current decodebin group is marked as closed, but data is still going out (multiqueue isn't empty)
 * oggdemux removes the old source pads
   * current decodebin group isn't linked to anything upstream anymore
 * oggdemux adds new pads
   * decodebin creates a new group with multiqueue/decoder.
 * oggdemux pushes data from the second chain
   * the data goes into the new decodebin2 group which will only be exposed when the first group is drained.

The problem is that:
* the first group is still pushing out data, but isn't linked to oggdemux anymore
* the second group (i.e. where we want to seek) isn't exposed yet

The seek arrives through the first group... which isn't linked anymore... and therefore oggdemux never gets the seek :(

So basically... we need to have a way for upstream events (SEEK but also QOS would be good) coming from outside decodebin2 to make their way up to the demuxer.

I'll think about how this can be fix... but feels quite tricky.

Comment 26 Thiago Sousa Santos 2010-01-29 14:25:19 UTC

Reminder: There's still a patch for oggdemux's deadlock on chain seeking pending review

Comment 27 Tim-Philipp Müller 2010-02-05 01:35:14 UTC

Sorry, completely forgot about that. Bit late to put it in now for the next pre-release though, need to look at it again with a clear head. Not entirely sure about the possibility of a locking order problem with the stream lock and chain lock.

Comment 28 Tim-Philipp Müller 2010-02-18 18:08:40 UTC

Comment on attachment 151828 [details] [review]
Patch for the deadlock

I think Wim was also not convinced the locking/locking order is entirely right here.

Comment 29 Tim-Philipp Müller 2010-02-18 18:10:36 UTC

Unsetting blocker status: while a deadlock is quiet bad, it should only happen in corner cases (who has chained oggs as local files? heavy scrubbing/seeking) and I don't think we'll have a solution for this until the release. And it's not really a regression.

Comment 30 Edward Hervey 2013-07-17 09:00:44 UTC

Still fails in current git, but in slightly different ways.

Comment 31 Sebastian Dröge (slomo) 2013-07-17 10:50:55 UTC

How does it fail now?

Comment 32 Thiago Sousa Santos 2014-09-19 13:50:14 UTC

This has become more evident with adaptive demuxers where group switches are frequent. The smaller the fragments, the more it happens.

Here is an idea I had for implementation:

Keep the removed group around and attach probes to the sink pads, then push a dummy custom upstream event/query to flow through the branch and only remove the group after it has reached the probe.

Events received in this probe should be forwarded to the new group

Comment 33 Duncan Palmer 2015-04-14 06:37:21 UTC

We've just stumbled across this problem as well, when streaming over HLS. We often experience variant switches at the start of playback, and therefore seeking early in the piece is not terribly reliable, as the seek events often get lost.

A question; if we implement a solution along the lines that Thiago has described, does anyone know how decodebin2 will behave when it receives a FLUSH_START event from the upstream element which has now received the SEEK event? I think we would want it to destroy the new groups (which are not yet linked downstream), and push the event into the old linked group (which is linked downstream).

Comment 34 Edward Hervey 2015-04-21 14:48:50 UTC

(In reply to Thiago Sousa Santos from comment #32)
> This has become more evident with adaptive demuxers where group switches are
> frequent. The smaller the fragments, the more it happens.
> 
> Here is an idea I had for implementation:
> 
> Keep the removed group around

  I'm a bit confused as to what you mean by "removed group". If it's still got data (i.e. EOS hasn't been seen on the source pads of that group), then it needs to stay around. If it received EOS and there's a pending group... there's no reason to keep it around ?

> and attach probes to the sink pads

  Having probes on the sink pads is definitely something we need to not "lose" events/queries that are travelling upstream.

  But I'd say we should put those immediately before unlinking the demuxer source pads (i.e. to handle events/queries travelling upstream between the moment we unlink from demuxer and the moment we un-expose that group).

>, then push
> a dummy custom upstream event/query to flow through the branch and only
> remove the group after it has reached the probe.

  At what moment do you intend to push that custom upstream event ? When un-exposing the group ?

> 
> Events received in this probe should be forwarded to the new group

(In reply to Duncan Palmer from comment #33)
> A question; if we implement a solution along the lines that Thiago has
> described, does anyone know how decodebin2 will behave when it receives a
> FLUSH_START event from the upstream element which has now received the SEEK
> event? I think we would want it to destroy the new groups (which are not yet
> linked downstream), and push the event into the old linked group (which is
> linked downstream).

  Until now this bug was looking at not losing events/queries coming from downstream, But you are indeed right that there is some special handling required for events coming from upstream.

  In the case of FLUSH_START I would propose the following:

  In group event probes (whether linked or unlinked) if we see a FLUSH_START we should forward that to *all* other groups. The rationale behind this is that upstream (the demuxer) asked us to flush, therefore we should flush everything downstream from that... which includes all the groups.
  By doing so we:
  1) Ensure all groups are flushed
  2) Downstream of decodebin gets the FLUSH_START event (there is at least one group exposed which will push that through decodebin's current source pads)
  Once that is done, we should also expose the latest group (which is the one currently linked to the demuxer) and remove the previous ones.

Comment 35 Edward Hervey 2015-04-21 14:57:09 UTC

Assigning it to myself since I plan to fix it !

Comment 36 Duncan Palmer 2015-04-22 00:10:47 UTC

(In reply to Edward Hervey from comment #34)


>   In the case of FLUSH_START I would propose the following:
> 
>   In group event probes (whether linked or unlinked) if we see a FLUSH_START
> we should forward that to *all* other groups. The rationale behind this is
> that upstream (the demuxer) asked us to flush, therefore we should flush
> everything downstream from that... which includes all the groups.
>   By doing so we:
>   1) Ensure all groups are flushed
>   2) Downstream of decodebin gets the FLUSH_START event (there is at least
> one group exposed which will push that through decodebin's current source
> pads)
>   Once that is done, we should also expose the latest group (which is the
> one currently linked to the demuxer) and remove the previous ones.

Would there be any problems if upstream did a variant switch after FLUSH_STOP, and before pushing any data into the exposed group? I don't think there would be, as the exposed group (now flushed) would receive an EOS, and things would proceed in the way they usually do for a variant switch.

Comment 37 Edward Hervey 2015-04-22 08:22:16 UTC

(In reply to Duncan Palmer from comment #36)
> Would there be any problems if upstream did a variant switch after
> FLUSH_STOP, and before pushing any data into the exposed group? I don't
> think there would be, as the exposed group (now flushed) would receive an
> EOS, and things would proceed in the way they usually do for a variant
> switch.

Not that I can think of (for the reason you stated). I'll start giving it a go today.

Comment 38 Edward Hervey 2015-04-29 14:00:22 UTC

Created attachment 302568 [details] [review]
WIP:decodebin2: Forward event/queries for unlinked groups

When upstream events/queries reach sinkpads of unlinked groups (i.e.
no longer linked to the upstream demuxer), this patch attempts to find
the linked group and forward it upstream of that group.

This is done by adding upstream event/query probes on new group sinkpads
and then:
* Checking if the pad is linked or not (has a peer or not)
* If there is a peer, just let the event/query follow through normally
* If there is no peer, we find a pad to which to proxy it and return
  GST_PROBE_HANDLED if it succeeded (allowing the event/query to be properly
  returned to the initial called)

Note that this is definitely not thread-safe for the time being

Comment 39 Edward Hervey 2015-04-29 14:03:05 UTC

With the above patch, the various queries/events do end up in the upstream elements even if the group isn't linked.

The next phase I'm working on is ensuring some action is take on downstream events, such as on FLUSH_START where we drain/remove all non-last groups and switch the last group to be the active one.

Comment 40 Edward Hervey 2015-05-04 09:22:45 UTC

Created attachment 302846 [details] [review]
decodebin2: Forward event/queries for unlinked groups

When upstream events/queries reach sinkpads of unlinked groups (i.e.
no longer linked to the upstream demuxer), this patch attempts to find
the linked group and forward it upstream of that group.

This is done by adding upstream event/query probes on new group sinkpads
and then:
* Checking if the pad is linked or not (has a peer or not)
* If there is a peer, just let the event/query follow through normally
* If there is no peer, we find a pad to which to proxy it and return
  GST_PROBE_HANDLED if it succeeded (allowing the event/query to be properly
  returned to the initial called)

Note that this is definitely not thread-safe for the time being

Comment 41 Edward Hervey 2015-05-04 09:22:53 UTC

Created attachment 302847 [details] [review]
decodebin2: Handle flushing with multiple decode groups

When an upstream element wants to flush downstream, we need to take
all chains/groups into consideration.

To that effect, when a FLUSH_START event is seen, after having it
sent downstream we mark all those chains/groups as "drained" (as if
they had seen a EOS event on the endpads).

When a FLUSH_STOP event is received, we check if we need to switch groups.
This is done by checking if there are next groups. If so, we will switch
over to the latest next_group. The actual switch will be done when
that group is blocked.

Comment 42 Edward Hervey 2015-05-06 09:18:06 UTC

Created attachment 302962 [details] [review]
decodebin2: Forward event/queries for unlinked groups

When upstream events/queries reach sinkpads of unlinked groups (i.e.
no longer linked to the upstream demuxer), this patch attempts to find
the linked group and forward it upstream of that group.

This is done by adding upstream event/query probes on new group sinkpads
and then:
* Checking if the pad is linked or not (has a peer or not)
* If there is a peer, just let the event/query follow through normally
* If there is no peer, we find a pad to which to proxy it and return
  GST_PROBE_HANDLED if it succeeded (allowing the event/query to be properly
  returned to the initial called)

Note that this is definitely not thread-safe for the time being

Comment 43 Edward Hervey 2015-05-06 09:18:13 UTC

Created attachment 302963 [details] [review]
decodebin2: Handle flushing with multiple decode groups

When an upstream element wants to flush downstream, we need to take
all chains/groups into consideration.

To that effect, when a FLUSH_START event is seen, after having it
sent downstream we mark all those chains/groups as "drained" (as if
they had seen a EOS event on the endpads).

When a FLUSH_STOP event is received, we check if we need to switch groups.
This is done by checking if there are next groups. If so, we will switch
over to the latest next_group. The actual switch will be done when
that group is blocked.

Comment 44 Edward Hervey 2015-08-15 16:04:44 UTC

Created attachment 309329 [details] [review]
decodebin2: Forward event/queries for unlinked groups

When upstream events/queries reach sinkpads of unlinked groups (i.e.
no longer linked to the upstream demuxer), this patch attempts to find
the linked group and forward it upstream of that group.

This is done by adding upstream event/query probes on new group sinkpads
and then:
* Checking if the pad is linked or not (has a peer or not)
* If there is a peer, just let the event/query follow through normally
* If there is no peer, we find a pad to which to proxy it and return
  GST_PROBE_HANDLED if it succeeded (allowing the event/query to be properly
  returned to the initial called)

Note that this is definitely not thread-safe for the time being

Comment 45 Edward Hervey 2015-08-15 16:04:59 UTC

Created attachment 309330 [details] [review]
decodebin2: Handle flushing with multiple decode groups

When an upstream element wants to flush downstream, we need to take
all chains/groups into consideration.

To that effect, when a FLUSH_START event is seen, after having it
sent downstream we mark all those chains/groups as "drained" (as if
they had seen a EOS event on the endpads).

When a FLUSH_STOP event is received, we check if we need to switch groups.
This is done by checking if there are next groups. If so, we will switch
over to the latest next_group. The actual switch will be done when
that group is blocked.

Comment 46 Sebastian Dröge (slomo) 2015-08-15 16:55:44 UTC

commit eaf9ca90c7000f5c7156d13969b9135f7273fa79
Author: Edward Hervey <edward@centricular.com>
Date:   Mon May 4 11:19:28 2015 +0200

    decodebin2: Handle flushing with multiple decode groups
    
    When an upstream element wants to flush downstream, we need to take
    all chains/groups into consideration.
    
    To that effect, when a FLUSH_START event is seen, after having it
    sent downstream we mark all those chains/groups as "drained" (as if
    they had seen a EOS event on the endpads).
    
    When a FLUSH_STOP event is received, we check if we need to switch groups.
    This is done by checking if there are next groups. If so, we will switch
    over to the latest next_group. The actual switch will be done when
    that group is blocked.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=606382

commit 2d3743e37dd13d2211e4ec2c596c5a420e944d18
Author: Edward Hervey <edward@centricular.com>
Date:   Wed Apr 29 15:56:39 2015 +0200

    decodebin2: Forward event/queries for unlinked groups
    
    When upstream events/queries reach sinkpads of unlinked groups (i.e.
    no longer linked to the upstream demuxer), this patch attempts to find
    the linked group and forward it upstream of that group.
    
    This is done by adding upstream event/query probes on new group sinkpads
    and then:
    * Checking if the pad is linked or not (has a peer or not)
    * If there is a peer, just let the event/query follow through normally
    * If there is no peer, we find a pad to which to proxy it and return
      GST_PROBE_HANDLED if it succeeded (allowing the event/query to be properly
      returned to the initial called)
    
    Note that this is definitely not thread-safe for the time being
    
    https://bugzilla.gnome.org/show_bug.cgi?id=606382

Comment 47 Sebastian Dröge (slomo) 2015-08-16 13:00:39 UTC

This is still a problem, you can easily reproduce it with the HLS seek tests in gst-validate:
> validate.hls.playback.seek_with_stop.hls_bibbop
> validate.hls.playback.seek_forward.hls_bibbop
> validate.hls.playback.seek_backward.hls_bibbop

The seek events are returning FALSE, although the seek handler in hlsdemux returns TRUE.

Comment 48 Sebastian Dröge (slomo) 2015-08-16 13:36:46 UTC

The problem here is that hlsdemux switches to a new bitrate at the same time as we send a seek. So the seek event goes to hlsdemux, returns TRUE, goes up to playbin. Playbin sees that elements changed, gst_iterator_resync(), send event again. But this time the streamsynchronizer pad at playsink is not linked because the bitrate switch is currently happening.

Comment 49 Sebastian Dröge (slomo) 2015-10-23 09:13:10 UTC

Related to this is also

commit 53f135cec7ac63b8951797568b9e8e153bd6ad2e
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Fri Oct 23 12:02:28 2015 +0300

    playbin: Send upstream events directly to playsink
    
    Send event directly to playsink instead of letting GstBin iterate
    over all sink elements. The latter might send the event multiple times
    in case the SEEK causes a reconfiguration of the pipeline, as can easily
    happen with adaptive streaming demuxers.
    
    What would then happen is that the iterator would be reset, we send the
    event again, and on the second time it will fail in the majority of cases
    because the pipeline is still being reconfigured


Doesn't fix it, but makes it better.

Comment 50 Sebastian Dröge (slomo) 2015-10-27 13:51:41 UTC

This fixes things even more, enough to enable the HLS seeking tests in gst-validate again.

commit 36b80edb7263118467dfcaee3923f7c964ae6bc8
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Tue Oct 27 15:44:06 2015 +0200

    decodebin: Send SEEK events directly to adaptive streaming demuxers
    
    This makes sure that they will always get SEEK events, even if we're currently
    in the middle of a group switch (i.e. switching to another
    representation/bitrate/etc).
    
    https://bugzilla.gnome.org/show_bug.cgi?id=606382

Comment 51 Duncan Palmer 2015-11-12 14:19:01 UTC

Created attachment 315338 [details]
logs showing hang on seek

I'm currently seeing a problem where gst_element_seek_simple(decodebin) is not returning. 

log.2 shows that hlsdemux receives the seek event fine, but that flush-start is not propogated to any groups within the decodebin. This is because hlsdemux has just switched bitrates, and the src pad probe has yet to be put in place by hlsdemux. 

A quick hack to workaround this is for hlsdemux to ensure the streaming thread is stopped before sending flush-start downstream.

With the quick hack in place, I can the produce the scenario in log.3. Search for WARNING and CRITICAL. In this case, flush-start is making it's way into a group which is in the process of being created, and causing grief.

I'm not sure why either scenario causes a hang, but in any case it's obviously wrong.

Any suggestions?

Comment 52 Sebastian Dröge (slomo) 2015-11-12 15:03:41 UTC

I don't think this can be fixed in any meaningful and reliable way in the existing decodebin design. There are already ideas how to solve it with a new design, see Edward's talk from the GStreamer Conference this year :)

Comment 53 Duncan Palmer 2015-11-16 04:47:18 UTC

Bummer... I've put in a bit of a hack to try and avoid the issue. Seems to be working so far. What I try and do is avoid sending a seek in whilst decodebin is creating a new group as a result of a bitrate switch.

The way I've done this is to add a new signal to decodebin, which I've called pad-activated. It fires after gst_decode_pad_activate() is called. I can know how many pads I expect to be 'activated' (it's the same as the number of src pads created by the tsdemux), so I can know when the new group has been built by counting the number of times pad-activated has fired.

hlsdemux emits a message to indicate when it's switching to a new playlist.

So, between these two things, I almost have a slightly fragile mechanism of detecting when the decodebin is busy creating a new decode group. There is a still a race with hlsdemux - the streaming thread needs to be paused before sending the seek event so we're not racing with the beginning of a bitrate switch.

This is a bit horrible, but is the best I can think of, seeing as not allowing seek isn't an option for me.

Another avenue I haven't investigated it to determine why a seek request can hang at the moment. If it instead failed cleanly, that's something I could deal with.

If anyone has any suggestions for a nicer workaround, I'd be grateful ;)

Comment 54 Sebastian Dröge (slomo) 2015-11-16 07:13:25 UTC

The hang might happen because we're flushing a decode group that is currently draining. With that, it would never go EOS anymore and we would wait forever for it to drain before switching to the next group. There's a pad probe that should prevent it, but maybe that's broken.

Comment 55 Edward Hervey 2018-05-05 15:47:27 UTC

And this bug can nicely die now that we have decodebin3 which never ends up in that situation.