After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 765933 - rtpjitterbuffer: Fix stall when receiving already lost packet
rtpjitterbuffer: Fix stall when receiving already lost packet
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other All
: Normal critical
: 1.8.2
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2016-05-03 11:15 UTC by Håvard Graff (hgr)
Modified: 2016-05-11 06:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
test and fix (6.48 KB, patch)
2016-05-03 11:15 UTC, Håvard Graff (hgr)
reviewed Details | Review
Updated test and fix (7.36 KB, patch)
2016-05-03 14:04 UTC, Stian Selnes (stianse)
needs-work Details | Review
rebased and updated test and fix (7.26 KB, patch)
2016-05-06 08:03 UTC, Håvard Graff (hgr)
none Details | Review
updated (again) test and fix (7.58 KB, patch)
2016-05-06 08:40 UTC, Håvard Graff (hgr)
committed Details | Review

Description Håvard Graff (hgr) 2016-05-03 11:15:30 UTC
Created attachment 327210 [details] [review]
test and fix

When a packet arrives that has already been considered lost as part of a
large gap the "lost timer" for this will be cancelled. If the remaining
packets of this large gap never arrives, there will be missing entries
in the queue and the loop function will keep waiting for these packets
to arrive and never push another packet, effectively stalling the
pipeline.

The proposed fix consideres parts of a large gap definitely lost (since
they are calculated from latency) and ignores the late arrivals.

In practice the issue is rare since large gaps are sceduled immediately,
and for the stall to happen the late arrival needs to be processed
before this times out.
Comment 1 Sebastian Dröge (slomo) 2016-05-03 11:26:07 UTC
Review of attachment 327210 [details] [review]:

Makes sense to me

::: gst/rtpmanager/gstrtpjitterbuffer.c
@@ +1996,3 @@
+
+    if (test->num > 1 && test->type == TIMER_TYPE_LOST &&
+        seqnum >= test->seqnum && seqnum < (test->seqnum + test->num)) {

Shouldn't this take extended seqnums or somehow care for wraparounds?
Comment 2 Stian Selnes (stianse) 2016-05-03 11:30:31 UTC
Indeed. Will write an additional test and update the patch.
Comment 3 Stian Selnes (stianse) 2016-05-03 14:04:36 UTC
Created attachment 327228 [details] [review]
Updated test and fix

Updated the patch to properly handle seqnum wraparound.

However, it was by chance working with the previous patch as well since it turns out the stall only happens when the first packet of a large gap arrives late, and for the first packet the wraparound is not an issue. For the other packets the lost event will actually be sent, and the late packets will be considered old when the loop function handles them and thus dropped.

Anyway, the updated patch includes tests with wraparound and handles this case properly (and not by chance) in the jitterbuffer.
Comment 4 Sebastian Dröge (slomo) 2016-05-06 06:19:55 UTC
Comment on attachment 327228 [details] [review]
Updated test and fix

Patch looks good but a) the test part does not apply (we don't have a test_latency_changed_event test in master yet) and b) it times out here :)

72%: Checks: 22, Failures: 0, Errors: 6
elements/rtpjitterbuffer.c:1445:E:general:test_considered_lost_packet_in_large_gap_arrives:0: (after this point) Test timeout expired
elements/rtpjitterbuffer.c:1445:E:general:test_considered_lost_packet_in_large_gap_arrives:1: (after this point) Test timeout expired
[...]
Comment 5 Håvard Graff (hgr) 2016-05-06 06:40:45 UTC
(In reply to Sebastian Dröge (slomo) from comment #4)
> Comment on attachment 327228 [details] [review] [review]
> Updated test and fix
> 
> Patch looks good but a) the test part does not apply (we don't have a
> test_latency_changed_event test in master yet) and b) it times out here :)
> 
> 72%: Checks: 22, Failures: 0, Errors: 6
> elements/rtpjitterbuffer.c:1445:E:general:
> test_considered_lost_packet_in_large_gap_arrives:0: (after this point) Test
> timeout expired
> elements/rtpjitterbuffer.c:1445:E:general:
> test_considered_lost_packet_in_large_gap_arrives:1: (after this point) Test
> timeout expired
> [...]

I will rebase on latest master and investigate. (we are a bit out of sync ATM)
Comment 6 Håvard Graff (hgr) 2016-05-06 08:03:38 UTC
Created attachment 327364 [details] [review]
rebased and updated test and fix

Also fixed up a few spelling-errors
Comment 7 Sebastian Dröge (slomo) 2016-05-06 08:16:13 UTC
Comment on attachment 327364 [details] [review]
rebased and updated test and fix

Applies and does not timeout anymore but fails:

Running suite(s): rtpjitterbuffer
81%: Checks: 22, Failures: 4, Errors: 0
elements/rtpjitterbuffer.c:482:F:general:test_considered_lost_packet_in_large_gap_arrives:2: 'expected_seqnum' (65536) is not equal to 'seqnum' (0)
elements/rtpjitterbuffer.c:482:F:general:test_considered_lost_packet_in_large_gap_arrives:3: 'expected_seqnum' (65536) is not equal to 'seqnum' (0)
elements/rtpjitterbuffer.c:482:F:general:test_considered_lost_packet_in_large_gap_arrives:4: 'expected_seqnum' (65537) is not equal to 'seqnum' (1)
elements/rtpjitterbuffer.c:482:F:general:test_considered_lost_packet_in_large_gap_arrives:5: 'expected_seqnum' (65537) is not equal to 'seqnum' (1)
Comment 8 Håvard Graff (hgr) 2016-05-06 08:40:29 UTC
Created attachment 327369 [details] [review]
updated (again) test and fix
Comment 9 Sebastian Dröge (slomo) 2016-05-06 11:34:37 UTC
commit 8f7962e1c3178bb6f84c7d9d06e3172ae7dfd273
Author: Havard Graff <havard.graff@gmail.com>
Date:   Tue May 3 11:45:01 2016 +0200

    rtpjitterbuffer: Fix stall when receiving already lost packet
    
    When a packet arrives that has already been considered lost as part of a
    large gap the "lost timer" for this will be cancelled. If the remaining
    packets of this large gap never arrives, there will be missing entries
    in the queue and the loop function will keep waiting for these packets
    to arrive and never push another packet, effectively stalling the
    pipeline.
    
    The proposed fix conciders parts of a large gap definitely lost (since
    they are calculated from latency) and ignores the late arrivals.
    
    In practice the issue is rare since large gaps are scheduled immediately,
    and for the stall to happen the late arrival needs to be processed
    before this times out.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=765933