After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 722370 - rtprtxsend: push rtx buffers from a different thread to avoid long retransmission delays
rtprtxsend: push rtx buffers from a different thread to avoid long retransmis...
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal normal
: 1.3.1
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2014-01-16 19:55 UTC by George Kiagiadakis
Modified: 2014-01-21 14:02 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description George Kiagiadakis 2014-01-16 19:55:33 UTC
This is a follow-up on https://bugzilla.gnome.org/show_bug.cgi?id=711084 for one of the patches that were not commited.

The problem essentially is that currently rtprtxsend pushes retransmission buffers from its chain() function, which requires chain() to be called, i.e. there need to be some "master stream" buffers ready to be sent in order to send rtx buffers for older requests. This introduces a retransmission delay that can be critical for stream reconstruction on the receiver end and is unnecessary.

This patch attempts to fix this by having a separate thread that pushes retransmission buffers as soon as they are requested.

On this attempt, the new thread only pushes retransmission buffers, as Wim mentioned it should in https://bugzilla.gnome.org/show_bug.cgi?id=711084#c68. Also, some unit tests that were fixed in order to have a deterministic output and not fail at random due to the new thread.

Please note that there are still two unit tests failing at random: rtprtx::test_drop_multiple_sender and rtprtx::test_drop_one_sender. These unit tests are quite complex and weird, but despite having analyzed them fully, I am unable to find a reason for their failure. It looks like some rtx buffers are lost between rtprtxsend and rtprtxreceive in their pipelines, sometimes. At this point, I'm all for removing them, since they do not offer much in my opinion. The functionality is more or less also tested in the rest of the tests, which work fine.

The code can be found here: http://cgit.collabora.com/git/user/gkiagia/gst-plugins-good.git/log/?h=rtxsend-new-thread
Comment 1 Wim Taymans 2014-01-20 13:13:32 UTC
(In reply to comment #0)
> Please note that there are still two unit tests failing at random:
> rtprtx::test_drop_multiple_sender and rtprtx::test_drop_one_sender. These unit
> tests are quite complex and weird, but despite having analyzed them fully, I am
> unable to find a reason for their failure. It looks like some rtx buffers are
> lost between rtprtxsend and rtprtxreceive in their pipelines, sometimes. At
> this point, I'm all for removing them, since they do not offer much in my
> opinion. The functionality is more or less also tested in the rest of the
> tests, which work fine.

Please don't remove the tests, the reason why they fail in this case is because they reveal a bug in the elements. The EOS event jumps over the RTX packets and causes the pipeline to stop early and racily. The rtx sender should probably place the EOS event in the queue and only push it forward when the queue is drained.
Comment 2 George Kiagiadakis 2014-01-21 12:57:46 UTC
(In reply to comment #1)
> Please don't remove the tests, the reason why they fail in this case is because
> they reveal a bug in the elements. The EOS event jumps over the RTX packets and
> causes the pipeline to stop early and racily. The rtx sender should probably
> place the EOS event in the queue and only push it forward when the queue is
> drained.

Thanks for the hint. I had actually thought of something going wrong with the EOS event, so I tried to drop all EOS events from the pad probe that is installed right after the queue element and manually stop the pipeline a few seconds after the last expected EOS event was received (there are 4 sources, so 4 EOS events), but that didn't really help.

But anyway, I tried to implement your suggestion and it seems that if I send the EOS event from the srcpad task *and* flush the queue afterwards so that no more rtx buffers can be sent, then it works.

The patch is in the same branch.
Comment 3 Wim Taymans 2014-01-21 14:01:45 UTC
commit 1a300eb50967e88ad73ffeeb44a84e268d4705b8
Author: George Kiagiadakis <george.kiagiadakis@collabora.com>
Date:   Tue Jan 21 13:42:38 2014 +0100

    rtprtxsend: ensure that no rtx buffers are sent after EOS
    
    To do that, enqueue the EOS event to be sent from the srcpad task
    thread and flush the queue right afterwards, so that no more rtx
    buffers can be sent, even if there are more requests coming in.
    
    Fixes https://bugzilla.gnome.org/show_bug.cgi?id=722370

commit 133913a11ae35656797a7101ad0e01d484bea46b
Author: George Kiagiadakis <george.kiagiadakis@collabora.com>
Date:   Wed Jan 15 09:46:14 2014 +0100

    rtprtxsend: run a new GstTask on the src pad
    
    The reason behind this is to minimize the retransmission delay.
    Previously, when a NACK was received, rtprtxsend would put a
    retransmission packet in a queue and it would send it from chain(),
    i.e. only after a new buffer would arrive.
    
    This unfortunately was causing big delays, in the order of 60-100 ms,
    which can be critical for the receiver side.
    
    By having a separate GstTask for pushing buffers out of rtxsend,
    we can push buffers out right after receiving the event, without
    waiting for chain() to get called.
Comment 4 Wim Taymans 2014-01-21 14:02:05 UTC
pushed the whole branch