Bug 567857 – [udpsrc] loop on gst_poll_wait when POLLERR because of icmp

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 567857 - [udpsrc] loop on gst_poll_wait when POLLERR because of icmp


Summary:	[udpsrc] loop on gst_poll_wait when POLLERR because of icmp


Status:	RESOLVED FIXED

Product:	GStreamer
Classification:	Platform
Component:	gst-plugins-good
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	0.10.15
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2009-01-15 13:11 UTC by Aurelien Grimaud
Modified:	2009-02-24 20:10 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Flush POLLERR in udpsrc before retry poll (2.17 KB, patch) 2009-01-15 13:12 UTC, Aurelien Grimaud	committed	Details \| Review
test case (8.83 KB, text/plain) 2009-01-15 21:09 UTC, Aurelien Grimaud		Details

Description Aurelien Grimaud 2009-01-15 13:11:15 UTC

I am using the same fd to send and recv rtp.
This fd is used in gstudpsink and gstudpsrc.
udpsrc tries to read from fd while udpsink sends RTP.

When distant is unreachable, udpsink sendto will lead to a icmp packet sent back.
Therefore, ppoll in gst_poll_wait called from udpsrc will return 1 with POLLERR set on rtp fd.
udpsrc then loops on poll because FIONREAD returns 0 (icmp packet).
When next RTP packet is sent (20ms later in ALAW there), sendto returns an error and the udpsink post a message on bus : Connection refused.
Meanwhile (20ms) udpsrc looped about a hundred times on poll, consuming all cpu.

The comment before retry in gstudpsrc.c is right : "We know someone will also do something with the socket so that we don't go into an infinite loop in the select()"
Problem is when ?

We can choose to stop the udpsrc when the message 'Connection refused' is received on bus. This means that all cpu will be consumed for 20 ms.

Moreover, the icmp error may only be temporary. It may happen at the beginning of the process, just let the distant time to bind its ports (It should have done it before, right, but who knows ? : linphone for instance has such bug)

So the problem is about icmp errors on RTP sockets.
Are they considered to be fatal (therefore, why loop on udpsrc ?) or not and we have a performance issue.

fd has POLLERR revent anytime the poll is done, this is why loop is so fast.
But pollerr is set only when icmp is received because of a send.

POLLERR should be flushed before loop on poll.
This can be done using recvmsg : patch attached.

side effect : udpsink does not see error anymore and does not return GST_FLOW_ERROR

Comment 1 Aurelien Grimaud 2009-01-15 13:12:27 UTC

Created attachment 126501 [details] [review]
Flush POLLERR in udpsrc before retry poll

Comment 2 Wim Taymans 2009-01-15 18:11:53 UTC

I can't reproduce. Are you allocating the socket yourself or using the one from udpsrc?

Comment 3 Aurelien Grimaud 2009-01-15 20:15:49 UTC

I am allocating the socket, passing the fd with the sockfd property of udpsrc.
I use gstreamer to handle media in sip calls.
The distant did not bind it's socket though provided RTP endpoints when sending its SIP INVITE.
The distant was sipp.
I'll try to build a simple application which is able to reproduce this issue.

There is a problem becasuse both udpsrc and udpsink shares the same fd.
So icmp packet caused by udpsink influence udpsrc behavior.

Comment 4 Aurelien Grimaud 2009-01-15 21:07:38 UTC

Here is server-PCMA.c to reproduce the issue.
This happens because the socket is connected to remote.
If not connected, everything is ok.

GST_DEBUG=udpsrc:5 ./server-PCMA

Comment 5 Aurelien Grimaud 2009-01-15 21:09:02 UTC

Created attachment 126535 [details]
test case

Comment 6 Aurelien Grimaud 2009-01-15 21:20:50 UTC

Why connect an udp socket ?
Well remote is well identified and seen with lsof or netstat.
Udp packets from wrong source are not delivered.
You can gain performance by using send and recv instead of sendto and recvfrom.

We plan to add property to use connected udp socket on udpsrc / udpsink.
http://bugzilla.gnome.org/show_bug.cgi?id=563323

Comment 7 Wim Taymans 2009-01-16 11:23:04 UTC

I see know and I'm asking myself the following questions:

 1) should we not disable the POLLERR in the udpsrc? we don't really want to
    receive errors from the sender here. This requires us to add a method to
    the GstPoll to unset the POLLERR flag on the sockets.
 2) we should likely read the errors on the udp socket in multiudpsink instead.
    We don't need to add GstPoll to multiudpsink for that, I think. We can
    simply read/clear the error queue when the last sendto returned an error.

Would this work too?

Comment 8 Aurelien Grimaud 2009-01-16 13:54:23 UTC

You cannot disable pollerr ...

man poll :
The bits returned in revents can include any of those specified in events, or one of the values POLLERR, POLLHUP, or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.)

for point 2, the sendto is not on error the first time. It is on error because of ICMP return.
so if errors are read from udpsink, you'll still have the gst_poll burst. maybe half less, but still.

Comment 9 Wim Taymans 2009-01-16 16:57:11 UTC

Yes, I figured that out experimenting a bit. So it seems your patch is pretty much the only thing that can be done.

Comment 10 Wim Taymans 2009-02-23 19:53:54 UTC

commit 969622b439cc7232e8889cd5c7a5579cba6b665f
Author: Aurelien Grimaud <gstelzz at yahoo dot fr>
Date:   Mon Feb 23 20:49:37 2009 +0100

    Read ICMP error messages instead of looping
    
    When we are dealing with connected sockets shared between a udpsrc and a udpsink
    we might receive ICMP connection refused error messages in udpsrc that will
    cause it to go into a bursty loop because the poll returns right away without a
    message to read.
    
    Instead of looping, read the error message from the error queue in udpsrc.
    Fixes #567857.

Comment 11 Wim Taymans 2009-02-23 19:58:51 UTC

should we post a WARNING message on the bus when the source receives an error message? this way the application could at least get some idea about errors.

Comment 12 Aurelien Grimaud 2009-02-24 20:10:01 UTC

Well, it sure would be better for the application to be aware of ICMP problems instead of silently hiding them.
Application will _have to_ deal with it, since there will have a bus post each 20ms (for pcma).
Can one bus post each 20ms lead to major performances problem ?
If so, a property "warn for icmp" could be added.
This could be a way to detect runaway calls from the caller and take signaling decisions (Hang Up) based on media failures (icmp errors) (We use gstreamer for telephony applications)