GNOME Bugzilla – Bug 781218
rtsp-stream: issue when getting EADDRINUSE on bind and only one address is in the pool (alloc_ports_one_family)
Last modified: 2018-05-06 12:53:43 UTC
Created attachment 349728 [details] [review] Patch to fix issue when there is only one address in the pool and bind fails. There is an issue when bind fails in alloc_ports_one_family and only one address is in the address pool. It goes back to the again label and that time the gst_rtsp_address_pool_acquire_address returns NULL and the function returns. With this patch, if we get EADDRINUSE, we just try to bind to another port instead of going all the way back to the again label. We use the existing 'count' variable to limit it to a maximum of 20 tries.
Review of attachment 349728 [details] [review]: ::: gst/rtsp-server/rtsp-stream.c @@ +1314,3 @@ + if (tmp_rtp > 65534 || ++count > 20) { + /* port outside of range or failed 20 times */ + goto port_error; This code block is duplicated, maybe add a helper ? @@ +1318,3 @@ + /* rtp_socket is already bound to a port, close and allocate another */ + g_clear_object (&rtp_socket); + if (rtp_socket == NULL) { This check is a no-op, since g_clear_object will always set rtp_socket to NULL.
Created attachment 349741 [details] [review] Patch to fix issue when there is only one address in the pool and bind fails. Updated the patch with a helper function 'next_port'. Also moved the part where the rtp_socket is being created into the 'bind_again' label.
Let's put this on hold for now.. We are investigating if our test case uses a correct approach for this. It uses UDP sockets and a networkshare, and failed very rarely. This patch might be just a workaround for some other underlying problem.
Don't know if this is what you're running into, but for what it's worth I've seen issues with 'make check' where depending on the timings at which various tests are run some tests would try to use the same ports at other tests (and always work fine if the tests are run individually of course). I've fixed a bunch of those a while back, but I'm sure there are more such cases (IIRC).
I don't know if that is related to this. We have a Jenkins host and bind() is run on the target. Maybe the test that runs just before happens to use the same port, and then the port is somehow unavailable. But that should not be the case for UDP anyway. Investigation continues..
I take it that this is not a regression then? Please let us know what your investigation results in, and ideally provide a testcase for the behaviour you see. Maybe that also helps to understand if there really is a bug or if things are working as intended and whatever you're doing should be solved differently.
Closing this bug report as no further information has been provided. Please feel free to reopen this bug report if you can provide the information that was asked for in a previous comment. Thanks!