GNOME Bugzilla – Bug 663944
souphttpsrc: leaking file descriptors + memory when destroyed while connecting
Last modified: 2014-05-08 08:20:03 UTC
souphttpsrc leaks fds and libsoup objects when the following contions are met: 1. connetion to a server is in progress (not established yet); 2. the pipeline is switched to NULL state and then destroyed. There's almost 100% chance of hitting this when destroying the pipeline while the server is not responding. My video server runs into max open fds limit in 3 hours when a single MJPEG source goes down because of this bug. It appears to me that the problem is in how souphttpsrc handles its separate main context. It creates a new main context and passes it to libsoup, which passes it to GIO. When the element is destroyed, the context goes away immediately (that's how I see it - may be wrong). But GIO depends on the context for freeing resources associated with the connection. Cancellable object uses the context to initiate cancellation, which leads to destruction of related objects and fds. My guess is that the context is destroyed early and that's why GIO cancellation doesn't work, which results in several file descriptors leaked and unfreed libsoup and GIO objects. The following patch makes resource leaks go away, which proves the point. diff --git a/ext/soup/gstsouphttpsrc.c b/ext/soup/gstsouphttpsrc.c index 3011058..2f03c8d 100644 --- a/ext/soup/gstsouphttpsrc.c +++ b/ext/soup/gstsouphttpsrc.c @@ -1267,7 +1267,7 @@ gst_soup_http_src_start (GstBaseSrc * bsrc) if (src->proxy == NULL) { src->session = soup_session_async_new_with_options (SOUP_SESSION_ASYNC_CONTEXT, - src->context, SOUP_SESSION_USER_AGENT, src->user_agent, + /* src->context */ g_main_context_default(), SOUP_SESSION_USER_AGENT, src->user_agent, SOUP_SESSION_TIMEOUT, src->timeout, #ifdef HAVE_LIBSOUP_GNOME SOUP_SESSION_ADD_FEATURE_BY_TYPE, SOUP_TYPE_PROXY_RESOLVER_GNOME, @@ -1276,7 +1276,7 @@ gst_soup_http_src_start (GstBaseSrc * bsrc) } else { src->session = soup_session_async_new_with_options (SOUP_SESSION_ASYNC_CONTEXT, - src->context, SOUP_SESSION_PROXY_URI, src->proxy, + /* src->context */ g_main_context_default(), SOUP_SESSION_PROXY_URI, src->proxy, SOUP_SESSION_TIMEOUT, src->timeout, SOUP_SESSION_USER_AGENT, src->user_agent, NULL); }
Created attachment 201489 [details] [review] libsoup resource leak fix This is a fix for releasing resources allocated by libsoup when pipeline goes to NULL state. It only works correctly with a patch to libsoup which I have sent to libsoup mailing list. Here's the libsoup patch: diff -ur libsoup/libsoup/soup-socket.c libsoup_new/libsoup/soup-socket.c --- libsoup/libsoup/soup-socket.c 2011-11-12 16:57:51.000000000 +0400 +++ libsoup_new/libsoup/soup-socket.c 2011-11-16 02:22:13.583630395 +0400 @@ -679,9 +679,6 @@ GSocketConnection *conn; guint status; - if (priv->async_context && !priv->use_thread_context) - g_main_context_pop_thread_default (priv->async_context); - conn = g_socket_client_connect_finish (G_SOCKET_CLIENT (client), result, &error); status = socket_connected (sacd->sock, conn, error); @@ -736,6 +733,10 @@ priv->connect_cancel, async_connected, sacd); g_object_unref (client); + + if (priv->async_context && !priv->use_thread_context) + g_main_context_pop_thread_default (priv->async_context); + } /** @@ -1057,9 +1058,6 @@ GError *error = NULL; guint status; - if (priv->async_context && !priv->use_thread_context) - g_main_context_pop_thread_default (priv->async_context); - if (g_tls_connection_handshake_finish (G_TLS_CONNECTION (priv->conn), result, &error)) status = SOUP_STATUS_OK; @@ -1093,10 +1091,14 @@ if (priv->async_context && !priv->use_thread_context) g_main_context_push_thread_default (priv->async_context); + g_tls_connection_handshake_async (G_TLS_CONNECTION (priv->conn), G_PRIORITY_DEFAULT, cancellable, handshake_async_ready, data); + + if (priv->async_context && !priv->use_thread_context) + g_main_context_pop_thread_default (priv->async_context); } /**
Created attachment 201490 [details] [review] libsoup g_main_context_pop_thread_default() patch This patch to libsoup allows to add extra iteration of the main context in gst_soup_http_src_stop() without breaking things. It appears to work, but needs review.
email to libsoup mailing list: http://mail.gnome.org/archives/libsoup-list/2011-November/msg00000.html
Created attachment 201497 [details] [review] souphttpsrc resource leak fix #2 additionally, g_main_loop_run() should be wrapped with g_main_context_push_thread_default()/g_main_context_pop_thread_default()
Created attachment 201498 [details] [review] souphttpsrc resource leak fix #3 Dummy idle callback to suppress glib's "idle source without callback" warning. Extra care when iterating the context.
Is this still a problem with GStreamer 1.0 and latest libsoup?
Yes, this problem still persists. I tested with gstreamer-1.1.4 and libsoup-2.43.90. The same patches apply (attachments to this bug). Souphttpsrc patch applies to gst-plugins-good-1.1.4 as is, and libsoup patch applies to libsoup-2.43.90 with a trivial one-liner conflict resolution. Without these patches, gstreamer leaks 3 fds per connection attempt to an offline hosts. There's also one new bug in glib/gio which reuslts in leaking 1 fd even with these patches applied - I'll post a patch for that in a new bug report. Attaching it here for reference.
Created attachment 254966 [details] [review] glib-2.37.5 patch against souphttpsrc fd leak
I think the souphttpsrc patch is not correct, when the main context is destroyed all GSources will be destroyed too, and inside their finalize function they should make sure to not leak anything. This only looks like a workaround for some GSource not cleaning up properly. Could you also create separate bugs for the libsoup and glib problems against these products?
The cause of leaks at souphttpsrc termination is that GIO cancellables don't have a chance to be run because corresponding glib message loop is no more. Other code does cleanups in a normal way. Some of the cleanups are made with g_cancellable_cancel(), which assumes that there's a message loop which will execute deferred operations. The souphttpsrc patch makes sure that all these cancellables are actually processed. That's done by iterating the loop until it has nothing to do.
*** Bug 721256 has been marked as a duplicate of this bug. ***
Hello all, what is the finally resolution for this fd leak?
None at all yet, but I can reproduce this here with GLib 2.40.0, libsoup 2.46.0 and GStreamer git master. All the problems still seem to be there as before in libsoup, glib and souphttpsrc.
I locally updated all patches to these latest versions and fixed a few similar leaks elsewhere in related code and my testcase succeeds now. I'll provide updated patches in a bit and make sure they get integrated wherever necessary.
Comment on attachment 254966 [details] [review] glib-2.37.5 patch against souphttpsrc fd leak This one is fixed already in latest GLib independently, see bug #726611.
Comment on attachment 201490 [details] [review] libsoup g_main_context_pop_thread_default() patch See bug #729737 for the libsoup fixes
I'll attach the souphttpsrc fix here once I found a way to a) fix the problem and b) don't crash when using an old libsoup without the fix from bug #729737
(In reply to comment #17) > I'll attach the souphttpsrc fix here once I found a way to a) fix the problem > and b) don't crash when using an old libsoup without the fix from bug #729737 bug 693911! :)
(In reply to comment #18) > (In reply to comment #17) > > I'll attach the souphttpsrc fix here once I found a way to a) fix the problem > > and b) don't crash when using an old libsoup without the fix from bug #729737 > > bug 693911! :) Yes, but I'd like to have a fix for the old code too :) I looked at the new API and it seems it mostly means removing a lot of complicated code... so will look into implementing that afterwards.
Also see bug #729796.
And fixed in souphttpsrc too now, but only if a new enough libsoup is available that contains the fix from bug #729737. commit 2a7abc98dbd5a184dd83c57370c690436040fdec Author: Sebastian Dröge <sebastian@centricular.com> Date: Thu May 8 09:49:24 2014 +0200 souphttpsrc: Clean up all pending operations from libsoup before unreffing When we cancel connection attempts and similar things, there are still some operations pending on our main context from the GCancellables. We should let them all run before unreffing our context, otherwise we leak file descriptors. Unfortunately this requires libsoup 2.47.0 or newer as earlier versions steal our main context from us and we can't use it for cleanup later without assertions and funny crashes. Based on a patch by Dmitry Shatrov <shatrov@gmail.com>. https://bugzilla.gnome.org/show_bug.cgi?id=663944