After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 663944 - souphttpsrc: leaking file descriptors + memory when destroyed while connecting
souphttpsrc: leaking file descriptors + memory when destroyed while connecting
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-good
git master
Other Linux
: Normal major
: 1.3.2
Assigned To: GStreamer Maintainers
GStreamer Maintainers
: 721256 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-11-13 00:49 UTC by Dmitry Shatrov
Modified: 2014-05-08 08:20 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
libsoup resource leak fix (978 bytes, patch)
2011-11-15 23:12 UTC, Dmitry Shatrov
none Details | Review
libsoup g_main_context_pop_thread_default() patch (1.50 KB, patch)
2011-11-15 23:16 UTC, Dmitry Shatrov
rejected Details | Review
souphttpsrc resource leak fix #2 (1.33 KB, patch)
2011-11-16 00:17 UTC, Dmitry Shatrov
none Details | Review
souphttpsrc resource leak fix #3 (1.95 KB, patch)
2011-11-16 00:33 UTC, Dmitry Shatrov
committed Details | Review
glib-2.37.5 patch against souphttpsrc fd leak (562 bytes, patch)
2013-09-15 11:05 UTC, Dmitry Shatrov
committed Details | Review

Description Dmitry Shatrov 2011-11-13 00:49:15 UTC
souphttpsrc leaks fds and libsoup objects when the following contions are met:
  1. connetion to a server is in progress (not established yet);
  2. the pipeline is switched to NULL state and then destroyed.

There's almost 100% chance of hitting this when destroying the pipeline while the server is not responding. My video server runs into max open fds limit in 3 hours when a single MJPEG source goes down because of this bug.

It appears to me that the problem is in how souphttpsrc handles its separate main context. It creates a new main context and passes it to libsoup, which passes it to GIO. When the element is destroyed, the context goes away immediately (that's how I see it - may be wrong). But GIO depends on the context for freeing resources associated with the connection. Cancellable object uses the context to initiate cancellation, which leads to destruction of related objects and fds. My guess is that the context is destroyed early and that's why GIO cancellation doesn't work, which results in several file descriptors leaked and unfreed libsoup and GIO objects.

The following patch makes resource leaks go away, which proves the point.

diff --git a/ext/soup/gstsouphttpsrc.c b/ext/soup/gstsouphttpsrc.c
index 3011058..2f03c8d 100644
--- a/ext/soup/gstsouphttpsrc.c
+++ b/ext/soup/gstsouphttpsrc.c
@@ -1267,7 +1267,7 @@ gst_soup_http_src_start (GstBaseSrc * bsrc)
   if (src->proxy == NULL) {
     src->session =
         soup_session_async_new_with_options (SOUP_SESSION_ASYNC_CONTEXT,
-        src->context, SOUP_SESSION_USER_AGENT, src->user_agent,
+        /* src->context */ g_main_context_default(), SOUP_SESSION_USER_AGENT, src->user_agent,
         SOUP_SESSION_TIMEOUT, src->timeout,
 #ifdef HAVE_LIBSOUP_GNOME
         SOUP_SESSION_ADD_FEATURE_BY_TYPE, SOUP_TYPE_PROXY_RESOLVER_GNOME,
@@ -1276,7 +1276,7 @@ gst_soup_http_src_start (GstBaseSrc * bsrc)
   } else {
     src->session =
         soup_session_async_new_with_options (SOUP_SESSION_ASYNC_CONTEXT,
-        src->context, SOUP_SESSION_PROXY_URI, src->proxy,
+        /* src->context */ g_main_context_default(), SOUP_SESSION_PROXY_URI, src->proxy,
         SOUP_SESSION_TIMEOUT, src->timeout,
         SOUP_SESSION_USER_AGENT, src->user_agent, NULL);
   }
Comment 1 Dmitry Shatrov 2011-11-15 23:12:51 UTC
Created attachment 201489 [details] [review]
libsoup resource leak fix

This is a fix for releasing resources allocated by libsoup when pipeline goes to NULL state. It only works correctly with a patch to libsoup which I have sent to libsoup mailing list. Here's the libsoup patch:

diff -ur libsoup/libsoup/soup-socket.c libsoup_new/libsoup/soup-socket.c
--- libsoup/libsoup/soup-socket.c	2011-11-12 16:57:51.000000000 +0400
+++ libsoup_new/libsoup/soup-socket.c	2011-11-16 02:22:13.583630395 +0400
@@ -679,9 +679,6 @@
 	GSocketConnection *conn;
 	guint status;
 
-	if (priv->async_context && !priv->use_thread_context)
-		g_main_context_pop_thread_default (priv->async_context);
-
 	conn = g_socket_client_connect_finish (G_SOCKET_CLIENT (client),
 					       result, &error);
 	status = socket_connected (sacd->sock, conn, error);
@@ -736,6 +733,10 @@
 				       priv->connect_cancel,
 				       async_connected, sacd);
 	g_object_unref (client);
+
+	if (priv->async_context && !priv->use_thread_context)
+		g_main_context_pop_thread_default (priv->async_context);
+
 }
 
 /**
@@ -1057,9 +1058,6 @@
 	GError *error = NULL;
 	guint status;
 
-	if (priv->async_context && !priv->use_thread_context)
-		g_main_context_pop_thread_default (priv->async_context);
-
 	if (g_tls_connection_handshake_finish (G_TLS_CONNECTION (priv->conn),
 					       result, &error))
 		status = SOUP_STATUS_OK;
@@ -1093,10 +1091,14 @@
 
 	if (priv->async_context && !priv->use_thread_context)
 		g_main_context_push_thread_default (priv->async_context);
+
 	g_tls_connection_handshake_async (G_TLS_CONNECTION (priv->conn),
 					  G_PRIORITY_DEFAULT,
 					  cancellable, handshake_async_ready,
 					  data);
+
+	if (priv->async_context && !priv->use_thread_context)
+		g_main_context_pop_thread_default (priv->async_context);
 }
 
 /**
Comment 2 Dmitry Shatrov 2011-11-15 23:16:59 UTC
Created attachment 201490 [details] [review]
libsoup g_main_context_pop_thread_default() patch

This patch to libsoup allows to add extra iteration of the main context in gst_soup_http_src_stop() without breaking things. It appears to work, but needs review.
Comment 3 Dmitry Shatrov 2011-11-15 23:20:12 UTC
email to libsoup mailing list:
http://mail.gnome.org/archives/libsoup-list/2011-November/msg00000.html
Comment 4 Dmitry Shatrov 2011-11-16 00:17:07 UTC
Created attachment 201497 [details] [review]
souphttpsrc resource leak fix #2

additionally, g_main_loop_run() should be wrapped with g_main_context_push_thread_default()/g_main_context_pop_thread_default()
Comment 5 Dmitry Shatrov 2011-11-16 00:33:15 UTC
Created attachment 201498 [details] [review]
souphttpsrc resource leak fix #3

Dummy idle callback to suppress glib's "idle source without callback" warning.
Extra care when iterating the context.
Comment 6 Sebastian Dröge (slomo) 2013-08-21 19:06:45 UTC
Is this still a problem with GStreamer 1.0 and latest libsoup?
Comment 7 Dmitry Shatrov 2013-09-15 11:04:41 UTC
Yes, this problem still persists. I tested with gstreamer-1.1.4 and libsoup-2.43.90.

The same patches apply (attachments to this bug). Souphttpsrc patch applies to gst-plugins-good-1.1.4 as is, and libsoup patch applies to libsoup-2.43.90 with a trivial one-liner conflict resolution.

Without these patches, gstreamer leaks 3 fds per connection attempt to an offline hosts.

There's also one new bug in glib/gio which reuslts in leaking 1 fd even with these patches applied - I'll post a patch for that in a new bug report. Attaching it here for reference.
Comment 8 Dmitry Shatrov 2013-09-15 11:05:32 UTC
Created attachment 254966 [details] [review]
glib-2.37.5 patch against souphttpsrc fd leak
Comment 9 Sebastian Dröge (slomo) 2013-09-16 09:18:56 UTC
I think the souphttpsrc patch is not correct, when the main context is destroyed all GSources will be destroyed too, and inside their finalize function they should make sure to not leak anything. This only looks like a workaround for some GSource not cleaning up properly.

Could you also create separate bugs for the libsoup and glib problems against these products?
Comment 10 Dmitry Shatrov 2013-09-16 09:57:53 UTC
The cause of leaks at souphttpsrc termination is that GIO cancellables don't have a chance to be run because corresponding glib message loop is no more.

Other code does cleanups in a normal way. Some of the cleanups are made with g_cancellable_cancel(), which assumes that there's a message loop which will execute deferred operations. The souphttpsrc patch makes sure that all these cancellables are actually processed. That's done by iterating the loop until it has nothing to do.
Comment 11 Sebastian Dröge (slomo) 2013-12-31 08:56:58 UTC
*** Bug 721256 has been marked as a duplicate of this bug. ***
Comment 12 zhangyanping 2014-01-02 02:59:42 UTC
Hello all, what is the finally resolution for this fd leak?
Comment 13 Sebastian Dröge (slomo) 2014-05-07 16:50:58 UTC
None at all yet, but I can reproduce this here with GLib 2.40.0, libsoup 2.46.0 and GStreamer git master. All the problems still seem to be there as before in libsoup, glib and souphttpsrc.
Comment 14 Sebastian Dröge (slomo) 2014-05-07 17:11:53 UTC
I locally updated all patches to these latest versions and fixed a few similar leaks elsewhere in related code and my testcase succeeds now. I'll provide updated patches in a bit and make sure they get integrated wherever necessary.
Comment 15 Sebastian Dröge (slomo) 2014-05-07 17:23:17 UTC
Comment on attachment 254966 [details] [review]
glib-2.37.5 patch against souphttpsrc fd leak

This one is fixed already in latest GLib independently, see bug #726611.
Comment 16 Sebastian Dröge (slomo) 2014-05-07 17:36:28 UTC
Comment on attachment 201490 [details] [review]
libsoup g_main_context_pop_thread_default() patch

See bug #729737 for the libsoup fixes
Comment 17 Sebastian Dröge (slomo) 2014-05-07 17:52:27 UTC
I'll attach the souphttpsrc fix here once I found a way to a) fix the problem and b) don't crash when using an old libsoup without the fix from bug #729737
Comment 18 Dan Winship 2014-05-07 18:03:07 UTC
(In reply to comment #17)
> I'll attach the souphttpsrc fix here once I found a way to a) fix the problem
> and b) don't crash when using an old libsoup without the fix from bug #729737

bug 693911! :)
Comment 19 Sebastian Dröge (slomo) 2014-05-07 19:14:05 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > I'll attach the souphttpsrc fix here once I found a way to a) fix the problem
> > and b) don't crash when using an old libsoup without the fix from bug #729737
> 
> bug 693911! :)

Yes, but I'd like to have a fix for the old code too :) I looked at the new API and it seems it mostly means removing a lot of complicated code... so will look into implementing that afterwards.
Comment 20 Sebastian Dröge (slomo) 2014-05-08 08:12:50 UTC
Also see bug #729796.
Comment 21 Sebastian Dröge (slomo) 2014-05-08 08:19:57 UTC
And fixed in souphttpsrc too now, but only if a new enough libsoup is available that contains the fix from bug #729737.

commit 2a7abc98dbd5a184dd83c57370c690436040fdec
Author: Sebastian Dröge <sebastian@centricular.com>
Date:   Thu May 8 09:49:24 2014 +0200

    souphttpsrc: Clean up all pending operations from libsoup before unreffing 
    
    When we cancel connection attempts and similar things, there are still
    some operations pending on our main context from the GCancellables. We
    should let them all run before unreffing our context, otherwise we leak
    file descriptors.
    
    Unfortunately this requires libsoup 2.47.0 or newer as earlier versions
    steal our main context from us and we can't use it for cleanup later
    without assertions and funny crashes.
    
    Based on a patch by Dmitry Shatrov <shatrov@gmail.com>.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=663944