After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 498509 - Insufficient unreferencing of async context?
Insufficient unreferencing of async context?
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: API
2.2.x
Other All
: High critical
: ---
Assigned To: Dan Winship
Dan Winship
Depends on:
Blocks:
 
 
Reported: 2007-11-20 14:30 UTC by Wouter Cloetens
Modified: 2008-10-01 17:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch (2.07 KB, patch)
2007-11-20 15:44 UTC, Dan Winship
none Details | Review
Add a new test. (18.60 KB, patch)
2007-11-21 12:05 UTC, Wouter Cloetens
none Details | Review
Add a new test. (18.60 KB, patch)
2007-11-21 12:19 UTC, Wouter Cloetens
none Details | Review
Extension of context-test to test for dangling ref in SoupAsyncSession (552 bytes, patch)
2008-09-30 11:55 UTC, Arnout Vandecappelle
none Details | Review
Fix for dangling ref in SoupAsyncSession and extension of context-test to test for it (1.45 KB, patch)
2008-09-30 12:29 UTC, Arnout Vandecappelle
none Details | Review

Description Wouter Cloetens 2007-11-20 14:30:52 UTC
Please describe the problem:
Maybe this is related to bug 494173.
I start my async client session in a dedicated main loop with a dedicated GMainContext. There can be several simultaneous sessions, and some of them have a limited lifespan, so they must be clearly separated. After a while, my app crashes with too many open fds. This turns out to be due to the pipe pair of the GMainContext not being closed, ergo the GMainContext is not destroyed at the end of my session's lifecycle.

Further debugging reveals this:

1. src->context = g_main_context_new ();
                                                        // context refcount = 1
2. src->loop = g_main_loop_new (src->context, TRUE);
                                                        // context refcount = 2
3. src->session = soup_session_async_new_with_options (
      SOUP_SESSION_ASYNC_CONTEXT, src->context, NULL);
                                                        // context refcount = 3
                                                        // session refcount = 1
4. src->msg = soup_message_new (SOUP_METHOD_GET, src->location);
5. g_signal_connect (src->msg, "got_chunk",
                    G_CALLBACK (soup_got_chunk), src);
6. soup_message_set_flags (src->msg, SOUP_MESSAGE_OVERWRITE_CHUNKS);

7. soup_session_queue_message (src->session, src->msg,
                               soup_response, src);
                                                        // context refcount = 5
                                                        // session refcount = 2
8. g_object_unref (src->session);
                                                        // session refcount = 1
9. g_main_loop_run (src->loop);

// data is received. In got_chunk callback:
10. g_main_loop_quit (src->loop);
11. soup_message_io_pause (msg);

// back in main thread:
12. soup_message_io_unpause (src->msg);
13. g_main_loop_run (src->loop);

// lather, rinse, repeat until clean

// main thread detects that Content-Length bytes have been read,
// or there is an external interrupt forcing the session to stop.

                                                        // context refcount = 5
                                                        // session refcount = 1
14. soup_session_abort (src->session);  /* This unrefs the message. */
                                                        // context refcount = 5
                                                        // session refcount = 1
15. g_object_unref (src->session);
                                                        // context refcount = 4
                                                        // session refcount = 0
16. g_main_loop_unref (src->loop);
                                                        // context refcount = 3
17. g_main_context_unref (src->context);
                                                        // context refcount = 2


If the session is aborted before step 7 (queueing the message) is reached, teardown occurs correctly, i.e. the context refcount drops to 0 and the pipe pair is closed.


Steps to reproduce:


Actual results:


Expected results:


Does this happen every time?


Other information:
Comment 1 Dan Winship 2007-11-20 15:44:44 UTC
Created attachment 99403 [details] [review]
patch

I think this fixes it, but none of the test cases currently tests using a non-NULL async context, so I need to write a new test before committing. But let me know if it works for you.
Comment 2 Dan Winship 2007-11-21 04:24:48 UTC
Should be fixed (and regression tested) in svn now. This will go out with
libsoup 2.2.104 as part of GNOME 2.20.2 next week. Thanks for the bug report.

(btw, the reason valgrind was being unhelpful is because it tries to merge
related leaks together based on their backtraces, but the default level
[--leak-resolution=low] seems to end up merging *unrelated* leaks together,
causing some of them to be hidden. --leak-resolution=med seems to work
better.)
Comment 3 Wouter Cloetens 2007-11-21 12:05:34 UTC
Created attachment 99431 [details] [review]
Add a new test.
Comment 4 Wouter Cloetens 2007-11-21 12:18:41 UTC
Note on comment 0: step 15 is a bug. I should not be unreffing the session there.

I'm not sure this is fixed.
I've added a test that isolates my use case from GStreamer. It exposes the refcount of the GMainContext object. Messy and dangerous, but it's for debug purposes only.

Run with '-d' for debug info.

Run with -'c' to let the server send the Content-Length header. Because of the 'got-body' signal not being emitted (fixed in bug 452280), I had code to abort the session when all the data arrived.

Run with '-w' to unref the session and the GMainContext until their refcount is 0; useful for comparing Valgrind results.

What I see is the following:
- When running with '-c', session_abort() called before 'got-body' arrives. The session refcount is one higher than without -c, and does not fall to 0 after session_abort() and unreffing the session. The context refcount is 3 at the end of the test.
- Even when running without '-c', the context refcount is 1 at the end.
Comment 5 Wouter Cloetens 2007-11-21 12:19:37 UTC
Created attachment 99433 [details] [review]
Add a new test.

Oops; inverted -w logic.
Comment 6 Dan Winship 2007-11-21 16:27:42 UTC
fixed; the problem was that if you finalized a SoupMessage before it emitted "finished", it would leak the I/O state (which includes a reference to the socket, which includes a reference to the async_context).
Comment 7 Arnout Vandecappelle 2008-09-30 11:55:41 UTC
Created attachment 119639 [details] [review]
Extension of context-test to test for dangling ref in SoupAsyncSession
Comment 8 Arnout Vandecappelle 2008-09-30 11:57:24 UTC
I'd like to reopen this bug.  In libsoup-2.4, I'm still observing the problem.  I have patched the context-test (attachment #119639 [details]) to also check the ref_count of the session.  Note that if the call to soup_session_abort() is removed, the ref_count is correct.

In soup_session_async::do_idle_run_queue, the session is reffed.  However, if the session is aborted before the idle callback happens, it is never unreffed.  I'm working on a patch that registers a DestroyNotify function for the idle callback.
Comment 9 Arnout Vandecappelle 2008-09-30 12:29:29 UTC
Created attachment 119642 [details] [review]
Fix for dangling ref in SoupAsyncSession and extension of context-test to test for it
Comment 10 Arnout Vandecappelle 2008-09-30 12:32:04 UTC
The fix is even simpler than expected: since 2.12, glib has the g_source_is_destroyed () function, which makes the trick with the reffing and unreffing and weak pointer unnecessary.

Patch is against 2.4.1, not trunk.  I hope that's fine.
Comment 11 Dan Winship 2008-09-30 13:35:44 UTC
(In reply to comment #8)
> I'd like to reopen this bug.

In general, it's better to file a new bug than to add more comments to a year-old bug. (Because that lets the developers decide how to handle it; bugzilla lets you merge two bugs together, but doesn't let you easily split two bugs apart.)

Anyway

(In reply to comment #10)
> The fix is even simpler than expected: since 2.12, glib has the
> g_source_is_destroyed () function, which makes the trick with the reffing and
> unreffing and weak pointer unnecessary.

That doesn't seem like it should work... if idle_run_queue() is getting run, then it would do the unref of the session, and so there's no dangling ref. And likewise, if idle_run_queue() is getting run, then the idle source can't have been destroyed, can it?

At any rate, the patch on bug 533473 complicates this code a bit more and would break your patch.

I can confirm that the refcount of the session is wrong after the abort call though, so I will look in to this.
Comment 12 Arnout Vandecappelle 2008-09-30 15:07:50 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > The fix is even simpler than expected: since 2.12, glib has the
> > g_source_is_destroyed () function, which makes the trick with the reffing and
> > unreffing and weak pointer unnecessary.
> 
> That doesn't seem like it should work... if idle_run_queue() is getting run,
> then it would do the unref of the session, and so there's no dangling ref. And
> likewise, if idle_run_queue() is getting run, then the idle source can't have
> been destroyed, can it?

As far as I understand, it is possible that a source still triggers after it has been destroyed.  "Destroyed", in that case, does not equal "freed".

You're right, though, that the fix doesn't work.  It is still possible that the async_context (and therefore the idle source) is kept alive externally even after the session has been aborted, unreffed and freed, so the g_source_is_destroyed() function is no guarantee that the session still exists.


> At any rate, the patch on bug 533473 complicates this code a bit more and would
> break your patch.

ACK.

The solution would therefore be to use a DestroyNotify after all.
Alternatively, a more general solution could be to use g_object_weak_ref() to register a cancellation callback (g_source_destroy()) for all the situations where weak pointers are used now.  But that basically boils down to the same thing.  Note that in that case the g_source_is_destroyed() check still needs to be done (I think).

By the way, I wonder if attachment 118879 [details] [review] of bug 533473 mightn't create a similar situation.  Isn't it possible that the connection is destroyed/aborted without call to got_connection, leaving a dangling ref to the session?
Comment 13 Dan Winship 2008-09-30 15:49:27 UTC
fixed in trunk; I changed soup-session-async to keep track of the
idle_run_queue's GSource, and destroy it when finalizing the session, rather
than having do_idle_run_queue ref the session. I'm not sure why i didn't do it that way originally.

(In reply to comment #12)
> By the way, I wonder if attachment 118879 [details] [review] [edit] of bug 533473 mightn't create a
> similar situation.  Isn't it possible that the connection is destroyed/aborted
> without call to got_connection, leaving a dangling ref to the session?

No. No one except the session ever sees the connection object, so the only way it could be destroyed is if the session is destroyed, and that can't happen while we're holding an extra ref on it. And aborting the session would cause the soup connection attempt to fail, resulting it it calling the callback. So it's all good. (Or at least, if it's not, I don't want to fix it, because the SoupConnection stuff is all getting rewritten to be less messy anyway :)
Comment 14 Arnout Vandecappelle 2008-10-01 17:38:56 UTC
Unfortunately, the original problem persists: when the main loop is quit after a connection is initiated but before the connection is established, there is a dangling ref to the session.  I think it is caused by the g_object_ref in SoupSessionAsync::run_queue().  The main_loop is stopped before got_connection is called.  This is because soup_socket_connect_async() goes into CONTINUE mode, but the connect_watch is never triggered.  I'm not entirely sure why not.