After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 698912 - Deadlocks in get_connection() in soup-session.c
Deadlocks in get_connection() in soup-session.c
Status: RESOLVED DUPLICATE of bug 703463
Product: libsoup
Classification: Core
Component: Misc
2.42.x
Other Linux
: Normal major
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2013-04-26 01:25 UTC by Matthew Barnes
Modified: 2013-07-12 22:14 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Matthew Barnes 2013-04-26 01:25:51 UTC
I wrote a gravatar.com module for Evolution that uses the SoupRequest API.

https://git.gnome.org/browse/evolution/diff/modules/gravatar/e-gravatar-photo-source.c?id=91aee2a805d70c9f7a5f0d089139fbfc710939c1

It's pretty simple; just calls soup_request_send() with a SoupSessionSync.

Works fine with libsoup 2.40, but with libsoup 2.42 I'm getting very frequent deadlocks that all look like this:

Thread 111 (Thread 0x92cfbb70 (LWP 1053))

  • #0 __kernel_vsyscall
  • #1 __pthread_cond_wait
    at pthread_cond_wait.c line 153
  • #2 g_cond_wait
    at /tmp/buildd/glib2.0-2.36.0/./glib/gthread-posix.c line 756
  • #3 get_connection
    at soup-session.c line 1747
  • #4 soup_session_process_queue_item
    at soup-session.c line 1796
  • #5 soup_session_send
    at soup-session.c line 4148
  • #6 soup_request_http_send
    at soup-request-http.c line 110
  • #7 soup_request_send
    at soup-request.c line 218
  • #8 gravatar_photo_source_get_photo_thread
    at e-gravatar-photo-source.c line 94
  • #9 run_in_thread
    at /tmp/buildd/glib2.0-2.36.0/./gio/gsimpleasyncresult.c line 871
  • #10 io_job_thread
    at /tmp/buildd/glib2.0-2.36.0/./gio/gioscheduler.c line 89
  • #11 g_task_thread_pool_thread
    at /tmp/buildd/glib2.0-2.36.0/./gio/gtask.c line 1242
  • #12 g_thread_pool_thread_proxy
    at /tmp/buildd/glib2.0-2.36.0/./glib/gthreadpool.c line 309
  • #13 g_thread_proxy
    at /tmp/buildd/glib2.0-2.36.0/./glib/gthread.c line 798
  • #14 start_thread
    at pthread_create.c line 304
  • #15 clone
    at ../sysdeps/unix/sysv/linux/i386/clone.S line 130

It's not freezing Evolution, but these deadlocked threads keep piling up.
Comment 1 Matthew Barnes 2013-04-27 14:05:05 UTC
For the time being I worked around the deadlocks by creating a new SoupSession instance for each SoupRequest sent.  Not very efficient, but seems to work.
Comment 2 Dan Winship 2013-04-30 13:13:29 UTC
(In reply to comment #1)
> For the time being I worked around the deadlocks by creating a new SoupSession
> instance for each SoupRequest sent.  Not very efficient, but seems to work.

Do you use soup_session_abort()? If so, do the deadlocks only occur after doing an abort?
Comment 3 Matthew Barnes 2013-04-30 13:45:50 UTC
I was not calling soup_session_abort() myself, but perhaps it's used internally within libsoup to implementation cancellation of soup_request_send()?

Before the work around, the context was simply a central SoupSessionSync with a SoupRequester feature that I was using to create and submit SoupRequests from various worker threads.

g_cancellable_cancel() is called on unfinished requests when the user moves on to another email message, so cancellation is indeed frequent.  I did not notice whether cancellation triggered the deadlocks, but it seems plausible.
Comment 4 Dan Winship 2013-05-31 18:19:01 UTC
Tried reproducing this and failed...

Is it possible you are leaking streams somewhere? (I tried tracing through the logic of e-photo-cache, but it's pretty hairy, and then you have to consider all of its callers too...) Anyway, when using the streaming APIs, it doesn't actually read the response body from the server until you read from the returned GInputStream. So if you leak the stream without reading from it or closing it first, then the message will remain in-progress forever, and so will never yield its SoupConnection back to the pool, and then after this happens enough times, you'll have hit the max-connections limit, and so all further requests will end up blocked in get_connection() waiting for the situation to change, which it never will.

This is a pretty bad failure mode, so libsoup ought to be a little more clever here...
Comment 5 Matthew Barnes 2013-05-31 19:41:38 UTC
I've worked around the issue (poorly) in the current 3.9 code, but was able to reproduce the deadlock quite easily in the commit cited in comment #0.  Is that the revision you were testing?

I suppose it's possible the stream is being leaked.  e-photo-cache is invoked to handle SoupRequests with a custom "mail://contact-photo/..." URI.  The resulting stream is handed off to WebKit via SoupRequest.send_finish() and after that I don't know what happens to it.

You can find this higher-level logic in evolution/mail/e-mail-request.c, which should be a little easier to trace through.
Comment 6 Dan Winship 2013-06-07 17:32:48 UTC
(In reply to comment #5)
> I've worked around the issue (poorly) in the current 3.9 code, but was able to
> reproduce the deadlock quite easily in the commit cited in comment #0.  Is that
> the revision you were testing?

No, I just wrote a program that made lots of requests simultaneously in lots of threads, and cancelled some, etc, but it always completed fine.

I just tried jhbuilding evo, but I'm not sure it's actually working. I can't find any place to turn the gravatar support on, and I can't get it to successfully connect to my IMAP server anyway. And trying to install the rawhide packages wants to pull in the entire universe...

> You can find this higher-level logic in evolution/mail/e-mail-request.c, which
> should be a little easier to trace through.

Yeah, but that doesn't help if the leak is happening in the lower-level parts.
Comment 7 Matthew Barnes 2013-06-07 18:11:40 UTC
Okay thanks.  I'll proceed then on the assumption that Evolution is leaking streams somewhere and see if I can track it down before I pester you further.
Comment 8 Matthew Barnes 2013-06-20 14:02:53 UTC
Starting to see the same SoupSession hang in other contexts now.

For example, bug 702747.  Maybe we're leaking streams there too.
Comment 9 Matthew Barnes 2013-06-21 04:07:21 UTC
Another case, reported downstream:
https://bugzilla.redhat.com/show_bug.cgi?id=976529
Comment 10 Dan Winship 2013-06-29 17:14:33 UTC
(In reply to comment #9)
> Another case, reported downstream:
> https://bugzilla.redhat.com/show_bug.cgi?id=976529

In gdata, which doesn't use the streams API, so maybe we're leaking streams internally...
Comment 11 Dan Winship 2013-07-01 16:57:39 UTC
tried and failed to reproduce this again.

How do I reproduce the gravatar case? What do I have to do to cause evolution to try to load gravatars? (I've updated to F19 now, so I should be able to rebuild it with the broken version of the code.)
Comment 12 Matthew Barnes 2013-07-01 17:13:06 UTC
Revert commit 1b2104e in Evolution and in the Headers tab of Mail Preferences, enable: [x] Show the photograph of sender in the message preview

(There's no switch for Gravatar photos specifically.)

When last I tried this with libsoup 2.42, the deadlocks were frequent.  I'm still trying to get around to checking our streams for reference leaks.  Sorry for the delay.
Comment 13 Dan Winship 2013-07-01 17:27:24 UTC
(In reply to comment #12)
>I'm still trying to get around to checking our streams for reference leaks.

Easy way to do this would be to add printfs to soup_body_input_stream_init() and _finalize(). But as mentioned above, the rhbz bug implies the problem is internal to libsoup.
Comment 14 Dan Winship 2013-07-05 18:05:21 UTC
probably a dup of bug 703463, which is now fixed in git. If you could verify that, that would be great
Comment 15 Matthew Barnes 2013-07-08 17:30:11 UTC
After reverting commit 1b2104e in Evolution and building libsoup from git (version 2.43.2, revision 2081c31), I can still reproduce the deadlocked get_connection() calls almost immediately.

Eventually we starve the thread pool, the message pane stops updating and the status bar fills with "Retrieving message" tasks that never finish and don't respond to cancellation.

I added some debug output to our send_finish() method in e-mail-request.c and in all cases the reference count on the returned GMemoryInputStream was 2: one reference being transferred to the caller and another held by our internal GSimpleAsyncResult.  Also verified the returned streams *are* being finalized.

So bug is still present and doesn't appear to be a stream reference leak.

Happy to try any other debugging steps you can think of.
Comment 16 Dan Winship 2013-07-12 21:02:51 UTC
(In reply to comment #15)
> After reverting commit 1b2104e in Evolution and building libsoup from git
> (version 2.43.2, revision 2081c31), I can still reproduce the deadlocked
> get_connection() calls almost immediately.

Just tried this: with "old" libsoup, I can reproduce the hangs, and gdb shows they're hanging in exactly the same way as bug 703463. ("p soup_connection_get_state (priv->connections->data)" in a stack frame with "SoupSessionPrivate *priv" returns SOUP_CONNECTION_REMOTE_DISCONNECTED). With the patched libsoup, I get no hangs. So I'm thinking you must have been in some sort of jhbuild dll hell where it wasn't pulling in the libsoup.so you thought it was?

I'm building rawhide and f19 packages now. I assume you'll get mail from koji when they're done.

marking DUPLICATE, but reopen if it's really still not fixed for you.

*** This bug has been marked as a duplicate of bug 703463 ***
Comment 17 Matthew Barnes 2013-07-12 22:14:00 UTC
I don't use jhbuild, but it's entirely possible I was pulling in the wrong libsoup.so.  I'll retest with that in mind.  Thanks for looking into this.