GNOME Bugzilla – Bug 777161
Deadlock under soup_auth_manager_use_auth()
Last modified: 2018-09-21 16:27:47 UTC
I put my computer to hibernation at night and resume in the morning. I have a pppoe connection which times out after I resume and gets redialed (pppd). In the meantime, it seems libsoup/evolution gets confused and stops being able to read/update calendars till I "pkill evolution-calendar-factory" A backtrace log using gdb reveals the following: [New LWP 23946] [New LWP 23947] [New LWP 23948] [New LWP 23949] [New LWP 23952] [New LWP 23953] [New LWP 23965] [New LWP 23967] [New LWP 23970] [New LWP 24083] [New LWP 24085] [New LWP 24088] [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". 0x00007f3be2526b99 in syscall () from /usr/lib/libc.so.6
+ Trace 237054
Thread 13 (Thread 0x7f3b9effd700 (LWP 24088))
Thread 12 (Thread 0x7f3ba4ff9700 (LWP 24085))
Thread 11 (Thread 0x7f3b9ffff700 (LWP 24083))
Thread 1 (Thread 0x7f3be3c4f740 (LWP 23944))
The backtrace shows that one thread (the main thread) is in the soup_session_abort() call, which several other threads are in soup_auth_manager_use_auth() waiting for some lock to be acquired. The problem is that the auth_got_headers() at soup-auth-manager.c line 626 holds the private structure lock, when the CalDAV signal handler calls the soup_auth_manager_use_auth() at soup-auth-manager.c line 809, which tries to acquire the same lock at the same thread.
Created attachment 343350 [details] [review] proposed patch Either make the lock recursive (what this patch does), or free the lock when the outer callbacks are going to be called.
Hi. I've been using this patch for the last few days and I haven't seen any deadlocks in evolution calendar backend since.
This doesn't seem to be new in libsoup, thus it would be nice to get the proposed patch (if accepted) also to any future stable release. Thanks in advance.
Comment on attachment 343350 [details] [review] proposed patch I don't think a recursive mutex is the right solution. (Recursive mutexes are almost never the right solution.) But I haven't looked enough to know if there are issues with just releasing the lock around the signal emission.
> (Recursive mutexes are almost never the right solution.) Right, I understand, but here the problem is that the signal emission means calling anything outside libsoup control, and users (like me), can do weird things, which look correct though. Thus I chose to use recursive mutex, also because signal emission is synchronous, not asynchronous, thus if anything needs to modify the libsoup object state can do it only in the same thread, which is perfectly fine limitation (not a limitation as such, from my point of view). There are more users hitting this issue, which prevents them to use Google calendars from evolution-data-server. Could you reconsider this, please?
Using a recursive mutex and allowing re-entrant calls only works if the internal SoupAuthManager state is guaranteed to be fully self-consistent at the point when the signal is emitted, and if the function emitting the signal doesn't assume that the state of the auth-manager after the signal emission is the same as the state before the signal emission. But if all of that is true, then we don't need a recursive mutex; you can just release the mutex around the signal emission instead. So, if you can prove that a recursive mutex will work, then you don't need it.
Yes, the recursive mutex works, but look at the backtrace in comment #0. Three threads are inside auth_got_headers() and the main thread is in soup_session_abort(). Thus the two can interleave. I do not know libsoup internals and I see that soup_session_abort() waits on a GCond, not on the mutex, thus this can be irrelevant. One thing I didn't think of before, is it "legal" to call soup_auth_manager_use_auth() inside the SoupSession::authenticate signal? If it's wrong, then the problem is in CalDAV code instead. The SoupSession::authenticate is called with already chosen 'auth', thus it can be eventually wrong to try to change it. Maybe? I do not know how much sense it would be to try to switch authentication method during authentication. It surely makes sense when libsoup begins for example Basic authentication and I realize in "authenticate" handler that the correct authentication method is OAuth (or any other), in which case I want to change it, without a round trip of cancelling current message send and re-issuing it with different authentication object. Again, I can be wrong. Please, correct me, if I am. I also noticed that I call soup_auth_manager_use_auth() with the same 'auth' object as the one passed to "authenticate" callback, thus it is a nonsense in this case. And the three threads are all working with different SoupSession object. I'm more and more convinced that it's me doing things wrong, not libsoup. What do you think, Dan?
I made a change which will not call soup_auth_manager_use_auth() when inside the "authenticate" signal handler for the time being. If it's the right thing to do, then we can move this bug report back to evolution-data-server and close it. Created commit_a40d3cd in eds master (3.25.1+) [1] Created commit_d0b262d in eds gnome-3-24 (3.24.1+) [1] https://git.gnome.org/browse/evolution-data-server/commit/?id=a40d3cd
I haven't had a chance to figure this out completely, but if you aren't allowed to call use_auth() from inside an authenticate callback, then that at least needs to be documented (and there should probably be a return-if-fail). So leave this bug open anyway.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/libsoup/issues/99.