GNOME Bugzilla – Bug 680961
Broken dispatching of EAuthenticationSession
Last modified: 2014-01-21 17:38:51 UTC
Also (bug #680960) after recent Matthew's changes, the "Retrieving "Out of Office" settings" never ends here. To be honest, I do not recall whether it ever worked, but I suppose it did, when Matthew was fixing it after account-mgmt merge. Steps: a) Open EWS account properties b) select "Out of Office" section See the bottom activity never ends, clicking Cancel beside the activity name does nothing too.
Seems to work fine here, although we probably need a way to cancel submitting data to the server if it gets stuck when you click OK.
I see in backtrace that this is waiting on reply from source-registry, but it either missed it, or something like that. I have two such threads now, because I opened EWS account Properties twice. The source-registry is idle, no thread doing anything.
+ Trace 230615
Thread 2 (Thread 0x7f0e756ed700 (LWP 16335))
It seems to recover if I run source registry from a console, if I left it on D-Bus, then the previous behaviour is back again. This is what I see in the source-registry console. AUTH (EWS): Initiated AUTH (x): Initiated AUTH (z): Initiated AUTH (EWS): Complete (SUCCESS) AUTH (x): Complete (SUCCESS) AUTH (y): Initiated AUTH (z): Complete (SUCCESS) AUTH (y): Complete (SUCCESS) AUTH (EWS): Initiated AUTH (EWS): Complete (ERROR - Client cancelled the operation) AUTH (EWS): Initiated AUTH (EWS): Complete (ERROR - Client cancelled the operation) The last two requests are from the time when I open Properties of the offending EWS account. The first is for OOO, the second is for Delegates (I work on now). I guess the error is misleading?
I'm setting this to blocker and moving to eds, because I cannot open any ews book. Steps: a) close evolution and all related process b) $ evolution -c contacts d) when books from enabled ews accounts are shown select one of it e) wait few seconds, might not do much f) select another ews book - still nothing g) get backtrace of running addressbook-factory see two threads stuck in e_source_registry_authenticate_sync() call h) get backtrace of source-registry see all threads are idle
Created attachment 221254 [details] eds debug patch for evolution-data-server; This adds tons of prints around authentication which helps to identify the issue.
After quite some investigation I found out (with the above patch) that the main problem is with authentication_session_execute(), when it is executed. Usual password prompts are executed on the main thread, with the default main context, which is all fine. This changes for ews, which on start calls e_backend_authenticate_sync() on a dedicated thread which has set its own thread default main context - ews is updating list of sources by checking for available folders. There happens this: - source registry starts - EWS starts to fetch its folders, with authentication still running - in evolution select EWS book - book factory starts auth request, and waits - in source registry the book's auth request is received and it's piled into the group of waiting, because this one is still pending - ews finished folder sync, and thus also closes the below thread, but the authentication_session_execute_thread() is finished on IDLE in the main context which authentication_session_execute() had as the thread's default, which is just gone now, thus the callback is never called Or something along that causes the issue, basically the authentication_session_execute() should be always run in the main thread, or in a thread which doesn't get rid of the main context before it is able to receive pending requests. Breakpoint 1, authentication_session_execute (session=0x7fffb4004360 [EAuthenticationSession], io_priority=0, cancellable=0x0, callback=0x7ffff7dd6afb <source_registry_server_auth_session_cb>, user_data= 0xc03100) at e-authentication-session.c:656 656 g_return_if_fail (E_IS_AUTHENTICATION_SESSION (session)); (gdb) bt
+ Trace 230679
Continuing. [New Thread 0x7fffba7fc700 (LWP 30285)] authentication_session_execute: for uid '1342640443.11026.89@zyxPad' mc:0x7fffac002800 default:0
Still not able to reproduce the problem for some reason, but I think I at least understand it in theory. I changed EAuthenticationMediator to export its D-Bus interface from an isolated thread so its signal emissions can't be inhibited by something calling g_main_context_push_thread_default(). Basically the same hack I had to add to ESourceRegistry. Don't know if it fixes this bug, but authentication still works for me. Can you retest with this commit? http://git.gnome.org/browse/evolution-data-server/commit/?id=e13cb4e0ba820694f908fe39255ff8f7a6239038
Created attachment 221345 [details] simpler eds debug patch Unfortunately not, it's still broken. This is easier debug patch, please apply it and, with at least one EWS account configured and enabled, run the source registry on console like this: $ ./evolution-source-registry >/dev/null you'll then see error messages only. This issue will be fine once there will be no line like this in the console output of the source registry: authentication_session_execute: '1344263232.10334.4@zyxPad' on default context:0 aka each "on default context" should be 1. It is 1 for remote requests (like if I run evolution and it asks for passwords for mail accounts), but 0 for local requests. It worked for remote requests even before your recent change.
I'm afraid your latest change caused this crash (it calls g_thread_join() in a thread which it tries to join).
+ Trace 230682
Thread 2 (Thread 0x7fad797fa700 (LWP 24595))
(In reply to comment #9) > I'm afraid your latest change caused this crash (it calls g_thread_join() in a > thread which it tries to join). Okay, I'm going to revert the change then since it doesn't solve the problem and it was stable for me before.
(In reply to comment #4) > I'm setting this to blocker and moving to eds, because I cannot open any ews > book. Steps: > a) close evolution and all related process > b) $ evolution -c contacts > d) when books from enabled ews accounts are shown select one of it > e) wait few seconds, might not do much > f) select another ews book - still nothing > g) get backtrace of running addressbook-factory > see two threads stuck in e_source_registry_authenticate_sync() call > h) get backtrace of source-registry > see all threads are idle Following these steps using that latest git revision still works fine for me. Using our "gss-test1" account, I get contacts shown for the "Contacts" and "shared-book-mcrha" address books when I click on the in the sidebar. Seems like maybe we're looking in the wrong place here.
FWIW, on my build that works: $ pkg-config --modversion glib-2.0 => 2.32.3 $ pkg-config --modversion gtk+-3.0 => 3.4.2 $ pkg-config --modversion libsoup-2.4 => 2.38.1
I use slightly newer glib, 2.32.4, and gtk3, 3.4.3. Libsoup is the same. Try to login to my account (mcrha, same password), I have configured more sources, though from my point of view, if you apply debug patch from comment #8 and test it, then you'll also see the "default context:0" on the console.
Looking at your debug patch, it's making wrong assumptions. The authentication session does not need to run on the global default main context. The Authenticator D-Bus interface will emit signals from whatever was the thread-default main context on the current thread at the time the interface was exported. The problem I thought I was correcting with the now-reverted patch was that the thread-default main context may be different when the D-Bus interface is first exported versus when the try_password() method is invoked. try_password() relies on receiving a D-Bus signal -- either Accepted or Rejected or Cancel -- in order to complete, but if the D-Bus signals are being suppressed by a different thread-default main context then try_password() eventually just time out after several minutes. That said, I set up an EWS account with your user name and I think I might be seeing the problem now. But the gss-test1 account still works fine. Odd. I'll dig further.
(In reply to comment #14) > That said, I set up an EWS account with your user name and I think I might be > seeing the problem now. I take it back. I think it was just taking awhile to download contacts but all your address books show content for me now. I even stopped everything, cleared out the caches, and retested per comment #6 and contacts show up fine after a short syncing delay. So I still can't reproduce whatever it is you're seeing.
I didn't download changes in the folder, it was already done. If you look into backtrace of comment #6, the authentication_session_execute() is called inside ews_backend_sync_folders_thread(). The this is that the authentication_session_execute() creates a new GSimpleAsyncResult, which remembers default_thread_context, which is currently tight to something from the backtrace in comment #6, and this thing is removed before the simple async result is delivered on "idle". That's my current theory about the issue. To be able to successfully reproduce this, you might want follow steps from comment #4 and make sure your request for authentication from step d), aka when selecting ews book, will be piled into waiting, aka the ews_backend_sync_folders_thread() will not be finished yet when you'll select the book in UI. It's quite easy here. By the way, do you see, with the debug patch, that the default context is 0?
Created attachment 222205 [details] [review] proposed eds patch for evolution-data-server; This change makes it work properly for me. I verified with my debug patch that this does what it should. It's enough to "pretend" authentication_session_execute() is run with the default main context as a thread default context and I can reproduce the issue again.
Okay, so it looks like the execute() method needs to run in a particular main context, but we can't assume it's always the global default main context. I'm sure that's a coincidence. Probably we need to break API and have e_authentication_session_new() explicitly take a GMainContext argument from whatever currently HAPPENS to be using the global default main context, and then have e_authentication_session_execute() schedule an idle source on the GMainContext we were given. The idle source will then execute the authentication session, similar to the helper function in your patch.
(In reply to comment #18) > Probably we need to break API and have e_authentication_session_new() > explicitly take a GMainContext argument from whatever currently HAPPENS to be > using the global default main context, and then have > e_authentication_session_execute() schedule an idle source on the GMainContext > we were given. The idle source will then execute the authentication session, > similar to the helper function in your patch. Actually, EAuthenticationSession is the wrong place to do this. e_authentication_mediator_new() should take the GMainContext argument and execute its try_password() method from an idle callback on that GMainContext. That would get at the problem I was trying to solve with my first patch, and makes more sense to me as to why its needed.
Are you sure? Check the backtrace in comment #6, there is none try_password() in it.
Created attachment 222223 [details] yet-another eds debug patch for evolution-data-server; So, the evil is e_async_closure_new(), called from e_source_registry_server_authenticate_sync(), which sets its own default thread context, which this debug patch proves. Before it (and after freeing the EAsyncClosure) the default thread context is the same as the default main context.
Here's the relevant output: collection_backend_authenticate_sync: '1344263232.10334.4@zyxPad' on default context:1, thread_default:(nil) e_source_registry_server_authenticate_sync: before closure_new '1344263232.10334.4@zyxPad' on default context:1, thread_default:(nil) e_source_registry_server_authenticate_sync: after closure_new '1344263232.10334.4@zyxPad' on default context:0, thread_default:0x7faf10005ce0 authentication_session_execute: '1344263232.10334.4@zyxPad' on default context:0, thread_default:0x7faf10005ce0 ... e_source_registry_server_authenticate_sync: before finish '1344263232.10334.4@zyxPad' on default context:0, thread_default:0x7faf10005ce0 e_source_registry_server_authenticate_sync: after closure_free '1344263232.10334.4@zyxPad' on default context:1, thread_default:(nil)
Can you break on e_source_registry_server_authenticate_sync() and then post a full backtrace with all threads?
Here you are. This is for the first call: Breakpoint 1, e_source_registry_server_authenticate_sync (server=0x762010 [ESourceRegistryServer], session=0x7fffd401a800 [EAuthenticationSession], cancellable=0x0, error=0x7fffdc595bb8) at e-source-registry-server.c:2038 2038 g_return_val_if_fail (E_IS_SOURCE_REGISTRY_SERVER (server), FALSE); (gdb) t a a bt
+ Trace 230729
Thread 1 (Thread 0x7ffff72bc980 (LWP 19807))
This is for the last. I have about 4 calls of it: Breakpoint 1, e_source_registry_server_authenticate_sync (server=0x762010 [ESourceRegistryServer], session=0x7fffac001c00 [EAuthenticationSession], cancellable=0x0, error=0x7fffbaffcbb8) at e-source-registry-server.c:2038 2038 g_return_val_if_fail (E_IS_SOURCE_REGISTRY_SERVER (server), FALSE); (gdb) t a a bt
+ Trace 230730
I recommitted my original solution from comment #7 with improved thread-safety, just to eliminate EAuthenticationMediator from the equation. Running a GDBusInterface in an isolated thread where its signals can't be inhibited by other code is still the best approach, imo. http://git.gnome.org/browse/evolution-data-server/commit/?id=43931f398f0289fac528489a01c8404c231b9f61 I still want to get to the bottom of WHY e_source_registry_server_authenticate_sync() works in one main context but not another. There's a deeper issue here, and I want to understand it and not just hack around it.
No, it doesn't work, furthermore, it's worse then before. The first start I didn't get any other than GAL from my ews account, the evolution was stuck in waiting for authentication on mailer part (why is that running when I run in contacts i really do not understand) with source registry being idle. The second start I got list of other books from ews account, but they didn't open again. The book factory was waiting for a response, the source registry in this state:
+ Trace 230733
Thread 1 (Thread 0x7f8b642ae980 (LWP 9212))
Created attachment 222301 [details] [review] yet-another version of the eds patch for evolution-data-server; nailed down to source_registry_server_maybe_start_auth_session().
Fixed in: http://git.gnome.org/browse/evolution-data-server/commit/?id=4afcf9c43a1141230d270974d6d20d0bf9396414 Similar to your patch, but uses an explicit main context instead of assuming the default.
It seems there's a regression on this, see downstream bug from F18 Beta, which has 3.6.2-2 of evolution and 3.6.2-1 of evolution-data-server: https://bugzilla.redhat.com/show_bug.cgi?id=880825
*** Bug 694739 has been marked as a duplicate of this bug. ***
I'm reopening this due to more people facing this issue in 3.6.x.
Created attachment 237554 [details] [review] proposed ed spatch for evolution-data-server; This reverts change from comment #29 and adds change from comment #28. Using g_main_context_ref_thread_default() this way leads to similar issues, aka is plain wrong. Prove above.
It seems to be better with Milan’s patch. I can do now search and save messages for offline (both feats I had not been able to achieve previously).
Although however, when I was trying to Quit Evolution now I still got it frozen (of course, I have no clue whether this is the same issue as before): (gdb) t a a bt
+ Trace 231612
Thread 1 (Thread 0x7fc417959a00 (LWP 21615))
*** Bug 708697 has been marked as a duplicate of this bug. ***
*** Bug 709013 has been marked as a duplicate of this bug. ***
I am able to reproduce this when I have configured a GOA EWS account and I change password on the server, thus the one stored in keyring is outdated. Then evolution can get into this state of "waiting for nothing in e_source_registry_authenticate_sync".
Created attachment 261486 [details] [review] eds patch for evolution-data-server; There was basically no error reporting from the server to the client when any such occurred during authentication on the server side, thus the client was left waiting for a response which never happened. Even the 'dismiss' signal has a comment that it can be used for such cases, it's not the right thing to do, because it hides the errors from a user, thus he/she doesn't know that something went wrong. For that a new ServerError signal was added and the error is properly propagated to the client. Note the GOA errors don't strip the GDBus error from the message, which looks odd in the UI, thus I added a workaround for that too.
Created commit eaf23db in eds master (3.11.3+) Created commit 25ddf92 in eds gnome-3-10 (3.10.3+)
*** Bug 709608 has been marked as a duplicate of this bug. ***
*** Bug 721644 has been marked as a duplicate of this bug. ***