GNOME Bugzilla – Bug 688018
Could not ensure credentials for account <myprincipal>: Timeout was reached
Last modified: 2021-07-05 10:58:42 UTC
ensure_credentials_sync sets up a thread default context for its duration. It also calls get_ticket_sync which then goes and uses the global default context. This causes dbus timeouts for Alban Browaeys. This fix changes get_ticket_sync to always use the thread default context, which fixes ensure_credentials_sync time outs, but retains the proper context for e.g. refresh_account
Created attachment 228613 [details] [review] kerberos: use thread default context, not default context for loop
prahal is still seeing a segfault, so let's fully investigate that before pushing attachment 228613 [details] [review]
The following snippet in on_identity_looked_up_to_ensure_credentials looks suspicious: look_up_identity (self, identifier, cancellable, (GAsyncReadyCallback) on_identity_looked_up, operation_result); g_simple_async_result_complete_in_idle (operation_result); g_object_unref (operation_result); The operation_result being passed as user_data to look_up_identity will be lost once the g_simple_async_result_complete_in_idle is done calling the callback. In which case on_identity_looked_up will crash.
Since on_identity_looked_up can set the op_res_gpointer to NULL, there should be a NULL check before calling goa_identity_service_identity_get_expiration_timestamp
Thanks Debarshi . Your irc comment was right (I extended it to another similar location and now I cannot crash goa anymore). in short you told me to remove the two last lines from on_identity_looked_up .. first I found that I mixed up and did that on on_identity_looked_up_to_ensure_credentials. A few minutes ago I fixed my mistake than it was still crashing in any of my tests (namely : test 1: kdestroy then goa-daemon --replace test 2: goa-daemon --replace then kdestroy) and then looking further decided to apply to both at the same time. All works now. @@ -1716,8 +1716,6 @@ on_identity_looked_up (GoaKerberosProvider *provider, NULL, NULL); - g_simple_async_result_complete_in_idle (operation_result); - g_object_unref (operation_result); } static void @@ -1780,8 +1778,6 @@ on_identity_looked_up_to_ensure_credentials (GoaKerberosProvider *self, on_identity_looked_up, operation_result); - g_simple_async_result_complete_in_idle (operation_result); - g_object_unref (operation_result); } This one top of Ray patch.
cleanup to previous changes (as to still unref): @@ -1836,10 +1834,6 @@ ensure_credentials_sync (GoaProvider *provider, operation_result); g_main_loop_run (loop); - g_main_loop_unref (loop); - - g_main_context_pop_thread_default (context); - g_main_context_unref (context); lookup_error = NULL; if (g_simple_async_result_propagate_error (operation_result, &lookup_error)) @@ -1851,6 +1845,12 @@ ensure_credentials_sync (GoaProvider *provider, } identity = g_simple_async_result_get_op_res_gpointer (operation_result); + g_simple_async_result_complete_in_idle (operation_result); + g_object_unref (operation_result); + g_main_context_pop_thread_default (context); + g_main_loop_unref (loop); + g_main_context_unref (context); + now = g_date_time_new_now_local (); timestamp = goa_identity_service_identity_get_expiration_timestamp (identity); @@ -1866,7 +1866,6 @@ ensure_credentials_sync (GoaProvider *provider, g_date_time_unref (now); g_date_time_unref (expiration_time); - g_object_unref (operation_result); return TRUE; } Debarshi, I would like for this to be your commit (I could make a git patch , though signing this change I only start to grasp I feel uneasy)
Created attachment 228634 [details] [review] kerberos: Don't complete and unref the async operation yet
Created attachment 228638 [details] [review] kerberos: Set an error if look_up_identity fails to find an identity Last night Alban showed me a backtrace where goa_identity_service_identity_get_expiration_timestamp was being called with a identity=NULL. Marking failures from look_up_identity as an error should avoid that code path.
I am still unable to reproduce the crashes that Alban has been seeing. However, looking at the code these two areas looked suspicious to me, so hopefully this helps.
comment 8 patch is partial: ie in comment 5 + comment 6 I move the complete and unred async operation to after: identity = g_simple_async_result_get_op_res_gpointer (operation_result); in "ensure_credentials_sync". Instead of doing complete and unref in "on_identity_looked_up" then identity = g_simple_async_result_get_op_res_gpointer the async operation. Ie I move the complete and unref after the last call to the async operation.
Review of attachment 228634 [details] [review]: totally right.
Review of attachment 228638 [details] [review]: ::: src/goabackend/goakerberosprovider.c @@ +629,3 @@ + error = g_error_new_literal (GOA_ERROR, + GOA_ERROR_NOT_AUTHORIZED, + ""); hmm, not sure about this. the error code doesn't really fit for "not found", there's no error message etc. Also, there's code that checks for identity != NULL in on_identity_looked_up now which will never succeed, so if we do go this route we should scrub that code, or change it to a warning or something, too. I think it's probably better, though, to just check if identity is NULL at the caller. Note, I don't think it normally be NULL in practice, because look_up should always succeed if get_ticket_sync succeeds. If get_ticket_sync fails, then ensure_credentials_sync will fail earlier. It would only be NULL without error if something external happened in the small race window between get_ticket_sync and look_up_identity.
(In reply to comment #5) > @@ -1716,8 +1716,6 @@ on_identity_looked_up (GoaKerberosProvider *provider, > NULL, > NULL); > > - g_simple_async_result_complete_in_idle (operation_result); > - g_object_unref (operation_result); > } This one is important. Removing it means we don't clean up the operation properly.
(In reply to comment #6) > cleanup to previous changes (as to still unref): > @@ -1836,10 +1834,6 @@ ensure_credentials_sync (GoaProvider *provider, > operation_result); > > g_main_loop_run (loop); > - g_main_loop_unref (loop); > - > - g_main_context_pop_thread_default (context); > - g_main_context_unref (context); > > lookup_error = NULL; > if (g_simple_async_result_propagate_error (operation_result, &lookup_error)) > @@ -1851,6 +1845,12 @@ ensure_credentials_sync (GoaProvider *provider, > } > > identity = g_simple_async_result_get_op_res_gpointer (operation_result); > + g_simple_async_result_complete_in_idle (operation_result); Generally callers don't complete operations they initiate, the code they initiate completes the operations. If you have this call here, then you're calling it after the local main loop has stopped running.
(In reply to comment #12) > ::: src/goabackend/goakerberosprovider.c > @@ +629,3 @@ > + error = g_error_new_literal (GOA_ERROR, > + GOA_ERROR_NOT_AUTHORIZED, > + ""); > > hmm, not sure about this. the error code doesn't really fit for "not found", > there's no error message etc. I used GOA_ERROR_NOT_AUTHORIZED because if the error is bubbled up to the caller via goa-daemon then only a NOT_AUTHORIZED error will trigger the "an account needs attention" notification. I left out the error message because I wasn't really sure what it should be and if we are going to use this in gnome-3-6, then the string freeze will be a problem. > Also, there's code that checks for identity != NULL in on_identity_looked_up > now which will never succeed, so if we do go this route we should scrub that > code, or change it to a warning or something, too. I removed it. Didn't I? > I think it's probably better, though, to just check if identity is NULL at the > caller. > > Note, I don't think it normally be NULL in practice, because look_up should > always succeed if get_ticket_sync succeeds. If get_ticket_sync fails, then > ensure_credentials_sync will fail earlier. It would only be NULL without error > if something external happened in the small race window between get_ticket_sync > and look_up_identity. Ok. As you wish, really. I couldn't reproduce Alban's crash so I wrote this, just in case. :-)
Comment on attachment 228613 [details] [review] kerberos: use thread default context, not default context for loop While testing the patches in bug 687962 I encountered timeouts after restarting the session and trying to EnsureCredentials. This patch fixed it for me.
this change was ray patch to apply first . Else no crashes but timeouts.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/ Thank you for your understanding and your help.