GNOME Bugzilla – Bug 658013
login hangs when a11y is turned on in the greeter session
Last modified: 2011-09-25 18:16:20 UTC
This was debugged to a hanging dbus call in the registryd, or somesuch. Ray knows.
I have not been able to reproduce this while playing with the osk on the login screen.
This seems to have been gremlins, cosmic rays, or bad luck or something. i only hit the issue the one time and haven't reproduced since after considerable effort.
To give more details about what I was seeing though... I applied the patch from attachment 195094 [details] [review] (bug 612662) and then logged in with the on screen keyboard. I clicked the virtual "enter" key not the Sign In button after typing the password. At that point things just hung and they stayed hung for the 10-15 minutes i was poking around. I attached to gnome-shell with gdb and it was stuck in a nested main loop in atk-bridge. I believe it was the send_and_allow_reentry function. That function sends off a dbus message over the a11y bus and waits for the reply in a nested main loop. The reply function quits the main loop and things unstick from there. In this particular case it was trying to let at-spi-registryd know about the key release event from me hitting the enter key in the on screen keyboard. It never unstuck though. It was waiting patiently in the main loop blocking the shell. This suggests the reply callback either never got called, or when it got called the loop didn't quit. For reference the reply callback is: static void set_reply (DBusPendingCall * pending, void *user_data) { SpiReentrantCallClosure* closure = (SpiReentrantCallClosure *) user_data; closure->reply = dbus_pending_call_steal_reply (pending); dbus_pending_call_unref (pending); switch_main_context (NULL); g_main_loop_quit (closure->loop); } Interestingly enough, the a11y dbus daemon and at-spi-registryd were both already dead by the time I started gdb. Examining the running state of the pending call, I saw it had a 25 second timeout set on it, and also the bus connection had recorded the dbud-daemon's death already. As I said though, it sat there for 10-15 minutes and never timed out. I got distracted by something else and put this down and haven't been able to reproduce it since, so I don't have anything else to add.
I've been able to reproduce this artificially. The DBusPendingCall is never notified that the connection went away. I suspect this is a libdbus bug; I need to investigate more / write a test case / file a bug against dbus. The dbus release schedule is not the same as ours, so, regardless, I should try to do something in at-spi2-atk to prevent it from happening. So I've added a g_timeout to check for disconnection and quit the main loop if the connection has gone away.
Matthias saw this again, reopening.
matthias managed to get a trace today: Thread 6 (Thread 0x7f440a976700 (LWP 3635)): Breakpoint 1 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87. Thread 5 (Thread 0x7f440a175700 (LWP 3636)): Note: breakpoint 1 also set at pc 0x33e62e7343. Breakpoint 2 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87. Thread 4 (Thread 0x7f440976d700 (LWP 3637)): Note: breakpoints 1 and 2 also set at pc 0x33e62e7343. Breakpoint 3 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87. Thread 3 (Thread 0x7f4408f5c700 (LWP 3638)): Note: breakpoints 1, 2 and 3 also set at pc 0x33e62e7343. Breakpoint 4 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87. Thread 2 (Thread 0x7f43dd1ec700 (LWP 3667)): Note: breakpoints 1, 2, 3 and 4 also set at pc 0x33e62e7343. Breakpoint 5 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87. Thread 1 (Thread 0x7f44132f19c0 (LWP 3634)): Note: breakpoints 1, 2, 3, 4 and 5 also set at pc 0x33e62e7343. Breakpoint 6 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87. Thread 6 (Thread 0x7f440a976700 (LWP 3635)): #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x000000360a045048 in g_main_context_poll (n_fds=1, fds=0x7f4404001170, priority=<optimized out>, timeout=-1, context=0x2e669e0) at gmain.c:3402 #2 g_main_context_iterate (context=0x2e669e0, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084 #3 0x000000360a045885 in g_main_loop_run (loop=0x7f4404001150) at gmain.c:3297 #4 0x00007f440a97b9fb in dconf_context_thread (data=0x2e669e0) at dconfcontext.c:11 #5 0x000000360a06a466 in g_thread_create_proxy (data=0x2e657b0) at gthread.c:1962 #6 0x00000033e6a07d31 in start_thread (arg=0x7f440a976700) at pthread_create.c:305 #7 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 5 (Thread 0x7f440a175700 (LWP 3636)): #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x000000360a045048 in g_main_context_poll (n_fds=3, fds=0x7f43fc006010, priority=<optimized out>, timeout=-1, context=0x7f440400ce00) at gmain.c:3402 #2 g_main_context_iterate (context=0x7f440400ce00, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084 #3 0x000000360a045885 in g_main_loop_run (loop=0x7f440400cdb0) at gmain.c:3297 #4 0x000000360b0c7996 in gdbus_shared_thread_func (user_data=0x7f440400cdd0) at gdbusprivate.c:276 #5 0x000000360a06a466 in g_thread_create_proxy (data=0x7f440400cef0) at gthread.c:1962 #6 0x00000033e6a07d31 in start_thread (arg=0x7f440a175700) at pthread_create.c:305 #7 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 4 (Thread 0x7f440976d700 (LWP 3637)): #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000033ffc2c08f in poll_func (ufds=0x7f4400001460, nfds=2, timeout=-1, userdata=0x2e9bb60) at pulse/thread-mainloop.c:75 #2 0x00000033ffc1df06 in pa_mainloop_poll (m=0x2e9ef00) at pulse/mainloop.c:879 #3 0x00000033ffc1e539 in pa_mainloop_iterate (m=0x2e9ef00, block=<optimized out>, retval=0x0) at pulse/mainloop.c:961 #4 0x00000033ffc1e5f0 in pa_mainloop_run (m=0x2e9ef00, retval=0x0) at pulse/mainloop.c:979 #5 0x00000033ffc2c03f in thread (userdata=0x2e9a690) at pulse/thread-mainloop.c:94 #6 0x0000003400038be8 in internal_thread_func (userdata=0x2e9b510) at pulsecore/thread-posix.c:83 #7 0x00000033e6a07d31 in start_thread (arg=0x7f440976d700) at pthread_create.c:305 #8 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 3 (Thread 0x7f4408f5c700 (LWP 3638)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165 #1 0x00000033f2a23020 in PR_WaitCondVar (cvar=0x2f0c150, timeout=4294967295) at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:417 #2 0x000000340a0b5ab7 in js::GCHelperThread::threadLoop (this=0x2ea5678, rt=0x2ea5400) at jsgc.cpp:2084 #3 0x00000033f2a283b3 in _pt_root (arg=0x2f0c1f0) at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:187 #4 0x00000033e6a07d31 in start_thread (arg=0x7f4408f5c700) at pthread_create.c:305 #5 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 2 (Thread 0x7f43dd1ec700 (LWP 3667)): #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x00000033ffc2c08f in poll_func (ufds=0x7f43e4006f60, nfds=2, timeout=-1, userdata=0x34b2ac0) at pulse/thread-mainloop.c:75 #2 0x00000033ffc1df06 in pa_mainloop_poll (m=0x2f04000) at pulse/mainloop.c:879 #3 0x00000033ffc1e539 in pa_mainloop_iterate (m=0x2f04000, block=<optimized out>, retval=0x0) at pulse/mainloop.c:961 #4 0x00000033ffc1e5f0 in pa_mainloop_run (m=0x2f04000, retval=0x0) at pulse/mainloop.c:979 #5 0x00000033ffc2c03f in thread (userdata=0x1bec8a0) at pulse/thread-mainloop.c:94 #6 0x0000003400038be8 in internal_thread_func (userdata=0x48667d0) at pulsecore/thread-posix.c:83 #7 0x00000033e6a07d31 in start_thread (arg=0x7f43dd1ec700) at pthread_create.c:305 #8 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7f44132f19c0 (LWP 3634)): #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87 #1 0x000000360a045048 in g_main_context_poll (n_fds=2, fds=0x3309d70, priority=<optimized out>, timeout=-1, context=0x2e696c0) at gmain.c:3402 #2 g_main_context_iterate (context=0x2e696c0, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084 #3 0x000000360a045885 in g_main_loop_run (loop=0x49d7e30) at gmain.c:3297 #4 0x00007f440b9cfddc in send_and_allow_reentry (message=0x1c1b8d0, bus=0x2e6ab60) at ../../atk-adaptor/event.c:118 #5 Accessibility_DeviceEventController_NotifyListenersSync (key_event=0x7fff4fdc34b0) at ../../atk-adaptor/event.c:152 #6 spi_atk_bridge_key_listener (event=<optimized out>, data=<optimized out>) at ../../atk-adaptor/event.c:221 #7 0x00000036134b47d0 in notify_hf (key=<optimized out>, value=<optimized out>, data=<optimized out>) at cally/cally-util.c:462 #8 0x000000360a032ac0 in g_hash_table_foreach_remove_or_steal (hash_table=0x36f0000, func=0x36134b47c0 <notify_hf>, user_data=0x348d040, notify=0) at ghash.c:1309 #9 0x00000036134b4dd4 in cally_key_snooper (event=0x373f2a0, actor=<optimized out>, user_data=<optimized out>) at cally/cally-util.c:493 #10 cally_key_snooper (actor=<optimized out>, event=0x373f2a0, user_data=<optimized out>) at cally/cally-util.c:473 #11 0x00000036134b6418 in _clutter_marshal_BOOLEAN__BOXED (closure=0x2e871e0, return_value=0x7fff4fdc37b0, n_param_values=<optimized out>, param_values=0x3714860, invocation_hint=<optimized out>, marshal_data=<optimized out>) at clutter-marshal.c:85 #12 0x000000360a80eb54 in g_closure_invoke (closure=0x2e871e0, return_value=0x7fff4fdc37b0, n_param_values=2, param_values=0x3714860, invocation_hint=<optimized out>) at gclosure.c:774 #13 0x000000360a821e0b in signal_emit_unlocked_R (node=<optimized out>, detail=0, instance=0x1bc3bb0, emission_return=0x7fff4fdc38f0, instance_and_params=0x3714860) at gsignal.c:3272 #14 0x000000360a82b2e6 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=0x7fff4fdc3958) at gsignal.c:3013 #15 0x000000360a82b622 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at gsignal.c:3060 #16 0x0000003613438598 in clutter_actor_event (actor=0x1bc3bb0 [ClutterStage], event=0x373f2a0, capture=1) at ./clutter-actor.c:8655 #17 0x0000003613478ca3 in emit_event (is_key_event=1, event=0x373f2a0) at ./clutter-main.c:1961 #18 emit_keyboard_event (event=0x373f2a0) at ./clutter-main.c:2016 #19 _clutter_process_event_details (event=0x373f2a0, context=0x1bc5ec0, stage=<optimized out>) at ./clutter-main.c:2106 #20 _clutter_process_event (event=0x373f2a0) at ./clutter-main.c:2329 #21 0x0000003613491108 in _clutter_stage_process_queued_events (stage=0x1bc3bb0 [ClutterStage]) at ./clutter-stage.c:898 #22 0x000000361347ab20 in clutter_clock_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ./clutter-master-clock.c:360 #23 0x000000360a044b3d in g_main_dispatch (context=0x1bce490) at gmain.c:2441 #24 g_main_context_dispatch (context=0x1bce490) at gmain.c:3011 #25 0x000000360a045338 in g_main_context_iterate (context=0x1bce490, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3089 #26 0x000000360a045885 in g_main_loop_run (loop=0x1bcd490) at gmain.c:3297 #27 0x00000036114569e1 in meta_run () at core/main.c:555 #28 0x00000000004029e1 in main (argc=1, argv=0x7fff4fdc3e68) at main.c:571 A debugging session is active. Inferior 1 [process 3634] will be detached. Quit anyway? (y or n) Detaching from program: /usr/bin/gnome-shell, process 3634
The timeout source added recently to work around this bug won't work because it's added to the default context instead of the main context associated with the nested main loop.
Created attachment 197377 [details] [review] Prevent gnome-shell getting stuck on the login screen The workaround that was committed for this didn't have the intended effect, since the timeout was added to the default main context, not the one that is used in the recursive mainloop. Without this patch, my login screen would freeze when hitting Enter in the password entry, with it, I could successfully log in 10 out of 10 times.
Review of attachment 197377 [details] [review]: ::: atk-adaptor/event.c @@ +122,3 @@ g_main_loop_run (closure.loop); if (closure.timeout != -1) g_source_remove (closure.timeout); Note we can't call g_source_remove() anymore, we have to use g_source_destroy()
I'm fine with the patch; Colin's observation is technically unrelated to it but we might as well make the change.
Colin's point was that g_source_remove only works on the default context. g_source_attach returns a "tag" integer for convienence, but g_source_remove assumes the tag passed to it is associated with a source on the default context.
Created attachment 197412 [details] [review] Prevent gnome-shell getting stuck on the login screen The workaround that was committed for this didn't have the intended effect, since the timeout was added to the default main context, not the one that is used in the recursive mainloop. Without this patch, my login screen would freeze when hitting Enter in the password entry, with it, I could successfully log in 10 out of 10 times.
Created attachment 197413 [details] [review] Prevent gnome-shell getting stuck on the login screen The workaround that was committed for this didn't have the intended effect, since the timeout was added to the default main context, not the one that is used in the recursive mainloop. Without this patch, my login screen would freeze when hitting Enter in the password entry, with it, I could successfully log in 10 out of 10 times.
Updated patch has the g_source_destroy fix. This has gotten release team approval for freeze break.
Comment on attachment 197413 [details] [review] Prevent gnome-shell getting stuck on the login screen Looks good to me. Thanks for the patch! >From 51b5abcea9ee948d11aeb5d6062186f3d4f5ee59 Mon Sep 17 00:00:00 2001 >From: Matthias Clasen <mclasen@redhat.com> >Date: Fri, 23 Sep 2011 17:31:53 -0400 >Subject: [PATCH] Prevent gnome-shell getting stuck on the login screen > >The workaround that was committed for this didn't have the >intended effect, since the timeout was added to the default >main context, not the one that is used in the recursive >mainloop. > >Without this patch, my login screen would freeze when hitting >Enter in the password entry, with it, I could successfully log >in 10 out of 10 times. > >https://bugzilla.gnome.org/show_bug.cgi?id=658013 >--- > atk-adaptor/event.c | 8 ++++++-- > 1 files changed, 6 insertions(+), 2 deletions(-) > >diff --git a/atk-adaptor/event.c b/atk-adaptor/event.c >index 79c2884..586d196 100644 >--- a/atk-adaptor/event.c >+++ b/atk-adaptor/event.c >@@ -100,6 +100,7 @@ send_and_allow_reentry (DBusConnection * bus, DBusMessage * message) > DBusPendingCall *pending; > SpiReentrantCallClosure closure; > GMainContext *main_context; >+ GSource *source; > > main_context = (g_getenv ("AT_SPI_CLIENT") ? NULL : > spi_global_app_data->main_context); >@@ -114,10 +115,13 @@ send_and_allow_reentry (DBusConnection * bus, DBusMessage * message) > return NULL; > } > dbus_pending_call_set_notify (pending, set_reply, (void *) &closure, NULL); >- closure.timeout = g_timeout_add (500, timeout_reply, &closure); >+ source = g_timeout_source_new (500); >+ g_source_set_callback (source, timeout_reply, &closure, NULL); >+ closure.timeout = g_source_attach (source, main_context); >+ g_source_unref (source); > g_main_loop_run (closure.loop); > if (closure.timeout != -1) >- g_source_remove (closure.timeout); >+ g_source_destroy (source); > > g_main_loop_unref (closure.loop); > return closure.reply; >-- >1.7.6.2
The following fix has been pushed: 51b5abc Prevent gnome-shell getting stuck on the login screen
Created attachment 197435 [details] [review] Prevent gnome-shell getting stuck on the login screen The workaround that was committed for this didn't have the intended effect, since the timeout was added to the default main context, not the one that is used in the recursive mainloop. Without this patch, my login screen would freeze when hitting Enter in the password entry, with it, I could successfully log in 10 out of 10 times.