After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 658013 - login hangs when a11y is turned on in the greeter session
login hangs when a11y is turned on in the greeter session
Status: RESOLVED FIXED
Product: at-spi
Classification: Platform
Component: at-spi2-atk
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Li Yuan
Depends on:
Blocks: 652646
 
 
Reported: 2011-09-01 21:08 UTC by Matthias Clasen
Modified: 2011-09-25 18:16 UTC
See Also:
GNOME target: 3.2
GNOME version: ---


Attachments
Prevent gnome-shell getting stuck on the login screen (1.69 KB, patch)
2011-09-23 21:34 UTC, Matthias Clasen
reviewed Details | Review
Prevent gnome-shell getting stuck on the login screen (1.69 KB, patch)
2011-09-24 19:41 UTC, Matthias Clasen
none Details | Review
Prevent gnome-shell getting stuck on the login screen (1.79 KB, patch)
2011-09-24 19:43 UTC, Matthias Clasen
accepted-commit_now Details | Review
Prevent gnome-shell getting stuck on the login screen (1.79 KB, patch)
2011-09-25 18:16 UTC, Matthias Clasen
committed Details | Review

Description Matthias Clasen 2011-09-01 21:08:51 UTC
This was debugged to a hanging dbus call in the registryd, or somesuch.
Ray knows.
Comment 1 Matthias Clasen 2011-09-08 18:33:59 UTC
I have not been able to reproduce this while playing with the osk on the login screen.
Comment 2 Ray Strode [halfline] 2011-09-08 19:49:15 UTC
This seems to have been gremlins, cosmic rays, or bad luck or something.

i only hit the issue the one time and haven't reproduced since after considerable effort.
Comment 3 Ray Strode [halfline] 2011-09-08 20:02:33 UTC
To give more details about what I was seeing though...

I applied the patch from attachment 195094 [details] [review] (bug 612662) and then logged in with the on screen keyboard.  I clicked the virtual "enter" key not the Sign In button after typing the password. 

At that point things just hung and they stayed hung for the 10-15 minutes i was poking around.  I attached to gnome-shell with gdb and it was stuck in a nested main loop in atk-bridge.

I believe it was the send_and_allow_reentry function.   That function sends off a dbus message over the a11y bus and waits for the reply in a nested main loop.  The reply function quits the main loop and things unstick from there.  In this particular case it was trying to let at-spi-registryd know about the key release event from me hitting the enter key in the on screen keyboard.  It never unstuck though.  It was waiting patiently in the main loop blocking the shell.  This suggests the reply callback either never got called, or when it got called the loop didn't quit.

For reference the reply callback is:

static void
set_reply (DBusPendingCall * pending, void *user_data)
{
  SpiReentrantCallClosure* closure = (SpiReentrantCallClosure *) user_data;

  closure->reply = dbus_pending_call_steal_reply (pending);
  dbus_pending_call_unref (pending);
  switch_main_context (NULL);
  g_main_loop_quit (closure->loop);
}

Interestingly enough, the a11y dbus daemon and at-spi-registryd were both already dead by the time I started gdb.  Examining the running state of the pending call, I saw it had a 25 second timeout set on it, and also the bus connection had recorded the dbud-daemon's death already.  As I said though, it sat there for 10-15 minutes and never timed out.

I got distracted by something else and put this down and haven't been able to reproduce it since, so I don't have anything else to add.
Comment 4 Mike Gorse 2011-09-19 21:19:22 UTC
I've been able to reproduce this artificially.  The DBusPendingCall is never notified that the connection went away.  I suspect this is a libdbus bug; I need to investigate more / write a test case / file a bug against dbus.  The dbus release schedule is not the same as ours, so, regardless, I should try to do something in at-spi2-atk to prevent it from happening.  So I've added a g_timeout to check for disconnection and quit the main loop if the connection has gone away.
Comment 5 Ray Strode [halfline] 2011-09-22 17:24:28 UTC
Matthias saw this again, reopening.
Comment 6 Ray Strode [halfline] 2011-09-23 21:11:47 UTC
matthias managed to get a trace today:


    Thread 6 (Thread 0x7f440a976700 (LWP 3635)):
    Breakpoint 1 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87.
     
    Thread 5 (Thread 0x7f440a175700 (LWP 3636)):
    Note: breakpoint 1 also set at pc 0x33e62e7343.
    Breakpoint 2 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87.
     
    Thread 4 (Thread 0x7f440976d700 (LWP 3637)):
    Note: breakpoints 1 and 2 also set at pc 0x33e62e7343.
    Breakpoint 3 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87.
     
    Thread 3 (Thread 0x7f4408f5c700 (LWP 3638)):
    Note: breakpoints 1, 2 and 3 also set at pc 0x33e62e7343.
    Breakpoint 4 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87.
     
    Thread 2 (Thread 0x7f43dd1ec700 (LWP 3667)):
    Note: breakpoints 1, 2, 3 and 4 also set at pc 0x33e62e7343.
    Breakpoint 5 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87.
     
    Thread 1 (Thread 0x7f44132f19c0 (LWP 3634)):
    Note: breakpoints 1, 2, 3, 4 and 5 also set at pc 0x33e62e7343.
    Breakpoint 6 at 0x33e62e7343: file ../sysdeps/unix/sysv/linux/poll.c, line 87.
     
    Thread 6 (Thread 0x7f440a976700 (LWP 3635)):
    #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
    #1 0x000000360a045048 in g_main_context_poll (n_fds=1, fds=0x7f4404001170, priority=<optimized out>, timeout=-1, context=0x2e669e0) at gmain.c:3402
    #2 g_main_context_iterate (context=0x2e669e0, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084
    #3 0x000000360a045885 in g_main_loop_run (loop=0x7f4404001150) at gmain.c:3297
    #4 0x00007f440a97b9fb in dconf_context_thread (data=0x2e669e0) at dconfcontext.c:11
    #5 0x000000360a06a466 in g_thread_create_proxy (data=0x2e657b0) at gthread.c:1962
    #6 0x00000033e6a07d31 in start_thread (arg=0x7f440a976700) at pthread_create.c:305
    #7 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
     
    Thread 5 (Thread 0x7f440a175700 (LWP 3636)):
    #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
    #1 0x000000360a045048 in g_main_context_poll (n_fds=3, fds=0x7f43fc006010, priority=<optimized out>, timeout=-1, context=0x7f440400ce00) at gmain.c:3402
    #2 g_main_context_iterate (context=0x7f440400ce00, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084
    #3 0x000000360a045885 in g_main_loop_run (loop=0x7f440400cdb0) at gmain.c:3297
    #4 0x000000360b0c7996 in gdbus_shared_thread_func (user_data=0x7f440400cdd0) at gdbusprivate.c:276
    #5 0x000000360a06a466 in g_thread_create_proxy (data=0x7f440400cef0) at gthread.c:1962
    #6 0x00000033e6a07d31 in start_thread (arg=0x7f440a175700) at pthread_create.c:305
    #7 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
     
    Thread 4 (Thread 0x7f440976d700 (LWP 3637)):
    #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
    #1 0x00000033ffc2c08f in poll_func (ufds=0x7f4400001460, nfds=2, timeout=-1, userdata=0x2e9bb60) at pulse/thread-mainloop.c:75
    #2 0x00000033ffc1df06 in pa_mainloop_poll (m=0x2e9ef00) at pulse/mainloop.c:879
    #3 0x00000033ffc1e539 in pa_mainloop_iterate (m=0x2e9ef00, block=<optimized out>, retval=0x0) at pulse/mainloop.c:961
    #4 0x00000033ffc1e5f0 in pa_mainloop_run (m=0x2e9ef00, retval=0x0) at pulse/mainloop.c:979
    #5 0x00000033ffc2c03f in thread (userdata=0x2e9a690) at pulse/thread-mainloop.c:94
    #6 0x0000003400038be8 in internal_thread_func (userdata=0x2e9b510) at pulsecore/thread-posix.c:83
    #7 0x00000033e6a07d31 in start_thread (arg=0x7f440976d700) at pthread_create.c:305
    #8 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
     
    Thread 3 (Thread 0x7f4408f5c700 (LWP 3638)):
    #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:165
    #1 0x00000033f2a23020 in PR_WaitCondVar (cvar=0x2f0c150, timeout=4294967295) at ../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:417
    #2 0x000000340a0b5ab7 in js::GCHelperThread::threadLoop (this=0x2ea5678, rt=0x2ea5400) at jsgc.cpp:2084
    #3 0x00000033f2a283b3 in _pt_root (arg=0x2f0c1f0) at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:187
    #4 0x00000033e6a07d31 in start_thread (arg=0x7f4408f5c700) at pthread_create.c:305
    #5 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
     
    Thread 2 (Thread 0x7f43dd1ec700 (LWP 3667)):
    #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
    #1 0x00000033ffc2c08f in poll_func (ufds=0x7f43e4006f60, nfds=2, timeout=-1, userdata=0x34b2ac0) at pulse/thread-mainloop.c:75
    #2 0x00000033ffc1df06 in pa_mainloop_poll (m=0x2f04000) at pulse/mainloop.c:879
    #3 0x00000033ffc1e539 in pa_mainloop_iterate (m=0x2f04000, block=<optimized out>, retval=0x0) at pulse/mainloop.c:961
    #4 0x00000033ffc1e5f0 in pa_mainloop_run (m=0x2f04000, retval=0x0) at pulse/mainloop.c:979
    #5 0x00000033ffc2c03f in thread (userdata=0x1bec8a0) at pulse/thread-mainloop.c:94
    #6 0x0000003400038be8 in internal_thread_func (userdata=0x48667d0) at pulsecore/thread-posix.c:83
    #7 0x00000033e6a07d31 in start_thread (arg=0x7f43dd1ec700) at pthread_create.c:305
    #8 0x00000033e62efdfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
     
    Thread 1 (Thread 0x7f44132f19c0 (LWP 3634)):
    #0 0x00000033e62e7343 in __GI___poll (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:87
    #1 0x000000360a045048 in g_main_context_poll (n_fds=2, fds=0x3309d70, priority=<optimized out>, timeout=-1, context=0x2e696c0) at gmain.c:3402
    #2 g_main_context_iterate (context=0x2e696c0, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3084
    #3 0x000000360a045885 in g_main_loop_run (loop=0x49d7e30) at gmain.c:3297
    #4 0x00007f440b9cfddc in send_and_allow_reentry (message=0x1c1b8d0, bus=0x2e6ab60) at ../../atk-adaptor/event.c:118
    #5 Accessibility_DeviceEventController_NotifyListenersSync (key_event=0x7fff4fdc34b0) at ../../atk-adaptor/event.c:152
    #6 spi_atk_bridge_key_listener (event=<optimized out>, data=<optimized out>) at ../../atk-adaptor/event.c:221
    #7 0x00000036134b47d0 in notify_hf (key=<optimized out>, value=<optimized out>, data=<optimized out>) at cally/cally-util.c:462
    #8 0x000000360a032ac0 in g_hash_table_foreach_remove_or_steal (hash_table=0x36f0000, func=0x36134b47c0 <notify_hf>, user_data=0x348d040, notify=0) at ghash.c:1309
    #9 0x00000036134b4dd4 in cally_key_snooper (event=0x373f2a0, actor=<optimized out>, user_data=<optimized out>) at cally/cally-util.c:493
    #10 cally_key_snooper (actor=<optimized out>, event=0x373f2a0, user_data=<optimized out>) at cally/cally-util.c:473
    #11 0x00000036134b6418 in _clutter_marshal_BOOLEAN__BOXED (closure=0x2e871e0, return_value=0x7fff4fdc37b0, n_param_values=<optimized out>, param_values=0x3714860,
    invocation_hint=<optimized out>, marshal_data=<optimized out>) at clutter-marshal.c:85
    #12 0x000000360a80eb54 in g_closure_invoke (closure=0x2e871e0, return_value=0x7fff4fdc37b0, n_param_values=2, param_values=0x3714860, invocation_hint=<optimized out>)
    at gclosure.c:774
    #13 0x000000360a821e0b in signal_emit_unlocked_R (node=<optimized out>, detail=0, instance=0x1bc3bb0, emission_return=0x7fff4fdc38f0, instance_and_params=0x3714860)
    at gsignal.c:3272
    #14 0x000000360a82b2e6 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=0x7fff4fdc3958) at gsignal.c:3013
    #15 0x000000360a82b622 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at gsignal.c:3060
    #16 0x0000003613438598 in clutter_actor_event (actor=0x1bc3bb0 [ClutterStage], event=0x373f2a0, capture=1) at ./clutter-actor.c:8655
    #17 0x0000003613478ca3 in emit_event (is_key_event=1, event=0x373f2a0) at ./clutter-main.c:1961
    #18 emit_keyboard_event (event=0x373f2a0) at ./clutter-main.c:2016
    #19 _clutter_process_event_details (event=0x373f2a0, context=0x1bc5ec0, stage=<optimized out>) at ./clutter-main.c:2106
    #20 _clutter_process_event (event=0x373f2a0) at ./clutter-main.c:2329
    #21 0x0000003613491108 in _clutter_stage_process_queued_events (stage=0x1bc3bb0 [ClutterStage]) at ./clutter-stage.c:898
    #22 0x000000361347ab20 in clutter_clock_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ./clutter-master-clock.c:360
    #23 0x000000360a044b3d in g_main_dispatch (context=0x1bce490) at gmain.c:2441
    #24 g_main_context_dispatch (context=0x1bce490) at gmain.c:3011
    #25 0x000000360a045338 in g_main_context_iterate (context=0x1bce490, block=<optimized out>, dispatch=1, self=<optimized out>) at gmain.c:3089
    #26 0x000000360a045885 in g_main_loop_run (loop=0x1bcd490) at gmain.c:3297
    #27 0x00000036114569e1 in meta_run () at core/main.c:555
    #28 0x00000000004029e1 in main (argc=1, argv=0x7fff4fdc3e68) at main.c:571
    A debugging session is active.
     
    Inferior 1 [process 3634] will be detached.
     
    Quit anyway? (y or n) Detaching from program: /usr/bin/gnome-shell, process 3634
Comment 7 Ray Strode [halfline] 2011-09-23 21:23:51 UTC
The timeout source added recently to work around this bug won't work because it's added to the default context instead of the main context associated with the nested main loop.
Comment 8 Matthias Clasen 2011-09-23 21:34:38 UTC
Created attachment 197377 [details] [review]
Prevent gnome-shell getting stuck on the login screen

The workaround that was committed for this didn't have the
intended effect, since the timeout was added to the default
main context, not the one that is used in the recursive
mainloop.

Without this patch, my login screen would freeze when hitting
Enter in the password entry, with it, I could successfully log
in 10 out of 10 times.
Comment 9 Colin Walters 2011-09-23 21:40:21 UTC
Review of attachment 197377 [details] [review]:

::: atk-adaptor/event.c
@@ +122,3 @@
   g_main_loop_run  (closure.loop);
   if (closure.timeout != -1)
     g_source_remove (closure.timeout);

Note we can't call g_source_remove() anymore, we have to use g_source_destroy()
Comment 10 Mike Gorse 2011-09-23 21:45:38 UTC
I'm fine with the patch; Colin's observation is technically unrelated to it but we might as well make the change.
Comment 11 Ray Strode [halfline] 2011-09-24 01:06:36 UTC
Colin's point was that g_source_remove only works on the default context.  g_source_attach returns a "tag" integer for convienence, but g_source_remove assumes the tag passed to it is associated with a source on the default context.
Comment 12 Matthias Clasen 2011-09-24 19:41:14 UTC
Created attachment 197412 [details] [review]
Prevent gnome-shell getting stuck on the login screen

The workaround that was committed for this didn't have the
intended effect, since the timeout was added to the default
main context, not the one that is used in the recursive
mainloop.

Without this patch, my login screen would freeze when hitting
Enter in the password entry, with it, I could successfully log
in 10 out of 10 times.
Comment 13 Matthias Clasen 2011-09-24 19:43:56 UTC
Created attachment 197413 [details] [review]
Prevent gnome-shell getting stuck on the login screen

The workaround that was committed for this didn't have the
intended effect, since the timeout was added to the default
main context, not the one that is used in the recursive
mainloop.

Without this patch, my login screen would freeze when hitting
Enter in the password entry, with it, I could successfully log
in 10 out of 10 times.
Comment 14 Matthias Clasen 2011-09-24 19:45:27 UTC
Updated patch has the g_source_destroy fix. 
This has gotten release team approval for freeze break.
Comment 15 Mike Gorse 2011-09-24 20:26:18 UTC
Comment on attachment 197413 [details] [review]
Prevent gnome-shell getting stuck on the login screen

Looks good to me. Thanks for the patch!




>From 51b5abcea9ee948d11aeb5d6062186f3d4f5ee59 Mon Sep 17 00:00:00 2001
>From: Matthias Clasen <mclasen@redhat.com>
>Date: Fri, 23 Sep 2011 17:31:53 -0400
>Subject: [PATCH] Prevent gnome-shell getting stuck on the login screen
>
>The workaround that was committed for this didn't have the
>intended effect, since the timeout was added to the default
>main context, not the one that is used in the recursive
>mainloop.
>
>Without this patch, my login screen would freeze when hitting
>Enter in the password entry, with it, I could successfully log
>in 10 out of 10 times.
>
>https://bugzilla.gnome.org/show_bug.cgi?id=658013
>---
> atk-adaptor/event.c |    8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/atk-adaptor/event.c b/atk-adaptor/event.c
>index 79c2884..586d196 100644
>--- a/atk-adaptor/event.c
>+++ b/atk-adaptor/event.c
>@@ -100,6 +100,7 @@ send_and_allow_reentry (DBusConnection * bus, DBusMessage * message)
>   DBusPendingCall *pending;
>   SpiReentrantCallClosure closure;
>   GMainContext *main_context;
>+  GSource *source;
> 
>   main_context = (g_getenv ("AT_SPI_CLIENT") ? NULL :
>                   spi_global_app_data->main_context);
>@@ -114,10 +115,13 @@ send_and_allow_reentry (DBusConnection * bus, DBusMessage * message)
>       return NULL;
>     }
>   dbus_pending_call_set_notify (pending, set_reply, (void *) &closure, NULL);
>-  closure.timeout = g_timeout_add (500, timeout_reply, &closure);
>+  source = g_timeout_source_new (500);
>+  g_source_set_callback (source, timeout_reply, &closure, NULL);
>+  closure.timeout = g_source_attach (source, main_context);
>+  g_source_unref (source);
>   g_main_loop_run  (closure.loop);
>   if (closure.timeout != -1)
>-    g_source_remove (closure.timeout);
>+    g_source_destroy (source);
>   
>   g_main_loop_unref (closure.loop);
>   return closure.reply;
>-- 
>1.7.6.2
Comment 16 Matthias Clasen 2011-09-25 18:16:13 UTC
The following fix has been pushed:
51b5abc Prevent gnome-shell getting stuck on the login screen
Comment 17 Matthias Clasen 2011-09-25 18:16:20 UTC
Created attachment 197435 [details] [review]
Prevent gnome-shell getting stuck on the login screen

The workaround that was committed for this didn't have the
intended effect, since the timeout was added to the default
main context, not the one that is used in the recursive
mainloop.

Without this patch, my login screen would freeze when hitting
Enter in the password entry, with it, I could successfully log
in 10 out of 10 times.