GNOME Bugzilla – Bug 348305
Force-quit dialog ("application not responding...") pops up too quickly
Last modified: 2006-09-13 16:32:52 UTC
Please describe the problem: Many apps take a few seconds or more to shut down normally, especially if the computer is busy with other activity. Prime examples are Firefox (when lots of windows and tabs are open) and Evolution. Between Gnome 2.12 and 2.14 the time allowed before the force-quit dialog appears seems to have been reduced to only a couple of seconds. As a result, it frequently offers to kill programs that are in the process of normal shutdown. Steps to reproduce: 1. Open Evolution, Firefox, or other memory-intensive app 2. Load up the app with lots of web pages or whatever 3. Click the close button Actual results: Application shuts down normally, but Gnome-panel thinks it's crashed and offers to force-quit it. Expected results: Application shuts down normally, without intervention from Gnome-panel Does this happen every time? Not every time, just when the normal shutdown is slow. Other information: This occurred rarely for Gnome 2.12, probably because the timeout was longer.
Ubuntu bug about that: https://launchpad.net/distros/ubuntu/+source/metacity/+bug/29584 "Whenever I have made some edits to a document, then try to close the application using the X in the upper right corner, it asks if I want to close, cancel, or save. Before I even get a chance to respond, within 1-2 seconds, a second dialog pops up saying: -------------- Warning The window "*Unsaved Document 1 - gedit" is not responding. Forcing this application to quit will cause you to lose any unsaved changes. [Cancel] [Force Quit] -------------- This has happened in different applications, gedit, gimp, etc. The warning is actually a false alarm because the window is actually responding just fine. I can cancel the warning and continue saving or exiting as normal. Note that this only happens for the X in the upper right corner of the window, NOT for the Close or Quit options on the menu, which wait for me to make a choice normally before exiting. I am running Dapper Drake Flight 3 on AMD64 (plus a bunch of other installed software and updates). ... I can confirm this. Thought it only happens in my amd64 box, but not with x86 box. Both are running Dapper with latest updates." Happens on my edgy desktop (GNOME 2.16) amd64, was not happening with a i386 on the same machine.
Created attachment 72461 [details] verbose log Log got from a basic debug session with gedit running before metacity, the dialog pops after a second or so when trying to close gedit with non-saved change, metacity has been stopped after getting the "window is not responding" dialog
This log file looks like a gedit or gtk bug; the "pong" reply contains the wrong timestamp, so metacity doesn't correlate it with the _NET_WM_PING. The original bug report here is just that the timeout isn't quite right (a pretty inevitable attribute of timeouts, though perhaps some tweaking is possible). That's different from a wrong pong reply, which breaks things always/reliably.
I'm not sure the issue is different from the original bug. That worked fine with an i386 distribution for me and the dialog is triggered really too quickly since I've installed an amd64 version. That happens with epiphany, firefox, gedit, gnumeric, gimp and probably any other GNOME application so seems to be a GTK issue. Do you have any pointer on where to start to look if that's a GTK bug?
If it's 64-bit specific it's pretty easy to guess there's a 32-64 conversion mistake or endianness mistake somewhere with the timestamp. The GDK code looks hard to have wrong, though. It's in gdkevents-x11.c and just returns back the exact event that was sent. So perhaps it's in the metacity code packing or unpacking this event (display.c, search for ping and pong, window.c, search for icccm_message), but I don't see anything offhand. Sticking some debug spew in there or looking for compiler warnings might help. (Suggest removing the casts to "Time" and see if some valid warnings were hidden.) It looks like XClientMessageEvent has a different size on 32 and 64 bit, since it has an array of 5 long; I would guess (but don't know) that Xlib magically converts to 32 bits with the right endianness on the wire, I don't see how any client message code could work at all otherwise. Maybe the broken thing is just the verbose print of the timestamp - it's possible. Then the real bug here is different. Is there a warning maybe about the %ul format getting the wrong arg on 64-bit? btw, I was slightly wrong that a bad pong breaks things always; it breaks things if the app takes longer to exit than the timeout. The normal/working case is that the app can take longer than the timeout to exit, as long as it replies within the timeout to the ping message.
Thank you for the pointer from display.c L4171: if ((Time)event->xclient.data.l[1] == ping_data->timestamp) removing the "(Time)" casting for it gives a compiler warning: "display.c: In function ‘process_pong_message’: display.c:4171: warning: comparison between signed and unsigned" a " printf("(Time)event->xclient.data.l[1]: %lu\n ping_data->timestamp: %lu\n", (Time)event->xclient.data.l[1], ping_data->timestamp);" prints that (when trying to close a gedit with some pending changes to store): "(Time)event->xclient.data.l[1]: 18446744072056098569 ping_data->timestamp: 2641514249"
changing the condition for "if ((guint32)event->xclient.data.l[1] == (guint32)ping_data->timestamp)" fixes the issue
I feel like I don't really understand what's going on... first, is the current code the same as this or is there a precedence issue: "((Time) xclient.data.l[1]) == ping_data->timestamp" If the current code is the same as that, then why does the cast to (Time) (signed long? or not?) not work?
We have a mixture of usage of Time and guint32 for timestamps throughout our codebase; I'd bet that it's literally half and half right now (a quick search showed that ping_data->timestamp is ultimately set from a timestamp declared as guint32 approximately half the time). I think Owen's comment from bug 115650 comment 43 may be applicable, though I don't fully understand everything it causes: "...the issue is that CARD32 server side items [such as Time] are represented as longs in the Xlib API, but they still only have 32 bits of data in them." I asked mclasen on irc and he said that gtk+ _always_ uses guint32 for timestamp declarations (in fact, he didn't know where 'Time' came from). Maybe we should switch to doing so?
Looks like Time is defined as CARD32 which unsigned int on 64-bit and unsigned long on 32-bit. Why it isn't unsigned int unconditionally, who knows. See CARD32 in Xmd.h and Time in Xproto.h So ping_data->timestamp is unsigned 32, and the client message contains signed long. using guint32 for this always in metacity is probably good. In any case, shows the danger of casts... any cleanup patch should be sure it reduces rather than increases those ;-)
Created attachment 72568 [details] [review] s/Time/guint32/ in the code This replaces every occurrence of Time in the code with guint32. It does add some casts, but only in printf-like debugging statements (since %lu doesn't work for guint32's, and we already had such casts in places where we were using guint32 previously). I don't have a 64-bit system, so I can't really test it, unfortunately. So I'll leave this patch here and wait for feedback.
Instead of adding the casts, why not just change the %lu to %u?
Created attachment 72569 [details] [review] s/Time/guint32/, remove casts
I saw one place where the l[1] from the xclient event is passed to %u, that one should be %ld I think: http://bugzilla.gnome.org/attachment.cgi?id=72569&action=diff#metacity/src/display.c_sec15 (or alternatively, just use the timestamp local variable instead of printing from the client message) Looks fine to me otherwise
Created attachment 72571 [details] [review] Updated to catch issue pointed out by Havoc in comment 14
I will tokenly mark it commit now though I'm sure you are not waiting for me to do so ;-)
metacity 2.16.1 builds fine on amd64 with the patch and that fixes the issue
2006-09-13 Elijah Newren <newren gmail com> * src/common.h (MetaWindowMenuFunc): * src/core.[ch] (meta_core_user_lower_and_unfocus, meta_core_user_focus, meta_core_show_window_menu, meta_core_begin_grab_op, meta_core_end_grab_op): * src/delete.c (delete_ping_reply_func, delete_ping_timeout_func, meta_window_delete): * src/display.[ch] (struct MetaDisplay, struct MetaPingData, sanity_check_timestamps, meta_display_open, event_callback, meta_spew_event, meta_display_set_grab_op_cursor, meta_display_begin_grab_op, meta_display_end_grab_op, meta_display_ping_timeout, meta_display_ping_window, process_pong_message, timestamp_too_old, meta_display_set_input_focus_window): * src/keybindings.[ch] (grab_keyboard, ungrab_keyboard, meta_screen_grab_all_keys, meta_window_grab_all_keys, meta_window_ungrab_all_keys, error_on_generic_command, error_on_command, error_on_terminal_command): * src/metacity-dialog.c (on_realize, warn_about_no_sm_support, error_about_command, main): * src/screen.[ch] (struct _MetaScreen, meta_screen_new, meta_screen_show_desktop, meta_screen_apply_startup_properties): * src/session.c (warn_about_lame_clients_and_finish_interact): * src/window.[ch] (struct _MetaWindow, intervening_user_event_occurred, window_activate, meta_window_delete, meta_window_focus, meta_window_send_icccm_message, meta_window_client_message, menu_callback, meta_window_show_menu, struct EventScannerData, check_use_this_motion_notify, meta_window_begin_grab_op, meta_window_set_user_time): * src/workspace.[ch] (focus_ancestor_or_mru_window, meta_workspace_activate_with_focus, meta_workspace_activate, meta_workspace_focus_default_window, focus_ancestor_or_mru_window): Fix issues on 64-bit machines with timestamps by using guint32 (like gtk+ does) instead of Time. #348305