After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 348305 - Force-quit dialog ("application not responding...") pops up too quickly
Force-quit dialog ("application not responding...") pops up too quickly
Status: RESOLVED FIXED
Product: metacity
Classification: Other
Component: general
2.16.x
Other All
: Normal major
: ---
Assigned To: Metacity maintainers list
Metacity maintainers list
Depends on:
Blocks:
 
 
Reported: 2006-07-22 03:55 UTC by Chris Koresko
Modified: 2006-09-13 16:32 UTC
See Also:
GNOME target: ---
GNOME version: 2.15/2.16


Attachments
verbose log (172.83 KB, text/plain)
2006-09-09 18:34 UTC, Sebastien Bacher
  Details
s/Time/guint32/ in the code (45.22 KB, patch)
2006-09-11 21:07 UTC, Elijah Newren
none Details | Review
s/Time/guint32/, remove casts (49.50 KB, patch)
2006-09-11 22:14 UTC, Elijah Newren
none Details | Review
Updated to catch issue pointed out by Havoc in comment 14 (49.49 KB, patch)
2006-09-11 22:53 UTC, Elijah Newren
committed Details | Review

Description Chris Koresko 2006-07-22 03:55:12 UTC
Please describe the problem:
Many apps take a few seconds or more to shut down normally, especially if the computer is busy with other activity.  Prime examples are Firefox (when lots of windows and tabs are open) and Evolution.  Between Gnome 2.12 and 2.14 the time allowed before the force-quit dialog appears seems to have been reduced to only a couple of seconds.  As a result, it frequently offers to kill programs that are in the process of normal shutdown.

Steps to reproduce:
1. Open Evolution, Firefox, or other memory-intensive app
2. Load up the app with lots of web pages or whatever
3. Click the close button


Actual results:
Application shuts down normally, but Gnome-panel thinks it's crashed and offers to force-quit it.

Expected results:
Application shuts down normally, without intervention from Gnome-panel

Does this happen every time?
Not every time, just when the normal shutdown is slow.

Other information:
This occurred rarely for Gnome 2.12, probably because the timeout was longer.
Comment 1 Sebastien Bacher 2006-09-09 18:30:26 UTC
Ubuntu bug about that: https://launchpad.net/distros/ubuntu/+source/metacity/+bug/29584

"Whenever I have made some edits to a document, then try to close the application using the X in the upper right corner, it asks if I want to close, cancel, or save. Before I even get a chance to respond, within 1-2 seconds, a second dialog pops up saying:

--------------
Warning

The window "*Unsaved Document 1 - gedit" is not responding.

Forcing this application to quit will cause you to lose any unsaved changes.

[Cancel] [Force Quit]
--------------

This has happened in different applications, gedit, gimp, etc. The warning is actually a false alarm because the window is actually responding just fine. I can cancel the warning and continue saving or exiting as normal.

Note that this only happens for the X in the upper right corner of the window, NOT for the Close or Quit options on the menu, which wait for me to make a choice normally before exiting.

I am running Dapper Drake Flight 3 on AMD64 (plus a bunch of other installed software and updates).
...
I can confirm this. Thought it only happens in my amd64 box, but not with x86 box. Both are running Dapper with latest updates."


Happens on my edgy desktop (GNOME 2.16) amd64, was not happening with a i386 on the same machine.
Comment 2 Sebastien Bacher 2006-09-09 18:34:51 UTC
Created attachment 72461 [details]
verbose log

Log got from a basic debug session with gedit running before metacity, the dialog pops after a second or so when trying to close gedit with non-saved change, metacity has been stopped after getting the "window is not responding" dialog
Comment 3 Havoc Pennington 2006-09-09 19:58:17 UTC
This log file looks like a gedit or gtk bug; the "pong" reply contains the wrong timestamp, so metacity doesn't correlate it with the _NET_WM_PING.

The original bug report here is just that the timeout isn't quite right (a pretty inevitable attribute of timeouts, though perhaps some tweaking is possible). That's different from a wrong pong reply, which breaks things always/reliably.

Comment 4 Sebastien Bacher 2006-09-10 12:26:51 UTC
I'm not sure the issue is different from the original bug. That worked fine with an i386 distribution for me and the dialog is triggered really too quickly since I've installed an amd64 version. That happens with epiphany, firefox, gedit, gnumeric, gimp and probably any other GNOME application so seems to be a GTK issue. Do you have any pointer on where to start to look if that's a GTK bug? 
Comment 5 Havoc Pennington 2006-09-10 17:02:07 UTC
If it's 64-bit specific it's pretty easy to guess there's a 32-64 conversion mistake or endianness mistake somewhere with the timestamp.

The GDK code looks hard to have wrong, though. It's in gdkevents-x11.c and just returns back the exact event that was sent.

So perhaps it's in the metacity code packing or unpacking this event (display.c, search for ping and pong, window.c, search for icccm_message), but I don't see anything offhand. Sticking some debug spew in there or looking for compiler warnings might help. (Suggest removing the casts to "Time" and see if some valid warnings were hidden.)

It looks like XClientMessageEvent has a different size on 32 and 64 bit, since it has an array of 5 long; I would guess (but don't know) that Xlib magically converts to 32 bits with the right endianness on the wire, I don't see how any client message code could work at all otherwise.

Maybe the broken thing is just the verbose print of the timestamp - it's possible. Then the real bug here is different. Is there a warning maybe about the %ul format getting the wrong arg on 64-bit?

btw, I was slightly wrong that a bad pong breaks things always; it breaks things if the app takes longer to exit than the timeout. The normal/working case is that the app can take longer than the timeout to exit, as long as it replies within the timeout to the ping message.
Comment 6 Sebastien Bacher 2006-09-11 15:19:39 UTC
Thank you for the pointer

from display.c L4171:
      if ((Time)event->xclient.data.l[1] == ping_data->timestamp)

removing the "(Time)" casting for it gives a compiler warning:

"display.c: In function ‘process_pong_message’:
display.c:4171: warning: comparison between signed and unsigned"

a "      printf("(Time)event->xclient.data.l[1]: %lu\n ping_data->timestamp: %lu\n", (Time)event->xclient.data.l[1], ping_data->timestamp);" prints that (when trying to close a gedit with some pending changes to store):

"(Time)event->xclient.data.l[1]: 18446744072056098569
 ping_data->timestamp: 2641514249"

Comment 7 Sebastien Bacher 2006-09-11 15:28:03 UTC
changing the condition for "if ((guint32)event->xclient.data.l[1] == (guint32)ping_data->timestamp)" fixes the issue
Comment 8 Havoc Pennington 2006-09-11 16:43:03 UTC
I feel like I don't really understand what's going on... 
first, is the current code the same as this or is there a precedence issue:
 "((Time) xclient.data.l[1]) == ping_data->timestamp"

If the current code is the same as that, then why does the cast to (Time) (signed long? or not?) not work?
Comment 9 Elijah Newren 2006-09-11 18:02:15 UTC
We have a mixture of usage of Time and guint32 for timestamps throughout our codebase; I'd bet that it's literally half and half right now (a quick search showed that ping_data->timestamp is ultimately set from a timestamp declared as guint32 approximately half the time).  I think Owen's comment from bug 115650 comment 43 may be applicable, though I don't fully understand everything it causes:

  "...the issue is that CARD32 server side items [such as Time] are represented
  as longs in the Xlib API, but they still only have 32 bits of data in them."

I asked mclasen on irc and he said that gtk+ _always_ uses guint32 for timestamp declarations (in fact, he didn't know where 'Time' came from).  Maybe we should switch to doing so?
Comment 10 Havoc Pennington 2006-09-11 19:25:52 UTC
Looks like Time is defined as CARD32 which unsigned int on 64-bit and unsigned long on 32-bit. Why it isn't unsigned int unconditionally, who knows.

See CARD32 in Xmd.h and Time in Xproto.h

So ping_data->timestamp is unsigned 32, and the client message contains signed long.

using guint32 for this always in metacity is probably good. In any case, shows the danger of casts... any cleanup patch should be sure it reduces rather than increases those ;-)
Comment 11 Elijah Newren 2006-09-11 21:07:36 UTC
Created attachment 72568 [details] [review]
s/Time/guint32/ in the code

This replaces every occurrence of Time in the code with guint32.  It does add some casts, but only in printf-like debugging statements (since %lu doesn't work for guint32's, and we already had such casts in places where we were using guint32 previously).  I don't have a 64-bit system, so I can't really test it, unfortunately.  So I'll leave this patch here and wait for feedback.
Comment 12 Havoc Pennington 2006-09-11 21:43:56 UTC
Instead of adding the casts, why not just change the %lu to %u?
Comment 13 Elijah Newren 2006-09-11 22:14:21 UTC
Created attachment 72569 [details] [review]
s/Time/guint32/, remove casts
Comment 14 Havoc Pennington 2006-09-11 22:26:47 UTC
I saw one place where the l[1] from the xclient event is passed to %u, that one should be %ld I think:
http://bugzilla.gnome.org/attachment.cgi?id=72569&action=diff#metacity/src/display.c_sec15

(or alternatively, just use the timestamp local variable instead of printing from the client message)

Looks fine to me otherwise
Comment 15 Elijah Newren 2006-09-11 22:53:25 UTC
Created attachment 72571 [details] [review]
Updated to catch issue pointed out by Havoc in comment 14
Comment 16 Havoc Pennington 2006-09-11 23:24:31 UTC
I will tokenly mark it commit now though I'm sure you are not waiting for me to do so ;-)
Comment 17 Sebastien Bacher 2006-09-13 10:20:45 UTC
metacity 2.16.1 builds fine on amd64 with the patch and that fixes the issue
Comment 18 Elijah Newren 2006-09-13 16:32:52 UTC
2006-09-13  Elijah Newren  <newren gmail com>

	* src/common.h (MetaWindowMenuFunc):
	* src/core.[ch] (meta_core_user_lower_and_unfocus,
	  meta_core_user_focus, meta_core_show_window_menu,
	  meta_core_begin_grab_op, meta_core_end_grab_op):
	* src/delete.c (delete_ping_reply_func, delete_ping_timeout_func,
	  meta_window_delete):
	* src/display.[ch] (struct MetaDisplay, struct MetaPingData,
	  sanity_check_timestamps, meta_display_open, event_callback,
	  meta_spew_event, meta_display_set_grab_op_cursor,
	  meta_display_begin_grab_op, meta_display_end_grab_op,
	  meta_display_ping_timeout, meta_display_ping_window,
	  process_pong_message, timestamp_too_old,
	  meta_display_set_input_focus_window):
	* src/keybindings.[ch] (grab_keyboard, ungrab_keyboard,
	  meta_screen_grab_all_keys, meta_window_grab_all_keys,
	  meta_window_ungrab_all_keys, error_on_generic_command,
	  error_on_command, error_on_terminal_command):
	* src/metacity-dialog.c (on_realize, warn_about_no_sm_support,
	  error_about_command, main):
	* src/screen.[ch] (struct _MetaScreen, meta_screen_new,
	  meta_screen_show_desktop, meta_screen_apply_startup_properties):
	* src/session.c (warn_about_lame_clients_and_finish_interact):
	* src/window.[ch] (struct _MetaWindow,
	  intervening_user_event_occurred, window_activate,
	  meta_window_delete, meta_window_focus,
	  meta_window_send_icccm_message, meta_window_client_message,
	  menu_callback, meta_window_show_menu, struct EventScannerData,
	  check_use_this_motion_notify, meta_window_begin_grab_op,
	  meta_window_set_user_time):
	* src/workspace.[ch] (focus_ancestor_or_mru_window,
	  meta_workspace_activate_with_focus, meta_workspace_activate,
	  meta_workspace_focus_default_window,
	  focus_ancestor_or_mru_window):
	Fix issues on 64-bit machines with timestamps by using guint32
	(like gtk+ does) instead of Time.  #348305