After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 742636 - Freeze if window closed while resume-events pending
Freeze if window closed while resume-events pending
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: .General
3.14.x
Other Linux
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
: 735101 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2015-01-09 10:12 UTC by Tom Hughes
Modified: 2015-06-03 21:15 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Stack trace (2.46 KB, application/octet-stream)
2015-01-09 10:12 UTC, Tom Hughes
  Details
Patch to send resume-events (1.05 KB, patch)
2015-01-24 12:27 UTC, Tom Hughes
none Details | Review
New patch along the lines suggested (1.84 KB, patch)
2015-01-26 21:52 UTC, Tom Hughes
committed Details | Review
Fix for the previous patch (636 bytes, patch)
2015-01-30 22:44 UTC, Tom Hughes
none Details | Review
Fix again, with spelling corrected (636 bytes, patch)
2015-01-30 22:45 UTC, Tom Hughes
none Details | Review

Description Tom Hughes 2015-01-09 10:12:57 UTC
Created attachment 294145 [details]
Stack trace

If a window is closed, and the underlying GDK window object is destroyed while it's frame clock has GDK_FRAME_CLOCK_PHASE_RESUME_EVENTS pending then the frame clock will be destroyed without ever sending the resume-events signal, and the X11 event handling on the display will remain paused forever.

I have been seeing lockups in a python application (gramps) built using GTK where it will occasionally lockup when a dialog is closed. Investigation showed that it was looping being told there were X events to process but not doing so because _gdk_display_pause_events had been called and had paused event handling.

In turn that has happened because gdkwindow has received a flush-events signal, so gdk_window_flush_events has paused event handling and asked the frame clock to resume when it is ready. The resume-events signal never arrives however and hence event handling remains paused.

Further investigation revealed that the reason gdkframeclockidle.c never sends the resume-events signal is that it is destroyed (because the associated window is closed) before gdk_frame_clock_paint_idle runs and gets a change to resume the events.

I'm attaching a stack trace from the point where the frame clock is destroyed, showing the trace back to the OK button handler in the python code. My environment is:

  Gramps 4.1.1
  Python 3.4.1
  Python GObject 3.14.0
  GTK 3.14.6
  GLib 1.2.10

I'm not 100% sure this is a GTK bug (maybe GTK is being misused by the application in some way) but at the very least the frame clock code should probably detect this case and issue a diagnostic, and maybe recover by sending the resume-events anyway?
Comment 1 Tom Hughes 2015-01-24 12:27:01 UTC
Created attachment 295318 [details] [review]
Patch to send resume-events

I've been running with this patch, which sends a resume-events signal when a clock is destroyed while GDK_FRAME_CLOCK_PHASE_RESUME_EVENTS is in the requested mask, for a week now without a single instance of the hangs that I was seeing at regular intervals before.
Comment 2 Owen Taylor 2015-01-26 19:03:27 UTC
Hmm, do you know what sequence of code gramps is doing to trigger this? At what point does it destroy the window? I'm curious why it doesn't happen for other applications.

Your patch looks workable, but I think it would be just a hair better to add a 'guint frame_clock_events_paused : 1' in the GdkWindow structure in gdkwindowinternals.h - group with the other bitfields - and then have the gdkwindow.c code resume events if that is set when unsetting the frame clock.

The reasons I think it would be better:

 * The need to resume events isn't really because the frame clock is being disposed, but rather that it is being disconnected from the window. Now GdkWindow knows that it disposes a frame clock when disconnecting it from the window, but gdkframeclockidle.c doesn't.

 * As a matter of general principle, I don't really like emitting a "phase" signal out of dispose, even when the phase signal is for an internal phase like "resume-events" - the structure of emitting the phase signals is currently all lined up in flush_idle and paint_idle.
Comment 3 Tom Hughes 2015-01-26 21:52:21 UTC
Created attachment 295483 [details] [review]
New patch along the lines suggested

I'm afraid I don't know how gramps manages to trigger this when other applications don't - as the bug only happens occasionally it's hard to debug and in any case by the time you know you have hit the problem it's too late to now what happened to get you there.

What I do know is that it only seems to happen when closing a dialog with OK and not with Cancel so most likely it somehow related to extra work being done in the former case, which mostly means updating the database to reflect any changes made in the dialog being closed.

I wondered if it was something like a callback from the DB update triggering some further update in the window contents and it then hanging it it gets to the close before the repaint has happened but I can't find such a code path currently, though I'm no expert in the gramps code.

I've attached an alternative patch along the lines you suggested, and I've built a version with that to try but it will take a few days before I can be sure it is working...
Comment 4 Tom Hughes 2015-01-30 22:39:25 UTC
Urgh. I hope you didn't apply my patch as it - it is buggy!

I was going to post an updated one once I had confirmed that it worked properly...
Comment 5 Tom Hughes 2015-01-30 22:44:39 UTC
Created attachment 295834 [details] [review]
Fix for the previous patch

So it looks like you did apply it as it was - here's the fix to go on top of it.
Comment 6 Tom Hughes 2015-01-30 22:45:43 UTC
Created attachment 295835 [details] [review]
Fix again, with spelling corrected
Comment 7 Matthias Clasen 2015-02-01 23:02:33 UTC
I had discovered the defect already, and pushed the fix in 561ff51abb9629c11855d346432cb1792b117815
Comment 8 Benjamin Berg 2015-05-31 20:05:59 UTC
*** Bug 735101 has been marked as a duplicate of this bug. ***
Comment 9 Egmont Koblinger 2015-06-03 21:15:25 UTC
Coming from bug 735101 I'd like to say a giant thanks to everyone who worked on fixing this! :)