After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 629608 - Revamp and modernize X error traps
Revamp and modernize X error traps
Status: RESOLVED FIXED
Product: gtk+
Classification: Platform
Component: .General
unspecified
Other All
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2010-09-14 03:11 UTC by Havoc Pennington
Modified: 2010-09-22 01:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Remove trailing whitespace and obsolete doc comments from gdk_error_trap code (3.37 KB, patch)
2010-09-14 03:11 UTC, Havoc Pennington
accepted-commit_now Details | Review
Replace crufty old code for gdk error traps with GQueue and GSlice (2.60 KB, patch)
2010-09-14 03:11 UTC, Havoc Pennington
accepted-commit_now Details | Review
Revamp and modernize X error traps (24.40 KB, patch)
2010-09-14 03:11 UTC, Havoc Pennington
accepted-commit_now Details | Review
Only store error codes in inner-most X error trap (1.79 KB, patch)
2010-09-20 20:15 UTC, Owen Taylor
committed Details | Review
Don't prematurely remove the X error handler (1.30 KB, patch)
2010-09-21 04:44 UTC, Matthias Clasen
none Details | Review
Don't prematurely remove the X error handler (1.63 KB, patch)
2010-09-21 05:40 UTC, Matthias Clasen
none Details | Review

Description Havoc Pennington 2010-09-14 03:11:02 UTC
The main objective here was to avoid the need to sync just
to ignore an error. Now, syncing is automatic, and only
happens when we need to know the error code. To just ignore
an error we keep around the trap's range of sequence numbers
and ignore errors in that range.

I have not tested this beyond "testgtk still starts up fine." 
In particular I haven't written a test program to actually trigger 
errors.
Comment 1 Havoc Pennington 2010-09-14 03:11:09 UTC
Created attachment 170214 [details] [review]
Remove trailing whitespace and obsolete doc comments from gdk_error_trap code

(there are actual docs in the template file, these were some kind of
pre-gtk-doc comments without useful info)
Comment 2 Havoc Pennington 2010-09-14 03:11:13 UTC
Created attachment 170215 [details] [review]
Replace crufty old code for gdk error traps with GQueue and GSlice

No need to do a manual free list these days.
Comment 3 Havoc Pennington 2010-09-14 03:11:16 UTC
Created attachment 170216 [details] [review]
Revamp and modernize X error traps

* add per-display gdk_x11_display_error_trap_push()
  (X11-specific because gdk_error_trap_push() probably
  should have been)
* make gdk_error_trap_push() handle only GDK displays
  not displays opened without a GDK wrapper
* make gdk_error_trap_pop() and gdk_x11_display_error_trap_pop()
  automatically sync only if needed, so manual gdk_flush() is not
  required
* add gdk_error_trap_pop_ignored() which just asynchronously
  ignores errors, so never needs to sync
* add G_GNUC_WARN_UNUSED_RESULT to plain pop(), because
  if you use plain pop() and don't need the return value,
  the async gdk_error_trap_pop_ignored() should be used
  instead. This results in lots of warnings to clean
  up in a later patch.

The main objective here was to avoid the need to sync just
to ignore an error. Now, syncing is automatic, and only
happens when we need to know the error code.
Comment 4 Matthias Clasen 2010-09-14 11:48:55 UTC
Review of attachment 170214 [details] [review]:

Looks good
Comment 5 Matthias Clasen 2010-09-14 11:49:12 UTC
Review of attachment 170215 [details] [review]:

Looks good
Comment 6 Matthias Clasen 2010-09-14 11:50:48 UTC
Review of attachment 170216 [details] [review]:

Looks good to me (without testing it).
We should probably write a test for X error handling - thats one thing thats fairly straightforward to test in X.
Comment 7 Matthias Clasen 2010-09-18 22:22:21 UTC
Committed with some minimal test.
Comment 8 Owen Taylor 2010-09-20 17:43:29 UTC
This stuff is pretty nifty; the asynchronous error handling idea has been floating around for a long time without anybody picking it up, and using XLastKnownRequestProcessed gets rid of the whole "do I need to sync" and even is compatible with existing code that syncs manually.

(_gdk_x11_send_client_message_async could presumably be removed now and replaced with an error trap.)

But I don't really understand how the async error traps work since you remove the X error handler when gdk_x11_display_error_trap_pop_ignored() is called. Any errors that occur after that point will be untrapped?

The need to leave error traps around is a huge problem for integration with other code that does error traps in the same process; I don't see how it works to have GTK+ using this async code and Clutter in the same process.

(luminocity contains an implementation of async error traps using Xlibint.h and display->async_handlers, but it's documented as being non-thread-safe and having side effects. Perhaps XCB could be used to make a better version or be extended for that purpose.)

But maybe I'm just completely missing something.
Comment 9 Havoc Pennington 2010-09-20 18:06:35 UTC
No, I just didn't think of that. doh. (I translated from where I coded this before with an always-installed handler and just unthinkingly kept the GDK behavior.)

I guess we could always force a sync (if we aren't already synced) in the outermost pop_ignored() ... which would at least keep the benefits that nested traps would only sync once and you wouldn't have to manually sync.

Only alternative would be to leave our error handler installed always I suppose.

(That might actually work fine with clutter since clutter will save/restore our handler?)

I think it's logical that if GDK "owns" a display it also sets the default error handler, fwiw.

In the clutter case there's also the dream solution of a clutter/gdk backend.
Comment 10 Matthias Clasen 2010-09-20 18:10:18 UTC
After some irc discussion, to make async error traps fully work, we probably need the traps to record their serial ranges, and leave traps in place until we know their range has been fully processed.
Comment 11 Owen Taylor 2010-09-20 18:28:48 UTC
(In reply to comment #9)
> No, I just didn't think of that. doh. (I translated from where I coded this
> before with an always-installed handler and just unthinkingly kept the GDK
> behavior.)
> 
> I guess we could always force a sync (if we aren't already synced) in the
> outermost pop_ignored() ... which would at least keep the benefits that nested
> traps would only sync once and you wouldn't have to manually sync.
> 
> Only alternative would be to leave our error handler installed always I
> suppose.

Or leave it installed until the trap list is empty.
 
> (That might actually work fine with clutter since clutter will save/restore our handler?)

It sort of works. if we make the assumption that:

 - Error traps are saved and restored properly
 - Error traps never do anything but blanket ignore all errors

Then the worst that will happen is that someone else's error trap will eat the errors that we were going to ignore. Which seems mostly harmless.

If someone else had a complicated error handler, like the one you implemented here, then things might easily go sour. But the main error handlers we have to interact with - Clutter, the one in gdk/x11/xsettings-client.c, etc, do fit the simple model.

> I think it's logical that if GDK "owns" a display it also sets the default
> error handler, fwiw.

Perhaps. It does makes the error trap something else that needs to be carefully negotiated for SWT, Openoffice.org, etc.

(This patch already breaks the hacks that Mutter was using - I think I can make Mutter just use the GDK error traps though - the only point of the Metacity/Mutter error trap *seems* to have been to get errors logged to g_log instead of stderr)
Comment 12 Matthias Clasen 2010-09-20 18:38:12 UTC
> Only alternative would be to leave our error handler installed always I
> suppose.

That is not a bad idea.
Comment 13 Owen Taylor 2010-09-20 18:48:25 UTC
Oh, another important problem with the patch - it causes any X errors on displays not managed by GTK+ to be silently ignored.
Comment 14 Owen Taylor 2010-09-20 19:11:40 UTC
And a request for an improvement - since the code is saving serial ranges, it should use it to not-trap errors outside of a error trap. There was a confusing problem in Mutter where an error was accidentally getting ignored in one gdk_error_trap_push()/pop() pair and with this change migrated to another gdk_error_trap_push()/pop() pair where the return value was checked.
Comment 15 Havoc Pennington 2010-09-20 19:22:15 UTC
It's tough to solve the "messing with other displays" problem. It only matters if the other display reads from its socket while our trap is pushed, right? But once we assume that can happen, if we ignore then we might ignore something it was trapping; and if we don't ignore we might die on something it was ignoring. Unless I'm missing something. The good news is that, without threads, this normally should not come up. The bad news is, threads would be one of the main reasons to open a different display.

I'm not sure there's a great solution here other than fixing Xlib or Xcb to have a per-display error handler.

One possible approach is to call the previous error handler on stuff from other displays, though it's really only a partial solution, it might improve things.

The previous code had the same issue right? So at least it isn't new.

I don't think I understand comment 14; doesn't this code address it:
      if (SEQUENCE_COMPARE (trap->start_sequence, <=, error->serial) &&
          (trap->end_sequence == 0 ||
           SEQUENCE_COMPARE (trap->end_sequence, >, error->serial)))
        {
          ignore = TRUE;
          trap->error_code = error->error_code;
        }

If it doesn't then I don't think I know what you mean
Comment 16 Owen Taylor 2010-09-20 19:26:26 UTC
(In reply to comment #14)
> And a request for an improvement - since the code is saving serial ranges, it
> should use it to not-trap errors outside of a error trap. There was a confusing
> problem in Mutter where an error was accidentally getting ignored in one
> gdk_error_trap_push()/pop() pair and with this change migrated to another
> gdk_error_trap_push()/pop() pair where the return value was checked.

Actually, was wrong about what was going wrong in Mutter, the problem is that the behavior of nested error traps changed. Previously:

 gdk_error_trap_push();
 gdk_error_trap_push();
 /* Do something triggering an error */
 gdk_error_trap_pop(); /* returns error */
 gdk_error_trap_pop(); /* returns success */

Now:

 gdk_error_trap_push();
 gdk_error_trap_push();
 /* Do something triggering an error */
 gdk_error_trap_pop(); /* returns error */
 gdk_error_trap_pop(); /* returns error */

I consider this a bug - pushing a trap should hide the error from enclosing code.
Comment 17 Owen Taylor 2010-09-20 19:32:11 UTC
(In reply to comment #15)
> It's tough to solve the "messing with other displays" problem. It only matters
> if the other display reads from its socket while our trap is pushed, right? But
> once we assume that can happen, if we ignore then we might ignore something it
> was trapping; and if we don't ignore we might die on something it was ignoring.
> Unless I'm missing something. The good news is that, without threads, this
> normally should not come up. The bad news is, threads would be one of the main
> reasons to open a different display.

Remember, gdk_x_error is installed by _gdk_windowing_init(), so it's always in place, not just during traps. Since it is replacing the default X error handler, it needs to act like it and die with a verbose error when an error occurs. Ignoring threads (and the whole XSetErrorHandler thing is pretty messed up for the multithreaded case), then ignoring works fine if people using other displays use simple save/ignore errors/restore error traps.

> I'm not sure there's a great solution here other than fixing Xlib or Xcb to
> have a per-display error handler.
> 
> One possible approach is to call the previous error handler on stuff from other
> displays, though it's really only a partial solution, it might improve things.
> 
> The previous code had the same issue right? So at least it isn't new.
> 
> I don't think I understand comment 14; doesn't this code address it:
>       if (SEQUENCE_COMPARE (trap->start_sequence, <=, error->serial) &&
>           (trap->end_sequence == 0 ||
>            SEQUENCE_COMPARE (trap->end_sequence, >, error->serial)))
>         {
>           ignore = TRUE;
>           trap->error_code = error->error_code;
>         }
> 
> If it doesn't then I don't think I know what you mean

Yeah, see my other comment - the problem is scoping - the error must only be stored in the innermost trap.
Comment 18 Havoc Pennington 2010-09-20 19:39:30 UTC
I see, we want only the innermost push frame to get the error. That seems right (and an easy change).
Comment 19 Owen Taylor 2010-09-20 20:15:07 UTC
Created attachment 170710 [details] [review]
Only store error codes in inner-most X error trap

When an error occurs with nested traps in place, only the innermost
trap should have the error code stored in it; outer traps are
shielded by the inner trap.
Comment 20 Havoc Pennington 2010-09-20 20:17:28 UTC
Review of attachment 170710 [details] [review]:

Looks good to me
Comment 21 Owen Taylor 2010-09-20 20:36:55 UTC
Comment on attachment 170710 [details] [review]
Only store error codes in inner-most X error trap

Attachment 170710 [details] pushed as 14e38da - Only store error codes in inner-most X error trap
Comment 22 Matthias Clasen 2010-09-21 04:44:53 UTC
Created attachment 170730 [details] [review]
Don't prematurely remove the X error handler

We cannot just remove the X error handler when gdk_error_trap_pop_ignored()
is called, we need to wait for the trap to actually be outdated.
Comment 23 Matthias Clasen 2010-09-21 04:58:08 UTC
That patch is not quite right, since it makes use not enfore proper pairing of pushes and pops anymore. A correct fix is probably to only remove the X error handler if the push_count is 0 and all traps have been deleted.
Comment 24 Matthias Clasen 2010-09-21 05:40:01 UTC
Created attachment 170733 [details] [review]
Don't prematurely remove the X error handler

I have convinced myself that the patch is correct after all; it just needs to pop the error handler in every place where it frees a GdkErrorTrap. In that way, _gdk_error_handler_push_count actually counts the number of GdkErrorTraps that are in place.
Comment 25 Havoc Pennington 2010-09-21 12:21:38 UTC
Review of attachment 170733 [details] [review]:

Isn't the problem here that this handler may not be removed until the next time there's a push ... the "delete old dead traps" stuff is pretty lazy and the goal was just to run it whenever we were going to add to the list again.
This would mean that we'd do something like:

push()   // saves previous error handler
blah blah
pop()    // leaves gdk_x_error in place
run various main loop iterations
push()   // with side effect of XSetErrorHandler() back to previous

This seems hard to manage. If it's something like Mutter trying to permanently replace the handler, it's OK-ish I suppose if unpredictable. With something like Clutter that might have temporarily pushed a handler, if we'd pushed with the clutter handler pushed, we'd end up putting the clutter handler back when it had been popped, perhaps. If we'd pushed when clutter was not pushed we might put gdk_x_error back with clutter still pushed.
Comment 26 Matthias Clasen 2010-09-21 14:23:52 UTC
But if the intention is to actually ignore X errors for a certain range of serials, then just leaving the traps around, but removing the gdk_x_error error handler that makes use of them is not achieving that.

I agree that to make this work, we probably need to purge traps more often (after each roundtrip ?) 

Btw, I notice that there seems to be an off-by-one error in the end_sequence calculation. gdk_display_error_event does

      if (SEQUENCE_COMPARE (trap->start_sequence, <=, error->serial) &&
          (trap->end_sequence == 0 ||
           SEQUENCE_COMPARE (trap->end_sequence, >, error->serial)))

so end_sequence is the first request _after_ the trap. But delete_outdated_error_traps does

      if (trap->end_sequence != 0 &&
          SEQUENCE_COMPARE (trap->end_sequence, <, processed_sequence))

I think that should be <=, since end_sequence is already after the trap, so if we know that's processed, the trap is dead.
Comment 27 Matthias Clasen 2010-09-21 16:20:09 UTC
Actually, Owen informed me that we have gdk_x_error in place, independent of all the handler pushing and popping, from _gdk_windowing_init. So things are probably fine, modulo the off-by-one error.
Comment 28 Havoc Pennington 2010-09-21 16:56:29 UTC
I think with the off by one I was trying to consider that we could get multiple things with one sequence (events + a reply). However, I think it might be guaranteed that the error event is always the first thing with the sequence number. i.e. X may not allow sending some motion events with the sequence from a request, before sending the error event for that request. If the error event is always the first thing with a sequence then <= should be OK.
Comment 29 Havoc Pennington 2010-09-21 16:56:59 UTC
well, except what I said makes no sense since we aren't ignoring errors on end_sequence. so nevermind, you're right.