GNOME Bugzilla – Bug 758816
Incorrect choice of signal marshaller causes crash when promoting a pawn in GNOME Chess when built with Fedora or Debian hardening flags
Last modified: 2017-03-14 11:21:10 UTC
3.18.0 crashes 100% when promoting a pawn Backtrace at Red Hat #1280470
I can only reproduce with Fedora builds, never with anything I build outside of mock, so maybe a GCC bug, or an issue caused by hardening somehow. What's strange is that the backtrace is simple, but I just don't see what could possibly be wrong.
+ Trace 235773
Will take a look.
Warning: this one is hard! The next step is to figure out why only Fedora builds crash; it might be a particular optimization or hardening flag, for example. I really have no clue what is wrong here. :(
(In reply to Michael Catanzaro from comment #3) > Warning: this one is hard! > > The next step is to figure out why only Fedora builds crash; it might be a > particular optimization or hardening flag, for example. I really have no > clue what is wrong here. :( Yeah, I tried reproducing using this: git checkout f20e68a978bab1d2f36e03b9ff04bb2995a1f822 (Prepare 3.18) make && make-install jhbuild run gnome-chess but and didn't see any crashes. :| The only thing I could see on the logs when promoting was: (gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars (gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars (gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars (gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars *For which we already have this bug: https://bugzilla.gnome.org/show_bug.cgi?id=738946
I tested by setting the Fedora CFLAGS in my jhbuildrc: CFLAGS = -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic But still could not trigger the crash.
I disabled hardening in the Fedora package and the crash went away... reenabled it and the crash returned... not sure what is wrong, but I'd say it's not our problem.
Reported this here: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/AYHBD5JYGIPZRZ5ENXSNT35YP5HTM256/
Thanks Michael! :D
Florian Weimer found the problem: "The backtrace with its register dump suggests that the upper 32 bits of the return value from the signal were clipped. Unfortunately, the glib signals mechanism does not have compile-time or run-time type checks, so it can easily happen, but is somewhat difficult to track down. It seems that g_cclosure_user_marshal_ENUM__VOID invokes the callback with an int return value, but the registered callback (_chess_application_show_promotion_type_selector_chess_scene_choose_promotion_type) has a return type of PieceType *. Any change in register allocation can make this bug appear and disappear, it's not related to the hardening flags." The signal declaration in Vala is: public signal PieceType? choose_promotion_type (); Reassigning this to Vala. I guess Vala got confused by the nullability of the return value. It needs to use a different marshaller, or pass NULL to allow GLib to pick the best marshaller.
*** Bug 761685 has been marked as a duplicate of this bug. ***
(In reply to Michael Catanzaro from comment #9) > or pass NULL to > allow GLib to pick the best marshaller. I hear it's nowadays best practice to just always pass NULL, and never define your own marshallers unless profiling suggests it improves performance.
I've encountered this bug on Debian testing. I can workaround the underlying Vala compiler issue by changing the return type of the choose_promotion_type() signal and show_promotion_type_selector() function to be the base type "PieceType" instead of the nullable type "PieceType?". I use PieceType.PAWN instead of null as a cancel value. Obviously the best solution would be to fix Vala, but the change avoids the issue, at least as far as gnome-chess is concerned.
FWIW: the workaround I used in Fedora is "%undefine _hardened_build" in the spec. :(
This bug should be an improvement and recommend not to use nulleable types in signals on Vala code, but in C code and use a VAPI to be usable in your Vala code. Tagged for Vala 2.0 milestone. I'll add this recommendation to my book as a Vala limitation: https://www.gitbook.com/book/danielespinosa/vala-tips-tricks/details
I think we should fix the compiler to make nullable return types work. There's no good reason to prevent use of nullable return types, right? I believe all we have to do is use g_cclosure_marshal_generic instead of g_cclosure_user_marshal_ENUM__VOID.
Created attachment 347839 [details] [review] signalmodule: Handle nullable ValueTypes properly and treat them as pointer Nullable value-types are actually pointers to heap-allocated structures. Therefore a pointer-based marshaller is required for those types.
Attachment 347839 [details] pushed as f6e29bd - signalmodule: Handle nullable ValueTypes properly and treat them as pointer