After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 758816 - Incorrect choice of signal marshaller causes crash when promoting a pawn in GNOME Chess when built with Fedora or Debian hardening flags
Incorrect choice of signal marshaller causes crash when promoting a pawn in G...
Status: RESOLVED FIXED
Product: vala
Classification: Core
Component: Code Generator: GSignal
0.34.x
Other Linux
: Normal enhancement
: 0.36
Assigned To: Vala maintainers
Vala maintainers
: 761685 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2015-11-29 21:57 UTC by Michael Catanzaro
Modified: 2017-03-14 11:21 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
signalmodule: Handle nullable ValueTypes properly and treat them as pointer (5.87 KB, patch)
2017-03-13 15:21 UTC, Rico Tzschichholz
committed Details | Review

Description Michael Catanzaro 2015-11-29 21:57:36 UTC
3.18.0 crashes 100% when promoting a pawn

Backtrace at Red Hat #1280470
Comment 1 Michael Catanzaro 2015-11-30 00:36:12 UTC
I can only reproduce with Fedora builds, never with anything I build outside of mock, so maybe a GCC bug, or an issue caused by hardening somehow. What's strange is that the backtrace is simple, but I just don't see what could possibly be wrong.

  • #0 chess_scene_select_square
    at chess-scene.c line 634
  • #1 chess_view_real_button_press_event
    at chess-view.c line 1513
  • #2 _gtk_marshal_BOOLEAN__BOXEDv
    at gtkmarshalers.c line 131
  • #3 _g_closure_invoke_va
    at gclosure.c line 864
  • #4 g_signal_emit_valist
    at gsignal.c line 3292
  • #5 g_signal_emit
    at gsignal.c line 3439
  • #6 gtk_widget_event_internal
    at gtkwidget.c line 7692
  • #7 propagate_event
    at gtkmain.c line 2517
  • #8 propagate_event
    at gtkmain.c line 2619
  • #9 gtk_main_do_event
    at gtkmain.c line 1850
  • #10 gdk_event_source_dispatch
    at gdkeventsource.c line 90
  • #11 g_main_context_dispatch
    at gmain.c line 3154
  • #12 g_main_context_dispatch
    at gmain.c line 3769
  • #13 g_main_context_iterate
    at gmain.c line 3840
  • #14 g_main_context_iteration
    at gmain.c line 3901
  • #15 g_application_run
    at gapplication.c line 2311
  • #16 chess_application_main
    at gnome-chess.c line 8387
  • #17 __libc_start_main
    at libc-start.c line 289
  • #18 _start

Comment 2 Sahil Sareen 2015-12-01 04:17:38 UTC
Will take a look.
Comment 3 Michael Catanzaro 2015-12-02 15:30:25 UTC
Warning: this one is hard!

The next step is to figure out why only Fedora builds crash; it might be a particular optimization or hardening flag, for example. I really have no clue what is wrong here. :(
Comment 4 Sahil Sareen 2015-12-03 12:36:03 UTC
(In reply to Michael Catanzaro from comment #3)
> Warning: this one is hard!
> 
> The next step is to figure out why only Fedora builds crash; it might be a
> particular optimization or hardening flag, for example. I really have no
> clue what is wrong here. :(

Yeah, I tried reproducing using this:
git checkout f20e68a978bab1d2f36e03b9ff04bb2995a1f822 (Prepare 3.18)
make && make-install
jhbuild run gnome-chess
but and didn't see any crashes. :|

The only thing I could see on the logs when promoting was: 
(gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars
(gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars
(gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars
(gnome-chess:7030): Gtk-WARNING **: Content added to the action area of a dialog using header bars

*For which we already have this bug: https://bugzilla.gnome.org/show_bug.cgi?id=738946
Comment 5 Michael Catanzaro 2015-12-19 14:14:00 UTC
I tested by setting the Fedora CFLAGS in my jhbuildrc:

CFLAGS = -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic

But still could not trigger the crash.
Comment 6 Michael Catanzaro 2016-01-27 04:23:07 UTC
I disabled hardening in the Fedora package and the crash went away... reenabled it and the crash returned... not sure what is wrong, but I'd say it's not our problem.
Comment 8 Sahil Sareen 2016-01-27 05:42:33 UTC
Thanks Michael! :D
Comment 9 Michael Catanzaro 2016-01-27 19:45:32 UTC
Florian Weimer found the problem:

"The backtrace with its register dump suggests that the upper 32 bits of
the return value from the signal were clipped.  Unfortunately, the glib
signals mechanism does not have compile-time or run-time type checks, so
it can easily happen, but is somewhat difficult to track down.

It seems that g_cclosure_user_marshal_ENUM__VOID invokes the callback
with an int return value, but the registered callback
(_chess_application_show_promotion_type_selector_chess_scene_choose_promotion_type)
has a return type of PieceType *.

Any change in register allocation can make this bug appear and
disappear, it's not related to the hardening flags."

The signal declaration in Vala is:

public signal PieceType? choose_promotion_type ();

Reassigning this to Vala. I guess Vala got confused by the nullability of the return value. It needs to use a different marshaller, or pass NULL to allow GLib to pick the best marshaller.
Comment 10 Michael Catanzaro 2016-02-07 22:34:16 UTC
*** Bug 761685 has been marked as a duplicate of this bug. ***
Comment 11 Michael Catanzaro 2016-10-14 21:49:02 UTC
(In reply to Michael Catanzaro from comment #9)
> or pass NULL to
> allow GLib to pick the best marshaller.

I hear it's nowadays best practice to just always pass NULL, and never define your own marshallers unless profiling suggests it improves performance.
Comment 12 Tim Greenfield 2016-12-31 01:12:55 UTC
I've encountered this bug on Debian testing.

I can workaround the underlying Vala compiler issue by changing the return type of the choose_promotion_type() signal and show_promotion_type_selector() function to be the base type "PieceType" instead of the nullable type "PieceType?". I use PieceType.PAWN instead of null as a cancel value.

Obviously the best solution would be to fix Vala, but the change avoids the issue, at least as far as gnome-chess is concerned.
Comment 13 Michael Catanzaro 2016-12-31 02:11:53 UTC
FWIW: the workaround I used in Fedora is "%undefine _hardened_build" in the spec. :(
Comment 14 Daniel Espinosa 2017-02-20 14:07:55 UTC
This bug should be an improvement and recommend not to use nulleable types in signals on Vala code, but in C code and use a VAPI to be usable in your Vala code.

Tagged for Vala 2.0 milestone.

I'll add this recommendation to my book as a Vala limitation:

https://www.gitbook.com/book/danielespinosa/vala-tips-tricks/details
Comment 15 Michael Catanzaro 2017-02-20 16:32:05 UTC
I think we should fix the compiler to make nullable return types work. There's no good reason to prevent use of nullable return types, right? I believe all we have to do is use g_cclosure_marshal_generic instead of g_cclosure_user_marshal_ENUM__VOID.
Comment 16 Rico Tzschichholz 2017-03-13 15:21:12 UTC
Created attachment 347839 [details] [review]
signalmodule: Handle nullable ValueTypes properly and treat them as pointer

Nullable value-types are actually pointers to heap-allocated structures.
Therefore a pointer-based marshaller is required for those types.
Comment 17 Rico Tzschichholz 2017-03-14 11:16:44 UTC
Attachment 347839 [details] pushed as f6e29bd - signalmodule: Handle nullable ValueTypes properly and treat them as pointer