After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 705591 - SIGSEGV in various conformance tests on wayland
SIGSEGV in various conformance tests on wayland
Status: VERIFIED FIXED
Product: cogl
Classification: Platform
Component: Wayland
unspecified
Other All
: Normal major
: ---
Assigned To: Cogl maintainer(s)
Cogl maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-08-06 21:18 UTC by U. Artie Eoff
Modified: 2013-08-20 15:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GDB backtrace (9.56 KB, text/plain)
2013-08-06 21:18 UTC, U. Artie Eoff
  Details
wayland: Call eglTerminate before destroying wl_display, not after (1.56 KB, patch)
2013-08-19 14:33 UTC, Neil Roberts
none Details | Review

Description U. Artie Eoff 2013-08-06 21:18:45 UTC
Created attachment 251001 [details]
GDB backtrace

Various cogl conformance tests produce SIGSEGV during cogl shutdown/cleanup on wayland.

$ export COGL_DRIVER=gles2
$ export COGL_RENDERER=egl_wayland
$ libtool --mode=execute \
  gdb --eval-command="start" --eval-command="b test_color_hsl" \
  --args ./test-conformance test_color_hsl

See attached gdb backtrace for more details.

wayland (master) heads/master-0-gc1fd097
drm (master) heads/master-0-g3c967e7
mesa (master) heads/master-0-g00a945f
weston (master) heads/master-0-g09252d4
cogl (cogl-1.16) heads/cogl-1.16-0-ga4fa571
Comment 1 U. Artie Eoff 2013-08-07 16:14:17 UTC
Only seeing this issue on a 64bit software stack.  That is, 32bit software stack seems ok.
Comment 2 Robert Bragg 2013-08-08 16:46:02 UTC
Thanks for this Artie. I'm afraid I run a 32bit distro so I'm not able to test this myself currently. I know Neil runs a 64bit distro though so when he gets back from holiday (next week I think) then hopefully he'll be able to take a look into this.
Comment 3 Neil Roberts 2013-08-19 14:29:58 UTC
I am not able to replicate this on my 64bit distro even after checking out exactly the same commit hashes for wayland, drm, mesa, weston and cogl. However looking at the backtrace I can see a potential issue. We are calling eglTerminate after destroying the Wayland display and Mesa is trying to access some data in the display. Maybe this works for me because of some differences in our libc libraries that causes free() not to clear the data or something. I am attaching a patch to swap the order. Artie, It would be great if you could verify whether this fixes the bug for you.
Comment 4 Neil Roberts 2013-08-19 14:33:27 UTC
Created attachment 252233 [details] [review]
wayland: Call eglTerminate before destroying wl_display, not after

The eglTerminate code in Mesa will try to destroy the wl_drm object
which involves using data structures in the wl_display. Cogl was
disconnecting the display before calling eglTerminate which meant that
this would end up accessing potentially garbage data.
Comment 5 Neil Roberts 2013-08-19 14:56:24 UTC
Just a further note to say that I can see the problem in Valgrind. I get a bunch of errors like the following without the patch but if I apply it they go away.

==13117== Invalid read of size 4
==13117==    at 0x3A0AE05C93: __pthread_mutex_unlock_full (in /usr/lib64/libpthread-2.17.so)
==13117==    by 0x77D2F3A: wl_proxy_destroy (wayland-client.c:293)
==13117==    by 0x6970353: wl_drm_destroy (wayland-drm-client-protocol.h:192)
==13117==    by 0x6971B49: dri2_terminate (platform_wayland.c:626)
==13117==    by 0x695B7CC: eglTerminate (eglapi.c:345)
==13117==    by 0x4C9642E: _cogl_winsys_renderer_disconnect (cogl-winsys-egl-wayland.c:126)
...
==13117==  Address 0x9ac9d30 is 256 bytes inside a block of size 336 free'd
==13117==    at 0x4A084C4: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==13117==    by 0x77D37DD: wl_display_disconnect (wayland-client.c:634)
==13117==    by 0x4C96411: _cogl_winsys_renderer_disconnect (cogl-winsys-egl-wayland.c:122)
==13117==    by 0x4C44BAD: _cogl_renderer_free (cogl-renderer.c:252)
...
Comment 6 Robert Bragg 2013-08-19 17:07:46 UTC
(In reply to comment #4)
> Created an attachment (id=252233) [details] [review]
> wayland: Call eglTerminate before destroying wl_display, not after
> 
> The eglTerminate code in Mesa will try to destroy the wl_drm object
> which involves using data structures in the wl_display. Cogl was
> disconnecting the display before calling eglTerminate which meant that
> this would end up accessing potentially garbage data.

Even if this doesn't fix the crash Artie is seeing, this patch looks good to land to me:

Reviewed-by: Robert Bragg <robert@linux.intel.com>
Comment 7 U. Artie Eoff 2013-08-19 19:28:22 UTC
(In reply to comment #3)
> I am not able to replicate this on my 64bit distro even after checking out
> exactly the same commit hashes for wayland, drm, mesa, weston and cogl. However
> looking at the backtrace I can see a potential issue. We are calling
> eglTerminate after destroying the Wayland display and Mesa is trying to access
> some data in the display. Maybe this works for me because of some differences
> in our libc libraries that causes free() not to clear the data or something. I
> am attaching a patch to swap the order. Artie, It would be great if you could
> verify whether this fixes the bug for you.

The attached patch fixes this bug for me :)
Comment 8 Neil Roberts 2013-08-20 10:14:51 UTC
(In reply to comment #7)

Great, thanks. The patch is now in master and the 1.16 branch.