GNOME Bugzilla – Bug 705591
SIGSEGV in various conformance tests on wayland
Last modified: 2013-08-20 15:29:22 UTC
Created attachment 251001 [details] GDB backtrace Various cogl conformance tests produce SIGSEGV during cogl shutdown/cleanup on wayland. $ export COGL_DRIVER=gles2 $ export COGL_RENDERER=egl_wayland $ libtool --mode=execute \ gdb --eval-command="start" --eval-command="b test_color_hsl" \ --args ./test-conformance test_color_hsl See attached gdb backtrace for more details. wayland (master) heads/master-0-gc1fd097 drm (master) heads/master-0-g3c967e7 mesa (master) heads/master-0-g00a945f weston (master) heads/master-0-g09252d4 cogl (cogl-1.16) heads/cogl-1.16-0-ga4fa571
Only seeing this issue on a 64bit software stack. That is, 32bit software stack seems ok.
Thanks for this Artie. I'm afraid I run a 32bit distro so I'm not able to test this myself currently. I know Neil runs a 64bit distro though so when he gets back from holiday (next week I think) then hopefully he'll be able to take a look into this.
I am not able to replicate this on my 64bit distro even after checking out exactly the same commit hashes for wayland, drm, mesa, weston and cogl. However looking at the backtrace I can see a potential issue. We are calling eglTerminate after destroying the Wayland display and Mesa is trying to access some data in the display. Maybe this works for me because of some differences in our libc libraries that causes free() not to clear the data or something. I am attaching a patch to swap the order. Artie, It would be great if you could verify whether this fixes the bug for you.
Created attachment 252233 [details] [review] wayland: Call eglTerminate before destroying wl_display, not after The eglTerminate code in Mesa will try to destroy the wl_drm object which involves using data structures in the wl_display. Cogl was disconnecting the display before calling eglTerminate which meant that this would end up accessing potentially garbage data.
Just a further note to say that I can see the problem in Valgrind. I get a bunch of errors like the following without the patch but if I apply it they go away. ==13117== Invalid read of size 4 ==13117== at 0x3A0AE05C93: __pthread_mutex_unlock_full (in /usr/lib64/libpthread-2.17.so) ==13117== by 0x77D2F3A: wl_proxy_destroy (wayland-client.c:293) ==13117== by 0x6970353: wl_drm_destroy (wayland-drm-client-protocol.h:192) ==13117== by 0x6971B49: dri2_terminate (platform_wayland.c:626) ==13117== by 0x695B7CC: eglTerminate (eglapi.c:345) ==13117== by 0x4C9642E: _cogl_winsys_renderer_disconnect (cogl-winsys-egl-wayland.c:126) ... ==13117== Address 0x9ac9d30 is 256 bytes inside a block of size 336 free'd ==13117== at 0x4A084C4: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==13117== by 0x77D37DD: wl_display_disconnect (wayland-client.c:634) ==13117== by 0x4C96411: _cogl_winsys_renderer_disconnect (cogl-winsys-egl-wayland.c:122) ==13117== by 0x4C44BAD: _cogl_renderer_free (cogl-renderer.c:252) ...
(In reply to comment #4) > Created an attachment (id=252233) [details] [review] > wayland: Call eglTerminate before destroying wl_display, not after > > The eglTerminate code in Mesa will try to destroy the wl_drm object > which involves using data structures in the wl_display. Cogl was > disconnecting the display before calling eglTerminate which meant that > this would end up accessing potentially garbage data. Even if this doesn't fix the crash Artie is seeing, this patch looks good to land to me: Reviewed-by: Robert Bragg <robert@linux.intel.com>
(In reply to comment #3) > I am not able to replicate this on my 64bit distro even after checking out > exactly the same commit hashes for wayland, drm, mesa, weston and cogl. However > looking at the backtrace I can see a potential issue. We are calling > eglTerminate after destroying the Wayland display and Mesa is trying to access > some data in the display. Maybe this works for me because of some differences > in our libc libraries that causes free() not to clear the data or something. I > am attaching a patch to swap the order. Artie, It would be great if you could > verify whether this fixes the bug for you. The attached patch fixes this bug for me :)
(In reply to comment #7) Great, thanks. The patch is now in master and the 1.16 branch.