GNOME Bugzilla – Bug 733282
glx_event_filter_cb: SIGSEGV
Last modified: 2014-09-11 15:12:54 UTC
cogl-1.18.2-2.fc22.x86_64 kernel-3.16.0-0.rc4.git3.1.fc22.x86_64 xorg-x11-server-Xorg-1.15.99.904-3.fc22.x86_64 00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09) Subsystem: Lenovo Device [17aa:21f9] Kernel driver in use: i915 Kernel modules: i915 totem crashing always at start. (gdb) t a a bt full
+ Trace 233829
Thread 1 (Thread 0x7ffff7faf980 (LWP 27482))
Just looking at this briefly; the implication seems to be that you are receiving a GLX_BufferSwapComplete event while there are no outstanding swaps as far as Cogl knows. onscreen->pending_frame_infos represents a queue of pending CoglFrameInfo structs that we keep until we receive a complete event from the X server. In this case it looks like cogl receives a swap-complete event, tries to pop an info struct off this queue which is empty and so you get a NULL dereference. There aren't that many places where we push and pop structs to and from that queue: cogl-onscreen.c:cogl_onscreen_swap_buffers_with_damage and cogl_onscreen_swap_region() we push structs before we ask X to handle the swap and if we know the X server doesn't support GLX swap complete events we also pop the structs after we've sent the request to X. cogl-winsys-glx.c:flush_pending_notifications_cb() - we pop when flushing completions cogl-onscreen.c:_cogl_onscreen_free() - we pop when freeing a cogl onscreen Perhaps if you're comfortable with debugging C code, you could use gdb or some printf debugging to trace these points where we push and pop CoglFrameInfos to the onscreen->pending_frame_infos queue to perhaps know more precisely what kind of miss match you are seeing. For reference; this code hasn't been touched for quite some time so it's a bit surprising to hear of a breakage like this. My instinct at the moment is that this is the result of an X server/ddx driver change. Are you perhaps using dri3 for example I wonder?
(In reply to comment #1) <snip> > My instinct at the moment is that this is the result of an X server/ddx driver > change. Are you perhaps using dri3 for example I wonder? Given WebKit traces that mention:
+ Trace 233836
I'd say that yes, I'm using DRI3. I'm using the devel version of Fedora 21.
It could help to try testing with DRI3 disabled via LIBGL_DRI3_DISABLE=1. I'm not yet familiar with how dri2 clients interact with a dri3 compositor so if it doesn't make a difference to only disable dri3 for totem then I'd also try relaunching the shell with dri3 disabled: LIBGL_DRI3_DISABLE=1 gnome-shell -r
(In reply to comment #3) > It could help to try testing with DRI3 disabled via LIBGL_DRI3_DISABLE=1. That works around the problem for me.
(In reply to comment #1) > Just looking at this briefly; the implication seems to be that you are > receiving a GLX_BufferSwapComplete event while there are no outstanding swaps > as far as Cogl knows. This is bizarre. This changed in DRI3 so that the GLX_BufferSwapComplete event was sent directly from the X server when presented to the screen, but it should only happen for Pixmaps that have selected for the event: http://cgit.freedesktop.org/xorg/xserver/tree/glx/glxcmds.c#n2509 Maybe mesa is getting confused between both the GLX_BufferSwapComplete and the PresentCompleteNotify events? Are you seeing multiple GLX events for the same Pixmap when perhaps you shouldn't be?
*** Bug 735425 has been marked as a duplicate of this bug. ***
I did some debug printfs in cogl when this happend. The problem is that the first swapbuffer gets *two* GLX_BufferSwapComplete events for the same drawable. If those are handled fast enough everything continues fine, but if we handle the second one after the second pending frame info is freed then it crashes.
Can you look in mesa and see if this is being hit twice? http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/glxext.c#n132 If so, we're getting two events out of the X server. If not, another event is being synthesized in mesa, which we should figure out.
Jasper: It is being hit twice, so we're getting two events from the Xserver.
Yeah, glXSwapBuffers is called once, but we get two GLX_BufferSwapComplete from the server. Is there any other call that can generate BufferSwapComplete?
Under DRI2, the GLX_BufferSwapComplete is synthesized from the DRI2_BufferSwapComplete event inside mesa: http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/dri2.c#n96 Are we possibly hitting that path, or is the server path hit twice for the same drawable?
Jasper: That code is never hit, because this is with DRI3, not DRI2. It instead hits the point in comment #8 twice. If i run with LIBGL_DRI3_DISABLE=1 the problem disappears.
So it sounds like the code in the X server is being hit twice. Can you see if present_vblank_notify is being called twice for the same drawable? http://cgit.freedesktop.org/xorg/xserver/tree/present/present.c#n181 The mesa code doesn't call Present with any notifies, so we shouldn't be hitting the for loop below. If that happens, I have no idea what to tell you, other than to badger Keith.
Its called twice, first with KIND PresentCompleteKindPixmap (0) then with KIND PresentCompleteKindMSCNotify (1). Both times with mode PresentCompleteModeCopy (1). I guess both of these trickle down to the app with no way to differentiate between them?
On the client side, in __glXWireToEvent for the GLX_BufferSwapComplete the first awire is: {type = 104 'h', pad = 0 '\000', sequenceNumber = 123, event_type = 33153, pad2 = 0, drawable = 39845905, ust_hi = 4, ust_lo = 2559142700, msc_hi = 0, msc_lo = 19740, sbc = 1} And the second is: {type = 104 'h', pad = 0 '\000', sequenceNumber = 124, event_type = 33153, pad2 = 0, drawable = 39845905, ust_hi = 4, ust_lo = 2608674467, msc_hi = 0, msc_lo = 22712, sbc = 1} Both seem more or less identical...
http://cgit.freedesktop.org/xorg/xserver/tree/present/present_event.c#n176 I wonder if this should only call complete_notify for PresentCompleteKindPixmap, since this is where we lose the "kind" details.
Keithp: Is the above correct? Does X need a fix here?
Patches here: http://lists.x.org/archives/xorg-devel/2014-September/043722.html Will test tomorrow
Yeah, I've sent a patch to xorg-devel which passes the 'kind' argument to the notify hook GLX uses, and then has GLX ignore non-pixmap notifications. Testing and review welcome, as always!
I tested the patch here and it fixes the crash.
Closing NOTGNOME as the X fix fixes this.