After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 733282 - glx_event_filter_cb: SIGSEGV
glx_event_filter_cb: SIGSEGV
Status: RESOLVED NOTGNOME
Product: cogl
Classification: Platform
Component: GLX
1.18.x
Other Linux
: Normal blocker
: ---
Assigned To: Cogl maintainer(s)
Cogl maintainer(s)
: 735425 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2014-07-16 22:02 UTC by Igor Gnatenko
Modified: 2014-09-11 15:12 UTC
See Also:
GNOME target: 3.14
GNOME version: 3.13/3.14



Description Igor Gnatenko 2014-07-16 22:02:04 UTC
cogl-1.18.2-2.fc22.x86_64
kernel-3.16.0-0.rc4.git3.1.fc22.x86_64
xorg-x11-server-Xorg-1.15.99.904-3.fc22.x86_64

00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09)
	Subsystem: Lenovo Device [17aa:21f9]
	Kernel driver in use: i915
	Kernel modules: i915

totem crashing always at start.



(gdb) t a a bt full

Thread 1 (Thread 0x7ffff7faf980 (LWP 27482))

  • #0 glx_event_filter_cb
    at winsys/cogl-winsys-glx.c line 439
  • #1 glx_event_filter_cb
    at winsys/cogl-winsys-glx.c line 561
  • #2 _cogl_renderer_handle_native_event
    at ./cogl-renderer.c line 732
  • #3 cogl_xlib_renderer_handle_event
    at ./cogl-xlib-renderer.c line 589
  • #4 cogl_xlib_filter
    at x11/clutter-backend-x11.c line 131
  • #5 clutter_backend_x11_translate_event
    at x11/clutter-backend-x11.c line 639
  • #6 clutter_x11_handle_event
    at x11/clutter-event-x11.c line 200
  • #7 gtk_clutter_filter_func
    at ./gtk-clutter-embed.c line 233
  • #8 gdk_event_apply_filters
  • #9 _gdk_x11_display_queue_events
  • #10 gdk_display_get_event
  • #11 gdk_event_source_dispatch
  • #12 g_main_context_dispatch
  • #13 g_main_context_iterate.isra
  • #14 g_main_context_iteration
  • #15 g_application_run
  • #16 main
    at totem.c line 282

Comment 1 Robert Bragg 2014-07-17 14:20:51 UTC
Just looking at this briefly; the implication seems to be that you are receiving a GLX_BufferSwapComplete event while there are no outstanding swaps as far as Cogl knows.

onscreen->pending_frame_infos represents a queue of pending CoglFrameInfo structs that we keep until we receive a complete event from the X server. In this case it looks like cogl receives a swap-complete event, tries to pop an info struct off this queue which is empty and so you get a NULL dereference.

There aren't that many places where we push and pop structs to and from that queue:

cogl-onscreen.c:cogl_onscreen_swap_buffers_with_damage and cogl_onscreen_swap_region() we push structs before we ask X to handle the swap and if we know the X server doesn't support GLX swap complete events we also pop the structs after we've sent the request to X.
cogl-winsys-glx.c:flush_pending_notifications_cb() - we pop when flushing completions
cogl-onscreen.c:_cogl_onscreen_free() - we pop when freeing a cogl onscreen

Perhaps if you're comfortable with debugging C code, you could use gdb or some printf debugging to trace these points where we push and pop CoglFrameInfos to the onscreen->pending_frame_infos queue to perhaps know more precisely what kind of miss match you are seeing.

For reference; this code hasn't been touched for quite some time so it's a bit surprising to hear of a breakage like this.

My instinct at the moment is that this is the result of an X server/ddx driver change. Are you perhaps using dri3 for example I wonder?
Comment 2 Bastien Nocera 2014-07-17 14:25:27 UTC
(In reply to comment #1)
<snip>
> My instinct at the moment is that this is the result of an X server/ddx driver
> change. Are you perhaps using dri3 for example I wonder?

Given WebKit traces that mention:
  • #5 dri3_get_buffers

I'd say that yes, I'm using DRI3. I'm using the devel version of Fedora 21.
Comment 3 Robert Bragg 2014-07-19 21:12:11 UTC
It could help to try testing with DRI3 disabled via LIBGL_DRI3_DISABLE=1. I'm not yet familiar with how dri2 clients interact with a dri3 compositor so if it doesn't make a difference to only disable dri3 for totem then I'd also try relaunching the shell with dri3 disabled: LIBGL_DRI3_DISABLE=1 gnome-shell -r
Comment 4 Bastien Nocera 2014-07-23 12:09:17 UTC
(In reply to comment #3)
> It could help to try testing with DRI3 disabled via LIBGL_DRI3_DISABLE=1.

That works around the problem for me.
Comment 5 Jasper St. Pierre (not reading bugmail) 2014-07-23 12:34:24 UTC
(In reply to comment #1)
> Just looking at this briefly; the implication seems to be that you are
> receiving a GLX_BufferSwapComplete event while there are no outstanding swaps
> as far as Cogl knows.

This is bizarre. This changed in DRI3 so that the GLX_BufferSwapComplete event was sent directly from the X server when presented to the screen, but it should only happen for Pixmaps that have selected for the event:

http://cgit.freedesktop.org/xorg/xserver/tree/glx/glxcmds.c#n2509

Maybe mesa is getting confused between both the GLX_BufferSwapComplete and the PresentCompleteNotify events? Are you seeing multiple GLX events for the same Pixmap when perhaps you shouldn't be?
Comment 6 Bastien Nocera 2014-08-26 10:30:00 UTC
*** Bug 735425 has been marked as a duplicate of this bug. ***
Comment 7 Alexander Larsson 2014-08-29 08:44:24 UTC
I did some debug printfs in cogl when this happend. The problem is that the first swapbuffer gets *two* GLX_BufferSwapComplete events for the same drawable. If those are handled fast enough everything continues fine, but if we handle the second one after the second pending frame info is freed then it crashes.
Comment 8 Jasper St. Pierre (not reading bugmail) 2014-08-29 15:59:15 UTC
Can you look in mesa and see if this is being hit twice?

http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/glxext.c#n132

If so, we're getting two events out of the X server. If not, another event is being synthesized in mesa, which we should figure out.
Comment 9 Alexander Larsson 2014-09-01 07:12:56 UTC
Jasper: It is being hit twice, so we're getting two events from the Xserver.
Comment 10 Alexander Larsson 2014-09-01 07:15:25 UTC
Yeah, glXSwapBuffers is called once, but we get two GLX_BufferSwapComplete from the server. Is there any other call that can generate BufferSwapComplete?
Comment 11 Jasper St. Pierre (not reading bugmail) 2014-09-01 20:40:12 UTC
Under DRI2, the GLX_BufferSwapComplete is synthesized from the DRI2_BufferSwapComplete event inside mesa: http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/dri2.c#n96

Are we possibly hitting that path, or is the server path hit twice for the same drawable?
Comment 12 Alexander Larsson 2014-09-02 10:56:17 UTC
Jasper: That code is never hit, because this is with DRI3, not DRI2. It instead hits the point in comment #8 twice. If i run with LIBGL_DRI3_DISABLE=1 the problem disappears.
Comment 13 Jasper St. Pierre (not reading bugmail) 2014-09-03 22:26:19 UTC
So it sounds like the code in the X server is being hit twice. Can you see if present_vblank_notify is being called twice for the same drawable?

http://cgit.freedesktop.org/xorg/xserver/tree/present/present.c#n181

The mesa code doesn't call Present with any notifies, so we shouldn't be hitting the for loop below.

If that happens, I have no idea what to tell you, other than to badger Keith.
Comment 14 Alexander Larsson 2014-09-04 11:13:23 UTC
Its called twice, first with KIND PresentCompleteKindPixmap (0) then with KIND PresentCompleteKindMSCNotify (1). Both times with mode PresentCompleteModeCopy (1).

I guess both of these trickle down to the app with no way to differentiate between them?
Comment 15 Alexander Larsson 2014-09-04 11:19:52 UTC
On the client side, in __glXWireToEvent for the GLX_BufferSwapComplete the first awire is:

  {type = 104 'h', pad = 0 '\000', sequenceNumber = 123, event_type = 33153,	pad2 = 0,  drawable = 39845905, ust_hi = 4, ust_lo = 2559142700, msc_hi = 0, msc_lo = 19740, sbc = 1}

And the second is:

  {type = 104 'h', pad = 0 '\000', sequenceNumber = 124, event_type = 33153,	pad2 = 0,  drawable = 39845905, ust_hi = 4, ust_lo = 2608674467, msc_hi = 0, msc_lo = 22712, sbc = 1}

Both seem more or less identical...
Comment 16 Alexander Larsson 2014-09-04 11:25:02 UTC
http://cgit.freedesktop.org/xorg/xserver/tree/present/present_event.c#n176

I wonder if this should only call complete_notify for PresentCompleteKindPixmap, since this is where we lose the "kind" details.
Comment 17 Alexander Larsson 2014-09-04 11:26:55 UTC
Keithp: Is the above correct? Does X need a fix here?
Comment 18 Alexander Larsson 2014-09-04 15:39:17 UTC
Patches here: http://lists.x.org/archives/xorg-devel/2014-September/043722.html
Will test tomorrow
Comment 19 Keith Packard 2014-09-04 15:40:45 UTC
Yeah, I've sent a patch to xorg-devel which passes the 'kind' argument to the notify hook GLX uses, and then has GLX ignore non-pixmap notifications. Testing and review welcome, as always!
Comment 20 Alexander Larsson 2014-09-05 07:45:35 UTC
I tested the patch here and it fixes the crash.
Comment 21 Alexander Larsson 2014-09-11 15:12:54 UTC
Closing NOTGNOME as the X fix fixes this.