GNOME Bugzilla – Bug 727115
Segfault when running as X11 compositor (process_damage on dead window)
Last modified: 2014-03-27 13:06:51 UTC
drago01> Jasper: ok http://www.fpaste.org/88990/13958701/ <drago01> Jasper: why do we process damage on a non existent window? * API (~api@92.12.165.83.dynamic.mundo-r.com) hat #gnome-shell betreten <drago01> Jasper: <drago01> master does <drago01> if (window_actor == NULL) <drago01> return; <drago01> that code is removed in wayland * mclasen_afk (~mclasen@pool-108-20-138-235.bstnma.fios.verizon.net) hat #gnome-shell betreten * rib (~bob@cpc26-heme10-2-0-cust305.9-1.cable.virginm.net) hat #gnome-shell betreten <Jasper> drago01, where? <drago01> Jasper: https://git.gnome.org/browse/mutter/tree/src/compositor/compositor.c#n154 vs. https://git.gnome.org/browse/mutter/tree/src/compositor/compositor.c?h=wayland#n139 <Jasper> drago01, I removed it intentionally <Jasper> drago01, why is MetaWindowActor NULL there? <drago01> Jasper: got destroyed ? <Jasper> #1 0x00007fe1cf2779e6 in process_damage (compositor=0x2678ab0, window=0x0, event=0x7fffe7137b90) at compositor/compositor.c:146 <Jasper> that seems utterly broken * mclasen_afk ist jetzt bekannt als mclasen <Jasper> we should not be calling process_damage with a NULL window ever <Jasper> that's why I removed it -- because we need to figure out what's giving us NULL windows rather than just fizzling these requests out <Jasper> meta_window_get_compositor_private does <Jasper> if (!window) <Jasper> return NULL; <Jasper> which is just broken <Jasper> drago01, can you figure out how to reproduce it? <drago01> ok lets see <yoseforb> Jasper: Hey <Jasper> yoseforb, hey <yoseforb> Jasper: I just want to say something is of some strange for me - already one week and there isn't even one crashes in my gnome-shell! <Jasper> your ccache is broken <yoseforb> I use gnome-shell from git for many time, and this almost never not happen <Jasper> there is no way that is actually true <Jasper> yoseforb, oh, you're probably using x11 <Jasper> yeah, we moved all the crashes to the wayland branch * shanttu hat die Verbindung getrennt (Leaving) <Jasper> use wayland. that will make things behave as normal again <drago01> Jasper: trying to raise the pidgin window by clicking on the tray icon <yoseforb> what this mean yoe move all the crashed to the wayland branch? the crashes is gone? or there is a leak instead? * aday_ (~aday@5d605f47.skybroadband.com) hat #gnome-shell betreten <drago01> yoseforb: should I commit some breakages? ;) <Jasper> according to the internet, GNOME can't possibly work <Jasper> so go for it drago01 <Jasper> drago01, I assume what happens then is that the window unmanages, and then we damage events for it? <Jasper> So the meta_display_lookup_x_window returns NULL? <Jasper> Can you add some print statements there? <yoseforb> Jasper: but the memory usage is groun up <yoseforb> now it on 740MB <Jasper> Are you using any extensions? <yoseforb> no <Jasper> Try hitting the trash button in the looking glass. <Jasper> That forces a GC <drago01> Jasper: yes that would make sense <Jasper> drago01, though I have no idea why clicking the tray icon would cause a window to unmanage <drago01> Jasper: it maps / unmaps it <Jasper> ugh <Jasper> instead of using _NET_ACTIVATE_WINDOW? <yoseforb> Jasper: It still there (just 3MB it gone from the memory usage) <Jasper> Probably a memory fragmentation issue then * aday hat die Verbindung getrennt (Ping timeout: 600 seconds) <drago01> Jasper: not sure about that but it is definitly that window that is null and cannot be found by meta_display_lookup_x_window <Jasper> drago01, so why are we getting damage events on it <drago01> Jasper: hmm maybe a timing thing <drago01> Jasper: i.e window sends a damage event gets destroyed and we handle get the damage event for from before the destruction <Jasper> drago01, right. Can you add some print statements to meta_window_unmanage / meta_compositor_process_event to figure out if that's the case? * yoseforb hat die Verbindung getrennt (yoseforb) * stefano hat die Verbindung getrennt (Leaving) * owen hat die Verbindung getrennt (Read error: 148 (No route to host)) <drago01> Jasper: it unmanages the pidgin window <drago01> Jasper: but the damage event seems to be for a different window <drago01> Jasper: no idea which one the id does not show up in xwininfo -tree -root * API hat die Verbindung getrennt (Remote closed the connection) <drago01> Jasper: also <drago01> MetaWindow *window = modified != None ? meta_display_lookup_x_window (display, modified) : NULL; <drago01> if (meta_compositor_process_event (display->compositor, event, window)) <drago01> there is a case where it always passes NULL <drago01> which might have been the reason for the null guard in the first place <drago01> but it makes no sense to do meta_display_lookup_x_window in the calle and in the caller
Program received signal SIGSEGV, Segmentation fault. meta_window_actor_process_x11_damage (self=0x0, event=0x7fffe7137b90) at compositor/meta-window-actor.c:1559 1559 if (priv->surface) (gdb) bt
+ Trace 233399
(In reply to comment #0) > <drago01> MetaWindow *window = modified != None ? meta_display_lookup_x_window > (display, modified) : NULL; > <drago01> if (meta_compositor_process_event (display->compositor, event, > window)) > <drago01> there is a case where it always passes NULL > <drago01> which might have been the reason for the null guard in the first > place > <drago01> but it makes no sense to do meta_display_lookup_x_window in the calle > and in the caller https://git.gnome.org/browse/mutter/tree/src/compositor/compositor.c?h=wayland#n924 We actually always look up the window here for damage events, rather than in display.c. I imagine that at one point, display.c didn't have the damage event base, so it needed to be here. We really should only be getting damage events on windows we select for. The weird case is that there's a damage in the event queue for a window after it's already unmanaged, but I'm not 100% sure that's what's happening here.
Back in ancient times we had this: https://git.gnome.org/browse/mutter/tree/src/compositor/compositor-clutter.c?id=9b3a0d1a#n1362 and https://git.gnome.org/browse/mutter/tree/src/compositor/compositor-clutter.c?id=9b3a0d1a#n1370 This got removed at some point and we ended up with the NULL check.
Seems like Owen added that in https://git.gnome.org/browse/mutter/commit/?id=c60d4c2bc43d374269f3148bc752a48b6033376a Removing the NULL guard in process_damage looks wrong given that history ... we should probably just re add it (it has been there for a reason).
Yeah, I'm fine with adding back the NULL check if "we got damage for an unmanaged window" is what's going on. I was concerned with it being generic, though. For stuff like maximize / minimize, we should *never* have a NULL MetaWindowActor. I'm probably also going to remove the NULL check in meta_window_get_compositor_private.
(In reply to comment #5) > Yeah, I'm fine with adding back the NULL check if "we got damage for an > unmanaged window" is what's going on. > > I was concerned with it being generic, though. For stuff like maximize / > minimize, we should *never* have a NULL MetaWindowActor. I'm probably also > going to remove the NULL check in meta_window_get_compositor_private. Well the thing here is not a MetaWindow that has not MetaWindowActor but a X window that has no MetaWindow actor in the first place.
Created attachment 273084 [details] [review] compositor: Don't call process_damage if the window is gone We might get a damage event for an already unmanaged window calling process_damage is pointless and causes a crash so simply skip that case.
Review of attachment 273084 [details] [review]: If you've verified that this is what's going on (and since it happens when Pidgin unmaps/remaps the window, I'm quite sure it is), LGTM.
Attachment 273084 [details] pushed as 6eeaf09 - compositor: Don't call process_damage if the window is gone