After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 731332 - Fix occasional crash on startup
Fix occasional crash on startup
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
Depends on:
Blocks:
 
 
Reported: 2014-06-06 13:14 UTC by Florian Müllner
Modified: 2014-06-11 21:36 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
display: Create compositor before screen (2.97 KB, patch)
2014-06-06 13:14 UTC, Florian Müllner
accepted-commit_now Details | Review
compositor: Check for plugin_mgr before accessing it (3.65 KB, patch)
2014-06-06 13:14 UTC, Florian Müllner
accepted-commit_now Details | Review
display: Fix segfault in meta_display_update_active_window_hint() (1.03 KB, patch)
2014-06-06 13:14 UTC, Florian Müllner
accepted-commit_now Details | Review
screen: Split workspace initialization from meta_screen_new() (5.30 KB, patch)
2014-06-11 15:16 UTC, Florian Müllner
committed Details | Review

Description Florian Müllner 2014-06-06 13:14:39 UTC
We currently don't take it very well when _NET_CURRENT_DESKTOP is set and not 0 - in fact, we added two separate regressions to make sure that we don't survive this ...
Comment 1 Florian Müllner 2014-06-06 13:14:41 UTC
Created attachment 278027 [details] [review]
display: Create compositor before screen

Since commit 8b2b65246a86, we assume that the compositor always
exists. Alas, the assumption is wrong - the compositor is currently
initialized after the screen, but meta_screen_new() itself may
call a compositor function if initialization involves a workspace
switch (which will happen when meta_workspace_activate() is called
more than once and for different workspaces - or in other words,
when _NET_CURRENT_DESKTOP is set and not 0).
Fix the resulting crash by creating the compositor before the screen,
so that the assumption becomes valid.
Comment 2 Florian Müllner 2014-06-06 13:14:45 UTC
Created attachment 278028 [details] [review]
compositor: Check for plugin_mgr before accessing it

The previous commit fixed the assumption of always having a compositor,
but by separating meta_compositor_new() from meta_compositor_manage(),
we still crash when calling a compositor function from meta_screen_new(),
this time because we access the plugin_mgr before it is initialized.
Add appropriate checks to all public methods that access plugin_mgr to
fix - this is still nicer than the code removed in commit 8b2b65246a6,
as the checks are kept private to a single file and we don't have any
fallback code in case the check fails.
Comment 3 Florian Müllner 2014-06-06 13:14:48 UTC
Created attachment 278029 [details] [review]
display: Fix segfault in meta_display_update_active_window_hint()

We may end up calling the function from a meta_workspace_activate()
call in meta_screen_new(), in which case display->screen is still
unset. This is a regression from commit d7519f4ebc32, fix it by
adding back the NULL check the commit removed.
Comment 4 drago01 2014-06-10 20:44:01 UTC
Review of attachment 278027 [details] [review]:

OK.
Comment 5 drago01 2014-06-10 20:44:49 UTC
Review of attachment 278028 [details] [review]:

OK.
Comment 6 drago01 2014-06-10 20:45:15 UTC
Review of attachment 278029 [details] [review]:

OK.
Comment 7 Jasper St. Pierre (not reading bugmail) 2014-06-10 20:53:41 UTC
I'm uncomfortable with these patches. I meant to reply earlier, but didn't. I'd prefer if we had a meta_screen_init(); or similar that we do after construction to initialize. I'm afraid of blanket: if (foo == NULL) return; tests, because it's too easy to hit that case and leave a state variable uninitialized after we properly initialize foo.
Comment 8 Florian Müllner 2014-06-11 15:16:51 UTC
Created attachment 278276 [details] [review]
screen: Split workspace initialization from meta_screen_new()

Since commit 8b2b65246a86, we assume that the compositor always
exists. Alas, the assumption is wrong - the compositor is currently
initialized after the screen, but meta_screen_new() itself may
call a compositor function if initialization involves a workspace
switch (which will happen when meta_workspace_activate() is called
more than once and for different workspaces - or in other words,
when _NET_CURRENT_DESKTOP is set and not 0).
So carefully split out the offending bits and only call them after
the compositor has been initialized.

To be honest, I'm not entirely convinced that this approach is actually better - it does look cleaner than "random" NULL checks, but it's fairly fragile (and not obviously so), so there's a good chance to break again the next time someone touches it (after all, we managed to add *two* unrelated regressions which triggered a crash when restarting from a workspace other than the first in the last couple of months).
Comment 9 Jasper St. Pierre (not reading bugmail) 2014-06-11 15:48:33 UTC
It's up to you: I trust your judgement. Really, either we crash during initialization, or we silently carry on with silently-broken behavior. I prefer the former, since it's easier to debug and fix, but if you prefer the latter, I'm not going to hold you back.
Comment 10 Florian Müllner 2014-06-11 16:16:13 UTC
(In reply to comment #9)
> Really, either we crash during initialization, or we silently carry on with
> silently-broken behavior.

Mmmh? Neither the original patch set nor the new patch carries on with "silently-broken" behavior - they both fix the crash during initialization and work exactly as expected after that.


(In reply to comment #9)
> I prefer the former, since it's easier to debug and fix

That sounds great, but the conditions for a crash may be reasonably obscure - I must have restarted mutter/gnome-shell countless times after those regressions, and so must you and others; yet this was only caught completely by chance ...
Comment 11 Jasper St. Pierre (not reading bugmail) 2014-06-11 17:10:39 UTC
(In reply to comment #10)
> Mmmh? Neither the original patch set nor the new patch carries on with
> "silently-broken" behavior - they both fix the crash during initialization and
> work exactly as expected after that.

After a refactoring with your initial patches, we could change the order of things and thus we'll always hit the: `if (screen == NULL)` test, and silently not ever do the proper initialization, which could be tough to track down. With the other patch, we'll crash instead, which could be easier to catch.

But whatever.
Comment 12 Florian Müllner 2014-06-11 17:31:57 UTC
Don't get me wrong, I'm not outright against the new patch. But have you looked at it? I can point to you at least three bits that look like they make more sense elsewhere, but the whole order is just so fragile that the consequences would be something between crashing under some conditions, subtly changing behavior or working fine standalone but breaking gnome-shell - and no, not all of those are obvious crashes that are easily traced back.


(In reply to comment #11)
> After a refactoring with your initial patches, we could change the order of
> things and thus we'll always hit the: `if (screen == NULL)` test, and silently
> not ever do the proper initialization

Not sure I'm following you - the only NULL check I see is the one I added back in update_active_window_hint() (it just wasn't that obvious before, but the check was there for 10 years or so). Again, I don't have a real preference for either approach yet (review on the other patch would help there), but not setting a root window property until the first focus window change hardly appears as catastrophic as you are implying?
Comment 13 Florian Müllner 2014-06-11 21:36:20 UTC
Attachment 278276 [details] pushed as 8100cef - screen: Split workspace initialization from meta_screen_new()