GNOME Bugzilla – Bug 731332
Fix occasional crash on startup
Last modified: 2014-06-11 21:36:44 UTC
We currently don't take it very well when _NET_CURRENT_DESKTOP is set and not 0 - in fact, we added two separate regressions to make sure that we don't survive this ...
Created attachment 278027 [details] [review] display: Create compositor before screen Since commit 8b2b65246a86, we assume that the compositor always exists. Alas, the assumption is wrong - the compositor is currently initialized after the screen, but meta_screen_new() itself may call a compositor function if initialization involves a workspace switch (which will happen when meta_workspace_activate() is called more than once and for different workspaces - or in other words, when _NET_CURRENT_DESKTOP is set and not 0). Fix the resulting crash by creating the compositor before the screen, so that the assumption becomes valid.
Created attachment 278028 [details] [review] compositor: Check for plugin_mgr before accessing it The previous commit fixed the assumption of always having a compositor, but by separating meta_compositor_new() from meta_compositor_manage(), we still crash when calling a compositor function from meta_screen_new(), this time because we access the plugin_mgr before it is initialized. Add appropriate checks to all public methods that access plugin_mgr to fix - this is still nicer than the code removed in commit 8b2b65246a6, as the checks are kept private to a single file and we don't have any fallback code in case the check fails.
Created attachment 278029 [details] [review] display: Fix segfault in meta_display_update_active_window_hint() We may end up calling the function from a meta_workspace_activate() call in meta_screen_new(), in which case display->screen is still unset. This is a regression from commit d7519f4ebc32, fix it by adding back the NULL check the commit removed.
Review of attachment 278027 [details] [review]: OK.
Review of attachment 278028 [details] [review]: OK.
Review of attachment 278029 [details] [review]: OK.
I'm uncomfortable with these patches. I meant to reply earlier, but didn't. I'd prefer if we had a meta_screen_init(); or similar that we do after construction to initialize. I'm afraid of blanket: if (foo == NULL) return; tests, because it's too easy to hit that case and leave a state variable uninitialized after we properly initialize foo.
Created attachment 278276 [details] [review] screen: Split workspace initialization from meta_screen_new() Since commit 8b2b65246a86, we assume that the compositor always exists. Alas, the assumption is wrong - the compositor is currently initialized after the screen, but meta_screen_new() itself may call a compositor function if initialization involves a workspace switch (which will happen when meta_workspace_activate() is called more than once and for different workspaces - or in other words, when _NET_CURRENT_DESKTOP is set and not 0). So carefully split out the offending bits and only call them after the compositor has been initialized. To be honest, I'm not entirely convinced that this approach is actually better - it does look cleaner than "random" NULL checks, but it's fairly fragile (and not obviously so), so there's a good chance to break again the next time someone touches it (after all, we managed to add *two* unrelated regressions which triggered a crash when restarting from a workspace other than the first in the last couple of months).
It's up to you: I trust your judgement. Really, either we crash during initialization, or we silently carry on with silently-broken behavior. I prefer the former, since it's easier to debug and fix, but if you prefer the latter, I'm not going to hold you back.
(In reply to comment #9) > Really, either we crash during initialization, or we silently carry on with > silently-broken behavior. Mmmh? Neither the original patch set nor the new patch carries on with "silently-broken" behavior - they both fix the crash during initialization and work exactly as expected after that. (In reply to comment #9) > I prefer the former, since it's easier to debug and fix That sounds great, but the conditions for a crash may be reasonably obscure - I must have restarted mutter/gnome-shell countless times after those regressions, and so must you and others; yet this was only caught completely by chance ...
(In reply to comment #10) > Mmmh? Neither the original patch set nor the new patch carries on with > "silently-broken" behavior - they both fix the crash during initialization and > work exactly as expected after that. After a refactoring with your initial patches, we could change the order of things and thus we'll always hit the: `if (screen == NULL)` test, and silently not ever do the proper initialization, which could be tough to track down. With the other patch, we'll crash instead, which could be easier to catch. But whatever.
Don't get me wrong, I'm not outright against the new patch. But have you looked at it? I can point to you at least three bits that look like they make more sense elsewhere, but the whole order is just so fragile that the consequences would be something between crashing under some conditions, subtly changing behavior or working fine standalone but breaking gnome-shell - and no, not all of those are obvious crashes that are easily traced back. (In reply to comment #11) > After a refactoring with your initial patches, we could change the order of > things and thus we'll always hit the: `if (screen == NULL)` test, and silently > not ever do the proper initialization Not sure I'm following you - the only NULL check I see is the one I added back in update_active_window_hint() (it just wasn't that obvious before, but the check was there for 10 years or so). Again, I don't have a real preference for either approach yet (review on the other patch would help there), but not setting a root window property until the first focus window change hardly appears as catastrophic as you are implying?
Attachment 278276 [details] pushed as 8100cef - screen: Split workspace initialization from meta_screen_new()