GNOME Bugzilla – Bug 788834
Calling meta_window_get_monitor on an unmanaged window crashes
Last modified: 2018-06-28 16:32:37 UTC
We were seeing the following crash when calling meta_window_get_monitor on a recently destroyed window (from within gnome-shell): Program terminated with signal SIGSEGV, Segmentation fault.
+ Trace 238047
Mario Sanchez-Prada did some preliminary investigation into this and it looks like the reason for the crash is a null-ptr dereference in meta_window_get_monitor. window->monitor gets set to NULL when the window is destroyed. In that case, we should return something sensible for meta_window_get_monitor and let the caller deal with it.
Created attachment 361331 [details] [review] core: Return -1 if calling meta_window_get_monitor on an unmanaged window This is the most sensible thing to do other than crashing.
FTR, I'm pasting down at the bottom the log from IRC where we discussed a bit about this with Ray. TL;DR: Ray is (rightfully so) concerned about returning -1 in a function that is usually fed straight into arrays (i.e. index), which would not necessary crash as it does now, making issues (programmer errors) harder to detect, debug and fix. My personal take on this (which I think Sam agrees with) is that having a way to crash the shell by calling this get_monitor() function when the MetaWindow has already been unmanaged (which is not possible to know unless you connect to the 'unmanaged' signal) is a bit of a pain, and I particularly would understand returning -1 in that case (as we can't return any index) as a sensible API. Besides, I've checked and there are only 2 places in gnome-shell at the moment that fed the output of get_monitor() straight into an array, which I think could be fixed by simply checking the index != -1 before using it. Comments? IRC log below: <msanchez> jadahl: hi! Quick one: would you be happy with a patch in mutter like this one? https://bugzilla.gnome.org/attachment.cgi?id=361331&action=diff <halfline> sounds a little risky <halfline> if the output gets fed straight into an array <msanchez> We had a problem on Endless with what we call DesktopOverlay (containes an actor that covers the whole desktop for the purpose of showing/managing overlay actors coming out from the side), where the shell would crash when closing overlay windows because b ythe time an notify:allocation callback is called, the associated MetaWindow will be unmanaged <halfline> (which I think it does usually?) <msanchez> halfline: not sure what you mean, exactly <smspillaz> halfline: So in this case, there's basically two things that can be done - either crash or give the caller a value that it might be able to do something useful with <smspillaz> The way we used to handle this in compiz was that we had a "destroyed" prop on the window struct and the policy was "well, its your own fault if you don't check if a window is destroyed first" <smspillaz> but I always thought that was a silly way of doing it <halfline> yea but "your own fault" is us too <halfline> i mean what are the callers doing <halfline> aren't they doing something like priv->monitors[meta_window_get_monitor(window)] or something like that? <msanchez> halfline: they are calling meta_window_get_monitor() <halfline> so -1 would be hard to debug memory corruption ? <halfline> msanchez: what are they doing with the result i mean <smspillaz> well under the status quo, that will either crash because window->monitor is unset, or it would crash because they were passing -1 to an array <halfline> -1 to an array doesn't mean crash necessarily <smspillaz> That's true <halfline> so it could turn a crasher into a hard to debug problem <smspillaz> Indeed <msanchez> halfline: oh yeah, sorry. That's exactly what's doing, using it as an idx to an array <smspillaz> I think the "best" solution here is actually to make meta_window_get_monitor return the monitor through an outparam and take a GError, but I haven't seen that pattern elsewhere in mutter <smspillaz> plus API break <halfline> to me it seems like no matter what, fixes need to be in the caller <halfline> or the callers caller <smspillaz> halfline: do you know if MetaWindow has some sort of property which indicates if the window is unmanaged or destroyed? <halfline> or whatever <smspillaz> halfline: well, IMO exposing bindable API that can crash intuitively seems like the wrong thing to do, but I guess that's where we might differ in opinion <smspillaz> halfline: at the very least, we should probably have an assertion there <msanchez> halfline: I see what you mean. At the same time I think that function returning -1 when there's no monitor available seems sensible to me <msanchez> just checked the shell and there are only a few places where the output of get_monitor() gets fed straight into the monitors array <smspillaz> but I still think you shouldn't be able to crash the shell by calling a public API (and I've updated the "Returns:" section of the gtkdoc to say "it should return -1") <msanchez> and in many of those cases there's a null-check right after accessing the array to check that we actually have a monitor <smspillaz> (in the event that the window is not managed) <smspillaz> msanchez: this is in the JS side right? <msanchez> yes, in the shell's js part <smspillaz> mmm <smspillaz> I wonder if there are any places where we do priv->monitors[meta_window_get_monitor(window)] <smspillaz> in mutter <halfline> hey guys gotta run, but i'll bbl and we can chat more <smspillaz> (side-point: Callers probably shouldn't be using the return value of something that returns a signed-int to directly index into an array ;-)) <halfline> or maybe jadahl will chime in <smspillaz> halfline: nw. Its late for me at the moment, so if you have any more thoughts it'd probably be best to chime in on the bug report <msanchez> I think it will, yes <msanchez> thanks <halfline> ok <smspillaz> FWIW we do not call meta_window_get_monitor in mutter (and none in the C-side of gnome-shell afaict)
I think returning -1 is more sane than crashing immediately, but we can at the time also g_warning() that a get_monitor() call is only valid on a managed window (though I can't see a way for gnome-shell to know that). On the other hand, on a headless system, calling meta_window_get_monitor() would be valid, and should also return -1, and the caller will not unlikely have an empty array to look up values from, and here we can't warn. For 3.26 I think returning -1 while checking gnome-shell for places the return value used incorrectly. For 3.28 I'd like to remove all APIs that passes/identifies a monitor via an array index, but instead just expose a subset of the MetaLogicalMonitor API.
Dealing with the gnome-shell fallout in https://bugzilla.gnome.org/show_bug.cgi?id=788882
*** Bug 789167 has been marked as a duplicate of this bug. ***
Sam (hi! :)), any update on this? We'd getting some crashes in ubuntu, and we'd like to backport it.
Hi Marco (long time, no chat :)). I believe this is blocked on #788882
I seem to be hitting this regularly when switching my HDMI receiver off/on (which seems to briefly look like disconnecting my monitor from PC). Applying this patch and the patches from #788882 left me crashing slightly differently in this scenario, with this backtrace: https://nickurak.ca/~atrus/gdb-788882-bt.txt Wrapping that code with a check for window->monitor being set seems to address the issue, and I can switch my receiver off/on without any ill effects: https://nickurak.ca/~atrus/788882-patch.txt (Note that I'm doing this on top of 3.26.2, but the patches all seem to apply on mainline just fine)
(In reply to Jeremy Nickurak from comment #8) > Wrapping that code with a check for window->monitor being set seems to > address the issue, and I can switch my receiver off/on without any ill > effects: > > https://nickurak.ca/~atrus/788882-patch.txt Looks sane, I think it's something to add here too.
Created attachment 366117 [details] [review] don't crash in meta_screen_get_monitor_geometry on missing window->monitor Attaching patch to the defect (in case my web-hosted copy should disappear).
Hello! Just FYI this appears to be the second biggest crasher of gnome-shell, of those still affecting Ubuntu 18.04. A fix in time for 3.26.3 would be great... https://errors.ubuntu.com/?package=gnome-shell&period=month
Review of attachment 366117 [details] [review]: Incorrect coding style and there is no commit subject/message/author. Otherwise looks sane.
Created attachment 366680 [details] [review] don't crash in meta_screen_get_monitor_geometry on missing window->monitor Replaced attachment 366117 [details] [review] with output from git format-patch (with commit metadata) and (hopefully) corrected coding style (the diff looks a little awkward because matching style resulted in diff chunks coincidentally matching some context it didn't before).
Comment on attachment 366680 [details] [review] don't crash in meta_screen_get_monitor_geometry on missing window->monitor The following fix has been pushed: attachment 366680 [details] [review] as commit 6dcce19 Don't attempt to get monitor geometry on non-existent monitors
Created attachment 367172 [details] [review] Don't attempt to get monitor geometry on non-existent monitors
^^^ Fix landed in commit 71b4ef59, present on both master and gnome-3-26 branches. So does this bug now qualify as "Fixed"?
Attachment 361331 [details] has not been landed yet, as it needs to go with fixes for bug 788882, so I think we should wait before marking all the bug as fixed.
Downstream bug report: https://launchpad.net/bugs/1724439
Bump. Although errors.ubuntu.com says this is the 8th-most common gnome-shell crash, if you don't count the Xwayland bugs and look at 18.04 only then this is the 2nd-most common crash after general heap corruption. Thus this is the most common crash in 18.04 Xorg sessions that has a definite and clear cause.
Fixed with commit 8626c69c