After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 788834 - Calling meta_window_get_monitor on an unmanaged window crashes
Calling meta_window_get_monitor on an unmanaged window crashes
Status: RESOLVED FIXED
Product: mutter
Classification: Core
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: mutter-maint
mutter-maint
: 789167 (view as bug list)
Depends on: 788882
Blocks:
 
 
Reported: 2017-10-11 14:14 UTC by Sam Spilsbury
Modified: 2018-06-28 16:32 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
core: Return -1 if calling meta_window_get_monitor on an unmanaged window (1.08 KB, patch)
2017-10-11 14:15 UTC, Sam Spilsbury
committed Details | Review
don't crash in meta_screen_get_monitor_geometry on missing window->monitor (2.76 KB, patch)
2017-12-31 01:11 UTC, Jeremy Nickurak
none Details | Review
don't crash in meta_screen_get_monitor_geometry on missing window->monitor (3.16 KB, patch)
2018-01-11 18:36 UTC, Jeremy Nickurak
committed Details | Review
Don't attempt to get monitor geometry on non-existent monitors (3.06 KB, patch)
2018-01-21 13:45 UTC, Marco Trevisan (Treviño)
committed Details | Review

Description Sam Spilsbury 2017-10-11 14:14:03 UTC
We were seeing the following crash when calling meta_window_get_monitor on a recently destroyed window (from within gnome-shell):

Program terminated with signal SIGSEGV, Segmentation fault.
  • #0 meta_window_get_monitor
    at core/window.c line 3536
  • #0 meta_window_get_monitor
    at core/window.c line 3536
  • #1 ffi_call_unix64
    at ../src/x86/unix64.S line 76
  • #2 ffi_call
    at ../src/x86/ffi64.c line 525
  • #3 ??
    from /lib/libgjs.so.0
  • #4 ??
    from /lib/libgjs.so.0
  • #5 js::CallJSNative
    at ./js/src/jscntxtinlines.h line 226
  • #6 js::Invoke
    at ./js/src/vm/Interpreter.cpp line 498
  • #7 Interpret
    at ./js/src/vm/Interpreter.cpp line 2602
  • #8 js::RunScript
    at ./js/src/vm/Interpreter.cpp line 448
  • #9 js::Invoke
    at ./js/src/vm/Interpreter.cpp line 517
  • #10 js::Invoke
    at ./js/src/vm/Interpreter.cpp line 554
  • #11 js::jit::InvokeFunction
    at ./js/src/jit/VMFunctions.cpp line 75
  • #12 ??
  • #13 ??
  • #14 ??
  • #15 ??
  • #16 ??
  • #17 js::jit::ArraySpliceDenseInfo
    from /lib/x86_64-linux-gnu/libjs.so.0
  • #18 ??
  • #19 ??
  • #20 ??
  • #21 ??
  • #22 ??

Mario Sanchez-Prada did some preliminary investigation into this and it looks like the reason for the crash is a null-ptr dereference in meta_window_get_monitor. window->monitor gets set to NULL when the window is destroyed.

In that case, we should return something sensible for meta_window_get_monitor and let the caller deal with it.
Comment 1 Sam Spilsbury 2017-10-11 14:15:49 UTC
Created attachment 361331 [details] [review]
core: Return -1 if calling meta_window_get_monitor on an unmanaged window

This is the most sensible thing to do other than crashing.
Comment 2 Mario Sánchez Prada 2017-10-11 15:44:45 UTC
FTR, I'm pasting down at the bottom the log from IRC where we discussed a bit about this with Ray.

TL;DR: Ray is (rightfully so) concerned about returning -1 in a function that is usually fed straight into arrays (i.e. index), which would not necessary crash as it does now, making issues (programmer errors) harder to detect, debug and fix. My personal take on this (which I think Sam agrees with) is that having a way to crash the shell by calling this get_monitor() function when the MetaWindow has already been unmanaged (which is not possible to know unless you connect to the 'unmanaged' signal) is a bit of a pain, and I particularly would understand returning -1 in that case (as we can't return any index) as a sensible API. Besides, I've checked and there are only 2 places in gnome-shell at the moment that fed the output of get_monitor() straight into an array, which I think could be fixed by simply checking the index != -1 before using it.

Comments?

IRC log below:

<msanchez> jadahl: hi! Quick one: would you be happy with a patch in mutter like this one? https://bugzilla.gnome.org/attachment.cgi?id=361331&action=diff

<halfline> sounds a little risky
<halfline> if the output gets fed straight into an array

<msanchez> We had a problem on Endless with what we call DesktopOverlay (containes an actor that covers the whole desktop for the purpose of showing/managing overlay actors coming out from the side), where the shell would crash when closing overlay windows because b ythe time an notify:allocation callback is called, the associated MetaWindow will be unmanaged

<halfline> (which I think it does usually?)

<msanchez> halfline: not sure what you mean, exactly

<smspillaz> halfline: So in this case, there's basically two things that can be done - either crash or give the caller a value that it might be able to do something useful with
<smspillaz> The way we used to handle this in compiz was that we had a "destroyed" prop on the window struct and the policy was "well, its your own fault if you don't check if a window is destroyed first"
<smspillaz> but I always thought that was a silly way of doing it

<halfline> yea but "your own fault" is us too
<halfline> i mean what are the callers doing
<halfline> aren't they doing something like priv->monitors[meta_window_get_monitor(window)] or something like that?

<msanchez> halfline: they are calling meta_window_get_monitor()

<halfline> so -1 would be hard to debug memory corruption ?
<halfline> msanchez: what are they doing with the result i mean

<smspillaz> well under the status quo, that will either crash because window->monitor is unset, or it would crash because they were passing -1 to an array

<halfline> -1 to an array doesn't mean crash necessarily

<smspillaz> That's true

<halfline> so it could turn a crasher into a hard to debug problem

<smspillaz> Indeed

<msanchez> halfline: oh yeah, sorry. That's exactly what's doing, using it as an idx to an array

<smspillaz> I think the "best" solution here is actually to make meta_window_get_monitor return the monitor through an outparam and take a GError, but I haven't seen that pattern elsewhere in mutter
<smspillaz> plus API break

<halfline> to me it seems like no matter what, fixes need to be in the caller
<halfline> or the callers caller

<smspillaz> halfline: do you know if MetaWindow has some sort of property which indicates if the window is unmanaged or destroyed?

<halfline> or whatever

<smspillaz> halfline: well, IMO exposing bindable API that can crash intuitively seems like the wrong thing to do, but I guess that's where we might differ in opinion
<smspillaz> halfline: at the very least, we should probably have an assertion there

<msanchez> halfline: I see what you mean. At the same time I think that function returning -1 when there's no monitor available seems sensible to me
<msanchez> just checked the shell and there are only a few places where the output of get_monitor() gets fed straight into the monitors array

<smspillaz> but I still think you shouldn't be able to crash the shell by calling a public API (and I've updated the "Returns:" section of the gtkdoc to say "it should return -1")

<msanchez> and in many of those cases there's a null-check right after accessing the array to check that we actually have a monitor

<smspillaz> (in the event that the window is not managed)
<smspillaz> msanchez: this is in the JS side right?

<msanchez> yes, in the shell's js part

<smspillaz> mmm
<smspillaz> I wonder if there are any places where we do priv->monitors[meta_window_get_monitor(window)]
<smspillaz> in mutter

<halfline> hey guys gotta run, but i'll bbl and we can chat more

<smspillaz> (side-point: Callers probably shouldn't be using the return value of something that returns a signed-int to directly index into an array ;-))

<halfline> or maybe jadahl will chime in

<smspillaz> halfline: nw. Its late for me at the moment, so if you have any more thoughts it'd probably be best to chime in on the bug report

<msanchez> I think it will, yes
<msanchez> thanks

<halfline> ok

<smspillaz> FWIW we do not call meta_window_get_monitor in mutter (and none in the C-side of gnome-shell afaict)
Comment 3 Jonas Ådahl 2017-10-12 02:47:42 UTC
I think returning -1 is more sane than crashing immediately, but we can at the time also g_warning() that a get_monitor() call is only valid on a managed window (though I can't see a way for gnome-shell to know that). On the other hand, on a headless system, calling meta_window_get_monitor() would be valid, and should also return -1, and the caller will not unlikely have an empty array to look up values from, and here we can't warn.

For 3.26 I think returning -1 while checking gnome-shell for places the return value used incorrectly.

For 3.28 I'd like to remove all APIs that passes/identifies a monitor via an array index, but instead just expose a subset of the MetaLogicalMonitor API.
Comment 4 Sam Spilsbury 2017-10-12 15:01:09 UTC
Dealing with the gnome-shell fallout in https://bugzilla.gnome.org/show_bug.cgi?id=788882
Comment 5 Marco Trevisan (Treviño) 2017-10-22 00:09:19 UTC
*** Bug 789167 has been marked as a duplicate of this bug. ***
Comment 6 Marco Trevisan (Treviño) 2017-10-22 00:21:01 UTC
Sam (hi! :)), any update on this?

We'd getting some crashes in ubuntu, and we'd like to backport it.
Comment 7 Sam Spilsbury 2017-10-22 02:59:50 UTC
Hi Marco (long time, no chat :)). I believe this is blocked on #788882
Comment 8 Jeremy Nickurak 2017-12-19 04:48:08 UTC
I seem to be hitting this regularly when switching my HDMI receiver off/on (which seems to briefly look like disconnecting my monitor from PC).

Applying this patch and the patches from #788882 left me crashing slightly differently in this scenario, with this backtrace:

https://nickurak.ca/~atrus/gdb-788882-bt.txt

Wrapping that code with a check for window->monitor being set seems to address the issue, and I can switch my receiver off/on without any ill effects:

https://nickurak.ca/~atrus/788882-patch.txt

(Note that I'm doing this on top of 3.26.2, but the patches all seem to apply on mainline just fine)
Comment 9 Marco Trevisan (Treviño) 2017-12-28 21:06:12 UTC
(In reply to Jeremy Nickurak from comment #8)
> Wrapping that code with a check for window->monitor being set seems to
> address the issue, and I can switch my receiver off/on without any ill
> effects:
> 
> https://nickurak.ca/~atrus/788882-patch.txt

Looks sane, I think it's something to add here too.
Comment 10 Jeremy Nickurak 2017-12-31 01:11:32 UTC
Created attachment 366117 [details] [review]
don't crash in meta_screen_get_monitor_geometry on missing window->monitor

Attaching patch to the defect (in case my web-hosted copy should disappear).
Comment 11 Daniel van Vugt 2018-01-11 02:20:32 UTC
Hello!

Just FYI this appears to be the second biggest crasher of gnome-shell, of those still affecting Ubuntu 18.04. A fix in time for 3.26.3 would be great...

https://errors.ubuntu.com/?package=gnome-shell&period=month
Comment 12 Jonas Ådahl 2018-01-11 02:47:45 UTC
Review of attachment 366117 [details] [review]:

Incorrect coding style and there is no commit subject/message/author. Otherwise looks sane.
Comment 13 Jeremy Nickurak 2018-01-11 18:36:02 UTC
Created attachment 366680 [details] [review]
don't crash in meta_screen_get_monitor_geometry on missing window->monitor

Replaced attachment 366117 [details] [review] with output from git format-patch (with commit metadata) and (hopefully) corrected coding style (the diff looks a little awkward because matching style resulted in diff chunks coincidentally matching some context it didn't before).
Comment 14 Marco Trevisan (Treviño) 2018-01-21 13:45:34 UTC
Comment on attachment 366680 [details] [review]
don't crash in meta_screen_get_monitor_geometry on missing window->monitor

The following fix has been pushed:
attachment 366680 [details] [review] as commit 6dcce19 Don't attempt to get monitor geometry on non-existent monitors
Comment 15 Marco Trevisan (Treviño) 2018-01-21 13:45:44 UTC
Created attachment 367172 [details] [review]
Don't attempt to get monitor geometry on non-existent monitors
Comment 16 Daniel van Vugt 2018-02-02 01:48:18 UTC
^^^
Fix landed in commit 71b4ef59, present on both master and gnome-3-26 branches.

So does this bug now qualify as "Fixed"?
Comment 17 Marco Trevisan (Treviño) 2018-02-05 15:27:36 UTC
Attachment 361331 [details] has not been landed yet, as it needs to go with fixes for bug 788882, so I think we should wait before marking all the bug as fixed.
Comment 18 Daniel van Vugt 2018-02-19 02:47:34 UTC
Downstream bug report:
https://launchpad.net/bugs/1724439
Comment 19 Daniel van Vugt 2018-03-22 08:44:58 UTC
Bump.

Although errors.ubuntu.com says this is the 8th-most common gnome-shell crash, if you don't count the Xwayland bugs and look at 18.04 only then this is the 2nd-most common crash after general heap corruption.

Thus this is the most common crash in 18.04 Xorg sessions that has a definite and clear cause.
Comment 20 Marco Trevisan (Treviño) 2018-06-28 16:31:20 UTC
Fixed with commit 8626c69c