GNOME Bugzilla – Bug 788908
gnome-shell crashed with SIGSEGV in clutter_actor_get_allocation_box (from _st_create_shadow_pipeline_from_actor)
Last modified: 2019-10-10 15:21:28 UTC
Created attachment 361476 [details] Stacktrace I've a VM with gnome-shell running, resizing the guest window multiple times, or changing the monitor settings from g-c-c after a while leads to a crash in: clutter_actor_get_allocation_box where the actor there is an invalid object (as per quick gdb check)
+ Trace 238050
Full stacktrace attached
Do you also have a "backtrace full" you can attach? Also the output of gjs_dumpstack() could be useful.
(In reply to Jonas Ådahl from comment #1) > Do you also have a "backtrace full" you can attach? Also the output of > gjs_dumpstack() could be useful. This is BT full: As per gjs_dumpstack() unfortunately, if I attach to gnome-shell it continues the execution when it segfaults, I don't get why :o.
+ Trace 238051
Similar trace could also be:
+ Trace 238054
(In reply to Marco Trevisan (Treviño) from comment #3) > Similar trace could also be That one is a duplicate of bug 788627.
Mh, yeah... I think they're actually they've the same root cause, so the actor has been destroyed, thus it fails immediately when trying to get data allocation box or it just get some compromised data and then it fails when trying to generate an insanely big texture.
Created attachment 361796 [details] [review] StIcon: only compute shadow pipeline when the texture is properly allocated Creating the shadow pipeline requires the actor to be allocated in order to get its dimensions, however in the current state we just compute it even if this is not the case. This causes _st_create_shadow_pipeline_from_actor (when getting the allocation box) to trigger an allocation cycle, which might lead to a convolution to st_icon_finish_update causing breakage on data as soon as we return from it. Waiting for the texture size change before trying to update the shadow pipeline is a way for avoiding this. Another option is to do simply as we do with other actors, so initializing the pipeline at paint if we don't already have one for current data, which is probably a better thing to do. So let me know if you prefer that way.
Some debugging on this, highlighting the issue: st_icon_set_icon_name (0x555558ad8af0) st_icon_style_changed (0x555556ceb2a0) st_icon_update (0x555556ceb2a0): Object pending texture 0x555557cd7bc0, is floating: 1, refs 1 st_icon_update (0x555556ceb2a0): FINISH NOW 0x555557cd7bc0! st_icon_finish_update (0x555556ceb2a0): destroying icon texture 0x555557cdd7e0 st_icon_finish_update (0x555556ceb2a0): Object icon texture 0x555557cd7bc0, refs 3 priv = 0x555556ceae10, instance priv is 0x555556ceae10 st_icon_update_shadow_pipeline (0x555556ceb2a0) : icon texture 0x555557cd7bc0 _st_create_shadow_pipeline_from_actor: actor 0x555557cd7bc0 | thread 128328 st_icon_style_changed (0x555556ceb2a0) st_icon_update (0x555556ceb2a0): Object pending texture 0x5555557d34d0, is floating: 1, refs 1 st_icon_update (0x555556ceb2a0): FINISH NOW 0x5555557d34d0! st_icon_finish_update (0x555556ceb2a0): destroying icon texture 0x555557cd7bc0 st_icon_finish_update (0x555556ceb2a0): Object icon texture 0x5555557d34d0, refs 3 priv = 0x555556ceae10, instance priv is 0x555556ceae10 st_icon_update_shadow_pipeline (0x555556ceb2a0) : icon texture 0x5555557d34d0 Thread 1 "gnome-shell" received signal SIGSEGV, Segmentation fault. clutter_actor_get_allocation_box (self=self@entry=0x555557cd7bc0, box=box@entry=0x7fffffff3240) at /media/M2/GNOME/mutter/clutter/clutter/clutter-actor.c:9800 9800 *box = self->priv->allocation; (gdb) bt
+ Trace 238073
$2 = (ClutterActor *) 0x5555557d34d0 (gdb) (gdb) call gjs_dumpstack() == Stack trace for context 0x555555969000 == Panel<._removeStyleClassName@resource:///org/gnome/shell/ui/panel.js:1166:9 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 Transparency<._getAlphas@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/theming.js:592:17 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 Transparency<._updateStyles@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/theming.js:537:9 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 Transparency<._init@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/theming.js:353:9 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 _Base.prototype._construct@resource:///org/gnome/gjs/modules/lang.js:110:5 Class.prototype._construct/newClassConstructor@resource:///org/gnome/gjs/modules/lang.js:213:20 ThemeManager<._init@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/theming.js:59:30 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 _Base.prototype._construct@resource:///org/gnome/gjs/modules/lang.js:110:5 Class.prototype._construct/newClassConstructor@resource:///org/gnome/gjs/modules/lang.js:213:20 DockedDash<._init@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/docking.js:363:30 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 _Base.prototype._construct@resource:///org/gnome/gjs/modules/lang.js:110:5 Class.prototype._construct/newClassConstructor@resource:///org/gnome/gjs/modules/lang.js:213:20 DockManager<._createDocks@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/docking.js:1744:20 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 DockManager<._toggle@/usr/share/gnome-shell/extensions/ubuntu-dock@ubuntu.com/docking.js:1690:9 wrapper@resource:///org/gnome/gjs/modules/lang.js:178:22 As you can see, basically the same function gets called on an object that has been just destroyed...
Created attachment 361838 [details] [review] StIcon: only compute shadow pipeline when the texture is properly allocated Some cleanups
Review of attachment 361838 [details] [review]: ::: src/st/st-icon.c @@ +277,3 @@ if (priv->shadow_spec) priv->shadow_pipeline = _st_create_shadow_pipeline_from_actor (priv->shadow_spec, priv->icon_texture); + Stray newline. @@ +290,3 @@ + +static void +st_icon_update_shadow_pipeline_on_allocation (StIcon *icon) Shouldn't you be able to do this by setting the ClutterActorClass::allocate vfunc, calling the parents allocate(), then maybe update the shadow pipeline? Thus no need for signals. You'd have to make sure not to eagerly replace valid pipelines only.
Created attachment 361841 [details] [review] StIcon: only compute shadow pipeline when the texture is allocated (at paint) This version is just creating the pipeline at paint, when we're sure about the allocation box of the texture icon actor.
Created attachment 362437 [details] [review] StIcon: only compute shadow pipeline when the texture is properly allocated Patch updated to fix the glitches I found during some testing of the previous version I posted here that was generating the texture at allocation. As you can see here: https://usercontent.irccloud-cdn.com/file/Mpe1Nqgi/out.mp4 (The other version that do it only at paint wasn't affected). Now, the problem here is that if the actor size and the texture size are different we can't use a fast path (as per commit 7015bb2ca975, which is probably also the reason why we get these crashes now), thus we use an offscreen mode. But that mode seem to behave badly when we're in allocation (while it's fine at paint). So, i've added a StPrivateShadowCreateFlags enum where we can define the mode we allow to be used in shadow creation, and thus for st-icon we allow only the texture mode at allocation point. PS: avoiding to do this only for st-icon by using some more checks at allocation was still possible, but I thought it was less elegant and portable
We have four downstream (Fedora) reports that look very similar to this and #788627 , with the "failed to allocate 18446744072098939136" bytes error message; https://bugzilla.redhat.com/show_bug.cgi?id=1526164 has some debugging work done by the reporter. The others (which I've closed as dupes of that bug) are https://bugzilla.redhat.com/show_bug.cgi?id=1508398 , https://bugzilla.redhat.com/show_bug.cgi?id=1506325 and https://bugzilla.redhat.com/show_bug.cgi?id=1502183 . So it'd be great to have a fix for this (these?) bug (bugs?) that we could port downstream. Thanks.
I think all these bugs are the same (and I'm also pretty sure bug 788627 is a duplicate of this). The patch fixes the problem, Jonas already did a first-pass review on IRC, but we were waiting for a final review by Florian. Cheers
I had asked whether there was a clear reproducer a while ago - is there? (The thing is, I'm not a fan of either patch, and the commit message suggests that there may be an easier fix, so I'd like to check that first)
Reporter of #1508398 downstream says "This happens occasionally, I am not able to reproduce it intentionally." Reporter of #1506325 says "When continously maximize and minimize the window of Visual Studio Code using key bindings. (Shell Extension Hide Top Bar enabled)", so you could try that, I guess. Reporter of #1514850 wrote "Right after booting the system, I started Thunderbird and Firefox. Gnome Shell crashed during Firefox startup. This is the second time this happend with Fedora 27, though I don't remember it ever happening with Fedora 26." #1515926: "Opening gnucash". Seems too simple to be a reliable reproducer, but hey, could try it. #1516253: "Error after boot". #1516633: "Just after log in". #1517234, #1525979, #1502183: nothing. Reporter of #1526164 wrote "Running dual monitors. I have a TRENDnet KVM switch that I use to switch my primary monitor between this workstation and another. This crash happened when I switched _to_ this workstation. Then my session crashed and went to the login screen" - I've asked if it's reliably reproducible, but sounds like his case may depend on that hardware KVM switch.
(In reply to Florian Müllner from comment #14) > I had asked whether there was a clear reproducer a while ago - is there? In my case, to reproduce this I only have to run g-s in a virtual machine, open a window, then another one who's going to be maximized. Then resize the VM to request a resolution change. Not sure if that works 100% in all the setups, but here is always happening. > (The thing is, I'm not a fan of either patch, and the commit message > suggests that there may be an easier fix, so I'd like to check that first) Attachment 361841 [details] was the simplest that worked. So basically allocating the shadow at paint, but I would have preferred to do it earlier when possible, thus the other version. If you want that to be structured differently, let me know... I'm open to refactor things, just let me know what's your ideal way.
If it is indeed the same thing, I was able to trigger crashes simply by opening and closing my laptop lid quickly. See bug 788627 and last comments there. Not tried that for a while (at the time I was experimenting to see what would make GNOME Shell get my monitor set-up correct).
*** Bug 788627 has been marked as a duplicate of this bug. ***
Note I've been running with attachment 362437 [details] [review] for a few weeks now and the issue appears to be resolved for me (came here from https://bugzilla.redhat.com/show_bug.cgi?id=1526164)
Review of attachment 362437 [details] [review]: ::: src/st/st-icon.c @@ +64,3 @@ static void st_icon_update (StIcon *icon); static gboolean st_icon_update_icon_size (StIcon *icon); +static void st_icon_update_shadow_pipeline (StIcon *icon, StPrivateShadowCreateFlags flags); nit: newline after St *icon, and parameter variable names should be aligned. @@ +169,3 @@ + + st_icon_update_shadow_pipeline (icon, ST_SHADOW_TEXTURE_MODE); +} Looks a bit of a missuse of the allocate vfunc. Shouldn't it be enough to update the pipeline on paint (especially if we potentially update the pipeline immediately when the pixbuf is ready). @@ +181,3 @@ if (priv->icon_texture) { + st_icon_update_shadow_pipeline (icon, ST_SHADOW_ANY_MODE); Why would you want to use ANY_MODE here? Drawing nothing (in case no texture is available) offscreen should still result in nothing I'd assume? @@ +295,3 @@ +static void +st_icon_update_shadow_pipeline (StIcon *icon, + StPrivateShadowCreateFlags shadow_flags) nit: parameter variable names should be aligned @@ +316,3 @@ + _st_create_shadow_pipeline_from_actor (priv->shadow_spec, + priv->icon_texture, + shadow_flags); I'd rather see we only call _st_create_shadow_pipeline_from_actor() if we know it wont fail (i.e. the texture actor has a texture). Then the create_function will actually work, and not unexpectedly return NULL. That'd mean we don't need to add a flag the shadow_flag to the create_shadow_pipeline() function. @@ +328,3 @@ StIcon *icon) { + st_icon_clear_shadow_pipeline (icon); Shouldn't it be possible to clear() and then update() immediately? I suppose we'd need to add a check for clutter_actor_has_allocation() in update() (and also rename to maybe_update() I guess. ::: src/st/st-private.c @@ +413,3 @@ _st_create_shadow_pipeline_from_actor (StShadow *shadow_spec, + ClutterActor *actor, + StPrivateShadowCreateFlags flags) nit: parameter variable names should stay aligned (also in the header)
Review of attachment 362437 [details] [review]: Let me know if I should keep the enum modes around or not, and see the comments. ::: src/st/st-icon.c @@ +64,3 @@ static void st_icon_update (StIcon *icon); static gboolean st_icon_update_icon_size (StIcon *icon); +static void st_icon_update_shadow_pipeline (StIcon *icon, StPrivateShadowCreateFlags flags); So you want me to move the lines above too? @@ +169,3 @@ + + st_icon_update_shadow_pipeline (icon, ST_SHADOW_TEXTURE_MODE); +} Indeed we can, I just tried to replicate what we did before, where the pipeline was updated as soon as possible. But since that leads to the crash we know because of missing allocation infos, we try now doing it at allocation time. But if we prefer just doing it at paint it's ok... It also seem that the glitches I mentioned in comment #11 aren't visible anymore (although sometimes I notice some tiny flickering). In that case I can get rid of the StPrivateShadowCreateFlags completely, as it was the only reason for being there. @@ +181,3 @@ if (priv->icon_texture) { + st_icon_update_shadow_pipeline (icon, ST_SHADOW_ANY_MODE); Yeah, I was guessing the same... But I just kept using the same codepath we were using before: try with the texture, if it's not there, just use the offscreen paint. In fact, in this case we could probably be just more restrictive, but... There's just a problem, and we still have with current codebase (just not underlined because we've the function in such way): when using the TEXTURE_MODE for some icons the shadow pipeline is not generated (see https://i.imgur.com/rvBrjjK.png, while this is when using the ANY_MODE: https://i.imgur.com/B7gGzTL.png). Reason for this is that (texture_width == width && texture_height == height) fails. The status is this one: Actor size is 16.000000x16.000000, scaled size is 16.000000x16.000000, texture size is 48x48, shadow pipeline: (nil). Explaination why that mode is used is explained in commit 7015bb2ca (as per fixing Bug 788039). @@ +316,3 @@ + _st_create_shadow_pipeline_from_actor (priv->shadow_spec, + priv->icon_texture, + shadow_flags); Well, true, but it's even true that I didn't want to duplicate the same checks in both places, plus even _st_create_shadow_pipeline might fail (basically never, but still, we don't have anything ensuring it's not NULL), so I'd prefer to leave such more specific checks at lower levels than here. It's even true that the flag might not needed anymore, so... @@ +328,3 @@ StIcon *icon) { + st_icon_clear_shadow_pipeline (icon); So in case it's not allocated, you'd still queue a relayout/redraw?
(In reply to Marco Trevisan (Treviño) from comment #21) > Review of attachment 362437 [details] [review] [review]: > > Let me know if I should keep the enum modes around or not, and see the > comments. > > ::: src/st/st-icon.c > @@ +64,3 @@ > static void st_icon_update (StIcon *icon); > static gboolean st_icon_update_icon_size (StIcon *icon); > +static void st_icon_update_shadow_pipeline (StIcon *icon, > StPrivateShadowCreateFlags flags); > > So you want me to move the lines above too? Personally, I don't think that's needed. > > @@ +169,3 @@ > + > + st_icon_update_shadow_pipeline (icon, ST_SHADOW_TEXTURE_MODE); > +} > > Indeed we can, I just tried to replicate what we did before, where the > pipeline was updated as soon as possible. But since that leads to the crash > we know because of missing allocation infos, we try now doing it at > allocation time. > > But if we prefer just doing it at paint it's ok... It also seem that the > glitches I mentioned in comment #11 aren't visible anymore (although > sometimes I notice some tiny flickering). Could the flickering be icons loading slowly? Anyway, I think using vfuncs only for what they are intended to be used for seems best. I guess we could schedule an g_idle (that we cancel on paint if not triggered) that'll load things, but might be too much work for too little gain. > > In that case I can get rid of the StPrivateShadowCreateFlags completely, as > it was the only reason for being there. > > @@ +181,3 @@ > if (priv->icon_texture) > { > + st_icon_update_shadow_pipeline (icon, ST_SHADOW_ANY_MODE); > > Yeah, I was guessing the same... But I just kept using the same codepath we > were using before: try with the texture, if it's not there, just use the > offscreen paint. > > In fact, in this case we could probably be just more restrictive, but... > > There's just a problem, and we still have with current codebase (just not > underlined because we've the function in such way): when using the > TEXTURE_MODE for some icons the shadow pipeline is not generated (see > https://i.imgur.com/rvBrjjK.png, while this is when using the ANY_MODE: > https://i.imgur.com/B7gGzTL.png). Reason for this is that (texture_width == > width && texture_height == height) fails. The status is this one: > > Actor size is 16.000000x16.000000, scaled size is 16.000000x16.000000, > texture size is 48x48, shadow pipeline: (nil). > > Explaination why that mode is used is explained in commit 7015bb2ca (as per > fixing Bug 788039). Could it then be that if we create the pipeline too early (e.g. it's allocated but the pixbuf is still pending), we'll create the shadow pipeline by using the offscreen, which happens to be empty -> no shadow? Only ever trying to create the pipeline when all requirements are met, that only st-icon.c knows about (cogl texture available (no matter if it'll be used directly or via an offscreen), icon actor allocated) would avoid that issue. > > @@ +316,3 @@ > + _st_create_shadow_pipeline_from_actor (priv->shadow_spec, > + priv->icon_texture, > + shadow_flags); > > Well, true, but it's even true that I didn't want to duplicate the same > checks in both places, plus even _st_create_shadow_pipeline might fail > (basically never, but still, we don't have anything ensuring it's not NULL), > so I'd prefer to leave such more specific checks at lower levels than here. > > It's even true that the flag might not needed anymore, so... > > @@ +328,3 @@ > StIcon *icon) > { > + st_icon_clear_shadow_pipeline (icon); > > So in case it's not allocated, you'd still queue a relayout/redraw? Yes, we need to redraw anyway since the pixbuf changed.
(In reply to Jonas Ådahl from comment #22) > > Indeed we can, I just tried to replicate what we did before, where the > > pipeline was updated as soon as possible. But since that leads to the crash > > we know because of missing allocation infos, we try now doing it at > > allocation time. > > > > But if we prefer just doing it at paint it's ok... It also seem that the > > glitches I mentioned in comment #11 aren't visible anymore (although > > sometimes I notice some tiny flickering). > > Could the flickering be icons loading slowly? Mh, might be, but maybe it was just an impression... I can't replicate it right now. So I guess we're all set. > Anyway, I think using vfuncs only for what they are intended to be used for > seems best. I guess we could schedule an g_idle (that we cancel on paint if > not triggered) that'll load things, but might be too much work for too > little gain. Yeah, ok... I generally prefer to load before paint, especially if it's something involving some cycles like blurring, but if that's ok with you, we can just do it at paint. > > Explaination why that mode is used is explained in commit 7015bb2ca (as per > > fixing Bug 788039). > > Could it then be that if we create the pipeline too early (e.g. it's > allocated but the pixbuf is still pending), we'll create the shadow pipeline > by using the offscreen, which happens to be empty -> no shadow? No I think it's not the case. At that stage the priv->pending_texture is NULL while priv->icon_texture is set, so I think that's not the case. It's just that the icon itself than the actor is in, it seems, as it's centered on that.
Let's continue in https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3
Reopening for discussion not covered by the merge request
(In reply to Florian Müllner from comment #14) > (The thing is, I'm not a fan of either patch, and the commit message > suggests that there may be an easier fix, so I'd like to check that first) What I was referring to is that _st_create_shadow_pipeline_from_actor() has undefined behavior likely leading to a crash under certain (not quite clear) conditions. Right now this seems to be only triggered by the use in st-icon, so fixing that isn't wrong. However it would be better to fix the function to always have well defined behavior - that is, return a valid shadow pipeline or NULL - to make sure that the next code that uses the function doesn't run into the same trap.
(In reply to Florian Müllner from comment #26) > (In reply to Florian Müllner from comment #14) > > (The thing is, I'm not a fan of either patch, and the commit message > > suggests that there may be an easier fix, so I'd like to check that first) > > What I was referring to is that _st_create_shadow_pipeline_from_actor() has > undefined behavior likely leading to a crash under certain (not quite clear) > conditions. Agree, however as I explained in comments or commits above, what I've noticed is that this is happening so far (at least in st-icon), when trying to allocate an actor in order to get its allocation box, as this is leading to some convolution. It might not be the only cause, but at least adding the allocation check on that function seems to be enough here (included in the 1st patch).
(In reply to Florian Müllner from comment #26) > certain (not quite clear) conditions I say they conditions are not quite clear, because I cannot reproduce the crash, however with an early check for clutter_actor_has_allocation(), all icon shadows disappear - so not having a valid allocation when calling the function doesn't trigger the crash by itself. Also according to https://gitlab.gnome.org/GNOME/mutter/blob/master/clutter/clutter/clutter-actor.c#L9787, the call to get_allocation_box() itself should trigger the allocation ...
(In reply to Marco Trevisan (Treviño) from comment #27) > It might not be the only cause, but at least adding the allocation check on > that function seems to be enough here (included in the 1st patch). Sure, but I'm still curious - maybe it's not the needs_allocation flag that is the problem, but one of the other checks in clutter_actor_has_allocation()? For instance, does clutter_actor_is_mapped() work?
(In reply to Florian Müllner from comment #28) > (In reply to Florian Müllner from comment #26) > > certain (not quite clear) conditions > > I say they conditions are not quite clear, because I cannot reproduce the > crash, however with an early check for clutter_actor_has_allocation(), all > icon shadows disappear - so not having a valid allocation when calling the > function doesn't trigger the crash by itself. True, only when this trigger allocation, and subsequent signals then cause to recall the very same function reusing invalid data. > Also according to > https://gitlab.gnome.org/GNOME/mutter/blob/master/clutter/clutter/clutter- > actor.c#L9787, the call to get_allocation_box() itself should trigger the > allocation ... Correct, and that's what I've explained on comment #6. After so long time, I actually can't recall all the details I went through by that, but I can try to remember if this is unclear.
>because I cannot reproduce the crash Here happen really often when: - Going to overview - Close a Gnome Terminal window with many tabs - When Gnome Terminal popups about closing multiple tabs, Shell crash with this error. Only step I'm not sure is: Do I click on Gnome Terminal window just after clicking on the "Shell overview close button". Not sure...
What is the content of StShadow (priv->shadow_spec) for when this crashes? For the blur_.. g_malloc0(), the only explanation for the crash is that the shadow spec from the theme node is invalid, thus not related to the actor being allocated or not. The reason being that in various backtraces I've seen for that crash, both the width and height values are all sane (16x16) but the blur value, coming directly from the theme node is always "optimized out".
>What is the content of StShadow (priv->shadow_spec) for when this crashes? How can I give you this information?
(In reply to Cédric Bellegarde from comment #33) > >What is the content of StShadow (priv->shadow_spec) for when this crashes? > > How can I give you this information? Build gnome-shell with "-O0 -g" (might not be needed), trigger the crash to get a core dump, then open it with gdb. In gdb, navigate to the _st_create_shadow_pipeline_from_actor frame (run "bt", then "frame N" where N is the frame number of said function). Then type print *shadow_spec
(gdb) print *shadow_spec $1 = {color = {red = 0 '\000', green = 0 '\000', blue = 0 '\000', alpha = 0 '\000'}, xoffset = 0, yoffset = 0, blur = 33554432.000000253, spread = 33554440.1875, inset = 1098907648, ref_count = 0}
(In reply to Cédric Bellegarde from comment #35) > (gdb) print *shadow_spec > $1 = {color = {red = 0 '\000', green = 0 '\000', blue = 0 '\000', alpha = 0 > '\000'}, > xoffset = 0, yoffset = 0, blur = 33554432.000000253, spread = > 33554440.1875, > inset = 1098907648, ref_count = 0} Can you reproduce the issue when using patch at https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3 ? As iirc this was happening when the actor sizes where invalid.
(In reply to Marco Trevisan (Treviño) from comment #36) > (In reply to Cédric Bellegarde from comment #35) > > (gdb) print *shadow_spec > > $1 = {color = {red = 0 '\000', green = 0 '\000', blue = 0 '\000', alpha = 0 > > '\000'}, > > xoffset = 0, yoffset = 0, blur = 33554432.000000253, spread = > > 33554440.1875, > > inset = 1098907648, ref_count = 0} > > Can you reproduce the issue when using patch at > https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3 ? > > As iirc this was happening when the actor sizes where invalid. As far as I can tell, the content of shadow_spec is not affected by the allocation state of the actor. The flow goes something like this: 1. We create a theme node from the StIcon This doesn't use the actor state, just the type name and place in the stage tree. 2. We use said theme node to fetch the shadow spec. This also doesn't use the allocation, just the theme node and a entry key string ("icon-shadow"). 3. The shadow spec we got is messed up causing invalid allocations due to the "blur" and "spread" being something very wrong. So, the issue here seems to rather be about a) corruption of the theme node data structure, making the us fetch corrupted data, or b) a bug in the theme code that makes us fetch uninitialized data, or c) something else.
Just reporting that I can reproduce this reliably on Endless' fork of GNOME Shell 3.26, I actually reported it in https://gitlab.gnome.org/GNOME/gnome-shell/issues/36 before being pointed out here by Florian! I've been debugging this for a while today and I ended up exactly in the same situations described here: the contents of `shadow_spec` show a ref_count of 0 at the time of the failure, and somehow it seems (directly or indirectly) related to the call to clutter_actor_get_allocation_box() inside _st_create_shadow_pipeline_from_actor(), which somehow invalidates the priv->shadow_spec of the icon, causing the crash (plus some warnings, see https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/24#note_54952). My early (and naive) experiments show that early returning from _st_create_shadow_pipeline_from_actor() to prevent a new reallocation from happening (see [1]) "fix" the issue, confirming Florian's comment #c28 here, but that seems to break icons shadows, so not a valid fix I guess. Last, I can confirm as well that applying the commits from https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3 seem to fix the problem as well for me, so it's probably better to close the gitlab issue and continue here? [1] https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/24/diffs?commit_id=735ce62e8b63ac2ec6950eff114c655f4f1c1e84
Another short heads-up: I haven't inspected in detail the code in https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3, but I tested it carefully and confirmed that it doesn't present the issue of my naive early return, and icon shadows do still work with that MR. At this point, and considering that MR in there is fairly more complex than what I was proposing, I guess I should close the new issue and associated MR I opened, to keep everything in one place. @Marco Are you planning to keep working on this? If so, considering I have a reproducer 100% reliable, I'd like to coordinate with you to help getting this nasty thing fixed upstream, let me know how I can help. Thanks.
(In reply to Mario Sánchez Prada from comment #39) > I have a reproducer 100% reliable, I'd like to coordinate with you to > help getting this nasty thing fixed upstream I may have come up with a simpler approach, but as I don't see the issue, I'm essentially guessing - could you test wip/fmuellner/guard-shadow-pipeline-from-actor-call? I would prefer to not change the API to expose the different code paths to callers, and put the burden of picking the right one on them (and crash in case they picked the wrong one) ...
@Florian Can't test it right now, but I will as soon as possible, likely tomorrow or Friday
I've tested your patch now (needed a couple of small fixes) and, while it fixes the crash, it looks like it's still not solving the root issue, since I still see lots of CRITICALs due to _st_paint_shadow_with_opacity() receiving a NULL shadow. More details in: https://gitlab.gnome.org/GNOME/gnome-shell/commit/9d133d8ebbf9d99db0a02eaa6843c7e43cdec480#note_62721
Thanks for testing! (In reply to Mario Sánchez Prada from comment #42) > (needed a couple of small fixes) Meh, I should have compile-tested at least :-( > it looks like it's still not solving the root issue, since > I still see lots of CRITICALs due to _st_paint_shadow_with_opacity() > receiving a NULL shadow. But it does support my hypothesis: - icon calls create_shadow_pipeline with spec - method triggers a style change that results in g_clear_pointer being called (but with the patch the refcount doesn't drop to 0 until the end of the call, so no crash) - method returns a pipeline that supposedly matches the spec (but doesn't, as spec has been nulled in the meantime) I'll propose a merge request that I belief makes create_shadow_pipeline() safe to call in all cases. With regard to StIcon, that should turn the crash into a waste of computing cycles (by creating pipelines that are thrown away immediately) - I'll review Marco's patches to address that issue separately.
Did anyone happen to notice that the magic number 18446744072098939136 is the result of a signed multiply overflow? (gdb) printf "%x\n",18446744072098939136 a0000100 (gdb) printf "%lx\n",18446744072098939136 ffffffffa0000100 (gdb) printf "%lx\n",0x5000010*0x5000010 ffffffffa0000100 (gdb) printf "%lu\n",0x5000010*0x5000010 18446744072098939136 (gdb) printf "%lu\n",0x5000010u*0x5000010u 2684354816 The large number arises because the arguments to blur_pixels in src/st/st-private.c are declares as gint when the are actually sizes, and so should be gsize. Then *rowstride_out * *height_out would not overflow, and the allocation would request a mere 2.6 gigabytes instead of 2^63 bytes.
That was not quite right. gdb seems to have 32 bit longs. (gdb) printf "%lx\n",0x5000010ll*0x5000010ll 190000a0000100 (gdb) printf "%lx\n",0x5000010l*0x5000010l ffffffffa0000100 (gdb) printf "%lu\n",0x5000010ll*0x5000010ll 7036877102121216 (gdb) printf "%lu\n",0x5000010l*0x5000010l 18446744072098939136 The allocation would be for 7 x 10^15 instead of 1.8 x 10^19 But still, pixels_out = g_malloc0 (*rowstride_out * *height_out); should be pixels_out = g_malloc0 ((gsize)*rowstride_out * (gsize)*height_out); to allow for the case where the factors are both (say) 65536.
(In reply to Florian Müllner from comment #43) > [...] > I'll propose a merge request that I belief makes create_shadow_pipeline() > safe to call in all cases. With regard to StIcon, that should turn the crash > into a waste of computing cycles (by creating pipelines that are thrown away > immediately) - I'll review Marco's patches to address that issue separately. Hi again. Sorry for the (loooong) delayed response, but I did manage to test your patch from https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/49/commits today and I can confirm they also fix the issue entirely, as I'm not seeing neither crashes nor CRITICAL messages at all now on the console. And to be clear, I did make sure first that the crash was still there by reverting first the 3 commits from https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3/commits that I had temporarily applied, which led me to the crash in nearly no time. Then I reverted those + applied yours and everything was fine again. Thanks!
Created attachment 370462 [details] Stack trace of gnome-shell crashing, from 2018-04-02 This appears to be hitting me on Arch Linux on amd64 with gnome-shell 3.28.0-1 (Wayland session). It happened twice when I had mpv --opengl-backend=wayland running in fullscreen and was leaving fullscreen.
Can someone please review Florian's PR? This has been open for several months, meanwhile people just keep hitting this crash in released distributions (which means, on Wayland, they lose their desktop session). I've found two more dupes of https://bugzilla.redhat.com/show_bug.cgi?id=1526164 just this morning and I am not in any way being methodical about this.
Pushed my fix in commit f6a08472a0fef65f, so closing. Changing StIcon to follow the pattern used in StEntry and StLabel still makes sense IMHO, but we already have a merge request[0] for that, so no need to keep this open. [0] https://gitlab.gnome.org/GNOME/gnome-shell/merge_requests/3"
Hi everyone, I came across this thread after searching for a solution to constant logfile errors and occasional reboots. So I'm running Debian 10, Linux debian 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 GNU/Linux And my logfile comes up constantly with the following error:- gnome-shell[1381]: _st_paint_shadow_with_opacity: assertion 'shadow_pipeline != NULL' failed I see from the thread here that the issue is marked as resolved. Can someone let me know how I can fix this on my installation. King regards
Well, according to packages.debian.org, buster has gnome-shell 3.30.2, and the commit mentioned in #c49 was included in that release, so *this* bug ought to be fixed. It's possible you have a different bug with a similar symptom; best thing to do would be to file a new issue (on GitLab, not here) with a full traceback.