Bug 642957 – gs crashes while move the only one window from initial workspace to the second

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 642957 - gs crashes while move the only one window from initial workspace to the second


Summary:	gs crashes while move the only one window from initial workspace to the second


Status:	RESOLVED FIXED

Product:	mutter
Classification:	Core
Component:	general
Version:	2.91.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Owen Taylor
QA Contact:	mutter-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-02-22 12:28 UTC by Yu Chen
Modified:	2011-07-13 14:40 UTC

See Also:	https://bugzilla.redhat.com/show_bug.cgi?id=680207
GNOME target:	---
GNOME version:	---

Attachments
crash message from abrt (19.00 KB, text/plain) 2011-03-04 17:58 UTC, Yu Chen		Details
util: fix a reentrancy problem with meta_later (4.13 KB, patch) 2011-03-29 19:28 UTC, Dan Winship	needs-work	Details \| Review
window: hack around a crash (884 bytes, patch) 2011-03-29 19:29 UTC, Dan Winship	committed	Details \| Review
util: fix a reentrancy problem with meta_later (4.07 KB, patch) 2011-07-12 17:47 UTC, Dan Winship	committed	Details \| Review

Description Yu Chen 2011-02-22 12:28:47 UTC

If there is only one windows in the desktop session, then new workspace can not handle while drag it from the first workspace to the second one.

how to reproduce this issue:

1, login the new session
2, open a gnome-terminal window, it will appear in the initial workspace and the second will be created automatically
3, switch to overview by press [Windows] key
4, move pointer to the right edge to active workspace bar
5, drag the terminal window from the first workspace to the second one
6, gnome-shell will hang up few seconds and then restart automatically

Comment 1 Owen Taylor 2011-02-22 12:59:37 UTC

This isn't immediately reproducible for me. 

(I dragged windows around a *lot* when developing the animation code :-)

Does it happen reliably for you?

Comment 2 Yu Chen 2011-02-22 14:07:12 UTC

 
Yes, it happens all the time. If there is more than 1 window, this issue never happen. 

My os is fedora15 + gnome-shell git build from jhbuild.

The following is the output from terminal:
--------------------------------------------------------------------

[jcome@horn ~]$ ~/gnome-shell/bin/gnome-shell --replace
Gtk-Message: Failed to load module "pk-gtk-module"
Gtk-Message: Failed to load module "atk-bridge"
Window manager warning: Log level 16: AT-SPI: Accessibility bus not found - Using session bus.

(mutter:6167): Clutter-DEBUG: Signal type minimize not supported

(mutter:6167): Clutter-DEBUG: Signal type maximize not supported

(mutter:6167): Clutter-DEBUG: Signal type restore not supported

      JS LOG: Cannot create "Network" item, .desktop file not found or corrupt.
[1298354106,000,xklavier_config.c:xkl_config_registry_load_helper/] 	Missing registry file /usr/share/xmodmap/base.xml
      JS LOG: GNOME Shell started at Tue Feb 22 2011 13:55:06 GMT+0800 (HKT)
Window manager warning: CurrentTime used to choose focus window; focus window may not be correct.
Window manager warning: Got a request to focus 0x4406c4a (jcome@horn) with a timestamp of 0.  This shouldn't happen!
Shell killed with signal 11
[jcome@horn ~]$ 
** (metacity:6194): WARNING **: AT-SPI: Accessibility bus not found - Using session bus.

[jcome@horn ~]$ 

--------------------------------------------------------------------

Comment 3 Owen Taylor 2011-02-22 14:21:16 UTC

If you have a second computer you can use to ssh into the affected system to get a backtrace, that would be very useful.

 - ssh in
 - run 

   $ ~/gnome-shell/bin/gnome-shell --replace
     (will automatically connect to the current X sesion)
   (gdb) run
   [ reproduce crash ]
   (gdb) backtrace
   [ attach results here ]

If you don't have a second computer, this may be possible to do from a different virtual terminal, but it's trickier.

Comment 4 Yu Chen 2011-03-04 17:58:11 UTC

Created attachment 182508 [details]
crash message from abrt

Comment 5 Yu Chen 2011-03-04 18:02:17 UTC

Comment on attachment 182508 [details]
crash message from abrt

seems like a mutter related issue. should I file it in mutter?

Comment 6 Yu Chen 2011-03-06 14:21:38 UTC

reproduced in other computer with nouveau driver (fedora 15)

Comment 7 Yu Chen 2011-03-19 14:44:20 UTC

After update my fedora 15 today, the issue is gone. 

Thanks you gnome-shell team for your great work.

Comment 8 Jean-François Fortin Tam 2011-03-27 16:53:48 UTC

Downstream report (?): https://bugzilla.redhat.com/show_bug.cgi?id=680207
Still seeing this happening with today's Fedora 15 packages.

Comment 9 Dan Winship 2011-03-29 19:28:12 UTC

Created attachment 184625 [details] [review]
util: fix a reentrancy problem with meta_later

Calling meta_later_add() or meta_later_remove() from within a
META_LATER_BEFORE_REDRAW callback ended up being a no-op, because of
how run_repaint_laters() was fiddling with the laters list. (This
resulted in a crash in window.c:idle_calc_repaint(), which assumed it
would only be called when a certain queue was non-empty, but was
getting called anyway because of a failed meta_later_remove() call.)

Fix this by having run_repaint_laters() work on a copy of the laters
list instead, and add refcounting to MetaLater so that removing a
later that run_repaint_laters() hasn't gotten to yet won't cause
problems.

Comment 10 Dan Winship 2011-03-29 19:29:19 UTC

Created attachment 184626 [details] [review]
window: hack around a crash

If idle_calc_showing() gets called when its queue is empty (which
shouldn't happen), just return rather than crashing.

=======

Alternative one-line hack to prevent the crash, without worrying about
bugs in the larger patch (or the existence of other code somewhere else
that depends on the buggy behavior...)

Comment 11 Dan Winship 2011-03-29 19:32:07 UTC

Oh, and with either patch, you'll notice a bug in the workspace-fixing-up animation. This is caused by the lostWorkspaces handling at the end up WorkspacesView._updateWorkspaceActors, which tries to tween workspace.actor.y to "workspace.x" which, besides being the wrong coordinate, doesn't exist because it got removed in Alex's "don't make the workspace actors be the same aspect ratio as the screen" patch. I'm not sure what it should be replaced with...

Comment 12 Owen Taylor 2011-04-01 17:04:53 UTC

I think for this release, the short patch is much less likely to reveal unexpected other problems. In fact, I might go even further and make it a silent return, since the other queues just silently ignore empty lists. But I'm OK with the g_return_if_fail() since it does indicate a logic bug.

Subsequently we need to fix the real issue.

I think the workspace.x bug might be filed elsewhere? Know it was mentioned by alexl in some context.

Comment 13 Owen Taylor 2011-04-01 17:07:24 UTC

Review of attachment 184626 [details] [review]:

I'd like it if the subject was more specific

 window: don't crash on spurious calls to idle_calc_showing()

Say. Other than that, I'm OK with this pending r-t approval.

Comment 14 Dan Winship 2011-04-02 11:35:15 UTC

Comment on attachment 184626 [details] [review]
window: hack around a crash

pushed the hacky patch, with the better commit message

Comment 15 Dan Winship 2011-05-16 10:35:43 UTC

poke

Comment 16 Colin Walters 2011-05-16 21:24:21 UTC

Oh interesting; this patch probably explains a weird crash I saw (somewhere, I forget) where _pendingAppIds was null in appDisplay.js, and I couldn't see where that was happening.  But it would make sense if the meta_later_remove call was a no-op.

Comment 17 Owen Taylor 2011-07-09 04:13:31 UTC

Review of attachment 184625 [details] [review]:

Generally, definitely good to fix this up

::: src/core/util.c
@@ +719,3 @@
+    {
+      later->notify (later->data);
+      later->notify = NULL;

Don't think this should happen until the refcount goes to zero - freeing the data of a callback currently being executed is dangerous; but you need to clear ->func here or otherwise mark the Later as destroyed since otherwise laters from laters_copy could be invoked after they are destroyed.

@@ +774,3 @@
         }
       else
+        meta_later_remove (later->id);

don't love the o(n^2) for accumulating a lot of one-shot laters, but simplicity is worth a lot, so I think the copy approach is probably better than something more complex along the lines of GHookList where we are careful to not mess up our iteration.

(I was already taking an easy way out by just removing everything and adding it back in - since that ended up being appends, it was also O(n^2), just O(n^2) in the number of non-one-shots rather than the number of one-shots)

@@ +811,2 @@
       destroy_later (later);
+      unref_later (later);

rather than adding an unref_later after every destroy_later, shouldn't destroy_later remove the initial reference itself?

And that implies that unref_later() shouldn't call destroy_later() - the only way we should ever get to a zero reference count is if destroy_later() has already been called.

Comment 18 Dan Winship 2011-07-12 17:43:40 UTC

(In reply to comment #17)
> don't love the o(n^2) for accumulating a lot of one-shot laters, but simplicity
> is worth a lot, so I think the copy approach is probably better than something
> more complex along the lines of GHookList where we are careful to not mess up
> our iteration.

So given that we're not using META_LATER_IDLE or META_LATER_RESIZE anywhere, I was going to suggest just getting rid of the later machinery altogether and just using clutter_threads_add_repaint_func() directly. Except that the repaint_func methods have exactly the same problem with callbacks-modifying-the-list as meta_later does/did.

Maybe glib needs a GModifiableWhileIteratingList type... I know evolution and libsoup both define one as well...

Also, why do we need to start a ClutterTimeline? If all we care about is that the BEFORE_REDRAW later runs before the next redraw, then if the clock isn't running, can't we just wait until it starts again before running? Or if we want BEFORE_REDRAW to mean "asap, and in particular before a redraw" (which I guess we probably do), then we could handle it the way the code currently handles META_LATER_RESIZE.

Comment 19 Dan Winship 2011-07-12 17:47:51 UTC

Created attachment 191828 [details] [review]
util: fix a reentrancy problem with meta_later

fixes from review

Comment 20 Owen Taylor 2011-07-12 19:25:03 UTC

(In reply to comment #18)
> (In reply to comment #17)
> > don't love the o(n^2) for accumulating a lot of one-shot laters, but simplicity
> > is worth a lot, so I think the copy approach is probably better than something
> > more complex along the lines of GHookList where we are careful to not mess up
> > our iteration.
> 
> So given that we're not using META_LATER_IDLE or META_LATER_RESIZE anywhere, I
> was going to suggest just getting rid of the later machinery altogether and
> just using clutter_threads_add_repaint_func() directly. Except that the
> repaint_func methods have exactly the same problem with
> callbacks-modifying-the-list as meta_later does/did.
> 
> Maybe glib needs a GModifiableWhileIteratingList type... I know evolution and
> libsoup both define one as well...

GHookList was meant to be an abstraction in that area. Complete API disaster. With a lot of hands-on help from Tim Janik I was able to figure out how to use it from GMainLoop, but we shortly after dumped it and wrote the GMainLoop code from scratch. Not sure if a better abstraction would be possible.

> Also, why do we need to start a ClutterTimeline? If all we care about is that
> the BEFORE_REDRAW later runs before the next redraw, then if the clock isn't
> running, can't we just wait until it starts again before running? Or if we want
> BEFORE_REDRAW to mean "asap, and in particular before a redraw" (which I guess
> we probably do), then we could handle it the way the code currently handles
> META_LATER_RESIZE.

BEFORE_REDRAW means "after finishing all the other stuff (like event processing) that you would do before redraw". This is basically universally what you want for any sort of batching. Just occasionally (like the code to bind texture pixmaps) you might mean "before the next redraw, whenever that is", but that's rare.

So that's the big thing that we're getting over clutter_threads_add_repaint_func() - we have something that acts like g_idle_add() and gets run deterministically, rather than an awkward abstraction that not only isn't reentrant, but you also have to force to run.

(The reason to use a timeline over clutter_actor_queue_redraw(stage) or whatever, is that that clutter_actor_queue_redraw(stage) would invalidate the whole stage, while we should draw only the actual damaged area, and redraw nothing if nothing gets damaged.)

META_LATER_RESIZE is actually used in window.c. META_LATER_IDLE seemed useful, but I later figured out that "at idle" means "however much later you want" - especially before the clipped and obscured-stacking optimizations it was very possible for a redrawing client to keep mutter 100% occupied and never to go fully idle.


- Owen

Comment 21 Owen Taylor 2011-07-12 19:48:23 UTC

Review of attachment 191828 [details] [review]:

Looks good to me

Comment 22 Dan Winship 2011-07-13 14:40:46 UTC

Attachment 191828 [details] pushed as cd7a968 - util: fix a reentrancy problem with meta_later