After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 653833 - Gnome shell hangs after fast user switch [intel i915]
Gnome shell hangs after fast user switch [intel i915]
Status: RESOLVED FIXED
Product: gnome-shell
Classification: Core
Component: general
3.0.x
Other Linux
: High critical
: ---
Assigned To: gnome-shell-maint
gnome-shell-maint
Depends on:
Blocks:
 
 
Reported: 2011-07-01 17:24 UTC by Steeve McCauley
Modified: 2012-03-19 03:00 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tweener: clean up some dead code (1.99 KB, patch)
2012-02-13 03:41 UTC, Ray Strode [halfline]
none Details | Review
tweener: don't blow up after 16 minutes of blocked animations (4.28 KB, patch)
2012-02-13 03:41 UTC, Ray Strode [halfline]
none Details | Review
tweener: use one ticker for each tween (5.72 KB, patch)
2012-02-16 01:08 UTC, Ray Strode [halfline]
none Details | Review
tweener: adapt to new gjs tweener api (2.31 KB, patch)
2012-02-16 01:09 UTC, Ray Strode [halfline]
none Details | Review
tweener: make timeline loop indefinitely (3.53 KB, patch)
2012-02-16 18:44 UTC, Ray Strode [halfline]
committed Details | Review

Description Steeve McCauley 2011-07-01 17:24:06 UTC
After swtiching users (either with Ctrl+Alt+F? or the Switch user menu item) gnome shell sometimes hangs (clicking on the User menu displays an underline, but the menu does not appear).  Alt+F2 then 'r' <Enter> restarts the shell most of the time, occasionally I am forced to do a Ctrl+Alt+Backspace to restart X.

This is the info about my graphics driver from lspci,

00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 12) (prog-if 00 [VGA controller])
        Subsystem: Acer Incorporated [ALI] Device 0297
        Flags: bus master, fast devsel, latency 0, IRQ 43
        Memory at fb800000 (64-bit, non-prefetchable) [size=4M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at ec00 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915
Comment 1 Rui Matos 2011-07-01 17:31:31 UTC
Might be related to bug 651378 ?
Comment 2 Steeve McCauley 2011-07-01 17:59:02 UTC
Not sure if it would be related to 651378, as it never recovers.

I did see this in the .xsession-errors file,

(gnome-shell:14421): Clutter-CRITICAL **: clutter_actor_queue_relayout: assertion `CLUTTER_IS_ACTOR (self)' failed
** Message: Error: Stream contains no data.
gsttypefindelement.c(954): gst_type_find_element_activate (): /GstPlayBin2:play/GstURIDecodeBin:uridecodebin0/GstDecodeBin2:decodebin20/GstTypeFindElement:typefind:
Can't typefind empty stream
Comment 3 Dave Airlie 2011-11-17 20:17:08 UTC
I think we worked this out to being a 16-minute animation timeout, if you are seeing the same issue.

/usr/share/gnome-shell/js/ui/tweener.js has a 1000*1000 number in it, try bumping t that up a long way and see if it reoccurs.
Comment 4 hanckmann 2012-01-02 18:55:37 UTC
I experience the same issue.
Can anyone confirm if the proposed solution is valid? Will this be permanently fixed in future releases?
Comment 5 hanckmann 2012-01-02 18:57:59 UTC
(I forgot to mention:)
And if the solutions works, then please describe it in a more elaborate way. I did not manage make the change.
Comment 6 Milan Bouchet-Valat 2012-01-04 14:44:55 UTC
Just open /usr/share/gnome-shell/js/ui/tweener.js using a text editor as root, and replace "1000*1000" with e.g. "2", and check this fixes the bug (after restarting the Shell).

But Dave, why do you think that's the problem?
Comment 7 Dave Airlie 2012-01-04 14:50:38 UTC
Because myself and Ray worked it out one day on irc.

The timeout occurs while gnome-shell is vt switched and is blocked from rendering, something gets badly stuck, you want to change it to a much larger number just to make it take days instead.

Ray said he'd fix it someday :-)
Comment 8 Milan Bouchet-Valat 2012-01-04 14:52:47 UTC
Ah, so my explanation is wrong, it should be "replace 1000*1000 with 10000000000". Thanks for the details.
Comment 9 Ray Strode [halfline] 2012-01-19 18:02:51 UTC
yea, i gotta look into this...
Comment 10 Ray Strode [halfline] 2012-02-13 03:40:31 UTC
Someone else hit this today so I took a quick look at this.  I wrote a patch that I think should fix it, though, I haven't tried to reproduce the problem yet to confirm the patch works.  I'll post the patch now, so I don't lose it, and test it later.
Comment 11 Ray Strode [halfline] 2012-02-13 03:41:42 UTC
Created attachment 207426 [details] [review]
tweener: clean up some dead code

The tweener code maintains a _startTime property that
it no long uses and also has a lingering comment about
a currentTime property that has long since been removed.

This commit just mops up that remnant cruft.
Comment 12 Ray Strode [halfline] 2012-02-13 03:41:45 UTC
Created attachment 207427 [details] [review]
tweener: don't blow up after 16 minutes of blocked animations

Clutter.Timeline requires a duration be passed to it at construct
time.  Once that duration has elapsed, the timeline completes
and any associated animations stop.

Tweener uses a Clutter.Timeline, but doesn't really care about this
duration.  It has its own logic for determining when to stop the
animations it's driving.  The timeline object is merely being used as
a source for getting regular "ticks" to know when to draw the next
frame. As far as the tweener code is concerned, the only requirement
on duration is that it is at least as long as the animations being
tweened.

Given the specific duration of the timeline isn't that
important, it was arbitrarily picked to be a 1000 seconds. Under
normal circumstances, animations should be less than a second, so
this is a more than reasonable upper bound.

There are circumstances where the timeline can be blocked
longer than 1000 seconds, however.  For instance, if the user's
laptop is suspended, it may stay suspended indefinitely long.
Likewise, if the machine is fast user switched to another X server,
the DRI lock will prevent clutter from drawing until the user
swiches back. In either of these cases, the clutter timeline may
stop ticking before the tweener animation completes and the shell
will stall forever waiting for its pending animation to finish.

This commit gives the tweener code one last tick in the event the
timeline expires, so that it can finish off any pending animations.
Comment 13 Mantas Mikulėnas (grawity) 2012-02-14 15:25:58 UTC
I seem to have the same issue with gnome-shell 3.2.2.1 ("Gallium 0.4 on AMD CEDAR", ATI Radeon). It does not hang forever, though; it recovers after 20-30 seconds.

Applying patches #11 & #12 did not fix the issue.

Am I hitting the same bug or a different one?
Comment 14 Mantas Mikulėnas (grawity) 2012-02-14 15:27:47 UTC
(Forgot to include: this happens whenever I switch VTs or suspend the system; and after returning gnome-shell *always* hangs, no exceptions.)
Comment 15 Ray Strode [halfline] 2012-02-14 16:57:29 UTC
Sounds different.  comment 13 almost sounds like bug 651378 but that's fixed I think.
Comment 16 Ray Strode [halfline] 2012-02-14 17:05:34 UTC
Comment on attachment 207427 [details] [review]
tweener: don't blow up after 16 minutes of blocked animations

<airlied> halfline: just got hung gdm login on f-u-s with new package
<halfline> airlied: bummer
<halfline> airlied: was this after more than 16 minutes ?
<airlied> halfline: yeah it was definitely > 16 mins
<airlied> I switched user a few hours ago
<airlied> then went to switch again
<halfline> okay
<halfline> i'll have to reproduce artificially with a small timeout and see why the patch isn't enough
Comment 17 Ray Strode [halfline] 2012-02-14 20:46:05 UTC
So, it's not enough to give one last tick, because animations can always get queued up right before the timeline expires, in which case they won't know they need to complete when the timeline ends.  Really ClutterTimeline isn't a good fit for tweener as it currently stands, but it's pretty important to use, since it's synchronized with the clutter paint loop.  So we have a few choices that I can think of:

1) Fix up ClutterTimeline to fit the needs of tweener better
2) Fix up tweener.js in gjs to deal with ClutterTimeline's finite duration
3) Fix up tweener.js in gnome-shell to have a UINT_MAX timeout.

Of the three choices 3) is by far the easiest, but it doesn't really fix the bug, just makes it only affect people who resume after leaving their laptops suspended for 2 months.

I'm not sure 1) makes sense, given clutter has its own builtin tweener like api, which works fine with cluttertimeline.

So that leaves 2) and what i'm currently leaning toward.
Comment 18 Owen Taylor 2012-02-15 15:08:52 UTC
Have you considered using a looping ClutterTimeline to get called at the right time, but ignore the time parameter, and use GLib.get_monotonic.time() as the time source?
Comment 19 Ray Strode [halfline] 2012-02-16 00:47:45 UTC
(In reply to comment #18)
> Have you considered using a looping ClutterTimeline to get called at the right
> time, but ignore the time parameter, and use GLib.get_monotonic.time() as the
> time source?

Yea, Jasper and I talked about that as an option.
Comment 20 Ray Strode [halfline] 2012-02-16 00:50:13 UTC
(In reply to comment #13)
> I seem to have the same issue with gnome-shell 3.2.2.1 ("Gallium 0.4 on AMD
> CEDAR", ATI Radeon). It does not hang forever, though; it recovers after 20-30
> seconds.
> 
> Applying patches #11 & #12 did not fix the issue.
> 
> Am I hitting the same bug or a different one?

I started running into this too.  It's the X server getting stuck spawning xkbcomp a bunch of times (like thousands) when first entering the vt.

I started to look into it with gdb but gave up patience after an hour or so of fighting the debugger (which got really confused by the X server's weird builtin scheduler thing)
Comment 21 Ray Strode [halfline] 2012-02-16 01:08:37 UTC
Created attachment 207712 [details] [review]
tweener: use one ticker for each tween

Tweener tries to keep one global ticker for all
animations, so that animations share a common
clock, I guess.

This isn't important in practice though, since
the ticker is normally replaced (by gnome-shell at least)
with one that uses a clutter timeline, and all clutter
timelines are sync'd to a common master clock.

Furthermore, having one ticker poses a problem. The
ticker based on clutter's timeline object have a fixed
duration and so, animations queued toward the end of a
ticker's lifetime can overflow past that ticker's life. This
can only happen in practice, if the ticker is frozen for a
while because the system is suspended or the user is VT
switched away.

This commit changes tweener to have one ticker per tween,
making the ticker shipped with the shell be more compatible
with us.
Comment 22 Ray Strode [halfline] 2012-02-16 01:09:13 UTC
Created attachment 207713 [details] [review]
tweener: adapt to new gjs tweener api

gjs now lets us have one ticker per animation.
This fixes the impedance mismatch we currently
have between tweener and Clutter.Timeline.
Comment 23 Ray Strode [halfline] 2012-02-16 01:10:52 UTC
attachment 207712 [details] [review] is for gjs and attachment 207713 [details] [review] builds on the earlier shell patches (which are still needed).
Comment 24 Ray Strode [halfline] 2012-02-16 01:16:05 UTC
the above two patches didn't take very long to do, and I don't think the proposed solution in comment 18 will take very long either, so I may go ahead and give it a try too so we can see which one we ultimately like better.
Comment 25 Ray Strode [halfline] 2012-02-16 16:58:15 UTC
Just for easy reference, Mantas filed bug 670164 to track the xkbcomp / vt switch freeze up shenangins
Comment 26 Ray Strode [halfline] 2012-02-16 18:44:01 UTC
Created attachment 207802 [details] [review]
tweener: make timeline loop indefinitely

Tweener uses a clutter timeline to manage all
active animations running at a given moment.
The timeline is mopped up when no animations
are going any more.

Clutter requires timelines to have a finite
duration, but since animations can happen at
any moment, no fixed duration can accomodate
the shell's needs.

To combat this problem, we pick a relatively
long duration: 10 seconds.  No string of animations
should take that long, so that should be good
enough in practice.

Unfortunately, this tactic fails when the user
suspends their machine, or VT switches.  A tween
can take much longer than 10 seconds to complete
in those cases and bad things will happen.

This commit changes the clutter timeline to
automatically loop back to 0, so that, despite,
it's fixed duration property, it effectively never
stops. Since the timeline loops, its
concept of elapsed time no longer increases
monotonically, so we now ignore it and track time
ourselves with GLib.get_monotonic_time().

This partially reverts commit
35764fa09e4341e79732409c4e74c226d19f780f.
Comment 27 Ray Strode [halfline] 2012-02-16 18:46:48 UTC
Okay attachment 207802 [details] [review] is the other approach.  Both seem to work okay here.

(I artificially test this by changing the timeline duration to 1.5 seconds, then starting an overview animation and quickly changing VTs, waiting, and changing back).

attachment 207802 [details] [review] has the advantage of being only 7 lines + comments, the other attachments have the advantage of being seemingly more "right" to me.

Owen, thoughts?
Comment 28 Ray Strode [halfline] 2012-03-16 17:15:27 UTC
Everywhere I wrote "10 seconds" in comment 26 I meant "16 minutes" i think rereading it.
Comment 29 Colin Walters 2012-03-17 18:21:01 UTC
Review of attachment 207802 [details] [review]:

"...so that despite its fixed duration..."

Overall makes sense to me, just some minor comments.

::: js/ui/tweener.js
@@ +241,3 @@
         // animation then only do frame dropping from there.
         if (this._startTime < 0)
+            this._startTime = GLib.get_monotonic_time();

Note that GLib.get_monotonic_time() returns microseconds, whereas Clutter.TimeLine.get_elapsed_time() returns milliseconds.

One unfortunate issue with this is how we handle 64 bit values in JS is we just convert them to regular JS numbers, i.e. doubles.  That's probably OK in this case with monotonic time because we only run into precision issues after 2^53 microseconds or 272.38 years.

@@ +253,2 @@
     getTime : function() {
+        return this._currentTime;

And so now we're making this function return microseconds too.  Nothing appears to actually call it right now, but, maybe we should divide by 1000?
Comment 30 Ray Strode [halfline] 2012-03-18 00:42:35 UTC
(In reply to comment #29)
> Review of attachment 207802 [details] [review]:
> 
> "...so that despite its fixed duration..."
> 
> Overall makes sense to me, just some minor comments.
So you like this better than the four patch approach above it?
 
> ::: js/ui/tweener.js
> @@ +241,3 @@
>          // animation then only do frame dropping from there.
>          if (this._startTime < 0)
> +            this._startTime = GLib.get_monotonic_time();
> 
> Note that GLib.get_monotonic_time() returns microseconds, whereas
> Clutter.TimeLine.get_elapsed_time() returns milliseconds.
> 
> One unfortunate issue with this is how we handle 64 bit values in JS is we just
> convert them to regular JS numbers, i.e. doubles.  That's probably OK in this
> case with monotonic time because we only run into precision issues after 2^53
> microseconds or 272.38 years.
Okay, the precision probably doesn't matter too much as long as it increments linearly, strictly monotonic, and at least precise enough to capture 60fps.
> @@ +253,2 @@
>      getTime : function() {
> +        return this._currentTime;
> 
> And so now we're making this function return microseconds too.  Nothing appears
> to actually call it right now, but, maybe we should divide by 1000?
It's called by tweener.js in gjs.
Comment 31 Ray Strode [halfline] 2012-03-19 02:40:15 UTC
Okay, I'm going to go ahead and push attachment 207802 [details] [review] (with the proposed changes).  I don't think we want this bug in 3.4 since it's pretty visible for users who suspend or user switch and the freeze is tomorrow.

I actually prefer the 4 patch approach over the one patch approach, but the one patch approach is more conservative than the other patches would be (doesn't create changes across multiple modules, doesn't change gjs api, etc).
Comment 32 Ray Strode [halfline] 2012-03-19 02:59:58 UTC
Attachment 207802 [details] pushed as 207abe9 - tweener: make timeline loop indefinitely