After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 692706 - Frequent crash in cally_stage_notify_key_focus_cb
Frequent crash in cally_stage_notify_key_focus_cb
Status: RESOLVED FIXED
Product: clutter
Classification: Platform
Component: cally
unspecified
Other Linux
: Normal normal
: ---
Assigned To: clutter-maint
clutter-maint
: 684799 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2013-01-28 13:29 UTC by Matthias Clasen
Modified: 2013-05-17 15:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Non tested patch (1.46 KB, patch)
2013-01-28 18:53 UTC, Alejandro Piñeiro Iglesias (IRC: infapi00)
none Details | Review
gnome-shell crashing with the described strategy (1.01 MB, video/ogg)
2013-02-21 22:38 UTC, Pedro F.
  Details
Use a weak pointer to hold the key focus in CallyStage (1.86 KB, patch)
2013-05-06 22:48 UTC, Emmanuele Bassi (:ebassi)
committed Details | Review

Description Matthias Clasen 2013-01-28 13:29:55 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=827158
which has quite a few duplicates

The stacktrace is here: https://bugzilla.redhat.com/attachment.cgi?id=588172
Comment 1 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-01-28 18:52:28 UTC
(In reply to comment #0)
> See https://bugzilla.redhat.com/show_bug.cgi?id=827158
> which has quite a few duplicates
> 
> The stacktrace is here: https://bugzilla.redhat.com/attachment.cgi?id=588172

Taking into account the backtrace, it seems that the problem is related with using a wrong accessible object. I say wrong because it seems to be a not-NULL. If it was just NULL, g_return_if_fail (ATK_IS_OBJECT (accessible)) at atk_object_notify_state_change would still works.

A tentative theory would be that cally-stage is using the key-focus ClutterActor object after his disposal (so also using a wrong accessible object when getting it), as the code is just using a pointer. One option would be try to add a reference of a weak_reference to workaround that. I will upload a patch implementing this. Anyway, in order to confirm that, first I need to be able to reproduce it. Taking into account redhat bugzilla, that seems really common. But other people I know that use that distro doesn't have so many problems. A wild-shot if this is related with using extensions. Some of the people on the bug report are using extensions, and I tend to not use it.
Comment 2 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-01-28 18:53:55 UTC
Created attachment 234642 [details] [review]
Non tested patch

Non tested patch based on a tentative theory (see previous comment). As soon as I got this bug reproducible with specific steps, I will confirm if this patch solves the problem. Uploading just in case someone want to test it.
Comment 3 Pedro F. 2013-01-29 05:28:11 UTC
I'm one of the reporters; I've given up on gnome-shell extensions in general because of the crashes and gnome-shell still crashes, so I'd say it's not related to extensions.

On extensions.gnome.org none is enabled. Gsettings has a value but I'm guessing it's a leftover:
$ gsettings get org.gnome.shell enabled-extensions
['Analog_Clock@l300lvl.co.nr']
Comment 4 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-01-29 10:08:59 UTC
(In reply to comment #3)
> I'm one of the reporters; I've given up on gnome-shell extensions in general
> because of the crashes and gnome-shell still crashes, so I'd say it's not
> related to extensions.

Ok, thanks for the feedback.
Comment 5 Joanmarie Diggs (IRC: joanie) 2013-01-29 13:11:29 UTC
(In reply to comment #3)
> I'm one of the reporters; I've given up on gnome-shell extensions in general
> because of the crashes and gnome-shell still crashes, so I'd say it's not
> related to extensions.

Still crashes with this particular stacktrace? And if so, I would love the steps to reproduce the crash.
Comment 6 Pedro F. 2013-01-30 22:29:14 UTC
I cannot give you the steps.
What usually happens is the feeling "I am too fast" for gnome-shell.

For example, I was able to crash it again this way (not reproducible):
1- have 3 programs open (Firefox, Google Chrome, Terminal)
2- do ALT+TAB, TAB, TAB (for example)
3- "something" happens: I am no longer pressing ALT+TAB but the window switcher does not disappear; no window switch happens;
4- I select the desired window from the window list with the mouse
5- gnome-shell crashes.

As you can see, step 3 is undefined. I don't know what triggers it.
It is a if keypresses are only partially handled and gnome-shell gets in an undefined state and crashes afterwards.


For context on why I express it that way: the _first times_ I tried gnome-shell 3.6, I would sometimes get to an inconsistent state, in which, for example, ALT+F2 box would never disappear no matter what I did: I needed to killall -HUP gnome-shell.
It hasn't happened recently and it didn't (always?) crashed the shell. So I have no backtrace of that particular situation :(
Comment 7 Pedro F. 2013-02-18 23:09:36 UTC
Try:
$ while [ 1 ]; do notify-send --expire-time=1000 --hint=int:transient:1 test ; sleep 1; done

And close clicking on 'x' the notification once or twice. Or thrice or more. Just insist. I usually trigger a crash in less than ten seconds.

Credit for the command-line to, due to a unrelated issue: https://ask.fedoraproject.org/question/575/notify-send-ignores-timeout .
Comment 8 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-02-19 12:07:36 UTC
(In reply to comment #7)
> Try:
> $ while [ 1 ]; do notify-send --expire-time=1000 --hint=int:transient:1 test ;
> sleep 1; done
> 
> And close clicking on 'x' the notification once or twice. Or thrice or more.
> Just insist. I usually trigger a crash in less than ten seconds.
> 
> Credit for the command-line to, due to a unrelated issue:
> https://ask.fedoraproject.org/question/575/notify-send-ignores-timeout .

Ok, thanks for this script.

FWIW, we were trying to reproduce this bug on the previous weeks without luck. Lets see if we are luckier this time with the script.
Comment 9 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-02-21 16:37:39 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > Try:
> > $ while [ 1 ]; do notify-send --expire-time=1000 --hint=int:transient:1 test ;
> > sleep 1; done
> > 
> > And close clicking on 'x' the notification once or twice. Or thrice or more.
> > Just insist. I usually trigger a crash in less than ten seconds.
> > 
> > Credit for the command-line to, due to a unrelated issue:
> > https://ask.fedoraproject.org/question/575/notify-send-ignores-timeout .
> 
> Ok, thanks for this script.
> 
> FWIW, we were trying to reproduce this bug on the previous weeks without luck.
> Lets see if we are luckier this time with the script.

Just tried this script with gnome-shell from master. Although I insisted, like 5 minutes or so, I was not able to reproduce it. This also applies to the gnome-shell installed at my system (ubuntu).

Could someone with the specific version of fedora (as initially was reported as a downstream bug) try it?
Comment 10 Pedro F. 2013-02-21 16:47:23 UTC
If you want to try on a Fedora system, here is what I have found out (these steps were attempted on Tuesday, 19 Feb):

Using Fedora 18 x86 Live CD, the above script will _not_ work (gnome-shell-3.6.2-something).

Using Fedora 18 x86 Live CD, after running `sudo yum install gnome-shell` and logout/login, the above script _will_ work (gnome-shell-3.6.3-something -- not on Fedora now, can't be precise).

I think this does not make sense since this was supposed to be a yet-to-be-squashed bug, but it appears to be a bug which appeared after other bugs were fixed.


Anyway, let me reiterate a more precise way to trigger the crash:

$ notify-send --expire-time=1000 --hint=int:transient:1 test ; notify-send --expire-time=1000 --hint=int:transient:1 test ;

You _have_ to click on the 'x' on both notifications. As soon as the 'x' is hovered on the second notification, gnome-shell will crash.
Comment 11 Joanmarie Diggs (IRC: joanie) 2013-02-21 17:39:11 UTC
> You _have_ to click on the 'x' on both notifications. As soon as the 'x' is
> hovered on the second notification, gnome-shell will crash.

Does it need to be a live cd? (i.e. I tried it on my running F18 with gnome-shell 3.6.3) I clicked on the 'x' for both notifications but there was no crash.
Comment 12 Pedro F. 2013-02-21 22:38:48 UTC
Created attachment 237121 [details]
gnome-shell crashing with the described strategy

The 'x' didn't appear on the second notification so I had to do the command twice.

I did _not_ click on the 'x' the second time it appeared.

The only reason I can think of for it not crashing joanie's PC may be:
1) architecture? most devs tend to use amd64, I use x86.
2) previously existing, unrelated, unexpected libs which are affecting current new ones (for example, see RHBZ #817841 as an example of weird interactions @ https://bugzilla.redhat.com/show_bug.cgi?id=817841 -- though I always thought I wouldn't hit such kind of weird bug again anytime soon).
Comment 13 Kalev Lember 2013-02-25 12:41:59 UTC
According to the Fedora retrace server, it looks like an i686-only crash indeed.

3080 crashes on i686 and 0 on x86_64, https://retrace.fedoraproject.org/faf/problems/20295/
Comment 14 Pedro F. 2013-03-09 21:53:10 UTC
Still happens on gnome-shell-3.6.3.1-1.fc18.i686 .
Comment 15 Pedro F. 2013-03-12 08:23:10 UTC
Also on Dev branch: also happens on gnome-shell-3.7.90-1.fc19.i686 (using GNOME-3.7.90.iso)
Comment 16 Fabio Durán Verdugo 2013-04-07 16:00:25 UTC
*** Bug 684799 has been marked as a duplicate of this bug. ***
Comment 17 Pedro F. 2013-05-05 22:46:00 UTC
And also on gnome-shell-3.8.1-1.fc19.i686 .

Do you want me to keep reporting at each minor version if it still occurs?
Comment 18 Emmanuele Bassi (:ebassi) 2013-05-06 17:58:44 UTC
even if we're discounting the idea of extensions, can anybody experiencing this crash test the attached patch and see if it fixes the issue?

the stack trace in https://bugzilla.redhat.com/show_bug.cgi?id=827158 seems to be half-decent, even if it has far too many optimized out values.
Comment 19 Kalev Lember 2013-05-06 18:52:22 UTC
I have done a test build for Fedora 19 with the patch applied. Can anybody who's experiencing this crash please give it a try?

Direct link to the patched rpm:
http://kojipkgs.fedoraproject.org//work/tasks/6336/5336336/clutter-1.14.2-1.fc19.kalev0.i686.rpm

... or if you need other subpackages:
http://koji.fedoraproject.org/koji/taskinfo?taskID=5336334

(The links will be valid only for a few days before getting garbage collected by koji)
Comment 20 Pedro F. 2013-05-06 22:28:33 UTC
With clutter-1.14.2-1.fc19.kalev0.i686.rpm the crash does NOT occur :)
Comment 21 Emmanuele Bassi (:ebassi) 2013-05-06 22:39:43 UTC
Review of attachment 234642 [details] [review]:

::: clutter/cally/cally-stage.c
@@ +166,3 @@
    */
   self->priv->key_focus = key_focus;
+  g_object_weak_ref (G_OBJECT (key_focus),

I guess this could be replaced by a simpler:

  g_object_add_weak_pointer (G_OBJECT (key_focus), (gpointer *) &(self->priv->key_focus));

without requiring a full function.
Comment 22 Emmanuele Bassi (:ebassi) 2013-05-06 22:48:34 UTC
Created attachment 243441 [details] [review]
Use a weak pointer to hold the key focus in CallyStage

a slightly reworked patch that uses weak pointers; testing appreciated.
Comment 24 Pedro F. 2013-05-07 09:22:32 UTC
The crash also does _not_ happen with clutter-1.14.2-1.fc19.kalev1.i686.rpm :)
Comment 25 Alejandro Piñeiro Iglesias (IRC: infapi00) 2013-05-09 10:05:43 UTC
So we have a more mature patch (thanks to Emmanuele) that after testing (thanks to Kalev and Pedro F.) all seems to work finely. Any objection to commit the patch and closing the bug?
Comment 26 Kalev Lember 2013-05-17 14:23:33 UTC
Any thoughts on committing the patch? I'd be happy to backport it to both F18 and F19 if we can wrap this up upstream.

P.S. We are at 20836 reported crashes for F18, according to https://retrace.fedoraproject.org/faf/problems/20295/
Comment 27 Emmanuele Bassi (:ebassi) 2013-05-17 14:26:39 UTC
I don't have any objection on committing the patch; I was just waiting for API to review it. :-)

I can push it to clutter-1.16 and clutter-1.14.
Comment 28 Emmanuele Bassi (:ebassi) 2013-05-17 14:34:21 UTC
okay, pushed.