After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 687364 - first entry to overview blocks compositor for a moment
first entry to overview blocks compositor for a moment
Status: RESOLVED OBSOLETE
Product: gnome-shell
Classification: Core
Component: general
3.6.x
Other Linux
: Normal normal
: ---
Assigned To: gnome-shell-maint
gnome-shell-maint
: 687849 691789 693446 (view as bug list)
Depends on:
Blocks: 687362
 
 
Reported: 2012-11-01 18:13 UTC by Simon McVittie
Modified: 2015-04-27 15:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
strace log while opening the overview for the first time (854.57 KB, text/plain)
2012-11-01 18:38 UTC, Simon McVittie
Details
strace from run 2 (659.69 KB, text/plain)
2012-11-05 15:31 UTC, Simon McVittie
Details
matching sysprof from run 2 (184.42 KB, application/octet-stream)
2012-11-05 15:32 UTC, Simon McVittie
Details
strace with patches from #687465 (529.50 KB, text/plain)
2012-11-12 19:00 UTC, Simon McVittie
Details
corresponding sysprof (777.20 KB, application/octet-stream)
2012-11-12 19:00 UTC, Simon McVittie
Details

Description Simon McVittie 2012-11-01 18:13:10 UTC
+++ This bug was initially created as a clone of Bug #687362 +++

I'm using an instrumented GLib with access()-based "trace points" similar to [ACCESS], with gnome-shell under strace - so my numbers are likely to be larger than they would be in real life.

The first time the Shell enters the overview, it does not get back to the GLib main loop for a moment (~1.6 seconds, with the instrumentation I mentioned), after performing all this I/O inside dispatch(). During this time, the compositor can't render frames.

Main thread:
* several X11 (?) round trips
* stat() on each icon directory (/usr/share/pixmaps/gnome,
  /usr/share/icons/gnome, /home/smcv/.icons/hicolor, etc.)
* starting several image-loading threads
* stat() on index.theme in some of the locations I mentioned
* mutex contention with the image-loading threads

Image-loading threads:
* actually loading the images, obviously
* mutex contention with each other and the main thread

[ACCESS]: http://people.gnome.org/~federico/news-2006-03.html#timeline-tools
Comment 1 Simon McVittie 2012-11-01 18:25:38 UTC
Some possible ways this could be addressed (some of these are probably unworkable, I'm erring on the side of "suggest it so people can reject it"):

* Make the application-chooser a separate process with a (transparent?) X11
  window, with a special case in compositing so it's only composited into
  the overview rather than being treated as an application window?
  (Would this mean that libgnome-menus could live in that process, and
  not in the main Shell process? I haven't dug into how gnome-menus is
  used yet.)

* stat() things in a secondary thread so these syscalls can't block the
  main loop?

* Put all icon loading on a thread (probably that same secondary thread?)
  and make it synchronous, rather than using the async GIO functions
  that end up in a generic thread pool, so there's less lock contention?

* Make a loading "wishlist" for the secondary thread and hand it off
  atomically, so there's only one unlock/lock handshake?

* Lazily load application icons in a background thread while the Shell
  is otherwise idle, so that opening the overview is quicker?
  (Not necessarily particularly useful: I would expect that typical Shell
  users would start their login session by opening the overview to look
  for a document or application anyway, at which point the Shell is
  waiting for the icons.)

* Eagerly load application icons while Shell starts up? (This directly
  conflicts with the goal of making Shell startup quicker.)

Bug #671406 is related: if as much I/O as possible was happening in a background thread or process, the compositor thread could display a spinner or something.
Comment 2 Simon McVittie 2012-11-01 18:38:30 UTC
Created attachment 227838 [details]
strace log while opening the overview for the first time

The format is according to strace -ttt -f:

{process (thread) ID} {timestamp} {syscall} = {result}

(I should have remembered to use -T as well, I'll do that next time.)

Process 1739 is the main thread.

"Process" 1768 is the GDBus worker thread.

access("START: ..."), access("END: ...") and access("MARK: ...") syscalls are from the GLib instrumentation I added in order to diagnose sources of blocking; they wouldn't happen "in real life".

You can see the main thread doing one dispatch() from lines 7 to 14, a second from lines 19 to 32, the one I'm concerned about from lines 37 to 1755, and some less eventful dispatch() calls after that.

The backtrace after line 1755 indicates where the offending GSource was added: the GSource itself is gdk_event_source_dispatch().

The ioctls on fd 10 are DRI, presumably. I think fd 6 is X11.

fd 8 is the GWakeup for the GDBus thread. fd 7 is certainly GDBus; fds 20, 21, 30, 37 are probably GDBus too.

fds 13, 25 are D-Bus in the main thread - probably the session and system buses for dbus-glib.
Comment 3 Simon McVittie 2012-11-05 15:31:39 UTC
Created attachment 228129 [details]
strace from run 2

Here's another run of essentially the same thing. I had no windows open and did not search for any applications; I did have the extra instrumentation in GLib, and both sysprof and strace attached to gnome-shell.

Profiling highlights (timings are non-cumulative):

6.58% additional_selector_matches_style()
5.43% add_matched_properties()
3.43% sel_matches_style_real()
3.43% st_theme_node_get_type()
Comment 4 Simon McVittie 2012-11-05 15:32:05 UTC
Created attachment 228130 [details]
matching sysprof from run 2
Comment 5 Simon McVittie 2012-11-12 18:59:01 UTC
(In reply to comment #0)
> The first time the Shell enters the overview, it does not get back to the GLib
> main loop for a moment (~1.6 seconds, with the instrumentation I mentioned)

With my patches from Bug #687465, this is down to 833ms.
Comment 6 Simon McVittie 2012-11-12 19:00:18 UTC
Created attachment 228805 [details]
strace with patches from #687465
Comment 7 Simon McVittie 2012-11-12 19:00:41 UTC
Created attachment 228806 [details]
corresponding sysprof
Comment 8 Simon McVittie 2012-11-13 17:41:56 UTC
(In reply to comment #0)
> I'm using an instrumented GLib with access()-based "trace points" similar to
> [ACCESS], with gnome-shell under strace - so my numbers are likely to be
> larger than they would be in real life.

There is indeed considerable tracing overhead there. Unfortunately, without the higher-overhead tracing, I can't see what bad things are going on in the main thread...

With more restrained instrumentation (the minimum needed to make GLib log long dispatch() calls as a g_warning(), no strace, no sysprof), opening the overview for the first time typically blocks for one long dispatch() and one even longer dispatch(), followed by 1-2 dispatch()es close to the 20ms reporting threshold.

Average of 3 runs each, without and with the patches from Bug #687465:

Long dispatch:   141ms (std.dev 3ms) down to  61ms (std.dev. 2ms)
Longer dispatch: 189ms (std.dev 4ms) down to 156ms (std.dev. 3ms)

(In reply to comment #1)
> * Put all icon loading on a thread (probably that same secondary thread?)
>   and make it synchronous, rather than using the async GIO functions
>   that end up in a generic thread pool, so there's less lock contention?

I tried this with the same profiling, and got a mean of 61, 157 ms (across 3 runs) for the first two long dispatches (std.dev: 2, 2 respectively). So it's the same timing as the thread-pool, or possibly fractionally slower, making this only worth doing if it simplifies the implementation (but, equally, not worth avoiding if it makes something else easier).
Comment 9 Jasper St. Pierre (not reading bugmail) 2012-11-27 23:51:41 UTC
*** Bug 687849 has been marked as a duplicate of this bug. ***
Comment 10 drago01 2013-02-14 21:28:30 UTC
*** Bug 693446 has been marked as a duplicate of this bug. ***
Comment 11 drago01 2013-03-03 19:16:40 UTC
*** Bug 691789 has been marked as a duplicate of this bug. ***
Comment 12 Emmanuel Touzery 2013-03-13 07:33:13 UTC
i otherwise think this was a regression in 3.6 compared to 3.4. i never saw that in 3.4 and i see it at every single login in 3.6.
Comment 13 Emmanuel Touzery 2013-07-11 06:32:13 UTC
This problem is much less pronounced, to the point that i can say it has disappeared for me in 3.8, on the same two computers on which I had a big problem in 3.6.
Comment 14 Jean-François Fortin Tam 2014-01-03 22:13:54 UTC
Simon, are you still seeing this in 3.10?
Comment 15 Simon McVittie 2014-01-06 11:48:39 UTC
Sorry, I'm still on Debian's Shell 3.8.4-5, and not actively working on this issue.

To reproduce: hard-disk based laptop, Alt+F2, "r", wait for hard disk activity to cease, press Windows key.

On 3.8, first entry to the overview does still seem to pause frame output for ~ half a second - I don't see the animated transition from normal window positions to their overview positions, just a pause in output followed by the windows abruptly "teleporting". The second and subsequent entries to the overview are noticeably smoother, with enough frame rate for the animated transition to look good.

Has something specific been done to improve this between 3.8 and 3.10, or are you hoping that it has improved "accidentally"?
Comment 16 Jasper St. Pierre (not reading bugmail) 2014-01-06 16:17:10 UTC
(In reply to comment #15)
> Has something specific been done to improve this between 3.8 and 3.10, or are
> you hoping that it has improved "accidentally"?

https://bugzilla.gnome.org/show_bug.cgi?id=712727
https://bugzilla.gnome.org/show_bug.cgi?id=712753
Comment 17 Simon McVittie 2015-04-27 15:57:02 UTC
Successive Shell versions generally get better, and I am not actively working on improving this. There is probably still room for improvement (the test I described on Comment #15 still doesn't animate smoothly the first time in Debian's Shell 3.14), but this bug doesn't have a whole lot of value without someone actively working on tracking down individual causes in a more current version; resolving OBSOLETE on that basis.