GNOME Bugzilla – Bug 687362
[metabug] avoid blocking the compositor thread
Last modified: 2021-07-05 14:17:44 UTC
[I'm going to treat this as a metabug and file clones for concrete bugs, if that's OK?] The compositor (which currently lives in the main thread of GNOME Shell) is a single point of failure for rendering applications: if its thread becomes blocked in a syscall, nothing can be output. One concrete symptom which I think can be attributed to this, for instance: * Boot a machine with GNOME 3 (I used a mixture of 3.4 and 3.6.1 from Debian experimental, including Shell 3.6.1) * Log in * Wait for I/O to cease * Play a video in a small windowed Totem * Open the Shell calendar menu by clicking on the time * Observe the video in the window apparently freezing for a moment I have some strace traces for concrete situations in which I/O is clearly happening in the main thread, with the longest being 1606ms while opening the overview for the first time (although that time is exaggerated by adding access()-based trace points and running under strace - it wouldn't normally have taken that long).
I should point out that I have a set of local patches that reduce the number of synchronous construction of DBus proxies. They're currently blocked on the new GDBus bindings for gjs.
(In reply to comment #1) > I should point out that I have a set of local patches that reduce the number of > synchronous construction of DBus proxies. They're currently blocked on the new > GDBus bindings for gjs. Sounds useful. Is the thing they're blocked on Bug #669350? Do you have a work-in-progress branch in git somewhere, or a bug, representing that? At some point I'm going to grep the strace output for obvious synchronousness (including construction of GDBus proxies), ignoring timing, to look for things that'll be easy to fix - it'd be great if I could compare with your branch cross off some of those as "already being dealt with". My approach to this bug at the moment is that I'm going to list both the biggest pauses in descending order (#687364 is the worst of these) and the most obvious instances of blocking, then work on fixing whichever look as though they'll have the best work:result ratio.
Ok, I rebased and pushed wip/gdbus-2 for gnome-shell and gjs, which addresses most GDBus proxies + NetworkManager objects (using dbus-glib).
x-kuse me, does that mean that if the Shell core or one of its extensions does something SLOW (stacked in a syscall), then it doesn't harm window manager/compositor responsiveness anymore? Just for non-specialists...
(In reply to comment #4) > does that mean that if the Shell core or one of its extensions does > something SLOW (stacked in a syscall), then it doesn't harm window > manager/compositor responsiveness anymore? No. If Shell core or an extension does something slow and synchronous in the compositor thread, the compositor becomes unresponsive. That's a fact that cannot be changed. This (meta)bug is about identifying particular slow things that take place in the compositor thread, and avoiding them - which could mean moving them to another thread or process, making them asynchronous, making them faster, breaking up a large task into smaller sub-tasks that can be distributed between several frames, or whatever.
Did anyone consider pushing extensions out of the compositor thread? Sorry for the lame question...
In our current threading model, JavaScript can only be one in run thread.
(In reply to comment #7) > In our current threading model, JavaScript can only be one in run thread. Can't find a bug on this. IMHO it's better to fix the root of the problem, i.e. push extensions to another thread after having JS run multithreaded.
(In reply to comment #8) > IMHO it's better to fix the root of the problem, i.e. push extensions to > another thread after having JS run multithreaded. Our current code is not thread-safe, nor would requiring extensions to be thread-safe make any sense. Even without extensions, there's a lot of stuff we can do to not block the compositor thread.
Smells a big regression to me (comparing to Gnome2). Sorry for offtop.
You're free to try it if you want.
I believe bug #686502 belongs in the blockers here as well
(In reply to comment #9) > (In reply to comment #8) > > IMHO it's better to fix the root of the problem, i.e. push extensions to > > another thread after having JS run multithreaded. > > Our current code is not thread-safe, nor would requiring extensions to be > thread-safe make any sense. > > Even without extensions, there's a lot of stuff we can do to not block the > compositor thread. I wonder why the compositor and the panel share one process. I remember in GNOME2, metacity and gnome-panel are two different processes, so that even one blocks, the other works still. Also, I know that JS can currently not run multi-threaded, however, why not push all the JS activity to another thread? I have never read GNOME's source code but I believe that it is possible.
The problem can also be seen when attempting to run GLib.spawn_command_line_sync or, GLib.spawn_command_line_async, eg if these commands are executed every second while playing a video, you can see how the video stops for a milliseconds. In my opinion it is essential add multiprocessing to the desktop in order to solve these problems.
Spawning something async is multiprocessing; it should not and does not freeze the desktop here. Spawning something synchronously will block the main thread, yes, but we don't do that anywhere in gnome-shell.
I don't know if GLib.spawn runs in a separate thread, but even if it does, the time it takes to prepare the context, it seems to be too high, at least is that happens in Cinnamon (Not really tested in Gnome-Shell) but should be equal. I am testing right now a proxy using GDBus to execute commands "delayed" in some auxiliary process. Maybe proves a good idea, I'll have to try. Thanks...
If you have an extension or something like that uses GLib.spawn_sync or similar in the gnome-shell process, that's your fault. Use spawn_async. I cannot notice any lag here.
(In reply to comment #17) > If you have an extension or something like that uses GLib.spawn_sync or similar > in the gnome-shell process, that's your fault. Use spawn_async. I cannot notice > any lag here. wow, what a solid position! Then you must confess this bug is the [regression] because GNOME2 had a very good isolation of main processes from the applets. GNOME3 is very easily offended by extensions.
Suppose that a single instruction takes p time to run and I'll run q numbers of instructions, then the total execution time is pxq. If pxq is a relatively high, because I do many things (All I do is with the greatest care not to use any instruction "malicious" to GNOME 3.0 that does not like), then I have a desktop increasingly slow. What is the conclucion? Do I need to buy a faster processor? I believe if in GNome 3.0 we can have a processes collection. This properly separate in independent threads, considering is not left blocking an important process, only for the execution of something as trivial as a spawComandLine. Now I have an example of how I can separete the process using GDBus but, should we fall for this? It's no better unable to create a better distribution of all process ....? Mmm sorry, my mistake. I should say, a distribution of all process, because currently there is only one task to the entire desktop. So better than that, I think all things. Greetings.
Correction, will not solve anything with better processor because modern processors have the same frequency, only have more cores. It's the requeriments to be execute modern desktops. Gnome Shell it's not included in them, not because it is not modern, but by the number of processes that he have. My example of GDBus: https://github.com/lestcape/Drives-Manager-Temp/tree/master/drivesManager%40lestcape/daemon
Can we keep the off-topic ramblings to a minimum please?
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/ Thank you for your understanding and your help.