GNOME Bugzilla – Bug 777255
fontconfig update racy, causes update storm that freezes the desktop
Last modified: 2017-08-08 17:16:55 UTC
I'm trying to debug my desktop freezing often when I update fonts, and I think I managed to reduce it to this: 1. Something changes ~/.local/share/fonts. 2. GSD notices and reinitializes fontconfig, which updates the cache. 3. GSD sends out an XSettings update. 2. Something changes ~/.local/share/fonts, again. 3. All other Gtk processes receive the XSettings update and reinitialize fontconfig. This includes the shell. 4. The processes all discover that ~/.local/share/fonts has changed and attempt to update the cache. This needs a lock on the cache. 5. The processes wait on the lock, one by one rescanning the fonts in the dir. Since the shell takes part, the desktop is frozen until the shell has finished its rebuild. If the dir contains a lot of fonts and is slow to scan, the desktop can be frozen for quite a while. Running "touch abc; sleep 0.1; rm abc" in ~/.local/share/fonts quite reliably freezes the shell for a second. Doing this in /usr/share/fonts/noto/ (which contains the entire Noto set including CJK) freezes the desktop for over two and a half minutes. Possibly related to bug #768717 I believe this is a regression from the removal of the timeout in bug #748776, or at least the timeout made this a lot unlikelier.
CC'ing Behdad in the hopes he takes a glance.
Likely a dupe of bug 773556, but there's no versioning information in the bug report to assert whether or not the patch is there. Please test that patch.
Ah, sorry; g-s-d 3.22.1, GTK 3.22.6, fontconfig 2.12.1. Bug #768717 sounds like a dupe of bug #773556. Adding commit f1b8257 onto g-s-d 3.22.1 doesn't help.
Maybe fontconfig can at least mitigate the problem; posted another bug at https://bugs.freedesktop.org/show_bug.cgi?id=100096
Maybe disabling the updates of caches on just-in-use case may be needed and laeve updating caches to applications. so they can update anytime when they want to do.
Ok let's get to the bottom of this. Originally there was a 2 second quiet period before gsd would update fontconfig cache and message apps. I removed that in 2015: https://bugzilla.gnome.org/show_bug.cgi?id=748776 Apparently my comment below was not quite accurate? """ commit 578524da528e554fddfa6cf070554984c0ebfcfd Author: Behdad Esfahbod <behdad@behdad.org> Date: Sun May 3 10:37:17 2015 -0400 xsettings: Remove update timeout GIO's GFileMonitor already delays directory monitor notifications, so directory-level changes will already be coalesced even if there's a lot of changes happening. Monitoring files (which happens for the XML config files) are signalled immediately now. https://bugzilla.gnome.org/show_bug.cgi?id=748776 """ supposedly that was fixed in https://bugzilla.gnome.org/show_bug.cgi?id=773556 Do we know if the storm was happening with the fix for bug 773556 or without? Is the wait for CHANGES_DONE good enough, or should we allow for more quiet period checking? I'm not sure about the supposed "waiting" of apps while other apps are updating the cache. I don't think there's any code to wait. It might just be that the CPU is taken? From what I understand, each process will try updating the cache, builds a new one in a temp file, and tries to move it to the final destination. if that fails, it just moves on. I'm going to follow up with my plan in: https://bugs.freedesktop.org/show_bug.cgi?id=64766#c47 to make scanning 10x faster. That should help a lot...
(In reply to Behdad Esfahbod from comment #6) > Apparently my comment below was not quite accurate? Testing with PyGI and reading the GLocalFileMonitor code, it appears the coalescing only applies to CHANGED events, which get rate-limited. You also get uncoalesced CREATED, DELETED and ATTRIBUTE_CHANGED events. A CHANGES_DONE_HINT is emitted whenever a file is closed after writing, a file is CREATED or two seconds after the last CHANGED without a close. All this applies for every file individually, even if you monitor a directory. I think the commit should be reverted. In addition, tif another event is received during the FcInitReinitialize, the timeout should be restarted. If the update takes a long time, this helps waiting for the directories to settle before notifying the applications. In addition, Gtk and Clutter should avoid triggering a cache update after they get the timestamp update from GSD. Apparently this needs new API in fontconfig.
(In reply to Jan Alexander Steffens (heftig) from comment #7) > (In reply to Behdad Esfahbod from comment #6) > > Apparently my comment below was not quite accurate? > > Testing with PyGI and reading the GLocalFileMonitor code, it appears the > coalescing only applies to CHANGED events, which get rate-limited. You also > get uncoalesced CREATED, DELETED and ATTRIBUTE_CHANGED events. A > CHANGES_DONE_HINT is emitted whenever a file is closed after writing, a file > is CREATED or two seconds after the last CHANGED without a close. All this > applies for every file individually, even if you monitor a directory. > > I think the commit should be reverted. In addition, tif another event is > received during the FcInitReinitialize, the timeout should be restarted. If > the update takes a long time, this helps waiting for the directories to > settle before notifying the applications. Agree with revert... > In addition, Gtk and Clutter should avoid triggering a cache update after > they get the timestamp update from GSD. Apparently this needs new API in > fontconfig. I think it's doable by calling FcConfigSetRescanInterval(..., 0) from Gtk+ / Pango somehow after receiving the notification and before recreating the pango fontmap...
(In reply to Behdad Esfahbod from comment #6) > Do we know if the storm was happening with the fix for bug 773556 or without? Yes, as I said in comment #3 I did try this patch. The experiments with GFileMonitor I just did (comment #7) also show that the kind of coalescing that happens is not the one we need and there are events other than CREATED and CHANGED that can trigger a premature update. > I'm not sure about the supposed "waiting" of apps while other apps are updating the cache. When it wants to update a cache file for a directory, fontconfig locks the cache file using fcntl F_SETLKW, blocking if the lock is already taken (FcDirCacheLock). The lock is taken before the directory is scanned and released afterwards (FcDirCacheRescan).
(In reply to Jan Alexander Steffens (heftig) from comment #9) > (In reply to Behdad Esfahbod from comment #6) > > Do we know if the storm was happening with the fix for bug 773556 or without? > > Yes, as I said in comment #3 I did try this patch. The experiments with > GFileMonitor I just did (comment #7) also show that the kind of coalescing > that happens is not the one we need and there are events other than CREATED > and CHANGED that can trigger a premature update. > > > I'm not sure about the supposed "waiting" of apps while other apps are updating the cache. > > When it wants to update a cache file for a directory, fontconfig locks the > cache file using fcntl F_SETLKW, blocking if the lock is already taken > (FcDirCacheLock). The lock is taken before the directory is scanned and > released afterwards (FcDirCacheRescan). Ouch. I don't know if the blocking was intended... Definitely not what I thought we are doing... I don't think we should block.
Yeah, all of this is 2015 code. We didn't have it before...
Jan, given that you have a good setup, can you propose a revert of the timer and adjust it as you see fit, with filtering events, etc? I'll let Akira comment re the blocking lock. I don't think it should be part of fontconfig.
Sure, I'll give it a shot. I'm tempted to do the fontconfig update using g_task_run_in_thread, but I don't know if this is safe. My gut says yes, since fontconfig apparently creates a new config first and then replaces the current config with the new one.
Async update with "cancelling" seems to work, as far as test-fontconfig-monitor can say. Now I still need to check if the patches actually improve the situation when applied to the real running GSD. Do you prefer piecemeal patches (reverts + refactoring + changes) or a single squashed patch?
Created attachment 356817 [details] [review] xsettings: Delay fontconfig update until changes have settled Wait for the changes to the font directories to settle before starting the fontconfig update. In addition, make the update asynchronous and restart it if more changes arrive while the update runs, delaying the notify further. Try to avoid sending out a premature signal to Gtk and Clutter, which would then detect the additional changes and proceed with blocking updates themselves. This would even freeze the Shell. Revert commits f1b8257a and 578524da, which were added in the mistaken belief that GLocalFileMonitor already delays and coalesces change events. Use a GObject for the monitor handle in order to fix a leak (fontconfig_monitor_stop did not free the slice) and use refcounting to keep the handle alive over the async operation.
Thanks Jan. Any particular reason you renamed the file and API? Makes code review harder. Also, I do maintain fontconfig-monitor as a separate repo: http://github.com/behdad/fontconfig-monitor Maybe we can pull it in using git submodules, but maybe that's overkill. I'm thinking about moving it into fontconfig tree itself...
Ah, sorry. I renamed it because the GObject macros want the PREFIX_TYPENAME schema and my OCD abhors inconsistency. Do you want it adjusted again?
I'd be ok if you rename it to FcMonitor fc-monitor. We can put it in fontconfig itself later. Looks great otherwise!
fontconfig-monitor.c:125:52: warning: unused parameter ‘self’ [-Wunused-parameter] gsd_fontconfig_monitor_init (GsdFontconfigMonitor *self) ^ Mark it G_UNUSED?
Can you also add a --verbose to main() to print the debug messages?
Created attachment 356841 [details] [review] xsettings: Delay fontconfig update until changes have settled (v2) - Rename GsdFontconfigMonitor to FcMonitor - Mark fc_monitor_init argument unused - Don't restart monitors if we were stopped during the async update
You can print the debug messages via env G_MESSAGES_DEBUG=all . Should I still add the option parsing?
(In reply to Jan Alexander Steffens (heftig) from comment #22) > You can print the debug messages via env G_MESSAGES_DEBUG=all . Should I > still add the option parsing? Thanks. I'll add later.
Bastien, this looks good to me.
(In reply to Behdad Esfahbod from comment #24) > Bastien, this looks good to me. I don't maintain this code anymore, sorry.
Review of attachment 356841 [details] [review]: looks good to me too
Humm. the patch doesn't seem to contain fc-monitor.h changes. Can you attach a new one? I'll commit after. Thanks.
Ah, nevermind. Figured it out.
Pushed. Let's see if I broke the build... Marking fixed. I'm also working on fontconfig upstream to make scanning 10 times faster as well as fixing the cache lock: https://lists.freedesktop.org/archives/fontconfig/2017-August/005986.html Between this change and those, we should be in really good shape.