After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 777255 - fontconfig update racy, causes update storm that freezes the desktop
fontconfig update racy, causes update storm that freezes the desktop
Status: RESOLVED FIXED
Product: gnome-settings-daemon
Classification: Core
Component: xsettings
3.22.x
Other Linux
: Normal normal
: ---
Assigned To: gnome-settings-daemon-maint
gnome-settings-daemon-maint
Depends on:
Blocks:
 
 
Reported: 2017-01-14 20:10 UTC by Jan Alexander Steffens (heftig)
Modified: 2017-08-08 17:16 UTC
See Also:
GNOME target: ---
GNOME version: 3.21/3.22


Attachments
xsettings: Delay fontconfig update until changes have settled (22.86 KB, patch)
2017-08-03 06:48 UTC, Jan Alexander Steffens (heftig)
none Details | Review
xsettings: Delay fontconfig update until changes have settled (v2) (22.14 KB, patch)
2017-08-03 11:22 UTC, Jan Alexander Steffens (heftig)
accepted-commit_now Details | Review

Description Jan Alexander Steffens (heftig) 2017-01-14 20:10:49 UTC
I'm trying to debug my desktop freezing often when I update fonts, and I think I managed to reduce it to this:

1. Something changes ~/.local/share/fonts.

2. GSD notices and reinitializes fontconfig, which updates the cache.

3. GSD sends out an XSettings update.

2. Something changes ~/.local/share/fonts, again.

3. All other Gtk processes receive the XSettings update and reinitialize fontconfig.  This includes the shell.

4. The processes all discover that ~/.local/share/fonts has changed and attempt to update the cache. This needs a lock on the cache.

5. The processes wait on the lock, one by one rescanning the fonts in the dir.

Since the shell takes part, the desktop is frozen until the shell has finished its rebuild. If the dir contains a lot of fonts and is slow to scan, the desktop can be frozen for quite a while.

Running "touch abc; sleep 0.1; rm abc" in ~/.local/share/fonts quite reliably freezes the shell for a second. Doing this in /usr/share/fonts/noto/ (which contains the entire Noto set including CJK) freezes the desktop for over two and a half minutes.

Possibly related to bug #768717

I believe this is a regression from the removal of the timeout in bug #748776, or at least the timeout made this a lot unlikelier.
Comment 1 Jan Alexander Steffens (heftig) 2017-01-14 20:38:29 UTC
CC'ing Behdad in the hopes he takes a glance.
Comment 2 Bastien Nocera 2017-01-15 12:27:57 UTC
Likely a dupe of bug 773556, but there's no versioning information in the bug report to assert whether or not the patch is there.

Please test that patch.
Comment 3 Jan Alexander Steffens (heftig) 2017-01-15 17:36:05 UTC
Ah, sorry; g-s-d 3.22.1, GTK 3.22.6, fontconfig 2.12.1.

Bug #768717 sounds like a dupe of bug #773556.

Adding commit f1b8257 onto g-s-d 3.22.1 doesn't help.
Comment 4 Jan Alexander Steffens (heftig) 2017-03-07 08:13:12 UTC
Maybe fontconfig can at least mitigate the problem; posted another bug at https://bugs.freedesktop.org/show_bug.cgi?id=100096
Comment 5 Akira TAGOH 2017-07-30 09:39:07 UTC
Maybe disabling the updates of caches on just-in-use case may be needed and laeve updating caches to applications. so they can update anytime when they want to do.
Comment 6 Behdad Esfahbod 2017-08-01 09:58:46 UTC
Ok let's get to the bottom of this.

Originally there was a 2 second quiet period before gsd would update fontconfig cache and message apps.  I removed that in 2015:

  https://bugzilla.gnome.org/show_bug.cgi?id=748776

Apparently my comment below was not quite accurate?
"""
commit 578524da528e554fddfa6cf070554984c0ebfcfd
Author: Behdad Esfahbod <behdad@behdad.org>
Date:   Sun May 3 10:37:17 2015 -0400

    xsettings: Remove update timeout
    
    GIO's GFileMonitor already delays directory monitor notifications, so
    directory-level changes will already be coalesced even if there's a
    lot of changes happening.
    
    Monitoring files (which happens for the XML config files) are signalled
    immediately now.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=748776
"""

supposedly that was fixed in https://bugzilla.gnome.org/show_bug.cgi?id=773556

Do we know if the storm was happening with the fix for bug 773556 or without?  Is the wait for CHANGES_DONE good enough, or should we allow for more quiet period checking?

I'm not sure about the supposed "waiting" of apps while other apps are updating the cache.  I don't think there's any code to wait.  It might just be that the CPU is taken?  From what I understand, each process will try updating the cache, builds a new one in a temp file, and tries to move it to the final destination.  if that fails, it just moves on.

I'm going to follow up with my plan in:

  https://bugs.freedesktop.org/show_bug.cgi?id=64766#c47

to make scanning 10x faster.  That should help a lot...
Comment 7 Jan Alexander Steffens (heftig) 2017-08-01 10:54:58 UTC
(In reply to Behdad Esfahbod from comment #6)
> Apparently my comment below was not quite accurate?

Testing with PyGI and reading the GLocalFileMonitor code, it appears the coalescing only applies to CHANGED events, which get rate-limited. You also get uncoalesced CREATED, DELETED and ATTRIBUTE_CHANGED events. A CHANGES_DONE_HINT is emitted whenever a file is closed after writing, a file is CREATED or two seconds after the last CHANGED without a close. All this applies for every file individually, even if you monitor a directory.

I think the commit should be reverted. In addition, tif another event is received during the FcInitReinitialize, the timeout should be restarted. If the update takes a long time, this helps waiting for the directories to settle before notifying the applications.

In addition, Gtk and Clutter should avoid triggering a cache update after they get the timestamp update from GSD. Apparently this needs new API in fontconfig.
Comment 8 Behdad Esfahbod 2017-08-01 11:04:43 UTC
(In reply to Jan Alexander Steffens (heftig) from comment #7)
> (In reply to Behdad Esfahbod from comment #6)
> > Apparently my comment below was not quite accurate?
> 
> Testing with PyGI and reading the GLocalFileMonitor code, it appears the
> coalescing only applies to CHANGED events, which get rate-limited. You also
> get uncoalesced CREATED, DELETED and ATTRIBUTE_CHANGED events. A
> CHANGES_DONE_HINT is emitted whenever a file is closed after writing, a file
> is CREATED or two seconds after the last CHANGED without a close. All this
> applies for every file individually, even if you monitor a directory.
> 
> I think the commit should be reverted. In addition, tif another event is
> received during the FcInitReinitialize, the timeout should be restarted. If
> the update takes a long time, this helps waiting for the directories to
> settle before notifying the applications.

Agree with revert...


> In addition, Gtk and Clutter should avoid triggering a cache update after
> they get the timestamp update from GSD. Apparently this needs new API in
> fontconfig.

I think it's doable by calling FcConfigSetRescanInterval(..., 0) from Gtk+ / Pango somehow after receiving the notification and before recreating the pango fontmap...
Comment 9 Jan Alexander Steffens (heftig) 2017-08-01 11:05:32 UTC
(In reply to Behdad Esfahbod from comment #6)
> Do we know if the storm was happening with the fix for bug 773556 or without?

Yes, as I said in comment #3 I did try this patch. The experiments with GFileMonitor I just did (comment #7) also show that the kind of coalescing that happens is not the one we need and there are events other than CREATED and CHANGED that can trigger a premature update.

> I'm not sure about the supposed "waiting" of apps while other apps are updating the cache.

When it wants to update a cache file for a directory, fontconfig locks the cache file using fcntl F_SETLKW, blocking if the lock is already taken (FcDirCacheLock). The lock is taken before the directory is scanned and released afterwards (FcDirCacheRescan).
Comment 10 Behdad Esfahbod 2017-08-01 11:07:56 UTC
(In reply to Jan Alexander Steffens (heftig) from comment #9)
> (In reply to Behdad Esfahbod from comment #6)
> > Do we know if the storm was happening with the fix for bug 773556 or without?
> 
> Yes, as I said in comment #3 I did try this patch. The experiments with
> GFileMonitor I just did (comment #7) also show that the kind of coalescing
> that happens is not the one we need and there are events other than CREATED
> and CHANGED that can trigger a premature update.
> 
> > I'm not sure about the supposed "waiting" of apps while other apps are updating the cache.
> 
> When it wants to update a cache file for a directory, fontconfig locks the
> cache file using fcntl F_SETLKW, blocking if the lock is already taken
> (FcDirCacheLock). The lock is taken before the directory is scanned and
> released afterwards (FcDirCacheRescan).

Ouch.  I don't know if the blocking was intended... Definitely not what I thought we are doing...  I don't think we should block.
Comment 11 Behdad Esfahbod 2017-08-01 11:10:53 UTC
Yeah, all of this is 2015 code.  We didn't have it before...
Comment 12 Behdad Esfahbod 2017-08-01 11:26:00 UTC
Jan, given that you have a good setup, can you propose a revert of the timer and adjust it as you see fit, with filtering events, etc?

I'll let Akira comment re the blocking lock.  I don't think it should be part of fontconfig.
Comment 13 Jan Alexander Steffens (heftig) 2017-08-01 12:09:29 UTC
Sure, I'll give it a shot.

I'm tempted to do the fontconfig update using g_task_run_in_thread, but I don't know if this is safe. My gut says yes, since fontconfig apparently creates a new config first and then replaces the current config with the new one.
Comment 14 Jan Alexander Steffens (heftig) 2017-08-02 14:34:56 UTC
Async update with "cancelling" seems to work, as far as test-fontconfig-monitor can say. Now I still need to check if the patches actually improve the situation when applied to the real running GSD.

Do you prefer piecemeal patches (reverts + refactoring + changes) or a single squashed patch?
Comment 15 Jan Alexander Steffens (heftig) 2017-08-03 06:48:38 UTC
Created attachment 356817 [details] [review]
xsettings: Delay fontconfig update until changes have settled

Wait for the changes to the font directories to settle before starting
the fontconfig update. In addition, make the update asynchronous and
restart it if more changes arrive while the update runs, delaying the
notify further.

Try to avoid sending out a premature signal to Gtk and Clutter, which
would then detect the additional changes and proceed with blocking
updates themselves. This would even freeze the Shell.

Revert commits f1b8257a and 578524da, which were added in the mistaken
belief that GLocalFileMonitor already delays and coalesces change
events.

Use a GObject for the monitor handle in order to fix a leak
(fontconfig_monitor_stop did not free the slice) and use refcounting to
keep the handle alive over the async operation.
Comment 16 Behdad Esfahbod 2017-08-03 10:35:45 UTC
Thanks Jan.

Any particular reason you renamed the file and API?  Makes code review harder.  Also, I do maintain fontconfig-monitor as a separate repo:

  http://github.com/behdad/fontconfig-monitor

Maybe we can pull it in using git submodules, but maybe that's overkill.  I'm thinking about moving it into fontconfig tree itself...
Comment 17 Jan Alexander Steffens (heftig) 2017-08-03 10:42:58 UTC
Ah, sorry. I renamed it because the GObject macros want the PREFIX_TYPENAME schema and my OCD abhors inconsistency. Do you want it adjusted again?
Comment 18 Behdad Esfahbod 2017-08-03 10:45:25 UTC
I'd be ok if you rename it to FcMonitor fc-monitor.  We can put it in fontconfig itself later.

Looks great otherwise!
Comment 19 Behdad Esfahbod 2017-08-03 10:45:56 UTC
fontconfig-monitor.c:125:52: warning: unused parameter ‘self’ [-Wunused-parameter]
 gsd_fontconfig_monitor_init (GsdFontconfigMonitor *self)
                                                    ^

Mark it G_UNUSED?
Comment 20 Behdad Esfahbod 2017-08-03 10:47:36 UTC
Can you also add a --verbose to main() to print the debug messages?
Comment 21 Jan Alexander Steffens (heftig) 2017-08-03 11:22:51 UTC
Created attachment 356841 [details] [review]
xsettings: Delay fontconfig update until changes have settled (v2)

- Rename GsdFontconfigMonitor to FcMonitor
- Mark fc_monitor_init argument unused
- Don't restart monitors if we were stopped during the async update
Comment 22 Jan Alexander Steffens (heftig) 2017-08-03 11:25:09 UTC
You can print the debug messages via env G_MESSAGES_DEBUG=all . Should I still add the option parsing?
Comment 23 Behdad Esfahbod 2017-08-03 11:25:59 UTC
(In reply to Jan Alexander Steffens (heftig) from comment #22)
> You can print the debug messages via env G_MESSAGES_DEBUG=all . Should I
> still add the option parsing?

Thanks.  I'll add later.
Comment 24 Behdad Esfahbod 2017-08-03 11:26:41 UTC
Bastien, this looks good to me.
Comment 25 Bastien Nocera 2017-08-08 13:53:57 UTC
(In reply to Behdad Esfahbod from comment #24)
> Bastien, this looks good to me.

I don't maintain this code anymore, sorry.
Comment 26 Rui Matos 2017-08-08 13:59:51 UTC
Review of attachment 356841 [details] [review]:

looks good to me too
Comment 27 Behdad Esfahbod 2017-08-08 17:12:21 UTC
Humm.  the patch doesn't seem to contain fc-monitor.h changes.  Can you attach a new one?  I'll commit after.  Thanks.
Comment 28 Behdad Esfahbod 2017-08-08 17:12:36 UTC
Ah, nevermind.  Figured it out.
Comment 29 Behdad Esfahbod 2017-08-08 17:16:55 UTC
Pushed.  Let's see if I broke the build...
Marking fixed.  I'm also working on fontconfig upstream to make scanning 10 times faster as well as fixing the cache lock:

  https://lists.freedesktop.org/archives/fontconfig/2017-August/005986.html

Between this change and those, we should be in really good shape.