Bug 792235 – High CPU load after mounts changes

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 792235 - High CPU load after mounts changes


Summary:	High CPU load after mounts changes


Status:	RESOLVED FIXED

Product:	gnome-settings-daemon
Classification:	Core
Component:	housekeeping
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Richard Hughes
QA Contact:	gnome-settings-daemon-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2018-01-05 10:13 UTC by Ondrej Holy
Modified:	2018-02-01 08:20 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
housekeeping: Do nothing if mounts haven't changed since last time (3.22 KB, patch) 2018-01-05 10:15 UTC, Ondrej Holy	rejected	Details \| Review
housekeeping: Rate limit mounts_changed signal (3.08 KB, patch) 2018-01-05 10:15 UTC, Ondrej Holy	none	Details \| Review
housekeeping: Do not use g_list_length (1.37 KB, patch) 2018-01-05 10:15 UTC, Ondrej Holy	committed	Details \| Review
housekeeping: Rate limit mounts_changed signal (3.26 KB, patch) 2018-01-05 15:05 UTC, Ondrej Holy	reviewed	Details \| Review

Description Ondrej Holy 2018-01-05 10:13:51 UTC

If I cause 1000 mount_changes within 10 seconds, gsd-housekeeping is using more than 10% of CPU almost whole minute (even though that all those new mounts are ignored and even though that no other changes happening for rest of time). This can easily happen in enterprise environments, where autofs is heavily used. CPU usage with the following patches is less than 1%...

10% is not that bad, but this begins to be a problem when other GUnixMountsMonitor consumers (e.g. gvfsd-trash, gvfs-udisks2-volume-monitor...) do the same. I am preparing similar fixes also for them...

See:
https://bugzilla.redhat.com/show_bug.cgi?id=1154183

Comment 1 Ondrej Holy 2018-01-05 10:15:05 UTC

Created attachment 366366 [details] [review]
housekeeping: Do nothing if mounts haven't changed since last time

The housekeeping daemon can cause high CPU load for a quite long time
in situations when it recieved a big amount of mount_changed signals
within a short time. This can easily happen in enterprise environments,
where nfs and bind mounts are often used together with autofs. Use
g_unix_mounts_changed_since to check whether the mounts changed since
the last check. This prevents high CPU load after the last real change,
because the processing is slow and the daemon is not able to handle the
signals in real time.

Comment 2 Ondrej Holy 2018-01-05 10:15:10 UTC

Created attachment 366367 [details] [review]
housekeeping: Rate limit mounts_changed signal

The housekeeping daemon can cause high CPU load in situations when it
receives a big amount of mounts_changed signals within a short time.
The processing is slow and the daemon is not able to handle the signals
in real time. Let's limit the number of checks per time to reduce CPU
load.

Comment 3 Ondrej Holy 2018-01-05 10:15:16 UTC

Created attachment 366368 [details] [review]
housekeeping: Do not use g_list_length

Do not use g_list_length when the number of elements can be easily
counted in the previous for-cycle.

Comment 4 Ondrej Holy 2018-01-05 10:16:54 UTC

Philip, don't you have any comments to this?

Comment 5 Philip Withnall 2018-01-05 12:42:37 UTC

Review of attachment 366366 [details] [review]:

Looks OK to me. The old code would never have worked, because time_read was always NULL.

Comment 6 Philip Withnall 2018-01-05 12:47:33 UTC

Review of attachment 366367 [details] [review]:

::: plugins/housekeeping/gsd-disk-space.c
@@ +926,3 @@
+        else if (time_read > 0)
+                ldsm_mounts_changed_id = g_timeout_add_seconds (MOUNTS_CHANGED_RATE_SECONDS,
+                                                                ldsm_mounts_changed_cb, NULL);

Surely if you’re going to defer the work to ldsm_mounts_changed_cb(), you should return early here? Otherwise you call ldsm_check_all_mounts() once now *and* once in the timeout callback.

(I assume ldsm_check_all_mounts() is what’s being slow?)

Comment 7 Philip Withnall 2018-01-05 12:50:50 UTC

Review of attachment 366368 [details] [review]:

Looks OK. It might be possible to merge the second loop (the one after `multiple_volumes` is set) into the first, since they seem to be independent.

Comment 8 Philip Withnall 2018-01-05 12:51:33 UTC

(In reply to Ondrej Holy from comment #4)
> Philip, don't you have any comments to this?

Aside from the trivial third patch, I can’t ack the other two, since I’m not a g-s-d maintainer (and don’t want to be!). I’ve left some review comments though.

Comment 9 Ondrej Holy 2018-01-05 15:03:15 UTC

Philip, thanks, sure, let's wait for g-s-d maint, I just wanted to be sure that you don't have a better idea what to do, e.g. on the GLib side...

(In reply to Philip Withnall from comment #6)
> Review of attachment 366367 [details] [review] [review]:
> 
> ::: plugins/housekeeping/gsd-disk-space.c
> @@ +926,3 @@
> +        else if (time_read > 0)
> +                ldsm_mounts_changed_id = g_timeout_add_seconds
> (MOUNTS_CHANGED_RATE_SECONDS,
> +                                                               
> ldsm_mounts_changed_cb, NULL);
> 
> Surely if you’re going to defer the work to ldsm_mounts_changed_cb(), you
> should return early here? Otherwise you call ldsm_check_all_mounts() once
> now *and* once in the timeout callback.

I want to call it now and the callback will return on g_unix_mounts_changed_since check, though it would be safer to not rely on it...

> (I assume ldsm_check_all_mounts() is what’s being slow?)

Yep

Comment 10 Ondrej Holy 2018-01-05 15:05:07 UTC

Created attachment 366384 [details] [review]
housekeeping: Rate limit mounts_changed signal

Let's add pending flag and do not rely on g_unix_mounts_changed_since check for sure...

Comment 11 Philip Withnall 2018-01-08 10:53:52 UTC

Review of attachment 366384 [details] [review]:

::: plugins/housekeeping/gsd-disk-space.c
@@ +931,3 @@
+
+        ldsm_mounts_changed_id = g_timeout_add_seconds (MOUNTS_CHANGED_RATE_SECONDS,
+                                                        ldsm_mounts_changed_cb, NULL);

Isn’t this going to result in an infinite loop of 5s timeout callbacks?  Each time ldsm_mounts_changed() is called by ldsm_mounts_changed_cb(), we’ll have `ldsm_mounts_changed_id == 0 && !ldsm_mounts_changed_pending`, which are the conditions for another timeout to be scheduled.

Comment 12 Ondrej Holy 2018-01-08 12:04:15 UTC

It is expected that another timeout is scheduled in this case, but ldsm_mounts_changed won't be called again from ldsm_mounts_changed_cb if !ldsm_mounts_changed_pending...

Comment 13 Ondrej Holy 2018-01-15 13:44:05 UTC

Review of attachment 366366 [details] [review]:

Hmm, after some tests I've realized that this is not actually needed for master. It seems that current monitor implementations do not flood the program by the big amount of signals like in RHEL 6 where mount_changed signal seems strickly emitted for each change of mtab file... however timestamp returned in RHEL 6 has precision by seconds, which is not enough and we can miss some changes due to it, so this can't be used for older versions anyway.

Comment 14 Philip Withnall 2018-01-15 13:50:08 UTC

(In reply to Ondrej Holy from comment #13)
> Review of attachment 366366 [details] [review] [review]:
> 
> timestamp returned in RHEL 6 has precision by seconds, which is not enough
> and we can miss some changes due to it, so this can't be used for older
> versions anyway.

Is it safe to use on master? What’s the precision there?

Comment 15 Ondrej Holy 2018-01-15 15:59:51 UTC

I suppose that this should be safe on master. The timestamp seems has microsec/nanosec precision there depends on concrete implementation, but maybe it also depends on concrete kernel/filesystem... :-(

Finally, I have found what is the difference between RHEL 6 and master (i.e. Fedora) and actually, the attachment 366366 [details] [review] still makes sense, when g_file_monitor is used to generate "mounts_changed" (i.e. when /etc/mtab is used). However, /proc/? are used on almost all modern distributions and GIOChannel is used for monitoring, which has the nice side-effect, that no "mounts_changed" events are generated if the application is busy...

It probably makes sense to emit all events for file changes when we are busy, respectively it can be also probably reduced to just one change of each type, but it doesn't make sense to emit "mounts_changed" for each file change since the "mount_changed" signal doesn't carry any additional info like list of mounts, or what was added/removed... 

g_idle_add seems like a possible alternative to skip the accumulated signals for apps where rate limiting using g_timeout_add_seconds isn't really suitable... but it would be best to fix it in GLib directly and not in each app separately. Or is there a better way to merge/skip all the accumulated signals of a certain type? I am not really an expert on main loop and signals... :-/

Maybe GIOChannel could be also used to monitor /etc/mtab file if used, but not sure that it would be backportable to RHEL 6 though...

Comment 16 Philip Withnall 2018-01-16 16:11:55 UTC

(In reply to Ondrej Holy from comment #15)
> Or is there a better way to merge/skip all the accumulated
> signals of a certain type? I am not really an expert on main loop and
> signals... :-/

Nope, you have to do it manually, because GLib can’t know about the semantics of the signals to be able to merge/squash them.

> Maybe GIOChannel could be also used to monitor /etc/mtab file if used, but
> not sure that it would be backportable to RHEL 6 though...

Would that work? We can use GIOChannel to monitor /proc because the files there are marked as POLLPRI whenever they change. I’m not sure you can do that for a normal file (/etc/mtab).

Comment 17 Ondrej Holy 2018-01-29 15:00:24 UTC

See Bug(In reply to Philip Withnall from comment #16)
> (In reply to Ondrej Holy from comment #15)
> > Or is there a better way to merge/skip all the accumulated
> > signals of a certain type? I am not really an expert on main loop and
> > signals... :-/
> 
> Nope, you have to do it manually, because GLib can’t know about the
> semantics of the signals to be able to merge/squash them.

See Bug 793006 as an alternative for attachment 366366 [details] [review].

> > Maybe GIOChannel could be also used to monitor /etc/mtab file if used, but
> > not sure that it would be backportable to RHEL 6 though...
> 
> Would that work? We can use GIOChannel to monitor /proc because the files
> there are marked as POLLPRI whenever they change. I’m not sure you can do
> that for a normal file (/etc/mtab).

Then probably not...

Comment 18 Philip Withnall 2018-01-31 18:36:26 UTC

OK, but #793006 has been merged (thanks for that). The g_list_length() change here is still to be committed, and then I think this bug can be closed?

Comment 19 Philip Withnall 2018-01-31 18:39:13 UTC

*bug #793006

Comment 20 Ondrej Holy 2018-02-01 08:11:47 UTC

Comment on attachment 366368 [details] [review]
housekeeping: Do not use g_list_length

Attachment 366368 [details] pushed as b004381 - housekeeping: Do not use g_list_length

Comment 21 Ondrej Holy 2018-02-01 08:19:50 UTC

Actually, attachment 366384 [details] [review] is still useful to reduce the CPU load. gsd-housekeeping isn't anything critical, so even bigger timeout could be used, but bug 793006 was crucial... so let's close it for now and will reopen if needed.

Comment 22 Ondrej Holy 2018-02-01 08:20:57 UTC

Philip, thanks for the assistance!