GNOME Bugzilla – Bug 792235
High CPU load after mounts changes
Last modified: 2018-02-01 08:20:57 UTC
If I cause 1000 mount_changes within 10 seconds, gsd-housekeeping is using more than 10% of CPU almost whole minute (even though that all those new mounts are ignored and even though that no other changes happening for rest of time). This can easily happen in enterprise environments, where autofs is heavily used. CPU usage with the following patches is less than 1%... 10% is not that bad, but this begins to be a problem when other GUnixMountsMonitor consumers (e.g. gvfsd-trash, gvfs-udisks2-volume-monitor...) do the same. I am preparing similar fixes also for them... See: https://bugzilla.redhat.com/show_bug.cgi?id=1154183
Created attachment 366366 [details] [review] housekeeping: Do nothing if mounts haven't changed since last time The housekeeping daemon can cause high CPU load for a quite long time in situations when it recieved a big amount of mount_changed signals within a short time. This can easily happen in enterprise environments, where nfs and bind mounts are often used together with autofs. Use g_unix_mounts_changed_since to check whether the mounts changed since the last check. This prevents high CPU load after the last real change, because the processing is slow and the daemon is not able to handle the signals in real time.
Created attachment 366367 [details] [review] housekeeping: Rate limit mounts_changed signal The housekeeping daemon can cause high CPU load in situations when it receives a big amount of mounts_changed signals within a short time. The processing is slow and the daemon is not able to handle the signals in real time. Let's limit the number of checks per time to reduce CPU load.
Created attachment 366368 [details] [review] housekeeping: Do not use g_list_length Do not use g_list_length when the number of elements can be easily counted in the previous for-cycle.
Philip, don't you have any comments to this?
Review of attachment 366366 [details] [review]: Looks OK to me. The old code would never have worked, because time_read was always NULL.
Review of attachment 366367 [details] [review]: ::: plugins/housekeeping/gsd-disk-space.c @@ +926,3 @@ + else if (time_read > 0) + ldsm_mounts_changed_id = g_timeout_add_seconds (MOUNTS_CHANGED_RATE_SECONDS, + ldsm_mounts_changed_cb, NULL); Surely if you’re going to defer the work to ldsm_mounts_changed_cb(), you should return early here? Otherwise you call ldsm_check_all_mounts() once now *and* once in the timeout callback. (I assume ldsm_check_all_mounts() is what’s being slow?)
Review of attachment 366368 [details] [review]: Looks OK. It might be possible to merge the second loop (the one after `multiple_volumes` is set) into the first, since they seem to be independent.
(In reply to Ondrej Holy from comment #4) > Philip, don't you have any comments to this? Aside from the trivial third patch, I can’t ack the other two, since I’m not a g-s-d maintainer (and don’t want to be!). I’ve left some review comments though.
Philip, thanks, sure, let's wait for g-s-d maint, I just wanted to be sure that you don't have a better idea what to do, e.g. on the GLib side... (In reply to Philip Withnall from comment #6) > Review of attachment 366367 [details] [review] [review]: > > ::: plugins/housekeeping/gsd-disk-space.c > @@ +926,3 @@ > + else if (time_read > 0) > + ldsm_mounts_changed_id = g_timeout_add_seconds > (MOUNTS_CHANGED_RATE_SECONDS, > + > ldsm_mounts_changed_cb, NULL); > > Surely if you’re going to defer the work to ldsm_mounts_changed_cb(), you > should return early here? Otherwise you call ldsm_check_all_mounts() once > now *and* once in the timeout callback. I want to call it now and the callback will return on g_unix_mounts_changed_since check, though it would be safer to not rely on it... > (I assume ldsm_check_all_mounts() is what’s being slow?) Yep
Created attachment 366384 [details] [review] housekeeping: Rate limit mounts_changed signal Let's add pending flag and do not rely on g_unix_mounts_changed_since check for sure...
Review of attachment 366384 [details] [review]: ::: plugins/housekeeping/gsd-disk-space.c @@ +931,3 @@ + + ldsm_mounts_changed_id = g_timeout_add_seconds (MOUNTS_CHANGED_RATE_SECONDS, + ldsm_mounts_changed_cb, NULL); Isn’t this going to result in an infinite loop of 5s timeout callbacks? Each time ldsm_mounts_changed() is called by ldsm_mounts_changed_cb(), we’ll have `ldsm_mounts_changed_id == 0 && !ldsm_mounts_changed_pending`, which are the conditions for another timeout to be scheduled.
It is expected that another timeout is scheduled in this case, but ldsm_mounts_changed won't be called again from ldsm_mounts_changed_cb if !ldsm_mounts_changed_pending...
Review of attachment 366366 [details] [review]: Hmm, after some tests I've realized that this is not actually needed for master. It seems that current monitor implementations do not flood the program by the big amount of signals like in RHEL 6 where mount_changed signal seems strickly emitted for each change of mtab file... however timestamp returned in RHEL 6 has precision by seconds, which is not enough and we can miss some changes due to it, so this can't be used for older versions anyway.
(In reply to Ondrej Holy from comment #13) > Review of attachment 366366 [details] [review] [review]: > > timestamp returned in RHEL 6 has precision by seconds, which is not enough > and we can miss some changes due to it, so this can't be used for older > versions anyway. Is it safe to use on master? What’s the precision there?
I suppose that this should be safe on master. The timestamp seems has microsec/nanosec precision there depends on concrete implementation, but maybe it also depends on concrete kernel/filesystem... :-( Finally, I have found what is the difference between RHEL 6 and master (i.e. Fedora) and actually, the attachment 366366 [details] [review] still makes sense, when g_file_monitor is used to generate "mounts_changed" (i.e. when /etc/mtab is used). However, /proc/? are used on almost all modern distributions and GIOChannel is used for monitoring, which has the nice side-effect, that no "mounts_changed" events are generated if the application is busy... It probably makes sense to emit all events for file changes when we are busy, respectively it can be also probably reduced to just one change of each type, but it doesn't make sense to emit "mounts_changed" for each file change since the "mount_changed" signal doesn't carry any additional info like list of mounts, or what was added/removed... g_idle_add seems like a possible alternative to skip the accumulated signals for apps where rate limiting using g_timeout_add_seconds isn't really suitable... but it would be best to fix it in GLib directly and not in each app separately. Or is there a better way to merge/skip all the accumulated signals of a certain type? I am not really an expert on main loop and signals... :-/ Maybe GIOChannel could be also used to monitor /etc/mtab file if used, but not sure that it would be backportable to RHEL 6 though...
(In reply to Ondrej Holy from comment #15) > Or is there a better way to merge/skip all the accumulated > signals of a certain type? I am not really an expert on main loop and > signals... :-/ Nope, you have to do it manually, because GLib can’t know about the semantics of the signals to be able to merge/squash them. > Maybe GIOChannel could be also used to monitor /etc/mtab file if used, but > not sure that it would be backportable to RHEL 6 though... Would that work? We can use GIOChannel to monitor /proc because the files there are marked as POLLPRI whenever they change. I’m not sure you can do that for a normal file (/etc/mtab).
See Bug(In reply to Philip Withnall from comment #16) > (In reply to Ondrej Holy from comment #15) > > Or is there a better way to merge/skip all the accumulated > > signals of a certain type? I am not really an expert on main loop and > > signals... :-/ > > Nope, you have to do it manually, because GLib can’t know about the > semantics of the signals to be able to merge/squash them. See Bug 793006 as an alternative for attachment 366366 [details] [review]. > > Maybe GIOChannel could be also used to monitor /etc/mtab file if used, but > > not sure that it would be backportable to RHEL 6 though... > > Would that work? We can use GIOChannel to monitor /proc because the files > there are marked as POLLPRI whenever they change. I’m not sure you can do > that for a normal file (/etc/mtab). Then probably not...
OK, but #793006 has been merged (thanks for that). The g_list_length() change here is still to be committed, and then I think this bug can be closed?
*bug #793006
Comment on attachment 366368 [details] [review] housekeeping: Do not use g_list_length Attachment 366368 [details] pushed as b004381 - housekeeping: Do not use g_list_length
Actually, attachment 366384 [details] [review] is still useful to reduce the CPU load. gsd-housekeeping isn't anything critical, so even bigger timeout could be used, but bug 793006 was crucial... so let's close it for now and will reopen if needed.
Philip, thanks for the assistance!