After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 788922 - Processes % CPU column is inaccurate (only ever an integer multiple of the number of CPUs)
Processes % CPU column is inaccurate (only ever an integer multiple of the nu...
Status: RESOLVED FIXED
Product: system-monitor
Classification: Core
Component: process list
3.26.x
Other Linux
: Normal normal
: ---
Assigned To: System-monitor maintainers
System-monitor maintainers
Depends on:
Blocks:
 
 
Reported: 2017-10-13 09:27 UTC by Daniel van Vugt
Modified: 2017-10-31 18:28 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
0001-Fix-inaccurate-CPU-values-in-the-Processes-table.patch (3.56 KB, patch)
2017-10-27 09:46 UTC, Daniel van Vugt
none Details | Review
Fix-inaccurate-CPU-values-in-the-Processes-table-v2.patch (1.69 KB, patch)
2017-10-27 10:00 UTC, Daniel van Vugt
none Details | Review
Fix-inaccurate-CPU-values-in-the-Processes-table-v3.patch (1.45 KB, patch)
2017-10-27 11:17 UTC, Daniel van Vugt
committed Details | Review

Description Daniel van Vugt 2017-10-13 09:27:23 UTC
Processes % CPU column is inaccurate (only ever an integer multiple of the number of CPUs).

So for example on my desktop with 8 CPU threads, the Processes tab only ever shows a process is using either 0%, 8%, 16%, 24%, 32%...
Comment 1 Benoît Dejean 2017-10-14 15:27:03 UTC
Hello Daniel,

that's what the "Solaris mode" is all about. I'm not sure what the current status.
The point is about displaying an overall % or a per-cpu %.
Comment 2 Daniel van Vugt 2017-10-16 01:35:06 UTC
I understand the difference between overall and per-cpu % but it's still a bug :)

I spent a decade developing monitoring software for Solaris and other Unixes and they never had this kind of precision problem. This seems to be just a simple math error. The precision is first limited to 1.0% and then multiplied by n-cpus afterwards.
Comment 3 Benoît Dejean 2017-10-16 10:50:38 UTC
Which libgtop version are you using ?
Comment 4 Daniel van Vugt 2017-10-17 01:34:49 UTC
ii  libgtop-2.0-11:amd64                       2.38.0-1                                    amd64        gtop system monitoring library (shared)
ii  libgtop2-common                            2.38.0-1                                    all          gtop system monitoring library (common)
Comment 5 Benoît Dejean 2017-10-26 08:43:31 UTC
Can you please give the /proc/PID/stat and /proc/PID/cpu of a process consuming CPU, 2 times, waiting 10s between each collection ?
Comment 6 Daniel van Vugt 2017-10-26 08:55:21 UTC
I'm busy with other work right now. But I'm also not bothered about this bug -- when I have time I'll just jump into the code and find the problem myself.
Comment 7 Benoît Dejean 2017-10-26 16:48:05 UTC
The code is based on /proc/PID/stat sum of 14th and 15th fields.
On each refresh, it computes the difference to see how much was consumed by a process. If you're not using the solaris mode, this difference is multiplied by the number of cores. And then it's a percent of the total cpu time.

So this would mean that between to reads of /proc/PID/stat, there's only a difference of 1, 2, 3, etc ticks...

What kernel or ubuntu version are you running ?
Comment 8 Daniel van Vugt 2017-10-27 09:13:34 UTC
It sounds to me (without looking at the source code) that the mistake is the multiplying by the number of cores. You should never need to do that for any mode.

You don't need to multiply by number of cores if you know the machine's actual total CPU times across all cores (from /proc/stat), and delta that over the same interval. Then "Solaris" mode means to divide (not multiply) that result by NCPUs, as is documented in the top(1) man page.

Normal per-process CPU usage as a percentage (with high fractional precision):

delta(/proc/PID/stat fields 14+15) * 100 / delta(sum first line of /proc/stat)

And Solaris mode is just that same value divided by NCPUs.
Comment 9 Daniel van Vugt 2017-10-27 09:21:38 UTC
Sorry, that's wrong. It should be a multiplication still. Serves me right for working mentally.

You've just lost some precision somewhere in the equation. It should be maintaining fractional precision all the way through and not get staggered at integer values of NCPUs. A simple math bug, which should be easy to track down.
Comment 10 Daniel van Vugt 2017-10-27 09:46:00 UTC
Created attachment 362392 [details] [review]
0001-Fix-inaccurate-CPU-values-in-the-Processes-table.patch

A fix.
Comment 11 Daniel van Vugt 2017-10-27 10:00:40 UTC
Created attachment 362393 [details] [review]
Fix-inaccurate-CPU-values-in-the-Processes-table-v2.patch

A better fix.
Comment 12 Daniel van Vugt 2017-10-27 11:17:39 UTC
Created attachment 362399 [details] [review]
Fix-inaccurate-CPU-values-in-the-Processes-table-v3.patch

An even better fix.
Comment 13 Benoît Dejean 2017-10-27 13:04:16 UTC
My point is that I'm not able to reproduce this in the first place.
Comment 14 Daniel van Vugt 2017-10-30 02:30:02 UTC
It's easy. Just run gnome-system-monitor on a multi-core machine.

Observe that %CPU values in the Processes table are truncated to multiples of the number of CPUs (actually the number of unique CPU threads). So on a typical laptop, that means you will only see %CPU as multiples of 2 or 4, despite the fact it should be possible to report precision to well below 1% CPU.

The above patch makes %CPU accurate to the nearest 1% CPU, instead of N% CPU.
Comment 15 Benoît Dejean 2017-10-30 07:49:04 UTC
I don't see that on a 2 and 4 core computers.
Your patch only improves a little bit the precision by doing the same ops in 1 step instead of 2. I'd still like to know what distribution you are running.
Comment 16 Daniel van Vugt 2017-10-30 08:01:03 UTC
I'm running Ubuntu 17.10.

This is a basic programming problem.

The problem is that presently we have:
  (int)(A * 100 / B) * num_cpus

The first part "(int)(A * 100 / B)" is an integer, so multiplying by num_cpus after that guarantees your result is always a multiple of num_cpus. That's not accurate to 1% unless num_cpus==1.

The correct way to write it is:
  (int)(A * 100 * num_cpus / B)
so that less information is lost.

Example:
  A = 57
  B = 1000
  num_cpus = 4
then the old code would show: 20
the new code I propose would show: 22
which is more correct because the precise result is 22.8
Comment 17 Benoît Dejean 2017-10-30 15:53:29 UTC
You're right, sorry for wasting your time, I don't know why I didn't pay more attention. Let me test something and I'll commit that.
Comment 18 Benoît Dejean 2017-10-31 18:28:27 UTC
Review of attachment 362399 [details] [review]:

Commit 9224515cd6b19ba2dc59afe14684749b4dbf0349
Comment 19 Benoît Dejean 2017-10-31 18:28:41 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Once that release is available, you may want to check for a software upgrade provided by your Linux distribution.