GNOME Bugzilla – Bug 788922
Processes % CPU column is inaccurate (only ever an integer multiple of the number of CPUs)
Last modified: 2017-10-31 18:28:41 UTC
Processes % CPU column is inaccurate (only ever an integer multiple of the number of CPUs). So for example on my desktop with 8 CPU threads, the Processes tab only ever shows a process is using either 0%, 8%, 16%, 24%, 32%...
Hello Daniel, that's what the "Solaris mode" is all about. I'm not sure what the current status. The point is about displaying an overall % or a per-cpu %.
I understand the difference between overall and per-cpu % but it's still a bug :) I spent a decade developing monitoring software for Solaris and other Unixes and they never had this kind of precision problem. This seems to be just a simple math error. The precision is first limited to 1.0% and then multiplied by n-cpus afterwards.
Which libgtop version are you using ?
ii libgtop-2.0-11:amd64 2.38.0-1 amd64 gtop system monitoring library (shared) ii libgtop2-common 2.38.0-1 all gtop system monitoring library (common)
Can you please give the /proc/PID/stat and /proc/PID/cpu of a process consuming CPU, 2 times, waiting 10s between each collection ?
I'm busy with other work right now. But I'm also not bothered about this bug -- when I have time I'll just jump into the code and find the problem myself.
The code is based on /proc/PID/stat sum of 14th and 15th fields. On each refresh, it computes the difference to see how much was consumed by a process. If you're not using the solaris mode, this difference is multiplied by the number of cores. And then it's a percent of the total cpu time. So this would mean that between to reads of /proc/PID/stat, there's only a difference of 1, 2, 3, etc ticks... What kernel or ubuntu version are you running ?
It sounds to me (without looking at the source code) that the mistake is the multiplying by the number of cores. You should never need to do that for any mode. You don't need to multiply by number of cores if you know the machine's actual total CPU times across all cores (from /proc/stat), and delta that over the same interval. Then "Solaris" mode means to divide (not multiply) that result by NCPUs, as is documented in the top(1) man page. Normal per-process CPU usage as a percentage (with high fractional precision): delta(/proc/PID/stat fields 14+15) * 100 / delta(sum first line of /proc/stat) And Solaris mode is just that same value divided by NCPUs.
Sorry, that's wrong. It should be a multiplication still. Serves me right for working mentally. You've just lost some precision somewhere in the equation. It should be maintaining fractional precision all the way through and not get staggered at integer values of NCPUs. A simple math bug, which should be easy to track down.
Created attachment 362392 [details] [review] 0001-Fix-inaccurate-CPU-values-in-the-Processes-table.patch A fix.
Created attachment 362393 [details] [review] Fix-inaccurate-CPU-values-in-the-Processes-table-v2.patch A better fix.
Created attachment 362399 [details] [review] Fix-inaccurate-CPU-values-in-the-Processes-table-v3.patch An even better fix.
My point is that I'm not able to reproduce this in the first place.
It's easy. Just run gnome-system-monitor on a multi-core machine. Observe that %CPU values in the Processes table are truncated to multiples of the number of CPUs (actually the number of unique CPU threads). So on a typical laptop, that means you will only see %CPU as multiples of 2 or 4, despite the fact it should be possible to report precision to well below 1% CPU. The above patch makes %CPU accurate to the nearest 1% CPU, instead of N% CPU.
I don't see that on a 2 and 4 core computers. Your patch only improves a little bit the precision by doing the same ops in 1 step instead of 2. I'd still like to know what distribution you are running.
I'm running Ubuntu 17.10. This is a basic programming problem. The problem is that presently we have: (int)(A * 100 / B) * num_cpus The first part "(int)(A * 100 / B)" is an integer, so multiplying by num_cpus after that guarantees your result is always a multiple of num_cpus. That's not accurate to 1% unless num_cpus==1. The correct way to write it is: (int)(A * 100 * num_cpus / B) so that less information is lost. Example: A = 57 B = 1000 num_cpus = 4 then the old code would show: 20 the new code I propose would show: 22 which is more correct because the precise result is 22.8
You're right, sorry for wasting your time, I don't know why I didn't pay more attention. Let me test something and I'll commit that.
Review of attachment 362399 [details] [review]: Commit 9224515cd6b19ba2dc59afe14684749b4dbf0349
This problem has been fixed in our software repository. The fix will go into the next software release. Once that release is available, you may want to check for a software upgrade provided by your Linux distribution.